I have a problem that I don't understand, in that code:
ilProbekUcz= valuesUcz.Count; //valuesUcz is the list of <float[]>
for (int i = 0; i < ilWezlowDanych; i++) nodesValueArrayUcz[i] = new BitArray(ilProbekUcz);
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < ilProbekUcz; i++)
{
int index = 0;
linia = (float[])valuesUcz[i];//removing this line not solve problem
for (int a = 0; a < ileRazem; a++)
for (int b = 0; b < ileRazem; b++)
if (a != b)
{
bool value = linia[a] >= linia[b];
nodesValueArrayUcz[index][i] = value;
nodesValueArrayUcz[ilWezlowDanychP2 + index][i] = !value;
index++;
}
}
sw.Stop();
When i increase size of valuesUcz 2x, time of execution is 4x bigger
When i increase size of valuesUcz 4x, time of execution is 8x bigger
etc ...
(ileRazem,ilWezlowDanych is the same)
I understand: increase of ilProbekUcz increases size of BitArrays but i test it many times and it is no problem - time should grow linearly - in code:
ilProbekUcz= valuesUcz.Count; //valuesTest is the list of float[]
for (int i = 0; i < ilWezlowDanych; i++) nodesValueArrayUcz[i] = new BitArray(ilProbekUcz);
BitArray test1 = nodesValueArrayUcz[10];
BitArray test2 = nodesValueArrayUcz[20];
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < ilProbekUcz; i++)
{
int index = 0;
linia = (float[])valuesUcz[i];//removing this line not solve problem
for (int a = 0; a < ileRazem; a++)
for (int b = 0; b < ileRazem; b++)
if (a != b)
{
bool value = linia[a] >= linia[b];
test1[i] = value;
test2[i] = !value;
index++;
}
}
time grows linearly, so the problem is to take a BitArray from the array...
Is any method to do it faster ? (i want time to grow linearly)
You have to understand that measuring time there are many factors that makes them inacurate. The biggest factor when you have huuuuuge arrays as in your example is cashe misses. Many times the same thing written when taking account of cashe, can be as much as 2-5 or more times faster. Two words how cashe works, very roughly. Cache is memory inside cpu. It is waaaaaaaaaaaaaay faster than ram so when you want to fetch a variable from memory you want to make sure this variable is stored in cache and not in ram. If it is stored in cache we say we have a hit otherwise a miss. Some times, not so often, a program is so big that it stores variables in hard drive. In that case you have a huuuuuuuuuuuge hit in delay when you fetch these! An example of cache:
Lets say we have an array of 10 elements in memory(ram)
when you get the first element testArray[0], because testArray[0] is not in cache the cpu brings this value along with a number(lets say 3, the number depends on the cpu) of adjacent elements of the array eg it stores to cache testArray[0], testArray[1], testArray[2], testArray[3]
Now when we get testArray[1] it is in cache so we have a hit. The same with testArray[2] and testArray[3]. testArray[4] isn't in cache so it gets testArray[4] along with another 3 testArray[5], testArray[6], testArray[7]
and so on...
Cache misses are very costly. That means you may expect an array of double the size is going to be accessible double the time. But this is not true. Bigger arrays more misses
and the time may increase 2 or 3 or 4 or more times from what you expect. This is normal. In your example that is what is happening. From 100 million elemensts(first array) you go t0 400 million (second one). The missesare not double but waaay more as you saw. A very cool trick has to do with the way you access an array. In your example ba1[j][i] = (j % 2) == 0; is way worse than ba1[i][j] = (j % 2) == 0;. The same with ba2[j][i] = (j % 2) == 0; and ba1[i][j] = (j % 2) == 0;. You can test it. Just reverse i and j. It has to do with the way the 2D array is stored in memory so in the second case you have more hits that the first one.
Related
as you can read from the title I'm trying to store all the numbers between two numbers in an array.
For example store the numbers between 21 and 43 (22,23,24,25,26,27,28,29...) in an array.
This is the code, I don't know why but it prints only the higher number minus one.
class Program
{
static void Main(string[] args)
{
int higher = 43;
int lower = 21;
int[] numbers = new int[22]; //the numbers between 21 and 43 are 22
for (int i = lower; i < higher;i++)
{
for (int a = 0; a < 22; a++)
{
numbers[a] = i;
}
}
for (int c = 0; c < 22; c++)
{
Console.WriteLine(numbers[c]);
}
Console.ReadLine();
}
}
This is the code, I don't know why but it prints only the higher number minus one.
This question will attract answers giving you a half dozen solutions you can cut and paste to do your assignment.
I note you did not ask a question in your question -- next time, please format your question in the form of a question. The right question to ask here is how do I learn how to spot mistakes in code I've written? because that is the vital skill you lack. Answers that give you the code will not answer that question.
I already gave you a link to a recent answer where I explain that in detail, so study that.
In particular, in your case you have to read the program you wrote as though you had not written it. As though you were coming fresh to the program that someone else wrote and trying to figure out what it does.
The first thing I would do is look at the inner loop and say to myself "what does this do, in words?"
for (int a = 0; a < 22; a++)
{
numbers[a] = i;
}
That is "put the value i in every slot of the array. Now look at the outer loop:
for (int i = lower; i < higher;i++)
{
put the value i in every slot of the array
}
Now the technique to use here is to logically "unroll" the loop. A loop just does something multiple times so write that out. It starts with lower, it goes to higher-1, so that loop does this:
put the value lower in every slot of the array
put the value lower+1 in every slot of the array
…
put the value higher-1 in every slot of the array
What does the third loop do?
print every item in the array
And now you know why it prints the highest number minus one multiple times. Because that's what the program does. We just reasoned it out.
Incidentally the answers posted so far are correct, but some are not the best.
You have a technique that you understand for "do something to every member of an array, and that is:
loop an indexer from 0 to the array size minus one
do something to the array slot at the indexer
But the solutions the other answers are proposing are the opposite:
loop an indexer from the lower to the higher value
compute an index
do something to the array slot at that index
It's important to understand that both are correct, but my feeling is that for the beginner you should stick with the pattern you know. How would we
loop an indexer from 0 to the array size minus one
do something to the array slot at the indexer
for your problem? Let's start with giving you a much better technique for looping the indexer:
for (int i = 0; i < numbers.Length; ++i)
That's a better technique because when you change the size of the array, you don't have to change the loop! And also you are guaranteed that every slot in the array is covered. Design your loops so that they are robust to changes and have good invariants.
Now you have to work out what the right loop body is:
{
int number = i + lower;
numbers[i] = number;
}
Now you know that your loop invariant is "when the loop is done, the array is full of consecutive numbers starting at lower".
For everytime you loop through i, you put that number in every slot of the array. The inner loop is what is causing your issue. A better solution would be:
int higher = 43;
int lower = 21;
int[] numbers = new int[21];
int index = 0;
for (int i = lower + 1; i < higher; i++) // if you want to store everything
// between 21 and 43, you need to
// start with 22, thus lower + 1
{
numbers[index] = i;
index++;
}
for (int c = 0; c < 21; c++)
{
Console.WriteLine(numbers[c]);
}
Console.ReadLine();
Replace a with a direct translation of i
for (int i = lower; i < higher;i++)
{
numbers[i-lower] = i;
}
Use below
int higher = 43;
int lower = 21;
int[] numbers = new int[22]; //the numbers between 21 and 43 are 22
for (int i = lower+1; i < higher; i++)
{
numbers[i-lower] = i;
}
for (int c = 1; c < 21; c++)
{
Console.WriteLine(numbers[c]);
}
Console.ReadLine();
I think higher & lower are variables so following will give you output
for any higher and lower numbers
class Program
{
static void Main(string[] args)
{
int higher = 43;
int lower = 21;
int numDiff = higher - lower - 1;
int[] numbers = new int[numDiff]; //the numbers between 21 and 43 are 22
for(int i = 0; i<numbers.Length; i++)
{
numbers[i] = numDiff + i + 1;
}
for(int b = 0; b<numbers.Length; b++)
{
Console.WriteLine(numbers[b]);
}
Console.ReadLine();
}
}
my problem is following: I need to create some random values for scheduling. For example process times are given. Lets say a job j on a machine i gets a random value between (1,99). thats the time the jobs needs on this machine.
Now, I need to manipulate this random values. I like to say of all random process times, 20% of them are zero process times. So does anybody know how it is possible to give an array with integers a specific amount with a specific time??
here the normal random:
p_machine_job_completionTime = new int[Constants.numberMachines][];
for (i = 0; i < Constants.numberMachines; i++)
{
p_machine_job_completionTime[i] = new int[Constants.numberJobs];
for (j = 0; j < Constants.numberJobs; j++)
{
p_machine_job_completionTime[i][j] = random.Next(1, 99);
}
}
Now, jobs may skip a machine and consequently have a processing time of 0. Is it possible to limit the random values, with guaranteeing that x% of all my random values has the value 0 ??
e.g.:
20% of p_machine_job_completionTime[i][j] = 0
80% of p_machine_job_completionTIme[i][j] = random (1,99)
I am very thankful for any small any tiny advice.
Just separate two cases: 20% when 0 should be returned and 80% when 1..99 is the outcome
Random random;
...
int value = random.Next(5) == 0 ? 0 : random.Next(99) + 1;
One way to do this would be by generating two random values: One for determining whether to use 0, and (possibly) another to generate non-zero values. However, you can combine both randoms into one by increasing the range of your random values by the appropriate amount, and converting any results above your limit to 0:
int val = random.Next(1, 123);
if (val >= 99)
val = 0;
In this case, your target range contains 98 possible values (1 to 98, since the upper bound is exclusive). To get 0 with 20% probability, you need to extend the range of your random generator to 1 / (1 - 20%), or 125% of its present value, which would be 123.
I think what the existing answers are missing is this important point:
"Is it possible to limit the random values, with guaranteeing that x%
of all my random values has the value 0"
If you need to guarantee that at the end of the day some random exactly x% of the items are given a value of zero, then you can't use random as in the answer from #Douglas. As #Douglas says, "To get 0 with 20% probability." But as stated in the question we don't want 20% probability, we want EXACTLY 20%, and the other exactly 80% to have random values. I think the code below does what you want. Filled in some values for numberMachines and numberJobs so the code can be run.
int numberMachines = 5;
int numberJobs = 20;
Random random = new Random();
var p_machine_job_completionTime = new int[numberMachines][];
var theTwentyPercent = new HashSet<int>(Enumerable.Range(0,(numberJobs * numberMachines) -1 ).OrderBy(x => Guid.NewGuid()).Take(Convert.ToInt32((numberMachines * numberJobs) * 0.2)));
for (int i = 0; i < numberMachines; i++) {
p_machine_job_completionTime[i] = new int[numberJobs];
for (int j = 0; j < numberJobs; j++) {
int index = (i * numberJobs) + j;
if (theTwentyPercent.Contains(index)) {
p_machine_job_completionTime[i][j] = 0;
}
else {
p_machine_job_completionTime[i][j] = random.Next(1, 99);
}
}
}
Debug.Assert( p_machine_job_completionTime.SelectMany(x => x).Count(val => val==0) == (numberMachines * numberJobs) * 0.2 );
I think that you do not need to include this within Random in your case. You could simply use modulo as a part of your for loop:
p_machine_job_completionTime = new int[Constants.numberMachines][];
int counter = 0;
for (i = 0; i < Constants.numberMachines; i++)
{
p_machine_job_completionTime[i] = new int[Constants.numberJobs];
for (j = 0; j < Constants.numberJobs; j++)
{
if( counter%5 == 0)
{
p_machine_job_completionTime[i][j] = 0;
}
else
{
p_machine_job_completionTIme[i][j] = random (1,99);
}
counter++;
}
}
If I have a standard for loop is there a more efficient way to omit certain occurances?
For example:
A:
for (int i = 0; i < n; i++)
{
if (i != n / 2 && i != n / 3 && i != n / 4)
{
val += DoWork(i);
}
else
{
continue;
}
}
B:
for (int i = 0; i < n / 4; i++)
{
val += DoWork(i); ;
}
for (int i = n / 4 + 1; i < n / 3; i++)
{
val += DoWork(i);
}
for (int i = n / 3 + 1; i < n / 2; i++)
{
val += DoWork(i);
}
for (int i = n / 2 + 1; i < n; i++)
{
val += DoWork(i);
}
C:
for (int i = 0; i < n; i++)
{
if (i == n / 2 || i == n / 3 || i == n / 4)
{
continue;
}
else
{
val += DoWork(i);
}
}
For n = int.MaxValue the results were as follows:
A Results: 57498 milliseconds.
B Results: 42204 milliseconds.
C Results: 57643 milliseconds.
EDIT BASED ON #Churk 's answer.
I added another test case method D as follows:
D:
for (int i = 0; i < n; i = (i != n / 2 -1 && i != n / 3 -1 && i != n / 4-1) ? i+1: i+2)
{
val += DoWork(i);
}
and got the following results:
A Results: 56355 milliseconds.
B Results: 40612 milliseconds.
C Results: 56214 milliseconds.
D Results: 51810 milliseconds.
Edit2: After pre-calculating values for n/2 n/3 and n/4 outside the loop got:
A Results: 50873 milliseconds.
B Results: 39514 milliseconds.
C Results: 51167 milliseconds.
D Results: 42808 milliseconds.
The D loop seems to be close again to the "unrolling" of B, with A and C still taking significantly more time.
Which are about what I expected, as far as comparisons between each of the methods.
My question is, is there a more efficient way of doing this?
It depends a bit on the context. One possiblitiy in many scenarios is to cheat. So instead of omitting the numbers, just include them and later reverse the result from the numbers you didn't want:
for (int i = 0; i < n; i++)
{
val += DoWork(i);
}
val -= DoWork(i/2);
val -= DoWork(i/3);
val -= DoWork(i/4);
The time you save from comparisons might outweigh the results from calculating some numbers twice, depending on how expensive the DoWork operation is.
Embed your secondary condition into you for loop
Untested:
for (int i = 0; i < n || (i != n / 2 && i != n / 3 && i != n / 4); i++)
val += DoWork(i);
}
I think this will work as long as one of those condition is true it will keep going
First, just pause it a few times during that 40-60 seconds, so you get a fair idea what fraction of time is in DoWork. If you could even make your looping take no time at all, you'd still have to spend that part.
Now look at your comparisons.
Each one is asking a question.
If the answer to a question is almost always True or almost always False, it is an opportunity to get speedup.
(All the log(n) and n*log(n) algorithms work by making their decision points more like fair coins.)
So you can see why your B is faster. It is asking fewer questions per loop, on average.
You can make it even faster, by unrolling the loops (asking the questions even less often).
(I know, I know, compilers can unroll loops. Well, maybe. Do it yourself and you won't have to bet on it.)
You could calculate the values outside of the loop and then just skip them:
int skip1 = n/2;
int skip2 = n/3;
int skip3 = n/4;
for (int i = 0; i < n; i++)
{
if (i != skip1 && i != skip2 && i != skip3)
{
val += DoWork(i);
}
}
Not sure if this will save enough milliseconds to be worthwhile, though.
Update: I've just noticed that #surfen suggested this in the comments.
If DoWork() isn't the bottleneck, it is a small enough method to embed into the loop, so you won't need the call which costs time by itself. If DoWork() does most of the work though, you're the one wasting time :)
You can count operations...
Option A:
n checks on i in the for loop and then 3 checks on each i that isn't that value... so there's 4n operations just for the checks.
option B:
You're just looping through intervals, so you're doing n-3 operations.
option C:
Same as option A, 4n operations.
I wouldn't bother complicating code with such modifications (as many have already commented on your question)
Instead, you might want to run your DoWork simultanuously in multiple threads using Pararell.ForEach. This would have a much larger impact on performance, if your DoWork() does anything.
look at these 2 loops
const int arrayLength = ...
Version 0
public void RunTestFrom0()
{
int sum = 0;
for (int i = 0; i < arrayLength; i++)
for (int j = 0; j < arrayLength; j++)
for (int k = 0; k < arrayLength; k++)
for (int l = 0; l < arrayLength; l++)
for (int m = 0; m < arrayLength; m++)
{
sum += myArray[i][j][k][l][m];
}
}
Version 1
public void RunTestFrom1()
{
int sum = 0;
for (int i = 1; i < arrayLength; i++)
for (int j = 1; j < arrayLength; j++)
for (int k = 1; k < arrayLength; k++)
for (int l = 1; l < arrayLength; l++)
for (int m = 1; m < arrayLength; m++)
{
sum += myArray[i][j][k][l][m];
}
}
Version 2
public void RunTestFrom2()
{
int sum = 0;
for (int i = 2; i < arrayLength; i++)
for (int j = 2; j < arrayLength; j++)
for (int k = 2; k < arrayLength; k++)
for (int l = 2; l < arrayLength; l++)
for (int m = 2; m < arrayLength; m++)
{
sum += myArray[i][j][k][l][m];
}
}
Results for arrayLength=50 are (average from multiple sampling compiled X64):
Version 0: 0.998s (Standard error of the mean 0.001s) total loops: 312500000
Version 1: 1.449s (Standard error of the mean 0.000s) total loops: 282475249
Version 2: 0.774s (Standard error of the mean 0.006s) total loops: 254803968
Version 3: 1.183s (Standard error of the mean 0.001s) total loops: 229345007
if we make arrayLength=45 then
Version 0: 0.495s (Standard error of the mean 0.003s) total loops: 184528125
Version 1: 0.527s (Standard error of the mean 0.001s) total loops: 164916224
Version 2: 0.752s (Standard error of the mean 0.001s) total loops: 147008443
Version 3: 0.356s (Standard error of the mean 0.000s) total loops: 130691232
why:
loop start from 0 is faster than loop start from 1 though more loops
why loop start from 2 behaves weird?
Update:
I did each run 10 times, (that's where standard error of the mean comes from)
I also switched the order of version tests a couple of time. No big difference.
The length of myArray of each dimension = arrayLength, I initialized it in the beginning and the time taken is excluded. The value is 1. So sum gives the total loops.
The complied version is Released mode, and I run it from Outside VS. (Closed VS)
Update2:
Now I discard myArray completely, sum++ instead, and added GC.Collect()
public void RunTestConstStartConstEnd()
{
int sum = 0;
for (int i = constStart; i < constEnd; i++)
for (int j = constStart; j < constEnd; j++)
for (int k = constStart; k < constEnd; k++)
for (int l = constStart; l < constEnd; l++)
for (int m = constStart; m < constEnd; m++)
{
sum++;
}
}
Update
This appears to me to be a result of an unsuccessful attempt at optimization by the jitter, not the compiler. In short, if the jitter can determine the lower bound is a constant it will do something different which turns out to actually be slower. The basis for my conclusions takes some proving, so bear with me. Or go read something else if you're not interested!
I concluded this after testing four different ways to set the lower bound of the loop:
Hard coded in each level, as in colinfang's question
Use a local variable, assigned dynamically through a command line argument
Use a local variable but assign it a constant value
Use a local variable and assign it a constant value, but first pass the value through a goofy sausage-grinding identity function. This confuses the jitter enough to prevent it from applying its constant-value "optimization".
The compiled intermediate language for all four versions of the looping section is almost identical. The only difference is that in version 1 the lower bound is loaded with the command ldc.i4.#, where # is 0, 1, 2, or 3. That stands for load constant. (See ldc.i4 opcode). In all other versions, the lower bound is loaded with ldloc. This is true even in case 3, where the compiler could infer that lowerBound is really a constant.
The resulting performance is not constant. Version 1 (explicit constant) is slower than version 2 (run-time argument) along similar lines as found by the OP. What is very interesting is that version 3 is also slower, with comparable times to version 1. So even though the IL treats the lower bound as a variable, the jitter appears to have figured out that the value never changes, and substitutes a constant as in version 1, with the corresponding performance reduction. In version 4 the jitter can't infer what I know -- that Confuser is actually an identity function -- and so it leaves the variable as a variable. The resulting performance is the same as the command line argument version (2).
My theory on the cause of the performance difference: The jitter is aware and makes use of the fine details of actual processor architecture. When it decides to use a constant other than 0, it has to actually go fetch that literal value from some storage which is not in the L2 cache. When it is fetching a frequently used local variable it instead reads its value from the L2 cache, which is insanely fast. Normally it doesn't make sense to be taking up room in the precious cache with something as dumb as a known literal integer value. In this case we care more about read time than storage, though, so it has an undesired impact on performance.
Here is the full code for the version 2 (command line arg):
class Program {
static void Main(string[] args) {
List<double> testResults = new List<double>();
Stopwatch sw = new Stopwatch();
int upperBound = int.Parse(args[0]) + 1;
int tests = int.Parse(args[1]);
int lowerBound = int.Parse(args[2]); // THIS LINE CHANGES
int sum = 0;
for (int iTest = 0; iTest < tests; iTest++) {
sum = 0;
GC.Collect();
sw.Reset();
sw.Start();
for (int lvl1 = lowerBound; lvl1 < upperBound; lvl1++)
for (int lvl2 = lowerBound; lvl2 < upperBound; lvl2++)
for (int lvl3 = lowerBound; lvl3 < upperBound; lvl3++)
for (int lvl4 = lowerBound; lvl4 < upperBound; lvl4++)
for (int lvl5 = lowerBound; lvl5 < upperBound; lvl5++)
sum++;
sw.Stop();
testResults.Add(sw.Elapsed.TotalMilliseconds);
}
double avg = testResults.Average();
double stdev = testResults.StdDev();
string fmt = "{0,13} {1,13} {2,13} {3,13}"; string bar = new string('-', 13);
Console.WriteLine();
Console.WriteLine(fmt, "Iterations", "Average (ms)", "Std Dev (ms)", "Per It. (ns)");
Console.WriteLine(fmt, bar, bar, bar, bar);
Console.WriteLine(fmt, sum, avg.ToString("F3"), stdev.ToString("F3"),
((avg * 1000000) / (double)sum).ToString("F3"));
}
}
public static class Ext {
public static double StdDev(this IEnumerable<double> vals) {
double result = 0;
int cnt = vals.Count();
if (cnt > 1) {
double avg = vals.Average();
double sum = vals.Sum(d => Math.Pow(d - avg, 2));
result = Math.Sqrt((sum) / (cnt - 1));
}
return result;
}
}
For version 1: same as above except remove lowerBound declaration and replace all lowerBound instances with literal 0, 1, 2, or 3 (compiled and executed separately).
For version 3: same as above except replace lowerBound declaration with
int lowerBound = 0; // or 1, 2, or 3
For version 4: same as above except replace lowerBound declaration with
int lowerBound = Ext.Confuser<int>(0); // or 1, 2, or 3
Where Confuser is:
public static T Confuser<T>(T d) {
decimal d1 = (decimal)Convert.ChangeType(d, typeof(decimal));
List<decimal> L = new List<decimal>() { d1, d1 };
decimal d2 = L.Average();
if (d1 - d2 < 0.1m) {
return (T)Convert.ChangeType(d2, typeof(T));
} else {
// This will never actually happen :)
return (T)Convert.ChangeType(0, typeof(T));
}
}
Results (50 iterations of each test, in 5 batches of 10):
1: Lower bound hard-coded in all loops:
Program Iterations Average (ms) Std Dev (ms) Per It. (ns)
-------- ------------- ------------- ------------- -------------
Looper0 345025251 267.813 1.776 0.776
Looper1 312500000 344.596 0.597 1.103
Looper2 282475249 311.951 0.803 1.104
Looper3 254803968 282.710 2.042 1.109
2: Lower bound supplied at command line:
Program Iterations Average (ms) Std Dev (ms) Per It. (ns)
-------- ------------- ------------- ------------- -------------
Looper 345025251 269.317 0.853 0.781
Looper 312500000 244.946 1.434 0.784
Looper 282475249 222.029 0.919 0.786
Looper 254803968 201.238 1.158 0.790
3: Lower bound hard-coded but copied to local variable:
Program Iterations Average (ms) Std Dev (ms) Per It. (ns)
-------- ------------- ------------- ------------- -------------
LooperX0 345025251 267.496 1.055 0.775
LooperX1 312500000 345.614 1.633 1.106
LooperX2 282475249 311.868 0.441 1.104
LooperX3 254803968 281.983 0.681 1.107
4: Lower bound hard-coded but ground through Confuser:
Program Iterations Average (ms) Std Dev (ms) Per It. (ns)
-------- ------------- ------------- ------------- -------------
LooperZ0 345025251 266.203 0.489 0.772
LooperZ1 312500000 241.689 0.571 0.774
LooperZ2 282475249 219.533 1.205 0.777
LooperZ3 254803968 198.308 0.416 0.778
That is an enourmous array. For all practical purposes you are testing how long it takes your operating system to fetch the values of each element from memory, not to compare whether j, k, etc are less than arrayLength, to increment the counters, and increment your sum. The latency to fetch those values has little to do with the runtime or jitter per se and a lot to do with whatever else happens to be running on your system as a whole and the current compression and organization of the heap.
In addition, because your array is taking up so much room and being accessed frequently it's quite possible that garbage collection is running during some of your test iterations, which would completely inflate the apparent CPU time.
Try doing your test without the array lookup -- just add 1 (sum++) and then see what happens. To be even more thorough, call GC.Collect() just before each test to avoid a collection during the loop.
I think Version 0 is faster because the compiler generates a special code without range checking in that case. See http://msdn.microsoft.com/library/ms973858.aspx (section Range Check Elimination)
Just an Idea :
Maybe there are bitshifting optimizations in the loops so it will take longer when beginning in an uneven count.
I also don't know if your processor might be an indicator for the different results
Off the top of my head, perhaps there's a compiler optimization for 0 -> length. Check your build settings (release vs debug).
Beyond that, unless it's just an issue of your computer doing other work that is influencing the benchmarks, I'm not sure. Perhaps you should alter your benchmark to run each test multiple times and average the results.
I have some large arrays of 2D data elements. A and B aren't equally sized dimensions.
A) is between 5 and 20
B) is between 1000 and 100000
The initialization time is no problem as its only going to be lookup tables for realtime application, so performance on indexing elements from knowing value A and B is crucial. The data stored is currently a single byte-value.
I was thinking around these solutions:
byte[A][B] datalist1a;
or
byte[B][A] datalist2a;
or
byte[A,B] datalist1b;
or
byte[B,A] datalist2b;
or perhaps loosing the multidimension as I know the fixed size and just multiply the to values before looking it up.
byte[A*Bmax + B] datalist3;
or
byte[B*Amax + A] datalist4;
What I need is to know, what datatype/array structure to use for most efficient lookup in C# when I have this setup.
Edit 1
the first two solutions were supposed to be multidimensional, not multi arrays.
Edit 2
All data in the smallest dimension is read at each lookup, but the large one is only used for indexing once at a time.
So its something like - Grab all A's from sample B.
I'd bet on the jagged arrays, unless the Amax or Bmax are a power of 2.
I'd say so, because a jagged array needs two indexed accesses, thus very fast. The other forms implies a multiplication, either implicit or explicit. Unless that multiplication is a simple shift, I think could be a bit heavier than a couple of indexed accesses.
EDIT: Here is the small program used for the test:
class Program
{
private static int A = 10;
private static int B = 100;
private static byte[] _linear;
private static byte[,] _square;
private static byte[][] _jagged;
unsafe static void Main(string[] args)
{
//init arrays
_linear = new byte[A * B];
_square = new byte[A, B];
_jagged = new byte[A][];
for (int i = 0; i < A; i++)
_jagged[i] = new byte[B];
//set-up the params
var sw = new Stopwatch();
byte b;
const int N = 100000;
//one-dim array (buffer)
sw.Restart();
for (int i = 0; i < N; i++)
{
for (int r = 0; r < A; r++)
{
for (int c = 0; c < B; c++)
{
b = _linear[r * B + c];
}
}
}
sw.Stop();
Console.WriteLine("linear={0}", sw.ElapsedMilliseconds);
//two-dim array
sw.Restart();
for (int i = 0; i < N; i++)
{
for (int r = 0; r < A; r++)
{
for (int c = 0; c < B; c++)
{
b = _square[r, c];
}
}
}
sw.Stop();
Console.WriteLine("square={0}", sw.ElapsedMilliseconds);
//jagged array
sw.Restart();
for (int i = 0; i < N; i++)
{
for (int r = 0; r < A; r++)
{
for (int c = 0; c < B; c++)
{
b = _jagged[r][c];
}
}
}
sw.Stop();
Console.WriteLine("jagged={0}", sw.ElapsedMilliseconds);
//one-dim array within unsafe access (and context)
sw.Restart();
for (int i = 0; i < N; i++)
{
for (int r = 0; r < A; r++)
{
fixed (byte* offset = &_linear[r * B])
{
for (int c = 0; c < B; c++)
{
b = *(byte*)(offset + c);
}
}
}
}
sw.Stop();
Console.WriteLine("unsafe={0}", sw.ElapsedMilliseconds);
Console.Write("Press any key...");
Console.ReadKey();
Console.WriteLine();
}
}
Multidimensional ([,]) arrays are nearly always the slowest, unless under a heavy random access scenario. In theory they shouldn't be, but it's one of the CLR oddities.
Jagged arrays ([][]) are nearly always faster than multidimensional arrays; even under random access scenarios. These have a memory overhead.
Singledimensional ([]) and algebraic arrays ([y * stride + x]) are the fastest for random access in safe code.
Unsafe code is, normally, fastest in all cases (provided you don't pin it repeatedly).
The only useful answer to "which X is faster" (for all X) is: you have to do performance tests that reflect your requirements.
And remember to consider, in general*:
Maintenance of the program. If this is not a quick one off, a slightly slower but maintainable program is a better option in most cases.
Micro benchmarks can be deceptive. For instance a tight loop just reading from a collection might be optimised away in ways not possible when real work is being done.
Additionally consider that you need to look at the complete program to decide where to optimise. Speeding up a loop by 1% might be useful for that loop, but if it is only 1% of the complete runtime then it is not making much differences.
* But all rules have exceptions.
On most modern computers, arithmetic operations are far, far faster than memory lookups.
If you fetch a memory address that isn't in a cache or where the out of order execution pulls from the wrong place you are looking at 10-100 clocks, a pipelined multiply is 1 clock.
The other issue is cache locality.
byte[BAmax + A] datalist4; seems like the best bet if you are accessing with A's varying sequentially.
When datalist4[bAmax + a] is accessed, the computer will usually start pulling in datalist4[bAmax + a+ 64/sizeof(dataListType)], ... +128 ... etc, or if it detects a reverse iteration, datalist4[bAmax + a - 64/sizeof(dataListType)]
Hope that helps!
May be best way for u will be use HashMap
Dictionary?