I've
written simple for loop iterating through array and Parallel.ForEach loop doing the same thing. However, resuls I've get are different so I want to ask what the heck is going on? :D
class Program
{
static void Main(string[] args)
{
long creating = 0;
long reading = 0;
long readingParallel = 0;
for (int j = 0; j < 10; j++)
{
Stopwatch timer1 = new Stopwatch();
Random rnd = new Random();
int[] array = new int[100000000];
timer1.Start();
for (int i = 0; i < 100000000; i++)
{
array[i] = rnd.Next(5);
}
timer1.Stop();
long result = 0;
Stopwatch timer2 = new Stopwatch();
timer2.Start();
for (int i = 0; i < 100000000; i++)
{
result += array[i];
}
timer2.Stop();
Stopwatch timer3 = new Stopwatch();
long result2 = 0;
timer3.Start();
Parallel.ForEach(array, (item) =>
{
result2 += item;
});
if (result != result2)
{
Console.WriteLine(result + " - " + result2);
}
timer3.Stop();
creating += timer1.ElapsedMilliseconds;
reading += timer2.ElapsedMilliseconds;
readingParallel += timer3.ElapsedMilliseconds;
}
Console.WriteLine("Create : \t" + creating / 100);
Console.WriteLine("Read: \t\t" + reading / 100);
Console.WriteLine("ReadP: \t\t" + readingParallel / 100);
Console.ReadKey();
}
}
So in the condition I get results:
result = 200009295;
result2 = 35163054;
Is there anything wrong?
The += operator is non-atomic and actually performs multiple operations:
load value at location that result is pointing to, into memory
add array[i] to the in-memory value (I'm simplifying here)
write the result back to result
Since a lot of these add operations will be running in parallel it is not just possible, but likely that there will be races between some of these operations where one thread reads a result value and performs the addition, but before it has the chance to write it back, another thread grabs the old result value (which hasn't yet been updated) and also performs the addition. Then both threads write their respective values to result. Regardless of which one wins the race, you end up with a smaller number than expected.
This is why the Interlocked class exists.
Your code could very easily be fixed:
Parallel.ForEach(array, (item) =>
{
Interlocked.Add(ref result2, item);
});
Don't be surprised if Parallel.ForEach ends up slower than the fully synchronous version in this case though. This is due to the fact that
the amount of work inside the delegate you pass to Parallel.ForEach is very small
Interlocked methods incur a slight but non-negligible overhead, which will be quite noticeable in this particular case
Related
I'm working an Euler problem with an outer and inner loop. The outer loop contains the value being checked, the inner loop controls how many test iterations pass, in this case looking for Lychrel numbers.
The outer loop works in parallel just fine, but the inner loop is extremely inconsistent. You can see from my commented out lines that I've tried a List<T> and used locking, as well as using ConcurrentQueue<T>. My initial implementation used a bool set to true (that the number IS a Lychrel number) which would get set to false if proven otherwise after n-iterations. It would then just count the number of collected Lychrel numbers. The bool operation wasn't working so well, jumping out of the inner loop (even with a lock). I even tried to implement a threadsafe boolean, but so far nothing has kept the inner loop consistent. At this point it's become a learning exercise. I'm generally familiar with threading, and use it fairly regularly even with collections, but this one stumps me as to the root cause of the problem.
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
BigInteger answer = 0;
List<long> lychrels = new List<long>();
ConcurrentQueue<long> CQlychrels = new ConcurrentQueue<long>();
long maxValue = 10000;
long maxIterations = 50;
sw.Start();
//for (int i = 1; i < maxValue; i++)
Parallel.For(1, maxValue, i =>
{
BigInteger workingValue = i;
//bool lychrel = true;
//for (int w = 1; w < maxIterations; w++)
Parallel.For(1, maxIterations, (w, loopstate) =>
{
workingValue = workingValue.LychrelAdd();
if (workingValue.ToString().Length > 1)
if (IsPalindrome(workingValue))
{
//lychrel = false;
CQlychrels.Enqueue(i);
//lock (lychrels)
//lychrels.Add(i);
loopstate.Break();
//break;
}
});
//if (!lychrel)
//lock (lychrels)
//lychrels.Add(i);
});
answer = maxValue - CQlychrels.Count();
sw.Stop();
Console.WriteLine("Answer: " + answer);
Console.WriteLine("Found in " + sw.ElapsedTicks + " ticks.");
Console.WriteLine("Found in " + sw.ElapsedMilliseconds + "ms.");
while (Console.ReadKey() == null)
{ }
Environment.Exit(0);
}
BigInteger.LychrelAdd() just takes the value and a mirror of it's value and adds them together.
I suspect, perhaps, that either that or IsPalindrome() not being threadsafe may be the cause? Setting workingValue outside of that loop and working on it inside? Something to do with BigInteger being a reference value and that reference changing?
No matter what I use: Threading Class or TPL task based pattern. There is always an Index out of bound on the data.
From further research, I found the value of counter i can be 4, which should not be even possible.
What I have Missed? I'm expecting your expert opinions!
Tested with Visual Studio 15.8(2017) 16.1(2019), project targeting .NET framework 4.72.
using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
// a multi-threading search demo, omit much code for simple and clear
// generate 0-99, total 100 elements with ascending order
List<int> testData = new List<int>();
for (int i = 0; i < 100; i++)
{
testData.Add(i);
}
List<int> searchFor = new List<int>() {
67, 0, 99,
23, 24, 25,
-1, 106
};
const int threadsCount = 4;
// Test switch
bool useThreadInsteadOfTaskTPL = true;
if (useThreadInsteadOfTaskTPL)
{
// search every piece of data
for (int j = 0; j < searchFor.Count; j++)
{
Thread[] threads = new Thread[threadsCount];
Console.WriteLine("Search for: {0}", searchFor[j]);
// trying to divide the data into 4 parts, and search in parallel
for (int i = 0; i < threadsCount; i++)
{
Thread thread = new Thread(() => {
// Capture the counters to make sure no lambda pitfall
int counterI = i;
int counterJ = j;
Console.WriteLine("i value: {0}", counterI);
Console.WriteLine("j value: {0}", counterJ);
// your code
});
threads[i] = thread;
threads[i].Start();
}
for (int i = 0; i < threads.Length; i++)
{
threads[i].Join();
}
Console.WriteLine();
}
}
else
{
for (int j = 0; j < searchFor.Count; j++)
{
Task[] tasks = new Task[threadsCount];
Console.WriteLine("Search for: {0}", searchFor[j]);
// trying to divide the data into 4 parts, and search in parallel
for (int i = 0; i < threadsCount; i++)
{
Task task = Task.Factory.StartNew(() => {
// Capture the counters to make sure no lambda pitfall
int counterI = i;
int counterJ = j;
Console.WriteLine("i value: {0}", counterI);
Console.WriteLine("j value: {0}", counterJ);
// your code
}, new CancellationTokenSource().Token,
TaskCreationOptions.None, TaskScheduler.Default);
tasks[i] = task;
}
Task.WaitAll(tasks);
Console.WriteLine();
}
}
Console.ReadKey();
}
}
}
The expected value of i should go through 0...3,
but the actual value of i may equals to 4 or keep unchanged between iterates.
You should reassign i and j on loop start (not inside lambda):
for (int i = 0; i < threadsCount; i++)
{
// Capture the counters to make sure no lambda pitfall
int counterI = i;
int counterJ = j;
Thread thread = new Thread(() =>
{
Console.WriteLine("i value: {0}", counterI);
Console.WriteLine("j value: {0}", counterJ);
// your code
}
}
Your thread is scheduled for execution (it is not started immediately after Start() is called) and when it starts running the value of i (and j) can be already changed. (You can take a look at compiler generated code for this case and for yours).
And same for tasks - they are scheduled, not started immediately.
More details:
See this example (Action delegate is used instead of Thread) and generated code.
You can see difference (generated code creates instance of class
which stores value to print and a method which actually prints):
reassign inside delegate - for every iteration the same instance is used and value is incremented after calling the delegate. With Action it works as expected,
because it executes immediately (calling method from generated class
to print value), then value of generated class is incremented and new
iteration is started.
reassign outside delegate - instance of generated class is created
for every iteration, so there is no increment. Every iteration has
independent instance and next iteration can't change the value for
previous one.
In case of threads the only difference is that thread is not started immediately, it is scheduled for execution and this takes some time. For first case - when method for printing value is called, the value can be already incremented (because of same instance for all iterations) and you get unexpected result.
You can check this by running application multiple times (for first case) - you will get not identical results when printing i variable - sometimes it is incremented when it is not expected (because it took some time from calling Start() and actual starting of thread execution after scheduling), sometimes values are correct (because thread was scheduled and started almost immediately after calling Start() before increment occurs).
I'm optimizing every line of code in my application, as performance is key. I'm testing all assumptions, as what I expect is not what I see in reality.
A strange occurrence to me is the performance of function calls. Below are two scenarios. Iterating an integer within the loop, and with a function in the loop. I expected the function call to be slower, however it is faster??
Can anyone explain this? I'm using .NET 4.7.1
Without function: 2808ms
With function 2295ms
UPDATE:
Switching the loops switches the runtime as well - I don't understand why, but will accept it as it is. Running the two different loops in different applications give similar results. I'll assume in the future that a function call won't create any additional overhead
public static int a = 0;
public static void Increment()
{
a = a + 1;
}
static void Main(string[] args)
{
//There were suggestions that the first for loop always runs faster. I have included a 'dummy' for loop here to warm up.
a = 0;
for (int i = 0;i < 1000;i++)
{
a = a + 1;
}
//Normal increment
Stopwatch sw = new Stopwatch();
sw.Start();
a = 0;
for (int i = 0; i < 900000000;i++)
{
a = a + 1;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
//Increment with function
Stopwatch sw2 = new Stopwatch();
sw2.Start();
a = 0;
for (int i = 0; i < 900000000; i++)
{
Increment();
}
sw2.Stop();
Console.WriteLine(sw2.ElapsedMilliseconds);
Console.ReadLine();
}
So I am looking at this question and the general consensus is that uint cast version is more efficient than range check with 0. Since the code is also in MS's implementation of List I assume it is a real optimization. However I have failed to produce a code sample that results in better performance for the uint version. I have tried different tests and there is something missing or some other part of my code is dwarfing the time for the checks. My last attempt looks like this:
class TestType
{
public TestType(int size)
{
MaxSize = size;
Random rand = new Random(100);
for (int i = 0; i < MaxIterations; i++)
{
indexes[i] = rand.Next(0, MaxSize);
}
}
public const int MaxIterations = 10000000;
private int MaxSize;
private int[] indexes = new int[MaxIterations];
public void Test()
{
var timer = new Stopwatch();
int inRange = 0;
int outOfRange = 0;
timer.Start();
for (int i = 0; i < MaxIterations; i++)
{
int x = indexes[i];
if (x < 0 || x > MaxSize)
{
throw new Exception();
}
inRange += indexes[x];
}
timer.Stop();
Console.WriteLine("Comparision 1: " + inRange + "/" + outOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms");
inRange = 0;
outOfRange = 0;
timer.Reset();
timer.Start();
for (int i = 0; i < MaxIterations; i++)
{
int x = indexes[i];
if ((uint)x > (uint)MaxSize)
{
throw new Exception();
}
inRange += indexes[x];
}
timer.Stop();
Console.WriteLine("Comparision 2: " + inRange + "/" + outOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms");
}
}
class Program
{
static void Main()
{
TestType t = new TestType(TestType.MaxIterations);
t.Test();
TestType t2 = new TestType(TestType.MaxIterations);
t2.Test();
TestType t3 = new TestType(TestType.MaxIterations);
t3.Test();
}
}
The code is a bit of a mess because I tried many things to make uint check perform faster like moving the compared variable into a field of a class, generating random index access and so on but in every case the result seems to be the same for both versions. So is this change applicable on modern x86 processors and can someone demonstrate it somehow?
Note that I am not asking for someone to fix my sample or explain what is wrong with it. I just want to see the case where the optimization does work.
if (x < 0 || x > MaxSize)
The comparison is performed by the CMP processor instruction (Compare). You'll want to take a look at Agner Fog's instruction tables document (PDF), it list the cost of instructions. Find your processor back in the list, then locate the CMP instruction.
For mine, Haswell, CMP takes 1 cycle of latency and 0.25 cycles of throughput.
A fractional cost like that could use an explanation, Haswell has 4 integer execution units that can execute instructions at the same time. When a program contains enough integer operations, like CMP, without an interdependency then they can all execute at the same time. In effect making the program 4 times faster. You don't always manage to keep all 4 of them busy at the same time with your code, it is actually pretty rare. But you do keep 2 of them busy in this case. Or in other words, two comparisons take just as long as single one, 1 cycle.
There are other factors at play that make the execution time identical. One thing helps is that the processor can predict the branch very well, it can speculatively execute x > MaxSize in spite of the short-circuit evaluation. And it will in fact end up using the result since the branch is never taken.
And the true bottleneck in this code is the array indexing, accessing memory is one of the slowest thing the processor can do. So the "fast" version of the code isn't faster even though it provides more opportunity to allow the processor to concurrently execute instructions. It isn't much of an opportunity today anyway, a processor has too many execution units to keep busy. Otherwise the feature that makes HyperThreading work. In both cases the processor bogs down at the same rate.
On my machine, I have to write code that occupies more than 4 engines to make it slower. Silly code like this:
if (x < 0 || x > MaxSize || x > 10000000 || x > 20000000 || x > 3000000) {
outOfRange++;
}
else {
inRange++;
}
Using 5 compares, now I can a difference, 61 vs 47 msec. Or in other words, this is a way to count the number of integer engines in the processor. Hehe :)
So this is a micro-optimization that probably used to pay off a decade ago. It doesn't anymore. Scratch it off your list of things to worry about :)
I would suggest attempting code which does not throw an exception when the index is out of range. Exceptions are incredibly expensive and can completely throw off your bench results.
The code below does a timed-average bench for 1,000 iterations of 1,000,000 results.
using System;
using System.Diagnostics;
namespace BenchTest
{
class Program
{
const int LoopCount = 1000000;
const int AverageCount = 1000;
static void Main(string[] args)
{
Console.WriteLine("Starting Benchmark");
RunTest();
Console.WriteLine("Finished Benchmark");
Console.Write("Press any key to exit...");
Console.ReadKey();
}
static void RunTest()
{
int cursorRow = Console.CursorTop; int cursorCol = Console.CursorLeft;
long totalTime1 = 0; long totalTime2 = 0;
long invalidOperationCount1 = 0; long invalidOperationCount2 = 0;
for (int i = 0; i < AverageCount; i++)
{
Console.SetCursorPosition(cursorCol, cursorRow);
Console.WriteLine("Running iteration: {0}/{1}", i + 1, AverageCount);
int[] indexArgs = RandomFill(LoopCount, int.MinValue, int.MaxValue);
int[] sizeArgs = RandomFill(LoopCount, 0, int.MaxValue);
totalTime1 += RunLoop(TestMethod1, indexArgs, sizeArgs, ref invalidOperationCount1);
totalTime2 += RunLoop(TestMethod2, indexArgs, sizeArgs, ref invalidOperationCount2);
}
PrintResult("Test 1", TimeSpan.FromTicks(totalTime1 / AverageCount), invalidOperationCount1);
PrintResult("Test 2", TimeSpan.FromTicks(totalTime2 / AverageCount), invalidOperationCount2);
}
static void PrintResult(string testName, TimeSpan averageTime, long invalidOperationCount)
{
Console.WriteLine(testName);
Console.WriteLine(" Average Time: {0}", averageTime);
Console.WriteLine(" Invalid Operations: {0} ({1})", invalidOperationCount, (invalidOperationCount / (double)(AverageCount * LoopCount)).ToString("P3"));
}
static long RunLoop(Func<int, int, int> testMethod, int[] indexArgs, int[] sizeArgs, ref long invalidOperationCount)
{
Stopwatch sw = new Stopwatch();
Console.Write("Running {0} sub-iterations", LoopCount);
sw.Start();
long startTickCount = sw.ElapsedTicks;
for (int i = 0; i < LoopCount; i++)
{
invalidOperationCount += testMethod(indexArgs[i], sizeArgs[i]);
}
sw.Stop();
long stopTickCount = sw.ElapsedTicks;
long elapsedTickCount = stopTickCount - startTickCount;
Console.WriteLine(" - Time Taken: {0}", new TimeSpan(elapsedTickCount));
return elapsedTickCount;
}
static int[] RandomFill(int size, int minValue, int maxValue)
{
int[] randomArray = new int[size];
Random rng = new Random();
for (int i = 0; i < size; i++)
{
randomArray[i] = rng.Next(minValue, maxValue);
}
return randomArray;
}
static int TestMethod1(int index, int size)
{
return (index < 0 || index >= size) ? 1 : 0;
}
static int TestMethod2(int index, int size)
{
return ((uint)(index) >= (uint)(size)) ? 1 : 0;
}
}
}
You aren't comparing like with like.
The code you were talking about not only saved one branch by using the optimisation, but also 4 bytes of CIL in a small method.
In a small method 4 bytes can be the difference in being inlined and not being inlined.
And if the method calling that method is also written to be small, then that can mean two (or more) method calls are jitted as one piece of inline code.
And maybe some of it is then, because it is inline and available for analysis by the jitter, optimised further again.
The real difference is not between index < 0 || index >= _size and (uint)index >= (uint)_size, but between code that has repeated efforts to minimise the method body size and code that does not. Look for example at how another method is used to throw the exception if necessary, further shaving off a couple of bytes of CIL.
(And no, that's not to say that I think all methods should be written like that, but there certainly can be performance differences when one does).
I can't figure out a discrepancy between the time it takes for the Contains method to find an element in an ArrayList and the time it takes for a small function that I wrote to do the same thing. The documentation states that Contains performs a linear search, so it's supposed to be in O(n) and not any other faster method. However, while the exact values may not be relevant, the Contains method returns in 00:00:00.1087087 seconds while my function takes 00:00:00.1876165. It might not be much, but this difference becomes more evident when dealing with even larger arrays. What am I missing and how should I write my function to match Contains's performances?
I'm using C# on .NET 3.5.
public partial class Window1 : Window
{
public bool DoesContain(ArrayList list, object element)
{
for (int i = 0; i < list.Count; i++)
if (list[i].Equals(element)) return true;
return false;
}
public Window1()
{
InitializeComponent();
ArrayList list = new ArrayList();
for (int i = 0; i < 10000000; i++) list.Add("zzz " + i);
Stopwatch sw = new Stopwatch();
sw.Start();
//Console.Out.WriteLine(list.Contains("zzz 9000000") + " " + sw.Elapsed);
Console.Out.WriteLine(DoesContain(list, "zzz 9000000") + " " + sw.Elapsed);
}
}
EDIT:
Okay, now, lads, look:
public partial class Window1 : Window
{
public bool DoesContain(ArrayList list, object element)
{
int count = list.Count;
for (int i = count - 1; i >= 0; i--)
if (element.Equals(list[i])) return true;
return false;
}
public bool DoesContain1(ArrayList list, object element)
{
int count = list.Count;
for (int i = 0; i < count; i++)
if (element.Equals(list[i])) return true;
return false;
}
public Window1()
{
InitializeComponent();
ArrayList list = new ArrayList();
for (int i = 0; i < 10000000; i++) list.Add("zzz " + i);
Stopwatch sw = new Stopwatch();
long total = 0;
int nr = 100;
for (int i = 0; i < nr; i++)
{
sw.Reset();
sw.Start();
DoesContain(list,"zzz");
total += sw.ElapsedMilliseconds;
}
Console.Out.WriteLine(total / nr);
total = 0;
for (int i = 0; i < nr; i++)
{
sw.Reset();
sw.Start();
DoesContain1(list, "zzz");
total += sw.ElapsedMilliseconds;
}
Console.Out.WriteLine(total / nr);
total = 0;
for (int i = 0; i < nr; i++)
{
sw.Reset();
sw.Start();
list.Contains("zzz");
total += sw.ElapsedMilliseconds;
}
Console.Out.WriteLine(total / nr);
}
}
I made an average of 100 running times for two versions of my function(forward and backward loop) and for the default Contains function. The times I've got are 136 and
133 milliseconds for my functions and a distant winner of 87 for the Contains version. Well now, if before you could argue that the data was scarce and I based my conclusions on a first, isolated run, what do you say about this test? Not does only on average Contains perform better, but it achieves consistently better results in each run. So, is there some kind of disadvantage in here for 3rd party functions, or what?
First, you're not running it many times and comparing averages.
Second, your method isn't being jitted until it actually runs. So the just in time compile time is added into its execution time.
A true test would run each multiple times and average the results (any number of things could cause one or the other to be slower for run X out of a total of Y), and your assemblies should be pre-jitted using ngen.exe.
As you're using .NET 3.5, why are you using ArrayList to start with, rather than List<string>?
A few things to try:
You could see whether using foreach instead of a for loop helps
You could cache the count:
public bool DoesContain(ArrayList list, object element)
{
int count = list.Count;
for (int i = 0; i < count; i++)
{
if (list[i].Equals(element))
{
return true;
}
return false;
}
}
You could reverse the comparison:
if (element.Equals(list[i]))
While I don't expect any of these to make a significant (positive) difference, they're the next things I'd try.
Do you need to do this containment test more than once? If so, you might want to build a HashSet<T> and use that repeatedly.
I'm not sure if you're allowed to post Reflector code, but if you open the method using Reflector, you can see that's it's essentially the same (there are some optimizations for null values, but your test harness doesn't include nulls).
The only difference that I can see is that calling list[i] does bounds checking on i whereas the Contains method does not.
Using the code below I was able to get the following timings relatively consitently (within a few ms):
1: 190ms DoesContainRev
2: 198ms DoesContainRev1
3: 188ms DoesContainFwd
4: 203ms DoesContainFwd1
5: 199ms Contains
Several things to notice here.
This is run with release compiled code from the commandline. Many people make the mistake of benchmarking code inside the Visual Studio debugging environment, not to say anyone here did but something to be careful of.
The list[i].Equals(element) appears to be just a bit slower than element.Equals(list[i]).
using System;
using System.Diagnostics;
using System.Collections;
namespace ArrayListBenchmark
{
class Program
{
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
const int arrayCount = 10000000;
ArrayList list = new ArrayList(arrayCount);
for (int i = 0; i < arrayCount; i++) list.Add("zzz " + i);
sw.Start();
DoesContainRev(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("1: {0}", sw.ElapsedMilliseconds));
sw.Reset();
sw.Start();
DoesContainRev1(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("2: {0}", sw.ElapsedMilliseconds));
sw.Reset();
sw.Start();
DoesContainFwd(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("3: {0}", sw.ElapsedMilliseconds));
sw.Reset();
sw.Start();
DoesContainFwd1(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("4: {0}", sw.ElapsedMilliseconds));
sw.Reset();
sw.Start();
list.Contains("zzz");
sw.Stop();
Console.WriteLine(String.Format("5: {0}", sw.ElapsedMilliseconds));
sw.Reset();
Console.ReadKey();
}
public static bool DoesContainRev(ArrayList list, object element)
{
int count = list.Count;
for (int i = count - 1; i >= 0; i--)
if (element.Equals(list[i])) return true;
return false;
}
public static bool DoesContainFwd(ArrayList list, object element)
{
int count = list.Count;
for (int i = 0; i < count; i++)
if (element.Equals(list[i])) return true;
return false;
}
public static bool DoesContainRev1(ArrayList list, object element)
{
int count = list.Count;
for (int i = count - 1; i >= 0; i--)
if (list[i].Equals(element)) return true;
return false;
}
public static bool DoesContainFwd1(ArrayList list, object element)
{
int count = list.Count;
for (int i = 0; i < count; i++)
if (list[i].Equals(element)) return true;
return false;
}
}
}
With a really good optimizer there should not be difference at all, because the semantics seems to be the same. However the existing optimizer can optimize your function not so good as the hardcoded Contains is optimized. Some of the points for optimization:
comparing to a property each time can be slower than counting downwards and comparing against 0
function call itself has its performance penalty
using iterators instead of explicit indexing can be faster (foreach loop instead of plain for)
First, if you are using types you know ahead of time, I'd suggest using generics. So List instead of ArrayList. Underneath the hood, ArrayList.Contains actually does a bit more than what you are doing. The following is from reflector:
public virtual bool Contains(object item)
{
if (item == null)
{
for (int j = 0; j < this._size; j++)
{
if (this._items[j] == null)
{
return true;
}
}
return false;
}
for (int i = 0; i < this._size; i++)
{
if ((this._items[i] != null) && this._items[i].Equals(item))
{
return true;
}
}
return false;
}
Notice that it forks itself on being passed a null value for item. However, since all the values in your example are not null, the additional check on null at the beginning and in the second loop should in theory take longer.
Are you positive you are dealing with fully compiled code? I.e., when your code runs the first time it gets JIT compiled where as the framework is obviously already compiled.
After your Edit, I copied the code and made a few improvements to it.
The difference was not reproducable, it turns out to be a measuring/rounding issue.
To see that, change your runs to this form:
sw.Reset();
sw.Start();
for (int i = 0; i < nr; i++)
{
DoesContain(list,"zzz");
}
total += sw.ElapsedMilliseconds;
Console.WriteLine(total / nr);
I just moved some lines. The JIT issue was insignificant with this numbr of repetitions.
My guess would be that ArrayList is written in C++ and could be taking advantage of some micro-optimizations (note: this is a guess).
For instance, in C++ you can use pointer arithmetic (specifically incrementing a pointer to iterate an array) to be faster than using an index.
using an array structure, you can't search faster than O(n) whithout any additional information.
if you know that the array is sorted, then you can use binary search algorithm and spent only o(log(n))
otherwise you should use a set.
Revised after reading comments:
It does not use some Hash-alogorithm to enable fast lookup.
Use SortedList<TKey,TValue>, Dictionary<TKey, TValue> or System.Collections.ObjectModel.KeyedCollection<TKey, TValue> for fast access based on a key.
var list = new List<myObject>(); // Search is sequential
var dictionary = new Dictionary<myObject, myObject>(); // key based lookup, but no sequential lookup, Contains fast
var sortedList = new SortedList<myObject, myObject>(); // key based and sequential lookup, Contains fast
KeyedCollection<TKey, TValue> is also fast and allows indexed lookup, however, it needs to be inherited as it is abstract. Therefore, you need a specific collection. However, with the following you can create a generic KeyedCollection.
public class GenericKeyedCollection<TKey, TValue> : KeyedCollection<TKey, TValue> {
public GenericKeyedCollection(Func<TValue, TKey> keyExtractor) {
this.keyExtractor = keyExtractor;
}
private Func<TValue, TKey> keyExtractor;
protected override TKey GetKeyForItem(TValue value) {
return this.keyExtractor(value);
}
}
The advantage of using the KeyedCollection is that the Add method does not require that a key is specified.