Iterating a dictionary in C# - c#

var dict = new Dictionary<int, string>();
for (int i = 0; i < 200000; i++)
dict[i] = "test " + i;
I iterated this dictionary using the code below:
foreach (var pair in dict)
Console.WriteLine(pair.Value);
Then, I iterated it using this:
foreach (var key in dict.Keys)
Console.WriteLine(dict[key]);
And the second iteration took ~3 seconds less.
I can get both keys and values via both methods. What I wonder is whether the second approach has a drawback. Since the most rated question that I can find about this doesn't include this way of iterating a dictionary, I wanted to know why no one uses it and how does it work faster.

Your time tests have some fundamental flaws:
Console.Writeline is an I/O operation, which takes orders of magnitude more time than memory accesses and CPU calculations. Any difference in iteration times is probably being dwarfed by the cost of this operation. It's like measuring the weights of pennies in a cast-iron stove.
You don't mention how long the overall operation took, so saying that one took 3 seconds less than another is meaningless. If it took 300 seconds to run the first, and 303 seconds to run the second, then you are micro-optimizing.
You don't mention how you measured running time. Did running time include the time getting the program assembly loaded and bootstrapped?
You don't mention repeatability: Did you run these operations several times? Several hundred times? In different orders?
Here are my tests. Note how I try my best to ensure that the method of iteration is the only thing that changes, and I include a control to see how much of the time is taken up purely because of a for loop and assignment:
void Main()
{
// Insert code here to set up your test: anything that you don't want to include as
// part of the timed tests.
var dict = new Dictionary<int, string>();
for (int i = 0; i < 2000; i++)
dict[i] = "test " + i;
string s = null;
var actions = new[]
{
new TimedAction("control", () =>
{
for (int i = 0; i < 2000; i++)
s = "hi";
}),
new TimedAction("first", () =>
{
foreach (var pair in dict)
s = pair.Value;
}),
new TimedAction("second", () =>
{
foreach (var key in dict.Keys)
s = dict[key];
})
};
TimeActions(100, // change this number as desired.
actions);
}
#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
Stopwatch s = new Stopwatch();
foreach(var action in actions)
{
var milliseconds = s.Time(action.Action, iterations);
Console.WriteLine("{0}: {1}ms ", action.Message, milliseconds);
}
}
public class TimedAction
{
public TimedAction(string message, Action action)
{
Message = message;
Action = action;
}
public string Message {get;private set;}
public Action Action {get;private set;}
}
public static class StopwatchExtensions
{
public static double Time(this Stopwatch sw, Action action, int iterations)
{
sw.Restart();
for (int i = 0; i < iterations; i++)
{
action();
}
sw.Stop();
return sw.Elapsed.TotalMilliseconds;
}
}
#endregion
Result
control: 1.2173ms
first: 9.0233ms
second: 18.1301ms
So in these tests, using the indexer takes roughly twice as long as iterating key-value pairs, which is what I would expect*. This stays roughly proportionate if I increase the number of entries and the number of repetitions by an order of magnitude, and I get the same results if I run the two tests in reverse order.
* Why would I expect this result? The Dictionary class probably represents its entries as KeyValuePairs internally, so all it really has to do when you iterate it directly is walk through its data structure once, handing the caller each entry as it comes to it. If you iterate just the Keys, it still has to find each KeyValuePair, and give you the value of the Key property from it, so that step alone is going to cost roughly the same amount as iterating across it in the first place. Then you have to call the indexer, which has to calculate a hash for provided key, jump to the correct hashtable bucket, and do an equality check on the keys of any KeyValuePairs it finds there. These operations aren't terribly expensive, but once you do them N times, it's roughly as expensive as if you'd iterated over the internal hashtable structure again.

Related

Cached arrays for subcomputation results in Parallel.For/Foreach

I have a method that calculates some arrays in each iteration, then sums them (adding each array index-wise) and returns the array of sums.
I'm looking for a performant parallel method of doing so. In essence I want to avoid creating arrays constantly and reuse them instead.
Example:
private void Computation(float[] result)
{
var _lock = new object();
Parallel.For(0, 50, x =>
{
var subResult = new float[200*1000];
CalculateSubResult(subResult);
lock(_lock)
for (int i = 0; i < subResult.Length; i++)
result[i] += subResult[i];
});
}
The problem is that these Parallel.For's are nested (because CalculateSubResult also does similar loop). This leads to constant creating and discarding arrays which are used to hold subcomputation results.
To Get around this, im using ThreadLocal to store cached arrays:
ThreadLocal<float[]> threadLocalArray = new ThreadLocal<float[]>(()=>new float[200*1000]);
private void ComputationWithThreadLocal(float[] result)
{
var _lock = new object();
Parallel.For(0, 50, x =>
{
var subResult = threadLocalArray.Value;
CalculateSubResult(subResult);
lock (_lock)
for (int i = 0; i < subResult.Length; i++)
result[i] += subResult[i];
Array.Clear(subResult,0,subResult.Length);
});
}
There are two problems with this approach:
Over time (and this is a part of a long running computation) threadLocalArray stores more and more arrays, even if there are only few logical cores, which leads to a memory leak. I cannot call threadLocalArray.Dispose() until the computation is finished, so there is no way to prevent it from growing.
Summing subcomputation results requires lock and is wasteful. I would like to have a local per-thread sum, and only after thread finishes its partitioned work, the summation should occur.
There is an overload method which deals with 2.
public static ParallelLoopResult For<TLocal>(
int fromInclusive,
int toExclusive,
Func<TLocal> localInit,
Func<int, ParallelLoopState, TLocal, TLocal> body,
Action<TLocal> localFinally
)
The Func<TLocal> localInit would just init a subResult array like so: ()=>new float[200*1000]. In that case the Action<TLocal> localFinally executes only one per partition, which is more efficient. But the problem here is that localInit cannot use ()=>threadLocalArray.Value to get thread local array, because if Parallel.For loops are recursively nested then the same array is used for subcomputation at diffrent nest levels, leading to incorrect results.
I think I'm looking for something like this (CachedArrayProvider is a class that needs to manage reusing arrays):
Parallel.For(0,50,
()=>CachedArrayProvider.GetArray(),
(i, localArray) => {/*body*/},
(localArray) =>
{
//do final sum here
CachedArrayProvider.ReturnArray(localArray);
}
);
Any ideas?

What is the most efficient loop in c#

There are a number of different way to accomplish the same simple loop though the items of an object in c#.
This has made me wonder if there is any reason be it performance or ease of use, as to use on over the other. Or is it just down to personal preference.
Take a simple object
var myList = List<MyObject>;
Lets assume the object is filled and we want to iterate over the items.
Method 1.
foreach(var item in myList)
{
//Do stuff
}
Method 2
myList.Foreach(ml =>
{
//Do stuff
});
Method 3
while (myList.MoveNext())
{
//Do stuff
}
Method 4
for (int i = 0; i < myList.Count; i++)
{
//Do stuff
}
What I was wondering is do each of these compiled down to the same thing? is there a clear performance advantage for using one over the others?
or is this just down to personal preference when coding?
Have I missed any?
The answer the majority of the time is it does not matter. The number of items in the loop (even what one might consider a "large" number of items, say in the thousands) isn't going to have an impact on the code.
Of course, if you identify this as a bottleneck in your situation, by all means, address it, but you have to identify the bottleneck first.
That said, there are a number of things to take into consideration with each approach, which I'll outline here.
Let's define a few things first:
All of the tests were run on .NET 4.0 on a 32-bit processor.
TimeSpan.TicksPerSecond on my machine = 10,000,000
All tests were performed in separate unit test sessions, not in the same one (so as not to possibly interfere with garbage collections, etc.)
Here's some helpers that are needed for each test:
The MyObject class:
public class MyObject
{
public int IntValue { get; set; }
public double DoubleValue { get; set; }
}
A method to create a List<T> of any length of MyClass instances:
public static List<MyObject> CreateList(int items)
{
// Validate parmaeters.
if (items < 0)
throw new ArgumentOutOfRangeException("items", items,
"The items parameter must be a non-negative value.");
// Return the items in a list.
return Enumerable.Range(0, items).
Select(i => new MyObject { IntValue = i, DoubleValue = i }).
ToList();
}
An action to perform for each item in the list (needed because Method 2 uses a delegate, and a call needs to be made to something to measure impact):
public static void MyObjectAction(MyObject obj, TextWriter writer)
{
// Validate parameters.
Debug.Assert(obj != null);
Debug.Assert(writer != null);
// Write.
writer.WriteLine("MyObject.IntValue: {0}, MyObject.DoubleValue: {1}",
obj.IntValue, obj.DoubleValue);
}
A method to create a TextWriter which writes to a null Stream (basically a data sink):
public static TextWriter CreateNullTextWriter()
{
// Create a stream writer off a null stream.
return new StreamWriter(Stream.Null);
}
And let's fix the number of items at one million (1,000,000, which should be sufficiently high to enforce that generally, these all have about the same performance impact):
// The number of items to test.
public const int ItemsToTest = 1000000;
Let's get into the methods:
Method 1: foreach
The following code:
foreach(var item in myList)
{
//Do stuff
}
Compiles down into the following:
using (var enumerable = myList.GetEnumerable())
while (enumerable.MoveNext())
{
var item = enumerable.Current;
// Do stuff.
}
There's quite a bit going on there. You have the method calls (and it may or may not be against the IEnumerator<T> or IEnumerator interfaces, as the compiler respects duck-typing in this case) and your // Do stuff is hoisted into that while structure.
Here's the test to measure the performance:
[TestMethod]
public void TestForEachKeyword()
{
// Create the list.
List<MyObject> list = CreateList(ItemsToTest);
// Create the writer.
using (TextWriter writer = CreateNullTextWriter())
{
// Create the stopwatch.
Stopwatch s = Stopwatch.StartNew();
// Cycle through the items.
foreach (var item in list)
{
// Write the values.
MyObjectAction(item, writer);
}
// Write out the number of ticks.
Debug.WriteLine("Foreach loop ticks: {0}", s.ElapsedTicks);
}
}
The output:
Foreach loop ticks: 3210872841
Method 2: .ForEach method on List<T>
The code for the .ForEach method on List<T> looks something like this:
public void ForEach(Action<T> action)
{
// Error handling omitted
// Cycle through the items, perform action.
for (int index = 0; index < Count; ++index)
{
// Perform action.
action(this[index]);
}
}
Note that this is functionally equivalent to Method 4, with one exception, the code that is hoisted into the for loop is passed as a delegate. This requires a dereference to get to the code that needs to be executed. While the performance of delegates has improved from .NET 3.0 on, that overhead is there.
However, it's negligible. The test to measure the performance:
[TestMethod]
public void TestForEachMethod()
{
// Create the list.
List<MyObject> list = CreateList(ItemsToTest);
// Create the writer.
using (TextWriter writer = CreateNullTextWriter())
{
// Create the stopwatch.
Stopwatch s = Stopwatch.StartNew();
// Cycle through the items.
list.ForEach(i => MyObjectAction(i, writer));
// Write out the number of ticks.
Debug.WriteLine("ForEach method ticks: {0}", s.ElapsedTicks);
}
}
The output:
ForEach method ticks: 3135132204
That's actually ~7.5 seconds faster than using the foreach loop. Not completely surprising, given that it uses direct array access instead of using IEnumerable<T>.
Remember though, this translates to 0.0000075740637 seconds per item being saved. That's not worth it for small lists of items.
Method 3: while (myList.MoveNext())
As shown in Method 1, this is exactly what the compiler does (with the addition of the using statement, which is good practice). You're not gaining anything here by unwinding the code yourself that the compiler would otherwise generate.
For kicks, let's do it anyways:
[TestMethod]
public void TestEnumerator()
{
// Create the list.
List<MyObject> list = CreateList(ItemsToTest);
// Create the writer.
using (TextWriter writer = CreateNullTextWriter())
// Get the enumerator.
using (IEnumerator<MyObject> enumerator = list.GetEnumerator())
{
// Create the stopwatch.
Stopwatch s = Stopwatch.StartNew();
// Cycle through the items.
while (enumerator.MoveNext())
{
// Write.
MyObjectAction(enumerator.Current, writer);
}
// Write out the number of ticks.
Debug.WriteLine("Enumerator loop ticks: {0}", s.ElapsedTicks);
}
}
The output:
Enumerator loop ticks: 3241289895
Method 4: for
In this particular case, you're going to gain some speed, as the list indexer is going directly to the underlying array to perform the lookup (that's an implementation detail, BTW, there's nothing to say that it can't be a tree structure backing the List<T> up).
[TestMethod]
public void TestListIndexer()
{
// Create the list.
List<MyObject> list = CreateList(ItemsToTest);
// Create the writer.
using (TextWriter writer = CreateNullTextWriter())
{
// Create the stopwatch.
Stopwatch s = Stopwatch.StartNew();
// Cycle by index.
for (int i = 0; i < list.Count; ++i)
{
// Get the item.
MyObject item = list[i];
// Perform the action.
MyObjectAction(item, writer);
}
// Write out the number of ticks.
Debug.WriteLine("List indexer loop ticks: {0}", s.ElapsedTicks);
}
}
The output:
List indexer loop ticks: 3039649305
However the place where this can make a difference is arrays. Arrays can be unwound by the compiler to process multiple items at a time.
Instead of doing ten iterations of one item in a ten item loop, the compiler can unwind this into five iterations of two items in a ten item loop.
However, I'm not positive here that this is actually happening (I have to look at the IL and the output of the compiled IL).
Here's the test:
[TestMethod]
public void TestArray()
{
// Create the list.
MyObject[] array = CreateList(ItemsToTest).ToArray();
// Create the writer.
using (TextWriter writer = CreateNullTextWriter())
{
// Create the stopwatch.
Stopwatch s = Stopwatch.StartNew();
// Cycle by index.
for (int i = 0; i < array.Length; ++i)
{
// Get the item.
MyObject item = array[i];
// Perform the action.
MyObjectAction(item, writer);
}
// Write out the number of ticks.
Debug.WriteLine("Enumerator loop ticks: {0}", s.ElapsedTicks);
}
}
The output:
Array loop ticks: 3102911316
It should be noted that out-of-the box, Resharper offers a suggestion with a refactoring to change the above for statements to foreach statements. That's not to say this is right, but the basis is to reduce the amount of technical debt in code.
TL;DR
You really shouldn't be concerned with the performance of these things, unless testing in your situation shows that you have a real bottleneck (and you'll have to have massive numbers of items to have an impact).
Generally, you should go for what's most maintainable, in which case, Method 1 (foreach) is the way to go.
In regards to the final bit of the question, "Did I miss any?" Yes, and I feel I would be remiss to not mention this even though the question is quite old. While those four ways of doing it will execute in relatively the same amount of time, there is a way not shown above that runs faster than all of them. Quite significantly, in fact, as the number of items in the iterated list increases. It would be the exact same way as the last method but instead of getting .Count in the condition check of the loop, you assign this value to a variable before setting up the loop and use that instead. Which leaves you with something like this:
var countVar = list.Count;
for(int i = 0; i < countVar; i++)
{
//loop logic
}
By doing it this way, you're only looking up a variable value at each iteration, rather than resolving the Count or Length properties, which is considerably less efficient.
I would suggest an even better and not well-known approach for faster loop iteration over a list. I would recommend you to first read about Span<T>. Note that you can use it if you are using .NET Core.
List<MyObject> list = new();
foreach (MyObject item in CollectionsMarshal.AsSpan(list))
{
// Do something
}
Be aware of the caveats:
The CollectionsMarshal.AsSpan method is unsafe and should be used only if you know what you're doing. CollectionsMarshal.AsSpan returns a Span<T> on the private array of List<T>. Iterating over a Span<T> is fast as the JIT uses the same tricks as for optimizing arrays. Using this method, it won't check the list is not modified during the enumeration.
This is a more detailed explanation of what it does behind the scenes and more, super interesting!

Dictionary.Count performance

This question seems to be nonsense. The behaviour cannot be reproduced reliably.
Comparing the following test programs, I observed a huge performance difference between the first and the second of the following examples (the first example is by factor ten slower than the second):
First example (slow):
interface IWrappedDict {
int Number { get; }
void AddSomething (string k, string v);
}
class WrappedDict : IWrappedDict {
private Dictionary<string, string> dict = new Dictionary<string,string> ();
public void AddSomething (string k, string v) {
dict.Add (k, v);
}
public int Number { get { return dict.Count; } }
}
class TestClass {
private IWrappedDict wrappedDict;
public TestClass (IWrappedDict theWrappedDict) {
wrappedDict = theWrappedDict;
}
public void DoSomething () {
// this function does the performance test
for (int i = 0; i < 1000000; ++i) {
var c = wrappedDict.Number; wrappedDict.AddSomething (...);
}
}
}
Second example (fast):
// IWrappedDict as above
class WrappedDict : IWrappedDict {
private Dictionary<string, string> dict = new Dictionary<string,string> ();
private int c = 0;
public void AddSomething (string k, string v) {
dict.Add (k, v); ++ c;
}
public int Number { get { return c; } }
}
// rest as above
Funnily, the difference vanishes (the first example gets fast as well) if I change the type of the member variable TestClass.wrappedDict from IWrappedDict to WrappedDict. My interpretation of this is that Dictionary.Count re-counts the elements every time it is accessed and that potential caching of the number of elements is done by compiler optimization only.
Can anybody confirm this? Is there any way to get the number of elements in a Dictionary in a performant way?
No, Dictionary.Count does not recount the elements every time it's used. The dictionary maintains a count, and should be as fast as your second version.
I suspect that in your test of the second example, you already had WrappedDict instead of IWrappedDict, and this is actually about interface member access (which is always virtual) and the JIT compiling inlining calls to the property when it knows the concrete type.
If you still believe Count is the problem, you should be able to edit your question to show a short but complete program which demonstrates both the fast and slow versions, including how you're timing it all.
Sound like your timing is off; I get:
#1: 330ms
#2: 335ms
when running the following in release mode, outside of the IDE:
public void DoSomething(int count) {
// this function does the performance test
for (int i = 0; i < count; ++i) {
var c = wrappedDict.Number; wrappedDict.AddSomething(i.ToString(), "a");
}
}
static void Execute(int count, bool show)
{
var obj1 = new TestClass(new WrappedDict1());
var obj2 = new TestClass(new WrappedDict2());
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
var watch = Stopwatch.StartNew();
obj1.DoSomething(count);
watch.Stop();
if(show) Console.WriteLine("#1: {0}ms", watch.ElapsedMilliseconds);
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
watch = Stopwatch.StartNew();
obj2.DoSomething(count);
watch.Stop();
if(show) Console.WriteLine("#2: {0}ms", watch.ElapsedMilliseconds);
}
static void Main()
{
Execute(1, false); // for JIT
Execute(1000000, true); // for measuring
}
Basically: "cannot reproduce". Also: for completeness, no: .Count does not count all the items (it already knows the count), nor does the compiler add any magic automatic caching code (note: there are a few limited examples of things like that; for example, the JIT can remove bounds-checking on a for loop over a vector).
No, a dictionary or hashtable never iterates the entries to determine the length.
It will (or should) always keep track of the number of entries.
Thus, the time complexity is O(1).

Containskey VS Try Catch

I have a list of Vector2's Generated I have to check against a dictionary to see if they exist, this function gets executed every tick.
which would run fastest/ be better to do it this way?
public static bool exists(Vector2 Position, Dictionary<Vector2, object> ToCheck)
{
try
{
object Test = ToCheck[Position];
return (true);
}
catch
{
return (false);
}
}
Or should I stick with The norm ?
public static bool exists(Vector2 Position, Dictionary<Vector2, object> ToCheck)
{
if (ToCheck.ContainsKey(Position))
{
return (true);
}
return (false);
}
Thanks for the input :)
Side Note: (The Value for the key doesn't matter at this point or i would use TryGetValue instead of ContainsKey)
I know it's an old question, but just to add a bit of empirical data...
Running 50,000,000 look-ups on a dictionary with 10,000 entries and comparing relative times to complete:
..if every look-up is successful:
a straight (unchecked) run takes 1.2 seconds
a guarded (ContainsKey) run takes 2 seconds
a handled (try-catch) run takes 1.21 seconds
..if 1 out of every 10,000 look-ups fail:
a guarded (ContainsKey) run takes 2 seconds
a handled (try-catch) run takes 1.37 seconds
..if 16 out of every 10,000 look-ups fail:
a guarded (ContainsKey) run takes 2 seconds
a handled (try-catch) run takes 3.27 seconds
..if 250 out of every 10,000 look-ups fail:
a guarded (ContainsKey) run takes 2 seconds
a handled (try-catch) run takes 32 seconds
..so a guarded test will add a constant overhead and nothing more, and try-catch test will operate almost as fast as no test if it never fails, but kills performance proportionally to the number of failures.
Code I used to run tests:
using System;
using System.Collections.Generic;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{ Test(0);
Test(1);
Test(16);
Test(250);
}
private static void Test(int failsPerSet)
{ Dictionary<int, bool> items = new Dictionary<int,bool>();
for(int i = 0; i < 10000; i++)
if(i >= failsPerSet)
items[i] = true;
if(failsPerSet == 0)
RawLookup(items, failsPerSet);
GuardedLookup(items, failsPerSet);
CaughtLookup(items, failsPerSet);
}
private static void RawLookup
( Dictionary<int, bool> items
, int failsPerSet
){ int found = 0;
DateTime start ;
Console.Write("Raw (");
Console.Write(failsPerSet);
Console.Write("): ");
start = DateTime.Now;
for(int i = 0; i < 50000000; i++)
{ int pick = i % 10000;
if(items[pick])
found++;
}
Console.WriteLine(DateTime.Now - start);
}
private static void GuardedLookup
( Dictionary<int, bool> items
, int failsPerSet
){ int found = 0;
DateTime start ;
Console.Write("Guarded (");
Console.Write(failsPerSet);
Console.Write("): ");
start = DateTime.Now;
for(int i = 0; i < 50000000; i++)
{ int pick = i % 10000;
if(items.ContainsKey(pick))
if(items[pick])
found++;
}
Console.WriteLine(DateTime.Now - start);
}
private static void CaughtLookup
( Dictionary<int, bool> items
, int failsPerSet
){ int found = 0;
DateTime start ;
Console.Write("Caught (");
Console.Write(failsPerSet);
Console.Write("): ");
start = DateTime.Now;
for(int i = 0; i < 50000000; i++)
{ int pick = i % 10000;
try
{ if(items[pick])
found++;
}
catch
{
}
}
Console.WriteLine(DateTime.Now - start);
}
}
}
Definitely use the ContainsKey check; exception handling can add a large overhead.
Throwing exceptions can negatively impact performance. For code that routinely fails, you can use design patterns to minimize performance issues.
Exceptions are not meant to be used for conditions you can check for.
I recommend reading the MSDN documentation on exceptions generally, and on exception handling in particular.
Never use try/catch as a part of your regular program path. It is really expensive and should only catch errors that you cannot prevent. ContainsKey is the way to go here.
Side Note: No. You would not. If the value matters you check with ContainsKey if it exists and retrieve it, if it does. Not try/catch.
Side Note: (The Value for the key doesn't matter at this point or i would use TryGetValue instead of ContainsKey)
The answer you accepted is correct, but just to add, if you only care about the key and not the value, maybe you're looking for a HashSet rather than a Dictionary?
In addition, your second code snippet is a method which literally adds zero value. Just use ToCheck.ContainsKey(Position), don't make a method which just calls that method and returns its value but does nothing else.

why in this simple test the speed of method relates to the order of triggering?

I was doing other experiments until this strange behaviour caught my eye.
code is compiled in x64 release.
if key in 1, the 3rd run of List method cost 40% more time than the first 2. output is
List costs 9312
List costs 9289
Array costs 12730
List costs 11950
if key in 2, the 3rd run of Array method cost 30% more time than the first 2. output is
Array costs 8082
Array costs 8086
List costs 11937
Array costs 12698
You can see the pattern, the complete code is attached following (just compile and run):
{the code presented is minimal to run the test. The actually code used to get reliable result is more complicated, I wrapped the method and tested it 100+ times after proper warmed up}
class ListArrayLoop
{
readonly int[] myArray;
readonly List<int> myList;
readonly int totalSessions;
public ListArrayLoop(int loopRange, int totalSessions)
{
myArray = new int[loopRange];
for (int i = 0; i < myArray.Length; i++)
{
myArray[i] = i;
}
myList = myArray.ToList();
this.totalSessions = totalSessions;
}
public void ArraySum()
{
var pool = myArray;
long sum = 0;
for (int j = 0; j < totalSessions; j++)
{
sum += pool.Sum();
}
}
public void ListSum()
{
var pool = myList;
long sum = 0;
for (int j = 0; j < totalSessions; j++)
{
sum += pool.Sum();
}
}
}
class Program
{
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
ListArrayLoop test = new ListArrayLoop(10000, 100000);
string input = Console.ReadLine();
if (input == "1")
{
sw.Start();
test.ListSum();
sw.Stop();
Console.WriteLine("List costs {0}",sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ListSum();
sw.Stop();
Console.WriteLine("List costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ArraySum();
sw.Stop();
Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ListSum();
sw.Stop();
Console.WriteLine("List costs {0}", sw.ElapsedMilliseconds);
}
else
{
sw.Start();
test.ArraySum();
sw.Stop();
Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ArraySum();
sw.Stop();
Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ListSum();
sw.Stop();
Console.WriteLine("List costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ArraySum();
sw.Stop();
Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
}
Console.ReadKey();
}
}
Contrived problems give you contrived answers.
Optimization should be done after code is written and not before. Write your solution the way that is easiest to understand and maintain. Then if the program is not fast enough for your use case, then you use a profiling tool and go back and see where the actual bottleneck is, not where you "think" it is.
Most optimizations people try to do in your situation is spending 6 hours to do something that will decrease the run time by 1 second. Most small programs will not be run enough times to offset the cost you spent trying to "optimize" it.
That being said this is a strange edge case. I modified it a bit and am running it though a profiler but I need to downgrade my VS2010 install so I can get .NET framework source stepping back.
I ran though the profiler using the larger example, I can find no good reason why it would take longer.
Your issue is your test. When you are benchmarking code there are a couple of guiding principals you should always follow:
Processor Affinity: Use only a single processor, usually not #1.
Warmup: Always perform the test a small number of times up front.
Duration: Make sure your test duration is at least 500ms.
Average: Average together multiple runs to remove anomalies.
Cleanup: Force the GC to collect allocated objects between tests.
Cooldown: Allow the process to sleep for a short-period of time.
So using these guidelines and rewriting your tests I get the following results:
Run 1
Enter test number (1|2): 1
ListSum averages 776
ListSum averages 753
ArraySum averages 1102
ListSum averages 753
Press any key to continue . . .
Run 2
Enter test number (1|2): 2
ArraySum averages 1155
ArraySum averages 1102
ListSum averages 753
ArraySum averages 1067
Press any key to continue . . .
So here is the final test code used:
static void Main(string[] args)
{
//We just need a single-thread for this test.
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2);
System.Threading.Thread.BeginThreadAffinity();
Console.Write("Enter test number (1|2): ");
string input = Console.ReadLine();
//perform the action just a few times to jit the code.
ListArrayLoop warmup = new ListArrayLoop(10, 10);
Console.WriteLine("Performing warmup...");
Test(warmup.ListSum);
Test(warmup.ArraySum);
Console.WriteLine("Warmup complete...");
Console.WriteLine();
ListArrayLoop test = new ListArrayLoop(10000, 10000);
if (input == "1")
{
Test(test.ListSum);
Test(test.ListSum);
Test(test.ArraySum);
Test(test.ListSum);
}
else
{
Test(test.ArraySum);
Test(test.ArraySum);
Test(test.ListSum);
Test(test.ArraySum);
}
}
private static void Test(Action test)
{
long totalElapsed = 0;
for (int counter = 10; counter > 0; counter--)
{
try
{
var sw = Stopwatch.StartNew();
test();
totalElapsed += sw.ElapsedMilliseconds;
}
finally { }
GC.Collect(0, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
//cooldown
for (int i = 0; i < 100; i++)
System.Threading.Thread.Sleep(0);
}
Console.WriteLine("{0} averages {1}", test.Method.Name, totalElapsed / 10);
}
Note: Some people may debate about the usefulness of the cool-down; However, everyone agrees that even if it's not helpful, it is not harmful. I find that on some tests it can yield a more reliable result; however, in the example above I doubt it makes any difference.
Short answer: It is because CRL has optimization for dispatching methods called on interface-type. As long as particular interface's method call is made on the same type (that implements this interface), CLR uses fast dispatching routine (only 3 instructions) that only checks actual type of instance and in case of match it jumps directly on precomputed address of particular method. But when the same interface's method call is made on instance of another type, CLR switches dispatching to slower routine (which can dispatch methods for any actual instance type).
Long answer:
Firstly, take a look at how the method System.Linq.Enumerable.Sum() is declared (I omitted validity checking of source parameter because it's not important it this case):
public static int Sum(this IEnumerable<int> source)
{
int num = 0;
foreach (int num2 in source)
num += num2;
return num;
}
So all types that implement IEnumerable< int > can call this extension method, including int[] and List< int >. Keyword foreach is just abbreviation for getting enumerator via IEnumerable< T >.GetEnumerator() and iterating through all values. So this method actually does this:
public static int Sum(this IEnumerable<int> source)
{
int num = 0;
IEnumerator<int> Enumerator = source.GetEnumerator();
while(Enumerator.MoveNext())
num += Enumerator.Current;
return num;
}
Now you can clearly see, that method body contains three method calls on interface-type variables: GetEnumerator(), MoveNext(), and Current (although Current is actually property, not method, reading value from property just calls corresponding getter method).
GetEnumerator() typically creates new instance of some auxiliary class, which implements IEnumerator< T > and thus is able to return all values one by one. It is important to note, that in case of int[] and List< int >, types of enumerators returned by GetEnumerator() ot these two classes are different. If argument source is of type int[], then GetEnumerator() returns instance of type SZGenericArrayEnumerator< int > and if source is of type List< int >, then it returns instance of type List< int >+Enumerator< int >.
Two other methods (MoveNext() and Current) are repeatedly called in tight loop and therefore their speed is crucial for overall performance. Unfortunatelly calling method on interface-type variable (such as IEnumerator< int >) is not as straightforward as ordinary instance method call. CLR must dynamically find out actual type of object in variable and then find out, which object's method implements corresponding interface method.
CLR tries to avoid doing this time consuming lookup on every call with a little trick. When particular method (such as MoveNext()) is called for the first time, CLR finds actual type of instance on which this call is made (for example SZGenericArrayEnumerator< int > in case you called Sum on int[]) and finds address of method, that implements corresponding method for this type (that is address of method SZGenericArrayEnumerator< int >.MoveNext()). Then it uses this information to generate auxiliary dispatching method, which simply checks, whether actual instance type is the same as when first call was made (that is SZGenericArrayEnumerator< int >) and if it is, it directly jumps to the method's address found earlier. So on subsequent calls, no complicated method lookup is made as long as type of instance remains the same. But when call is made on enumerator of different type (such as List< int >+Enumerator< int > in case of calculating sum of List< int >), CLR no longer uses this fast dispatching method. Instead another (general-purpose) and much slower dispatching method is used.
So as long as Sum() is called on array only, CLR dispatches calls to GetEnumerator(), MoveNext(), and Current using fast method. When Sum() is called on list too, CLR switches to slower dispatching method and therefore performance decreases.
If performance is your concern, implement your own separate Sum() extension method for every type, on which you want to call Sum(). This ensures that CLR will use fast dispatching method. For example:
public static class FasterSumExtensions
{
public static int Sum(this int[] source)
{
int num = 0;
foreach (int num2 in source)
num += num2;
return num;
}
public static int Sum(this List<int> source)
{
int num = 0;
foreach(int num2 in source)
num += num2;
return num;
}
}
Or even better, avoid using IEnumerable< T > interface at all (because it's still brings noticeable overhead). For example:
public static class EvenFasterSumExtensions
{
public static int Sum(this int[] source)
{
int num = 0;
for(int i = 0; i < source.Length; i++)
num += source[i];
return num;
}
public static int Sum(this List<int> source)
{
int num = 0;
for(int i = 0; i < source.Count; i++)
num += source[i];
return num;
}
}
Here are results from my computer:
Your original program: 9844, 9841, 12545, 14384
FasterSumExtensions: 6149, 6445, 754, 6145
EvenFasterSumExtensions: 1557, 1561, 553, 1574
Way too much for a comment so it's CW -- feel free to incorporate and I'll delete this. The given code is a little off to me but the problem is still interesting. If you mix calls, you get poorer performance. This code highlights it:
static void Main(string[] args)
{
var input = Console.ReadLine();
var test = new ListArrayLoop(10000, 1000);
switch (input)
{
case "1":
Test(test.ListSum);
break;
case "2":
Test(test.ArraySum);
break;
case "3":
// adds about 40 ms
test.ArraySum();
Test(test.ListSum);
break;
default:
// adds about 35 ms
test.ListSum();
Test(test.ArraySum);
break;
}
}
private static void Test(Action toTest)
{
for (int i = 0; i < 100; i++)
{
var sw = Stopwatch.StartNew();
toTest();
sw.Stop();
Console.WriteLine("costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
}
}
Lists are implemented in .NET with arrays so the average performance should be the same (since you do not change the length of either).
It looks like you have averaged the sum()s' times sufficiently, this could be a GC issue with an iterator used in the sum() method.
Hm, it really looks strange...
My guess: You are calling .sum() on the variable pool with var type. As long as you are working on only one type (either list or array), the call to sum() is unambigous and can be optimized. By using a new class var is ambigous and must be resolved, so further calls will cause a performance hit.
I do not have a compiler, so try to load another class which supports sum() and compare the times. If I am right, I would expect again a performance hit, but this time not so much.
In my opinion that's caching (because of read ahead).
First time you access an array, many elements from it get into the cache at once (read ahead). This prefetching mechanism expects the program will be likely accessing memory near theaddress requested.
Further calls already benefit from this (provided the array fit into the cache). When you change the method, the cache is invalidated and you need to get everything again from memory.
so calling: list, array, list, array, list, array
should be slower than: list,list,list,array,array,array
But this is not deterministic from the programmer point of view, as you don't know the state of the cache or other units affecting caching decisions.

Categories

Resources