Dictionary.Count performance - c#

This question seems to be nonsense. The behaviour cannot be reproduced reliably.
Comparing the following test programs, I observed a huge performance difference between the first and the second of the following examples (the first example is by factor ten slower than the second):
First example (slow):
interface IWrappedDict {
int Number { get; }
void AddSomething (string k, string v);
}
class WrappedDict : IWrappedDict {
private Dictionary<string, string> dict = new Dictionary<string,string> ();
public void AddSomething (string k, string v) {
dict.Add (k, v);
}
public int Number { get { return dict.Count; } }
}
class TestClass {
private IWrappedDict wrappedDict;
public TestClass (IWrappedDict theWrappedDict) {
wrappedDict = theWrappedDict;
}
public void DoSomething () {
// this function does the performance test
for (int i = 0; i < 1000000; ++i) {
var c = wrappedDict.Number; wrappedDict.AddSomething (...);
}
}
}
Second example (fast):
// IWrappedDict as above
class WrappedDict : IWrappedDict {
private Dictionary<string, string> dict = new Dictionary<string,string> ();
private int c = 0;
public void AddSomething (string k, string v) {
dict.Add (k, v); ++ c;
}
public int Number { get { return c; } }
}
// rest as above
Funnily, the difference vanishes (the first example gets fast as well) if I change the type of the member variable TestClass.wrappedDict from IWrappedDict to WrappedDict. My interpretation of this is that Dictionary.Count re-counts the elements every time it is accessed and that potential caching of the number of elements is done by compiler optimization only.
Can anybody confirm this? Is there any way to get the number of elements in a Dictionary in a performant way?

No, Dictionary.Count does not recount the elements every time it's used. The dictionary maintains a count, and should be as fast as your second version.
I suspect that in your test of the second example, you already had WrappedDict instead of IWrappedDict, and this is actually about interface member access (which is always virtual) and the JIT compiling inlining calls to the property when it knows the concrete type.
If you still believe Count is the problem, you should be able to edit your question to show a short but complete program which demonstrates both the fast and slow versions, including how you're timing it all.

Sound like your timing is off; I get:
#1: 330ms
#2: 335ms
when running the following in release mode, outside of the IDE:
public void DoSomething(int count) {
// this function does the performance test
for (int i = 0; i < count; ++i) {
var c = wrappedDict.Number; wrappedDict.AddSomething(i.ToString(), "a");
}
}
static void Execute(int count, bool show)
{
var obj1 = new TestClass(new WrappedDict1());
var obj2 = new TestClass(new WrappedDict2());
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
var watch = Stopwatch.StartNew();
obj1.DoSomething(count);
watch.Stop();
if(show) Console.WriteLine("#1: {0}ms", watch.ElapsedMilliseconds);
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
watch = Stopwatch.StartNew();
obj2.DoSomething(count);
watch.Stop();
if(show) Console.WriteLine("#2: {0}ms", watch.ElapsedMilliseconds);
}
static void Main()
{
Execute(1, false); // for JIT
Execute(1000000, true); // for measuring
}
Basically: "cannot reproduce". Also: for completeness, no: .Count does not count all the items (it already knows the count), nor does the compiler add any magic automatic caching code (note: there are a few limited examples of things like that; for example, the JIT can remove bounds-checking on a for loop over a vector).

No, a dictionary or hashtable never iterates the entries to determine the length.
It will (or should) always keep track of the number of entries.
Thus, the time complexity is O(1).

Related

While updating a value in concurrent dictionary is better to lock dictionary or value

I am performing two updates on a value I get from TryGet I would like to know that which of these is better?
Option 1: Locking only out value?
if (HubMemory.AppUsers.TryGetValue(ConID, out OnlineInfo onlineinfo))
{
lock (onlineinfo)
{
onlineinfo.SessionRequestId = 0;
onlineinfo.AudioSessionRequestId = 0;
onlineinfo.VideoSessionRequestId = 0;
}
}
Option 2: Locking whole dictionary?
if (HubMemory.AppUsers.TryGetValue(ConID, out OnlineInfo onlineinfo))
{
lock (HubMemory.AppUsers)
{
onlineinfo.SessionRequestId = 0;
onlineinfo.AudioSessionRequestId = 0;
onlineinfo.VideoSessionRequestId = 0;
}
}
I'm going to suggest something different.
Firstly, you should be storing immutable types in the dictionary to avoid a lot of threading issues. As it is, any code could modify the contents of any items in the dictionary just by retrieving an item from it and changing its properties.
Secondly, ConcurrentDictionary provides the TryUpdate() method to allow you to update values in the dictionary without having to implement explicit locking.
TryUpdate() requires three parameters: The key of the item to update, the updated item and the original item that you got from the dictionary and then updated.
TryUpdate() then checks that the original has NOT been updated by comparing the value currently in the dictionary with the original that you pass to it. Only if it is the SAME does it actually update it with the new value and return true. Otherwise it returns false without updating it.
This allows you to detect and respond appropriately to cases where some other thread has changed the value of the item you're updating while you were updating it. You can either ignore this (in which case, first change gets priority) or try again until you succeed (in which case, last change gets priority). What you do depend on your situation.
Note that this requires that your type implements IEquatable<T>, since that is used by the ConcurrentDictionary to compare values.
Here's a sample console app that demonstrates this:
using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
sealed class Test: IEquatable<Test>
{
public Test(int value1, int value2, int value3)
{
Value1 = value1;
Value2 = value2;
Value3 = value3;
}
public Test(Test other) // Copy ctor.
{
Value1 = other.Value1;
Value2 = other.Value2;
Value3 = other.Value3;
}
public int Value1 { get; }
public int Value2 { get; }
public int Value3 { get; }
#region IEquatable<Test> implementation (generated using Resharper)
public bool Equals(Test other)
{
if (other is null)
return false;
if (ReferenceEquals(this, other))
return true;
return Value1 == other.Value1 && Value2 == other.Value2 && Value2 == other.Value3;
}
public override bool Equals(object obj)
{
return ReferenceEquals(this, obj) || obj is Test other && Equals(other);
}
public override int GetHashCode()
{
unchecked
{
return (Value1 * 397) ^ Value2;
}
}
public static bool operator ==(Test left, Test right)
{
return Equals(left, right);
}
public static bool operator !=(Test left, Test right)
{
return !Equals(left, right);
}
#endregion
}
static class Program
{
static void Main()
{
var dict = new ConcurrentDictionary<int, Test>();
dict.TryAdd(0, new Test(1000, 2000, 3000));
dict.TryAdd(1, new Test(4000, 5000, 6000));
dict.TryAdd(2, new Test(7000, 8000, 9000));
Parallel.Invoke(() => update(dict), () => update(dict));
}
static void update(ConcurrentDictionary<int, Test> dict)
{
for (int i = 0; i < 100000; ++i)
{
for (int attempt = 0 ;; ++attempt)
{
var original = dict[0];
var modified = new Test(original.Value1 + 1, original.Value2 + 1, original.Value3 + 1);
var updatedOk = dict.TryUpdate(1, modified, original);
if (updatedOk) // Updated OK so don't try again.
break; // In some cases you might not care, so you would never try again.
Console.WriteLine($"dict.TryUpdate() returned false in iteration {i} attempt {attempt} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
}
}
}
There's a lot of boilerplate code there to support the IEquatable<T> implementation and also to support the immutability.
Fortunately, C# 9 has introduced the record type which makes immutable types much easier to implement. Here's the same sample console app that uses a record instead. Note that record types are immutable and also implement IEquality<T> for you:
using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;
namespace System.Runtime.CompilerServices // Remove this if compiling with .Net 5
{ // This is to allow earlier versions of .Net to use records.
class IsExternalInit {}
}
namespace Demo
{
record Test(int Value1, int Value2, int Value3);
static class Program
{
static void Main()
{
var dict = new ConcurrentDictionary<int, Test>();
dict.TryAdd(0, new Test(1000, 2000, 3000));
dict.TryAdd(1, new Test(4000, 5000, 6000));
dict.TryAdd(2, new Test(7000, 8000, 9000));
Parallel.Invoke(() => update(dict), () => update(dict));
}
static void update(ConcurrentDictionary<int, Test> dict)
{
for (int i = 0; i < 100000; ++i)
{
for (int attempt = 0 ;; ++attempt)
{
var original = dict[0];
var modified = original with
{
Value1 = original.Value1 + 1,
Value2 = original.Value2 + 1,
Value3 = original.Value3 + 1
};
var updatedOk = dict.TryUpdate(1, modified, original);
if (updatedOk) // Updated OK so don't try again.
break; // In some cases you might not care, so you would never try again.
Console.WriteLine($"dict.TryUpdate() returned false in iteration {i} attempt {attempt} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
}
}
}
Note how much shorter record Test is compared to class Test, even though it provides the same functionality. (Also note that I added class IsExternalInit to allow records to be used with .Net versions prior to .Net 5. If you're using .Net 5, you don't need that.)
Finally, note that you don't need to make your class immutable. The code I posted for the first example will work perfectly well if your class is mutable; it just won't stop other code from breaking things.
Addendum 1:
You may look at the output and wonder why so many retry attempts are made when the TryUpdate() fails. You might expect it to only need to retry a few times (depending on how many threads are concurrently attempting to modify the data). The answer to this is simply that the Console.WriteLine() takes so long that it's much more likely that some other thread changed the value in the dictionary again while we were writing to the console.
We can change the code slightly to only print the number of attempts OUTSIDE the loop like so (modifying the second example):
static void update(ConcurrentDictionary<int, Test> dict)
{
for (int i = 0; i < 100000; ++i)
{
int attempt = 0;
while (true)
{
var original = dict[1];
var modified = original with
{
Value1 = original.Value1 + 1,
Value2 = original.Value2 + 1,
Value3 = original.Value3 + 1
};
var updatedOk = dict.TryUpdate(1, modified, original);
if (updatedOk) // Updated OK so don't try again.
break; // In some cases you might not care, so you would never try again.
++attempt;
}
if (attempt > 0)
Console.WriteLine($"dict.TryUpdate() took {attempt} retries in iteration {i} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
With this change, we see that the number of retry attempts drops significantly. This shows the importance of minimising the amount of time spent in code between TryUpdate() attempts.
Addendum 2:
As noted by Theodor Zoulias below, you could also use ConcurrentDictionary<TKey,TValue>.AddOrUpdate(), as the example below shows. This is probably a better approach, but it is slightly harder to understand:
static void update(ConcurrentDictionary<int, Test> dict)
{
for (int i = 0; i < 100000; ++i)
{
int attempt = 0;
dict.AddOrUpdate(
1, // Key to update.
key => new Test(1, 2, 3), // Create new element; won't actually be called for this example.
(key, existing) => // Update existing element. Key not needed for this example.
{
++attempt;
return existing with
{
Value1 = existing.Value1 + 1,
Value2 = existing.Value2 + 1,
Value3 = existing.Value3 + 1
};
}
);
if (attempt > 1)
Console.WriteLine($"dict.TryUpdate() took {attempt-1} retries in iteration {i} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
If you just need to lock the dictionary value, for instance to make sure the 3 values are set at the same time. Then it doesn't really matter what reference type you lock over, just as long as it is a reference type, it's the same instance, and everything else that needs to read or modify those values are also locked on the same instance.
You can read more on how the Microsoft CLR implementation deals with locking and how and why locks work with a reference types here
Why Do Locks Require Instances In C#?
If you are trying to have internal consistency with the dictionary and the value, that's to say, if you are trying to protect not only the internal consistency of the dictionary and the setting and reading of object in the dictionary. Then the your lock is not appropriate at all.
You would need to place a lock around the entire statement (including the TryGetValue) and every other place where you add to the dictionary or read/modify the value. Once again, the object you lock over is not important, just as long as it's consistent.
Note 1 : it is normal to use a dedicated instance to lock over (i.e. some instantiated object) either statically or an instance member depending on your needs, as there is less chance of you shooting yourself in the foot.
Note 2 : there are a lot more ways that can implement thread safety here, depending on your needs, if you are happy with stale values, whether you need every ounce of performance, and if you have a degree in minimal lock coding and how much effort and innate safety you want to bake in. And that is entirely up to you and your solution.
The first option (locking on the entry of the dictionary) is more efficient because it is unlikely to create significant contention for the lock. For this to happen, two threads should try to update the same entry at the same time. The second option (locking on the entire dictionary) is quite possible to create contention under heavy usage, because two threads will be synchronized even if they try to update different entries concurrently.
The first option is also more in the spirit of using a ConcurrentDictionary<K,V> in the first place. If you are going to lock on the entire dictionary, you might as well use a normal Dictionary<K,V> instead. Regarding this dilemma, you may find this question interesting: When should I use ConcurrentDictionary and Dictionary?

IEnumerable vs List while iterating a collection

My question is basically what's a good programming practice. In case of IEnumerable each item is evaluated at a time where as in case of ToList the whole collection gets iterated before it starts the for loop.
As per below code which function (GetBool1 vs GetBool2) should be used and why.
public class TestListAndEnumerable1
{
public static void Test()
{
GetBool1();
GetBool2();
Console.ReadLine();
}
private static void GetBool1()
{
var list = new List<int> {0,1,2,3,4,5,6,7,8,9};
foreach (var item in list.Where(PrintAndEvaluate))
{
Thread.Sleep(1000);
}
}
private static bool PrintAndEvaluate(int x)
{
Console.WriteLine("Hi from " + x);
return x%2==0;
}
private static void GetBool2()
{
List<int> list = new List<int> { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
foreach (var item in list.Where(PrintAndEvaluate).ToList())
{
Thread.Sleep(1000);
}
}
}
The bahviour of the two loops is different. In the first case the Console will be written to as each item is iterated and evaluated, and a Sleep will occur between each Console.Write.
In the second case the Console Writes will also be evaluated, but these evaluations will all occur before the Sleeps - these occur only when all the PrintAndEvaluate calls have finished.
The second case enumerates the members of the list twice, allocating and fragmenting memory as it does so.
If your question is "which is most efficient" then the answer is the first example, but if you want to know "is there another more efficient method" then just use a loop like;
for(int counter = 0 ; counter <= list.Count; counter ++)
{
if(PrintAndEvaluate(list[counter]))
{
Thread.Sleep(1000);
}
}
This prevents the construction of an instance of an Iterator class so does not contribute to heap fragmentation.
GetBool1 should be used.
The sole difference between the two methods is the presence of the ToList() call. Right?
Let's look at the significance of the ToList call by first reading its docs:
Creates a List<T> from an IEnumerable<T>.
This means that a new list will be created when you call ToList. As you may know, creating a new list takes time and memory.
On the other hand GetBool1 does not have a ToList call, so it does not take as much time to execute.
GetBool1 is the better option. For the option2, even you convert the IEnumerable to List, when you call foreach it called the GetEnumerator again. But the differences are very little. I make a little change of your code to output the execution time:
public static void Test()
{
var list = new List<int>();
for (int i = 0; i < 10000; i++)
{
list.Add(i);
}
GetBool1(list);
GetBool2(list);
GetBool3(list);
Console.ReadLine();
}
private static void GetBool1(List<int> list)
{
System.Diagnostics.Stopwatch watcher = new System.Diagnostics.Stopwatch();
watcher.Start();
foreach (var item in list.Where(PrintAndEvaluate))
{
Thread.Sleep(1);
}
watcher.Stop();
Console.WriteLine("GetBool1 - {0}", watcher.ElapsedMilliseconds);
}
private static bool PrintAndEvaluate(int x)
{
return x % 2 == 0;
}
private static void GetBool2(List<int> list)
{
System.Diagnostics.Stopwatch watcher = new System.Diagnostics.Stopwatch();
watcher.Start();
foreach (var item in list.Where(PrintAndEvaluate).ToList())
{
Thread.Sleep(1);
}
watcher.Stop();
Console.WriteLine("GetBool2 - {0}", watcher.ElapsedMilliseconds);
}
The output is:

Iterating a dictionary in C#

var dict = new Dictionary<int, string>();
for (int i = 0; i < 200000; i++)
dict[i] = "test " + i;
I iterated this dictionary using the code below:
foreach (var pair in dict)
Console.WriteLine(pair.Value);
Then, I iterated it using this:
foreach (var key in dict.Keys)
Console.WriteLine(dict[key]);
And the second iteration took ~3 seconds less.
I can get both keys and values via both methods. What I wonder is whether the second approach has a drawback. Since the most rated question that I can find about this doesn't include this way of iterating a dictionary, I wanted to know why no one uses it and how does it work faster.
Your time tests have some fundamental flaws:
Console.Writeline is an I/O operation, which takes orders of magnitude more time than memory accesses and CPU calculations. Any difference in iteration times is probably being dwarfed by the cost of this operation. It's like measuring the weights of pennies in a cast-iron stove.
You don't mention how long the overall operation took, so saying that one took 3 seconds less than another is meaningless. If it took 300 seconds to run the first, and 303 seconds to run the second, then you are micro-optimizing.
You don't mention how you measured running time. Did running time include the time getting the program assembly loaded and bootstrapped?
You don't mention repeatability: Did you run these operations several times? Several hundred times? In different orders?
Here are my tests. Note how I try my best to ensure that the method of iteration is the only thing that changes, and I include a control to see how much of the time is taken up purely because of a for loop and assignment:
void Main()
{
// Insert code here to set up your test: anything that you don't want to include as
// part of the timed tests.
var dict = new Dictionary<int, string>();
for (int i = 0; i < 2000; i++)
dict[i] = "test " + i;
string s = null;
var actions = new[]
{
new TimedAction("control", () =>
{
for (int i = 0; i < 2000; i++)
s = "hi";
}),
new TimedAction("first", () =>
{
foreach (var pair in dict)
s = pair.Value;
}),
new TimedAction("second", () =>
{
foreach (var key in dict.Keys)
s = dict[key];
})
};
TimeActions(100, // change this number as desired.
actions);
}
#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
Stopwatch s = new Stopwatch();
foreach(var action in actions)
{
var milliseconds = s.Time(action.Action, iterations);
Console.WriteLine("{0}: {1}ms ", action.Message, milliseconds);
}
}
public class TimedAction
{
public TimedAction(string message, Action action)
{
Message = message;
Action = action;
}
public string Message {get;private set;}
public Action Action {get;private set;}
}
public static class StopwatchExtensions
{
public static double Time(this Stopwatch sw, Action action, int iterations)
{
sw.Restart();
for (int i = 0; i < iterations; i++)
{
action();
}
sw.Stop();
return sw.Elapsed.TotalMilliseconds;
}
}
#endregion
Result
control: 1.2173ms
first: 9.0233ms
second: 18.1301ms
So in these tests, using the indexer takes roughly twice as long as iterating key-value pairs, which is what I would expect*. This stays roughly proportionate if I increase the number of entries and the number of repetitions by an order of magnitude, and I get the same results if I run the two tests in reverse order.
* Why would I expect this result? The Dictionary class probably represents its entries as KeyValuePairs internally, so all it really has to do when you iterate it directly is walk through its data structure once, handing the caller each entry as it comes to it. If you iterate just the Keys, it still has to find each KeyValuePair, and give you the value of the Key property from it, so that step alone is going to cost roughly the same amount as iterating across it in the first place. Then you have to call the indexer, which has to calculate a hash for provided key, jump to the correct hashtable bucket, and do an equality check on the keys of any KeyValuePairs it finds there. These operations aren't terribly expensive, but once you do them N times, it's roughly as expensive as if you'd iterated over the internal hashtable structure again.

why in this simple test the speed of method relates to the order of triggering?

I was doing other experiments until this strange behaviour caught my eye.
code is compiled in x64 release.
if key in 1, the 3rd run of List method cost 40% more time than the first 2. output is
List costs 9312
List costs 9289
Array costs 12730
List costs 11950
if key in 2, the 3rd run of Array method cost 30% more time than the first 2. output is
Array costs 8082
Array costs 8086
List costs 11937
Array costs 12698
You can see the pattern, the complete code is attached following (just compile and run):
{the code presented is minimal to run the test. The actually code used to get reliable result is more complicated, I wrapped the method and tested it 100+ times after proper warmed up}
class ListArrayLoop
{
readonly int[] myArray;
readonly List<int> myList;
readonly int totalSessions;
public ListArrayLoop(int loopRange, int totalSessions)
{
myArray = new int[loopRange];
for (int i = 0; i < myArray.Length; i++)
{
myArray[i] = i;
}
myList = myArray.ToList();
this.totalSessions = totalSessions;
}
public void ArraySum()
{
var pool = myArray;
long sum = 0;
for (int j = 0; j < totalSessions; j++)
{
sum += pool.Sum();
}
}
public void ListSum()
{
var pool = myList;
long sum = 0;
for (int j = 0; j < totalSessions; j++)
{
sum += pool.Sum();
}
}
}
class Program
{
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
ListArrayLoop test = new ListArrayLoop(10000, 100000);
string input = Console.ReadLine();
if (input == "1")
{
sw.Start();
test.ListSum();
sw.Stop();
Console.WriteLine("List costs {0}",sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ListSum();
sw.Stop();
Console.WriteLine("List costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ArraySum();
sw.Stop();
Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ListSum();
sw.Stop();
Console.WriteLine("List costs {0}", sw.ElapsedMilliseconds);
}
else
{
sw.Start();
test.ArraySum();
sw.Stop();
Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ArraySum();
sw.Stop();
Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ListSum();
sw.Stop();
Console.WriteLine("List costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
test.ArraySum();
sw.Stop();
Console.WriteLine("Array costs {0}", sw.ElapsedMilliseconds);
}
Console.ReadKey();
}
}
Contrived problems give you contrived answers.
Optimization should be done after code is written and not before. Write your solution the way that is easiest to understand and maintain. Then if the program is not fast enough for your use case, then you use a profiling tool and go back and see where the actual bottleneck is, not where you "think" it is.
Most optimizations people try to do in your situation is spending 6 hours to do something that will decrease the run time by 1 second. Most small programs will not be run enough times to offset the cost you spent trying to "optimize" it.
That being said this is a strange edge case. I modified it a bit and am running it though a profiler but I need to downgrade my VS2010 install so I can get .NET framework source stepping back.
I ran though the profiler using the larger example, I can find no good reason why it would take longer.
Your issue is your test. When you are benchmarking code there are a couple of guiding principals you should always follow:
Processor Affinity: Use only a single processor, usually not #1.
Warmup: Always perform the test a small number of times up front.
Duration: Make sure your test duration is at least 500ms.
Average: Average together multiple runs to remove anomalies.
Cleanup: Force the GC to collect allocated objects between tests.
Cooldown: Allow the process to sleep for a short-period of time.
So using these guidelines and rewriting your tests I get the following results:
Run 1
Enter test number (1|2): 1
ListSum averages 776
ListSum averages 753
ArraySum averages 1102
ListSum averages 753
Press any key to continue . . .
Run 2
Enter test number (1|2): 2
ArraySum averages 1155
ArraySum averages 1102
ListSum averages 753
ArraySum averages 1067
Press any key to continue . . .
So here is the final test code used:
static void Main(string[] args)
{
//We just need a single-thread for this test.
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2);
System.Threading.Thread.BeginThreadAffinity();
Console.Write("Enter test number (1|2): ");
string input = Console.ReadLine();
//perform the action just a few times to jit the code.
ListArrayLoop warmup = new ListArrayLoop(10, 10);
Console.WriteLine("Performing warmup...");
Test(warmup.ListSum);
Test(warmup.ArraySum);
Console.WriteLine("Warmup complete...");
Console.WriteLine();
ListArrayLoop test = new ListArrayLoop(10000, 10000);
if (input == "1")
{
Test(test.ListSum);
Test(test.ListSum);
Test(test.ArraySum);
Test(test.ListSum);
}
else
{
Test(test.ArraySum);
Test(test.ArraySum);
Test(test.ListSum);
Test(test.ArraySum);
}
}
private static void Test(Action test)
{
long totalElapsed = 0;
for (int counter = 10; counter > 0; counter--)
{
try
{
var sw = Stopwatch.StartNew();
test();
totalElapsed += sw.ElapsedMilliseconds;
}
finally { }
GC.Collect(0, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
//cooldown
for (int i = 0; i < 100; i++)
System.Threading.Thread.Sleep(0);
}
Console.WriteLine("{0} averages {1}", test.Method.Name, totalElapsed / 10);
}
Note: Some people may debate about the usefulness of the cool-down; However, everyone agrees that even if it's not helpful, it is not harmful. I find that on some tests it can yield a more reliable result; however, in the example above I doubt it makes any difference.
Short answer: It is because CRL has optimization for dispatching methods called on interface-type. As long as particular interface's method call is made on the same type (that implements this interface), CLR uses fast dispatching routine (only 3 instructions) that only checks actual type of instance and in case of match it jumps directly on precomputed address of particular method. But when the same interface's method call is made on instance of another type, CLR switches dispatching to slower routine (which can dispatch methods for any actual instance type).
Long answer:
Firstly, take a look at how the method System.Linq.Enumerable.Sum() is declared (I omitted validity checking of source parameter because it's not important it this case):
public static int Sum(this IEnumerable<int> source)
{
int num = 0;
foreach (int num2 in source)
num += num2;
return num;
}
So all types that implement IEnumerable< int > can call this extension method, including int[] and List< int >. Keyword foreach is just abbreviation for getting enumerator via IEnumerable< T >.GetEnumerator() and iterating through all values. So this method actually does this:
public static int Sum(this IEnumerable<int> source)
{
int num = 0;
IEnumerator<int> Enumerator = source.GetEnumerator();
while(Enumerator.MoveNext())
num += Enumerator.Current;
return num;
}
Now you can clearly see, that method body contains three method calls on interface-type variables: GetEnumerator(), MoveNext(), and Current (although Current is actually property, not method, reading value from property just calls corresponding getter method).
GetEnumerator() typically creates new instance of some auxiliary class, which implements IEnumerator< T > and thus is able to return all values one by one. It is important to note, that in case of int[] and List< int >, types of enumerators returned by GetEnumerator() ot these two classes are different. If argument source is of type int[], then GetEnumerator() returns instance of type SZGenericArrayEnumerator< int > and if source is of type List< int >, then it returns instance of type List< int >+Enumerator< int >.
Two other methods (MoveNext() and Current) are repeatedly called in tight loop and therefore their speed is crucial for overall performance. Unfortunatelly calling method on interface-type variable (such as IEnumerator< int >) is not as straightforward as ordinary instance method call. CLR must dynamically find out actual type of object in variable and then find out, which object's method implements corresponding interface method.
CLR tries to avoid doing this time consuming lookup on every call with a little trick. When particular method (such as MoveNext()) is called for the first time, CLR finds actual type of instance on which this call is made (for example SZGenericArrayEnumerator< int > in case you called Sum on int[]) and finds address of method, that implements corresponding method for this type (that is address of method SZGenericArrayEnumerator< int >.MoveNext()). Then it uses this information to generate auxiliary dispatching method, which simply checks, whether actual instance type is the same as when first call was made (that is SZGenericArrayEnumerator< int >) and if it is, it directly jumps to the method's address found earlier. So on subsequent calls, no complicated method lookup is made as long as type of instance remains the same. But when call is made on enumerator of different type (such as List< int >+Enumerator< int > in case of calculating sum of List< int >), CLR no longer uses this fast dispatching method. Instead another (general-purpose) and much slower dispatching method is used.
So as long as Sum() is called on array only, CLR dispatches calls to GetEnumerator(), MoveNext(), and Current using fast method. When Sum() is called on list too, CLR switches to slower dispatching method and therefore performance decreases.
If performance is your concern, implement your own separate Sum() extension method for every type, on which you want to call Sum(). This ensures that CLR will use fast dispatching method. For example:
public static class FasterSumExtensions
{
public static int Sum(this int[] source)
{
int num = 0;
foreach (int num2 in source)
num += num2;
return num;
}
public static int Sum(this List<int> source)
{
int num = 0;
foreach(int num2 in source)
num += num2;
return num;
}
}
Or even better, avoid using IEnumerable< T > interface at all (because it's still brings noticeable overhead). For example:
public static class EvenFasterSumExtensions
{
public static int Sum(this int[] source)
{
int num = 0;
for(int i = 0; i < source.Length; i++)
num += source[i];
return num;
}
public static int Sum(this List<int> source)
{
int num = 0;
for(int i = 0; i < source.Count; i++)
num += source[i];
return num;
}
}
Here are results from my computer:
Your original program: 9844, 9841, 12545, 14384
FasterSumExtensions: 6149, 6445, 754, 6145
EvenFasterSumExtensions: 1557, 1561, 553, 1574
Way too much for a comment so it's CW -- feel free to incorporate and I'll delete this. The given code is a little off to me but the problem is still interesting. If you mix calls, you get poorer performance. This code highlights it:
static void Main(string[] args)
{
var input = Console.ReadLine();
var test = new ListArrayLoop(10000, 1000);
switch (input)
{
case "1":
Test(test.ListSum);
break;
case "2":
Test(test.ArraySum);
break;
case "3":
// adds about 40 ms
test.ArraySum();
Test(test.ListSum);
break;
default:
// adds about 35 ms
test.ListSum();
Test(test.ArraySum);
break;
}
}
private static void Test(Action toTest)
{
for (int i = 0; i < 100; i++)
{
var sw = Stopwatch.StartNew();
toTest();
sw.Stop();
Console.WriteLine("costs {0}", sw.ElapsedMilliseconds);
sw.Reset();
}
}
Lists are implemented in .NET with arrays so the average performance should be the same (since you do not change the length of either).
It looks like you have averaged the sum()s' times sufficiently, this could be a GC issue with an iterator used in the sum() method.
Hm, it really looks strange...
My guess: You are calling .sum() on the variable pool with var type. As long as you are working on only one type (either list or array), the call to sum() is unambigous and can be optimized. By using a new class var is ambigous and must be resolved, so further calls will cause a performance hit.
I do not have a compiler, so try to load another class which supports sum() and compare the times. If I am right, I would expect again a performance hit, but this time not so much.
In my opinion that's caching (because of read ahead).
First time you access an array, many elements from it get into the cache at once (read ahead). This prefetching mechanism expects the program will be likely accessing memory near theaddress requested.
Further calls already benefit from this (provided the array fit into the cache). When you change the method, the cache is invalidated and you need to get everything again from memory.
so calling: list, array, list, array, list, array
should be slower than: list,list,list,array,array,array
But this is not deterministic from the programmer point of view, as you don't know the state of the cache or other units affecting caching decisions.

Is one of these for loops faster than the other?

for (var keyValue = 0; keyValue < dwhSessionDto.KeyValues.Count; keyValue++)
{...}
var count = dwhSessionDto.KeyValues.Count;
for (var keyValue = 0; keyValue < count; keyValue++)
{...}
I know there's a difference between the two, but is one of them faster than the other? I would think the second is faster.
Yes, the first version is much slower. After all, I'm assuming you're dealing with types like this:
public class SlowCountProvider
{
public int Count
{
get
{
Thread.Sleep(1000);
return 10;
}
}
}
public class KeyValuesWithSlowCountProvider
{
public SlowCountProvider KeyValues
{
get { return new SlowCountProvider(); }
}
}
Here, your first loop will take ~10 seconds, whereas your second loop will take ~1 second.
Of course, you might argue that the assumption that you're using this code is unjustified - but my point is that the right answer will depend on the types involved, and the question doesn't state what those types are.
Now if you're actually dealing with a type where accessing KeyValues and Count is cheap (which is quite likely) I wouldn't expect there to be much difference. Mind you, I'd almost always prefer to use foreach where possible:
foreach (var pair in dwhSessionDto.KeyValues)
{
// Use pair here
}
That way you never need the count. But then, you haven't said what you're trying to do inside the loop either. (Hint: to get more useful answers, provide more information.)
it depends how difficult it is to compute dwhSessionDto.KeyValues.Count if its just a pointer to an int then the speed of each version will be the same. However, if the Count value needs to be calculated, then it will be calculated every time, and therefore impede perfomance.
EDIT -- heres some code to demonstrate that the condition is always re-evaluated
public class Temp
{
public int Count { get; set; }
}
static void Main(string[] args)
{
var t = new Temp() {Count = 5};
for (int i = 0; i < t.Count; i++)
{
Console.WriteLine(i);
t.Count--;
}
Console.ReadLine();
}
The output is 0, 1, 2 - only !
See comments for reasons why this answer is wrong.
If there is a difference, it’s the other way round: Indeed, the first one might be faster. That’s because the compiler recognizes that you are iterating from 0 to the end of the array, and it can therefore elide bounds checks within the loop (i.e. when you access dwhSessionDTo.KeyValues[i]).
However, I believe the compiler only applies this optimization to arrays so there probably will be no difference here.
It is impossible to say without knowing the implementation of dwhSessionDto.KeyValues.Count and the loop body.
Assume a global variable bool foo = false; and then following implementations:
/* Loop body... */
{
if(foo) Thread.Sleep(1000);
}
/* ... */
public int Count
{
get
{
foo = !foo;
return 10;
}
}
/* ... */
Now, the first loop will perform approximately twice as fast as the second ;D
However, assuming non-moronic implementation, the second one is indeed more likely to be faster.
No. There is no performance difference between these two loops. With JIT and Code Optimization, it does not make any difference.
There is no difference but why you think that thereis difference , can you please post your findings?
if you see the implementation of insert item in Dictionary using reflector
private void Insert(TKey key, TValue value, bool add)
{
int freeList;
if (key == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
if (this.buckets == null)
{
this.Initialize(0);
}
int num = this.comparer.GetHashCode(key) & 0x7fffffff;
int index = num % this.buckets.Length;
for (int i = this.buckets[index]; i >= 0; i = this.entries[i].next)
{
if ((this.entries[i].hashCode == num) && this.comparer.Equals(this.entries[i].key, key))
{
if (add)
{
ThrowHelper.ThrowArgumentException(ExceptionResource.Argument_AddingDuplicate);
}
this.entries[i].value = value;
this.version++;
return;
}
}
if (this.freeCount > 0)
{
freeList = this.freeList;
this.freeList = this.entries[freeList].next;
this.freeCount--;
}
else
{
if (this.count == this.entries.Length)
{
this.Resize();
index = num % this.buckets.Length;
}
freeList = this.count;
this.count++;
}
this.entries[freeList].hashCode = num;
this.entries[freeList].next = this.buckets[index];
this.entries[freeList].key = key;
this.entries[freeList].value = value;
this.buckets[index] = freeList;
this.version++;
}
Count is a internal member to this class which is incremented each item you insert an item into dictionary
so i beleive that there is no differenct at all.
The second version can be faster, sometimes. The point is that the condition is reevaluated after every iteration, so if e.g. the getter of "Count" actually counts the elements in an IEnumerable, or interogates a database /etc, this will slow things down.
So I'd say that if you dont affect the value of "Count" in the "for", the second version is safer.

Categories

Resources