C# Unexpected Performance - Function Calls

C# Unexpected Performance - Function Calls - c#

I'm optimizing every line of code in my application, as performance is key. I'm testing all assumptions, as what I expect is not what I see in reality.
A strange occurrence to me is the performance of function calls. Below are two scenarios. Iterating an integer within the loop, and with a function in the loop. I expected the function call to be slower, however it is faster??
Can anyone explain this? I'm using .NET 4.7.1
Without function: 2808ms
With function 2295ms
UPDATE:
Switching the loops switches the runtime as well - I don't understand why, but will accept it as it is. Running the two different loops in different applications give similar results. I'll assume in the future that a function call won't create any additional overhead
public static int a = 0;
public static void Increment()
{
a = a + 1;
}
static void Main(string[] args)
{
//There were suggestions that the first for loop always runs faster. I have included a 'dummy' for loop here to warm up.
a = 0;
for (int i = 0;i < 1000;i++)
{
a = a + 1;
}
//Normal increment
Stopwatch sw = new Stopwatch();
sw.Start();
a = 0;
for (int i = 0; i < 900000000;i++)
{
a = a + 1;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
//Increment with function
Stopwatch sw2 = new Stopwatch();
sw2.Start();
a = 0;
for (int i = 0; i < 900000000; i++)
{
Increment();
}
sw2.Stop();
Console.WriteLine(sw2.ElapsedMilliseconds);
Console.ReadLine();
}

Related

C# Thread example code working without a lock

I'm having a play with Threads to remind myself how they work, I haven't done any Threading code in ages.
So I thought I'd start with the most basic example, create n threads, get them updating a static int without a lock, so I can see it go wrong. However, it's working? The final value is 500 every time, it should be slightly random as Threads update the value at the same time
I know I'm doing something really stupid here but I can't see what
https://dotnetfiddle.net/w9TK5W
using System;
using System.Threading;
public class Program
{
public class Department
{
public static int a = 0;
public Department()
{
}
public void Inc()
{
a = a +5;
a = a -4;
}
}
public static void Main()
{
int count = 500;
Thread[] threads = new Thread[count];
Department dep = new Department();
for (int i = 0; i < count; i++)
{
Thread t = new Thread(new ThreadStart(dep.Inc));
threads[i] = t;
}
for (int i = 0; i < count; i++)
{
threads[i].Start();
}
Thread.Sleep(2000);
for (int i = 0; i < count; i++)
{
threads[i].Join();
}
Console.WriteLine(Department.a.ToString());
}
}
[Edit, more info]
So I changed the loop to look like this, it now works as expected
int b = a;
b = b+1;
int j=0;
for (int i=0; i<1E5; i++)
{
j += i;
}
a = b;

The computation in each thread is so short that probably each started thread is already ended before the next one actually starts.
This happens just by chance.
You should run in each thread a loop that performs millions of operations on a, and then you will probably detect the expected inconsistency.
The question has been edited with millions of operation which do not alter a, thus by chance again, the very few operations that actually alter a don't happen at the same time.
The question has been edited since the previous remark, but now it fails for a different reason than the expected initial one.
Reading a, waiting a bit, and altering a based on the read value is obviously not atomic.
If the loop simply performed a+=1, you could also see that this apparently trivial operation is not atomic either.

Weird DirectoryInfo.EnumerateFiles() performance pattern?

I was experimenting with DirectoryInfo.EnumerateFiles() and found this weird performance pattern that I can't understand. If I perform several successive enumerations, each successive enumeration takes longer than the previous one by a substantial amount of time. That is weird. But what's even weirder, it's that if I put several searches in a for loop, then with each iteration, the search times reset. Here's my method and my results:
static int i;
static void Main(string[] args)
{
for (int j = 0; j < 15; j++)
{
var sw = new System.Diagnostics.Stopwatch();
i = 0;
sw.Start();
EnumerateFiles();
sw.Stop();
Console.WriteLine(sw.Elapsed.ToString());
i = 0;
sw.Start();
EnumerateFiles();
sw.Stop();
Console.WriteLine(sw.Elapsed.ToString());
i = 0;
sw.Start();
EnumerateFiles();
sw.Stop();
Console.WriteLine(sw.Elapsed.ToString());
i = 0;
sw.Start();
EnumerateFiles();
sw.Stop();
Console.WriteLine(sw.Elapsed.ToString());
Console.WriteLine("====================================");
}
Console.ReadLine();
}
private static void EnumerateFiles()
{
foreach (var item in new System.IO.DirectoryInfo("d:\\aac").EnumerateFiles("*.*", System.IO.SearchOption.AllDirectories))
{
i++;
}
}
And the results:
Total files: 5386
00:00:00.2868080
00:00:00.5720745
00:00:00.8443089
00:00:01.1315225
====================================
00:00:00.2729422
00:00:00.5275304
00:00:00.8259863
00:00:01.0712183
====================================
00:00:00.2457264
00:00:00.4642581
00:00:00.6948112
00:00:00.9178203
====================================
00:00:00.2198666
00:00:00.4503493
00:00:00.6717144
00:00:00.8951899
====================================
00:00:00.2391378
00:00:00.4602923
00:00:00.6767395
00:00:00.9082248
====================================
//last one (i == 15):
00:00:00.2138526
00:00:00.4437129
00:00:00.6626495
00:00:00.8794025
Does anyone know why that is happening?
Now, the answer to the question about why am I doing four searches in one iteration is: I was trying out some things, measuring the performance and stumbled upon this anomaly, and now I want to know why it is like so.

instead of all the sw.Start();
do sw = System.Diagnostics.Stopwatch.StartNew();
and try again
the issue here is your not resetting the stopwatch, your pausing it
you can also call the sw.Reset(); sw.Start(); or sw.Retart() instead

Parallel.Foreach loop gets different result than For loop?

I've
written simple for loop iterating through array and Parallel.ForEach loop doing the same thing. However, resuls I've get are different so I want to ask what the heck is going on? :D
class Program
{
static void Main(string[] args)
{
long creating = 0;
long reading = 0;
long readingParallel = 0;
for (int j = 0; j < 10; j++)
{
Stopwatch timer1 = new Stopwatch();
Random rnd = new Random();
int[] array = new int[100000000];
timer1.Start();
for (int i = 0; i < 100000000; i++)
{
array[i] = rnd.Next(5);
}
timer1.Stop();
long result = 0;
Stopwatch timer2 = new Stopwatch();
timer2.Start();
for (int i = 0; i < 100000000; i++)
{
result += array[i];
}
timer2.Stop();
Stopwatch timer3 = new Stopwatch();
long result2 = 0;
timer3.Start();
Parallel.ForEach(array, (item) =>
{
result2 += item;
});
if (result != result2)
{
Console.WriteLine(result + " - " + result2);
}
timer3.Stop();
creating += timer1.ElapsedMilliseconds;
reading += timer2.ElapsedMilliseconds;
readingParallel += timer3.ElapsedMilliseconds;
}
Console.WriteLine("Create : \t" + creating / 100);
Console.WriteLine("Read: \t\t" + reading / 100);
Console.WriteLine("ReadP: \t\t" + readingParallel / 100);
Console.ReadKey();
}
}
So in the condition I get results:
result = 200009295;
result2 = 35163054;
Is there anything wrong?

The += operator is non-atomic and actually performs multiple operations:
load value at location that result is pointing to, into memory
add array[i] to the in-memory value (I'm simplifying here)
write the result back to result
Since a lot of these add operations will be running in parallel it is not just possible, but likely that there will be races between some of these operations where one thread reads a result value and performs the addition, but before it has the chance to write it back, another thread grabs the old result value (which hasn't yet been updated) and also performs the addition. Then both threads write their respective values to result. Regardless of which one wins the race, you end up with a smaller number than expected.
This is why the Interlocked class exists.
Your code could very easily be fixed:
Parallel.ForEach(array, (item) =>
{
Interlocked.Add(ref result2, item);
});
Don't be surprised if Parallel.ForEach ends up slower than the fully synchronous version in this case though. This is due to the fact that
the amount of work inside the delegate you pass to Parallel.ForEach is very small
Interlocked methods incur a slight but non-negligible overhead, which will be quite noticeable in this particular case

Writing to console char by char, fastest way

In a current project of mine I have to parse a string, and write parts of it to the console. While testing how to do this without too much overhead, I discovered that one way I was testing is actually faster than Console.WriteLine, which is slightly confusing to me.
I'm aware this is not the proper way to benchmark stuff, but I'm usually fine with a rough "this is faster than this", which I can tell after running it a few times.
static void Main(string[] args)
{
var timer = new Stopwatch();
timer.Restart();
Test1("just a little test string.");
timer.Stop();
Console.WriteLine(timer.Elapsed);
timer.Restart();
Test2("just a little test string.");
timer.Stop();
Console.WriteLine(timer.Elapsed);
timer.Restart();
Test3("just a little test string.");
timer.Stop();
Console.WriteLine(timer.Elapsed);
}
static void Test1(string str)
{
Console.WriteLine(str);
}
static void Test2(string str)
{
foreach (var c in str)
Console.Write(c);
Console.Write('\n');
}
static void Test3(string str)
{
using (var stream = new StreamWriter(Console.OpenStandardOutput()))
{
foreach (var c in str)
stream.Write(c);
stream.Write('\n');
}
}
As you can see, Test1 is using Console.WriteLine. My first thought was to simply call Write for every char, see Test2. But this resulted in taking roughly twice as long. My guess would be that it flushes after every write, which makes it slower. So I tried Test3, using a StreamWriter (AutoFlush off), which resulted in being about 25% faster than Test1, and I'm really curious why that is. Or is it that writing to the console can't be benchmarked properly? (noticed some strange data when adding more test cases...)
Can someone enlighten me?
Also, if there's a better way to do this (going though a string and only writing parts of it to the console), feel free to comment on that.

First I agree with the other comments that your test harness leaves something to be desired... I rewrote it and included it below. The result after rewrite post a clear winner:
//Test 1 = 00:00:03.7066514
//Test 2 = 00:00:24.6765818
//Test 3 = 00:00:00.8609692
From this you are correct the buffered stream writer is better than 25% faster. It's faster only because it's buffered. Internally the StreamWriter implementation uses a default buffer size of around 1~4kb (depending on the stream type). If you construct the StreamWriter with an 8-byte buffer (the smallest allowed) you will see most of your performance improvement disappear. You can also see this by using a Flush() call following each write.
Here is the test rewritten to obtain the numbers above:
private static StreamWriter stdout = new StreamWriter(Console.OpenStandardOutput());
static void Main(string[] args)
{
Action<string>[] tests = new Action<string>[] { Test1, Test2, Test3 };
TimeSpan[] timming = new TimeSpan[tests.Length];
// Repeat the entire sequence of tests many times to accumulate the result
for (int i = 0; i < 100; i++)
{
for( int itest =0; itest < tests.Length; itest++)
{
string text = String.Format("just a little test string, test = {0}, iteration = {1}", itest, i);
Action<string> thisTest = tests[itest];
//Clear the console so that each test begins from the same state
Console.Clear();
var timer = Stopwatch.StartNew();
//Repeat the test many times, if this was not using the console
//I would use a much higher number, say 10,000
for (int j = 0; j < 100; j++)
thisTest(text);
timer.Stop();
//Accumulate the result, but ignore the first run
if (i != 0)
timming[itest] += timer.Elapsed;
//Depending on what you are benchmarking you may need to force GC here
}
}
//Now print the results we have collected
Console.Clear();
for (int itest = 0; itest < tests.Length; itest++)
Console.WriteLine("Test {0} = {1}", itest + 1, timming[itest]);
Console.ReadLine();
}
static void Test1(string str)
{
Console.WriteLine(str);
}
static void Test2(string str)
{
foreach (var c in str)
Console.Write(c);
Console.Write('\n');
}
static void Test3(string str)
{
foreach (var c in str)
stdout.Write(c);
stdout.Write('\n');
}

I've ran your test for 10000 times each and the results are the following on my machine:
test1 - 0.6164241
test2 - 8.8143273
test3 - 0.9537039
this is the script I used:
static void Main(string[] args)
{
Test1("just a little test string."); // warm up
GC.Collect(); // compact Heap
GC.WaitForPendingFinalizers(); // and wait for the finalizer queue to empty
Stopwatch timer = new Stopwatch();
timer.Start();
for (int i = 0; i < 10000; i++)
{
Test1("just a little test string.");
}
timer.Stop();
Console.WriteLine(timer.Elapsed);
}

I changed the code to run each test 1000 times.
static void Main(string[] args) {
var timer = new Stopwatch();
timer.Restart();
for (int i = 0; i < 1000; i++)
Test1("just a little test string.");
timer.Stop();
TimeSpan elapsed1 = timer.Elapsed;
timer.Restart();
for (int i = 0; i < 1000; i++)
Test2("just a little test string.");
timer.Stop();
TimeSpan elapsed2 = timer.Elapsed;
timer.Restart();
for (int i = 0; i < 1000; i++)
Test3("just a little test string.");
timer.Stop();
TimeSpan elapsed3 = timer.Elapsed;
Console.WriteLine(elapsed1);
Console.WriteLine(elapsed2);
Console.WriteLine(elapsed3);
Console.Read();
}
My output:
00:00:05.2172738
00:00:09.3893525
00:00:05.9624869

I also ran this one 10000 times and got these results:
00:00:00.6947374
00:00:09.6185047
00:00:00.8006468
Which seems in keeping with what others observed. I was curious why Test3 was slower than Test1, so wrote a fourth test:
timer.Start();
using (var stream = new StreamWriter(Console.OpenStandardOutput()))
{
for (int i = 0; i < testSize; i++)
{
Test4("just a little test string.", stream);
}
}
timer.Stop();
This one reuses the stream for each test, thus avoiding the overhead of recreating it each time. Result:
00:00:00.4090399
Although this is the fastest, it writes all the output at the end of the using block, which may not be what you are after. I would imagine that this approach would chew up more memory as well.

How can I make my function run as fast as "Contains" on an ArrayList?

I can't figure out a discrepancy between the time it takes for the Contains method to find an element in an ArrayList and the time it takes for a small function that I wrote to do the same thing. The documentation states that Contains performs a linear search, so it's supposed to be in O(n) and not any other faster method. However, while the exact values may not be relevant, the Contains method returns in 00:00:00.1087087 seconds while my function takes 00:00:00.1876165. It might not be much, but this difference becomes more evident when dealing with even larger arrays. What am I missing and how should I write my function to match Contains's performances?
I'm using C# on .NET 3.5.
public partial class Window1 : Window
{
public bool DoesContain(ArrayList list, object element)
{
for (int i = 0; i < list.Count; i++)
if (list[i].Equals(element)) return true;
return false;
}
public Window1()
{
InitializeComponent();
ArrayList list = new ArrayList();
for (int i = 0; i < 10000000; i++) list.Add("zzz " + i);
Stopwatch sw = new Stopwatch();
sw.Start();
//Console.Out.WriteLine(list.Contains("zzz 9000000") + " " + sw.Elapsed);
Console.Out.WriteLine(DoesContain(list, "zzz 9000000") + " " + sw.Elapsed);
}
}
EDIT:
Okay, now, lads, look:
public partial class Window1 : Window
{
public bool DoesContain(ArrayList list, object element)
{
int count = list.Count;
for (int i = count - 1; i >= 0; i--)
if (element.Equals(list[i])) return true;
return false;
}
public bool DoesContain1(ArrayList list, object element)
{
int count = list.Count;
for (int i = 0; i < count; i++)
if (element.Equals(list[i])) return true;
return false;
}
public Window1()
{
InitializeComponent();
ArrayList list = new ArrayList();
for (int i = 0; i < 10000000; i++) list.Add("zzz " + i);
Stopwatch sw = new Stopwatch();
long total = 0;
int nr = 100;
for (int i = 0; i < nr; i++)
{
sw.Reset();
sw.Start();
DoesContain(list,"zzz");
total += sw.ElapsedMilliseconds;
}
Console.Out.WriteLine(total / nr);
total = 0;
for (int i = 0; i < nr; i++)
{
sw.Reset();
sw.Start();
DoesContain1(list, "zzz");
total += sw.ElapsedMilliseconds;
}
Console.Out.WriteLine(total / nr);
total = 0;
for (int i = 0; i < nr; i++)
{
sw.Reset();
sw.Start();
list.Contains("zzz");
total += sw.ElapsedMilliseconds;
}
Console.Out.WriteLine(total / nr);
}
}
I made an average of 100 running times for two versions of my function(forward and backward loop) and for the default Contains function. The times I've got are 136 and
133 milliseconds for my functions and a distant winner of 87 for the Contains version. Well now, if before you could argue that the data was scarce and I based my conclusions on a first, isolated run, what do you say about this test? Not does only on average Contains perform better, but it achieves consistently better results in each run. So, is there some kind of disadvantage in here for 3rd party functions, or what?

First, you're not running it many times and comparing averages.
Second, your method isn't being jitted until it actually runs. So the just in time compile time is added into its execution time.
A true test would run each multiple times and average the results (any number of things could cause one or the other to be slower for run X out of a total of Y), and your assemblies should be pre-jitted using ngen.exe.

As you're using .NET 3.5, why are you using ArrayList to start with, rather than List<string>?
A few things to try:
You could see whether using foreach instead of a for loop helps
You could cache the count:
public bool DoesContain(ArrayList list, object element)
{
int count = list.Count;
for (int i = 0; i < count; i++)
{
if (list[i].Equals(element))
{
return true;
}
return false;
}
}
You could reverse the comparison:
if (element.Equals(list[i]))
While I don't expect any of these to make a significant (positive) difference, they're the next things I'd try.
Do you need to do this containment test more than once? If so, you might want to build a HashSet<T> and use that repeatedly.

I'm not sure if you're allowed to post Reflector code, but if you open the method using Reflector, you can see that's it's essentially the same (there are some optimizations for null values, but your test harness doesn't include nulls).
The only difference that I can see is that calling list[i] does bounds checking on i whereas the Contains method does not.

Using the code below I was able to get the following timings relatively consitently (within a few ms):
1: 190ms DoesContainRev
2: 198ms DoesContainRev1
3: 188ms DoesContainFwd
4: 203ms DoesContainFwd1
5: 199ms Contains
Several things to notice here.
This is run with release compiled code from the commandline. Many people make the mistake of benchmarking code inside the Visual Studio debugging environment, not to say anyone here did but something to be careful of.
The list[i].Equals(element) appears to be just a bit slower than element.Equals(list[i]).
using System;
using System.Diagnostics;
using System.Collections;
namespace ArrayListBenchmark
{
class Program
{
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
const int arrayCount = 10000000;
ArrayList list = new ArrayList(arrayCount);
for (int i = 0; i < arrayCount; i++) list.Add("zzz " + i);
sw.Start();
DoesContainRev(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("1: {0}", sw.ElapsedMilliseconds));
sw.Reset();
sw.Start();
DoesContainRev1(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("2: {0}", sw.ElapsedMilliseconds));
sw.Reset();
sw.Start();
DoesContainFwd(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("3: {0}", sw.ElapsedMilliseconds));
sw.Reset();
sw.Start();
DoesContainFwd1(list, "zzz");
sw.Stop();
Console.WriteLine(String.Format("4: {0}", sw.ElapsedMilliseconds));
sw.Reset();
sw.Start();
list.Contains("zzz");
sw.Stop();
Console.WriteLine(String.Format("5: {0}", sw.ElapsedMilliseconds));
sw.Reset();
Console.ReadKey();
}
public static bool DoesContainRev(ArrayList list, object element)
{
int count = list.Count;
for (int i = count - 1; i >= 0; i--)
if (element.Equals(list[i])) return true;
return false;
}
public static bool DoesContainFwd(ArrayList list, object element)
{
int count = list.Count;
for (int i = 0; i < count; i++)
if (element.Equals(list[i])) return true;
return false;
}
public static bool DoesContainRev1(ArrayList list, object element)
{
int count = list.Count;
for (int i = count - 1; i >= 0; i--)
if (list[i].Equals(element)) return true;
return false;
}
public static bool DoesContainFwd1(ArrayList list, object element)
{
int count = list.Count;
for (int i = 0; i < count; i++)
if (list[i].Equals(element)) return true;
return false;
}
}
}

With a really good optimizer there should not be difference at all, because the semantics seems to be the same. However the existing optimizer can optimize your function not so good as the hardcoded Contains is optimized. Some of the points for optimization:
comparing to a property each time can be slower than counting downwards and comparing against 0
function call itself has its performance penalty
using iterators instead of explicit indexing can be faster (foreach loop instead of plain for)

First, if you are using types you know ahead of time, I'd suggest using generics. So List instead of ArrayList. Underneath the hood, ArrayList.Contains actually does a bit more than what you are doing. The following is from reflector:
public virtual bool Contains(object item)
{
if (item == null)
{
for (int j = 0; j < this._size; j++)
{
if (this._items[j] == null)
{
return true;
}
}
return false;
}
for (int i = 0; i < this._size; i++)
{
if ((this._items[i] != null) && this._items[i].Equals(item))
{
return true;
}
}
return false;
}
Notice that it forks itself on being passed a null value for item. However, since all the values in your example are not null, the additional check on null at the beginning and in the second loop should in theory take longer.
Are you positive you are dealing with fully compiled code? I.e., when your code runs the first time it gets JIT compiled where as the framework is obviously already compiled.

After your Edit, I copied the code and made a few improvements to it.
The difference was not reproducable, it turns out to be a measuring/rounding issue.
To see that, change your runs to this form:
sw.Reset();
sw.Start();
for (int i = 0; i < nr; i++)
{
DoesContain(list,"zzz");
}
total += sw.ElapsedMilliseconds;
Console.WriteLine(total / nr);
I just moved some lines. The JIT issue was insignificant with this numbr of repetitions.

My guess would be that ArrayList is written in C++ and could be taking advantage of some micro-optimizations (note: this is a guess).
For instance, in C++ you can use pointer arithmetic (specifically incrementing a pointer to iterate an array) to be faster than using an index.

using an array structure, you can't search faster than O(n) whithout any additional information.
if you know that the array is sorted, then you can use binary search algorithm and spent only o(log(n))
otherwise you should use a set.

Revised after reading comments:
It does not use some Hash-alogorithm to enable fast lookup.

Use SortedList<TKey,TValue>, Dictionary<TKey, TValue> or System.Collections.ObjectModel.KeyedCollection<TKey, TValue> for fast access based on a key.
var list = new List<myObject>(); // Search is sequential
var dictionary = new Dictionary<myObject, myObject>(); // key based lookup, but no sequential lookup, Contains fast
var sortedList = new SortedList<myObject, myObject>(); // key based and sequential lookup, Contains fast
KeyedCollection<TKey, TValue> is also fast and allows indexed lookup, however, it needs to be inherited as it is abstract. Therefore, you need a specific collection. However, with the following you can create a generic KeyedCollection.
public class GenericKeyedCollection<TKey, TValue> : KeyedCollection<TKey, TValue> {
public GenericKeyedCollection(Func<TValue, TKey> keyExtractor) {
this.keyExtractor = keyExtractor;
}
private Func<TValue, TKey> keyExtractor;
protected override TKey GetKeyForItem(TValue value) {
return this.keyExtractor(value);
}
}
The advantage of using the KeyedCollection is that the Add method does not require that a key is specified.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Unexpected Performance - Function Calls - c#

Related

C# Thread example code working without a lock

Weird DirectoryInfo.EnumerateFiles() performance pattern?

Parallel.Foreach loop gets different result than For loop?

Writing to console char by char, fastest way

How can I make my function run as fast as "Contains" on an ArrayList?

Categories

Resources