Related
This is my first attempt at parallel programming.
I'm writing a test console app before using this in my real app and I can't seem to get it right. When I run this, the parallel search is always faster than the sequential one, but the parallel search never finds the correct value. What am I doing wrong?
I tried it without using a partitioner (just Parallel.For); it was slower than the sequential loop and gave the wrong number. I saw a Microsoft doc that said for simple computations, using Partitioner.Create can speed things up. So I tried that but still got the wrong values. Then I saw Interlocked, but I think I'm using it wrong.
Any help would be greatly appreciated
Random r = new Random();
Stopwatch timer = new Stopwatch();
do {
// Make and populate a list
List<short> test = new List<short>();
for (int x = 0; x <= 10000000; x++)
{
test.Add((short)(r.Next(short.MaxValue) * r.NextDouble()));
}
// Initialize result variables
short rMin = short.MaxValue;
short rMax = 0;
// Do min/max normal search
timer.Start();
foreach (var amp in test)
{
rMin = Math.Min(rMin, amp);
rMax = Math.Max(rMax, amp);
}
timer.Stop();
// Display results
Console.WriteLine($"rMin: {rMin} rMax: {rMax} Time: {timer.ElapsedMilliseconds}");
// Initialize parallel result variables
short pMin = short.MaxValue;
short pMax = 0;
// Create list partioner
var rangePortioner = Partitioner.Create(0, test.Count);
// Do min/max parallel search
timer.Restart();
Parallel.ForEach(rangePortioner, (range, loop) =>
{
short min = short.MaxValue;
short max = 0;
for (int i = range.Item1; i < range.Item2; i++)
{
min = Math.Min(min, test[i]);
max = Math.Max(max, test[i]);
}
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMin), Math.Min(pMin, min));
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMax), Math.Max(pMax, max));
});
timer.Stop();
// Display results
Console.WriteLine($"pMin: {pMin} pMax: {pMax} Time: {timer.ElapsedMilliseconds}");
Console.WriteLine("Press enter to run again; any other key to quit");
} while (Console.ReadKey().Key == ConsoleKey.Enter);
Sample output:
rMin: 0 rMax: 32746 Time: 106
pMin: 0 pMax: 32679 Time: 66
Press enter to run again; any other key to quit
The correct way to do a parallel search like this is to compute local values for each thread used, and then merge the values at the end. This ensures that synchronization is only needed at the final phase:
var items = Enumerable.Range(0, 10000).ToList();
int globalMin = int.MaxValue;
int globalMax = int.MinValue;
Parallel.ForEach<int, (int Min, int Max)>(
items,
() => (int.MaxValue, int.MinValue), // Create new min/max values for each thread used
(item, state, localMinMax) =>
{
var localMin = Math.Min(item, localMinMax.Min);
var localMax = Math.Max(item, localMinMax.Max);
return (localMin, localMax); // return the new min/max values for this thread
},
localMinMax => // called one last time for each thread used
{
lock(items) // Since this may run concurrently, synchronization is needed
{
globalMin = Math.Min(globalMin, localMinMax.Min);
globalMax = Math.Max(globalMax, localMinMax.Max);
}
});
As you can see this is quite a bit more complex than a regular loop, and this is not even doing anything fancy like partitioning. An optimized solution would work over larger blocks to reduce overhead, but this is omitted for simplicity, and it looks like the OP is aware such issues already.
Be aware that multi threaded programming is difficult. While it is a great idea to try out such techniques in a playground rather than a real program, I would still suggest that you should start by studying the potential dangers of thread safety, there is fairly easy to find good resources about this.
Not all problems will be as obviously wrong like this, and it is quite easy to cause issues that breaks once in a million, or only when the cpu load is high, or only on single CPU systems, or issues that are only detected long after the code is put into production. It is a good practice to be paranoid whenever multiple threads may read and write the same memory concurrently.
I would also recommend learning about immutable data types, and pure functions, since these are much safer and easier to reason about once multiple threads are involved.
Interlocked.Exchange is thread safe only for Exchange, every Math.Min and Math.Max can be with race condition. You should compute min/max for every batch separately and then join results.
Using low-lock techniques like the Interlocked class is tricky and advanced. Taking into consideration that your experience in multithreading is not excessive, I would say go with a simple and trusty lock:
object locker = new object();
//...
lock (locker)
{
pMin = Math.Min(pMin, min);
pMax = Math.Max(pMax, max);
}
How expensive are exceptions in C#? It seems like they are not incredibly expensive as long as the stack is not deep; however I have read conflicting reports.
Is there definitive report that hasn't been rebutted?
Having read that exceptions are costly in terms of performance I threw together a simple measurement program, very similar to the one Jon Skeet published years ago. I mention this here mainly to provide updated numbers.
It took the program below 29914 milliseconds to process one million exceptions, which amounts to 33 exceptions per millisecond. That is fast enough to make exceptions a viable alternative to return codes for most situations.
Please note, though, that with return codes instead of exceptions the same program runs less than one millisecond, which means exceptions are at least 30,000 times slower than return codes. As stressed by Rico Mariani these numbers are also minimum numbers. In practice, throwing and catching an exception will take more time.
Measured on a laptop with Intel Core2 Duo T8100 # 2,1 GHz with .NET 4.0 in release build not run under debugger (which would make it way slower).
This is my test code:
static void Main(string[] args)
{
int iterations = 1000000;
Console.WriteLine("Starting " + iterations.ToString() + " iterations...\n");
var stopwatch = new Stopwatch();
// Test exceptions
stopwatch.Reset();
stopwatch.Start();
for (int i = 1; i <= iterations; i++)
{
try
{
TestExceptions();
}
catch (Exception)
{
// Do nothing
}
}
stopwatch.Stop();
Console.WriteLine("Exceptions: " + stopwatch.ElapsedMilliseconds.ToString() + " ms");
// Test return codes
stopwatch.Reset();
stopwatch.Start();
int retcode;
for (int i = 1; i <= iterations; i++)
{
retcode = TestReturnCodes();
if (retcode == 1)
{
// Do nothing
}
}
stopwatch.Stop();
Console.WriteLine("Return codes: " + stopwatch.ElapsedMilliseconds.ToString() + " ms");
Console.WriteLine("\nFinished.");
Console.ReadKey();
}
static void TestExceptions()
{
throw new Exception("Failed");
}
static int TestReturnCodes()
{
return 1;
}
I guess I'm in the camp that if performance of exceptions impacts your application then you're throwing WAY too many of them. Exceptions should be for exceptional conditions, not as routine error handling.
That said, my recollection of how exceptions are handled is essentially walking up the stack finding a catch statement that matches the type of the exception thrown. So performance will be impacted most by how deep you are from the catch and how many catch statements you have.
In my case, exceptions were very expensive. I rewrote this:
public BlockTemplate this[int x,int y, int z]
{
get
{
try
{
return Data.BlockTemplate[World[Center.X + x, Center.Y + y, Center.Z + z]];
}
catch(IndexOutOfRangeException e)
{
return Data.BlockTemplate[BlockType.Air];
}
}
}
Into this:
public BlockTemplate this[int x,int y, int z]
{
get
{
int ix = Center.X + x;
int iy = Center.Y + y;
int iz = Center.Z + z;
if (ix < 0 || ix >= World.GetLength(0)
|| iy < 0 || iy >= World.GetLength(1)
|| iz < 0 || iz >= World.GetLength(2))
return Data.BlockTemplate[BlockType.Air];
return Data.BlockTemplate[World[ix, iy, iz]];
}
}
And I noticed a good speed increase of about 30 seconds. This function gets called at least 32,000 times at startup. The code isn't as clear as to what the intention is, but the cost savings were huge.
I did my own measurements to find out how serious the exceptions implication were. I didn't try to measure the absolute time for throwing/catching exception. I was mostly interested in how much slower a loop will become if an exception is thrown in each pass. The measuring code looks like this:
for(; ; ) {
iValue = Level1(iValue);
lCounter += 1;
if(DateTime.Now >= sFinish)
break;
}
vs.
for(; ; ) {
try {
iValue = Level3Throw(iValue);
}
catch(InvalidOperationException) {
iValue += 3;
}
lCounter += 1;
if(DateTime.Now >= sFinish)
break;
}
The difference is 20 times. The second snippet is 20 times slower.
Barebones exception objects in C# are fairly lightweight; it's usually the ability to encapsulate an InnerException that makes it heavy when the object tree becomes too deep.
As for a definitive report, I'm not aware of any, although a cursory dotTrace profile (or any other profiler) for memory consumption and speed will be fairly easy to do.
Just to give my personal experience:
I'm working on a program that parses JSON files and extracts data from them, with Newtonsoft (Json.NET).
I rewrote this:
Option 1, with exceptions
try
{
name = rawPropWithChildren.Value["title"].ToString();
}
catch(System.NullReferenceException)
{
name = rawPropWithChildren.Name;
}
To this:
Option 2, without exceptions
if(rawPropWithChildren.Value["title"] == null)
{
name = rawPropWithChildren.Name;
}
else
{
name = rawPropWithChildren.Value["title"].ToString();
}
Of course, you don't really have context to judge about it, but here are my results (in debug mode):
Option 1, with exceptions.
38.50 seconds
Option 2, without exceptions.
06.48 seconds
To give a little bit of context, I'm working with thousands of JSON properties that can be null. Exceptions were thrown way too much, like maybe during 15% of the execution time. Well, not really precise, but they were thrown too many times.
I wanted to fix this, so I changed my code, and I did not understand why the execution time was so much faster. That was because of my poor exception handling.
So, what I've learned from this: I need to use exceptions only in particular cases and for things that can't be tested with a simple conditional statement. They also must be thrown the less often possible.
This is a kind of a random story for you, but I guess I would definitely think twice before using exceptions in my code from now on!
The performance hit with exceptions seems to be at the point of generating the exception object (albeit too small to cause any concerns 90% of the time). The recommendation therefore is to profile your code - if exceptions are causing a performance hit, you write a new high-perf method that does not use exceptions. (An example that comes to mind would be (TryParse introduced to overcome perf issues with Parse which uses exceptions)
THat said, exceptions in most cases do not cause significant performance hits in most situations - so the MS Design Guideline is to report failures by throwing exceptions
TLDR;
The answer should always start by answering "expensive compared to what?"
Exceptions are likely orders of magnitude faster than any connected service or data call so it's unlikely that avoiding their use is providing a tangible benefit over the improved information and control flow that they provide.
Throwing an exception can be measured in MICROseconds (but it depends on stack depth):
Image from this article and he posts the testing code: .Net exceptions performance
Do you generally get what you pay for? Most of the time yes.
Longer Explanation:
I'm quite interested in the origins of this question. As far as I can tell it is residual distaste for marginally useful c++ exceptions. .Net exceptions have a wealth of info in them and allow for neat and tidy code without excessive checks for success and logging. I explain much of the benefit of exceptions in another answer.
In 20 years of programming, I've never removed a throw or a catch to make something faster (not to say that I couldn't, just to say that there was lower hanging fruit and after that, nobody complained).
There is a separate question with competing answers, one catching an exception (no "Try" method was provided by the built-in method) and one that avoided exceptions.
I decided to do a head to head performance comparison of the two, and for a smaller number of columns the non-exception version was faster, but the exception version scaled better and eventually outperformed the exception-avoiding version:
The linqpad code for that test is below (including the graph rendering).
The point here though, is that this idea of "exceptions are slow" begs the question of "slower than what?" If a deep-stack exception costs 500 microseconds, does it matter if it occurs in response to a unique constraint that took a database 3000 microseconds to create? In any case, this demonstrates a generalized avoiding of exceptions for performance reasons will not necessarily yield more performant code.
Code for performance test:
void Main()
{
var loopResults = new List<Results>();
var exceptionResults = new List<Results>();
var totalRuns = 10000;
for (var colCount = 1; colCount < 20; colCount++)
{
using (var conn = new SqlConnection(#"Data Source=(localdb)\MSSQLLocalDb;Initial Catalog=master;Integrated Security=True;"))
{
conn.Open();
//create a dummy table where we can control the total columns
var columns = String.Join(",",
(new int[colCount]).Select((item, i) => $"'{i}' as col{i}")
);
var sql = $"select {columns} into #dummyTable";
var cmd = new SqlCommand(sql,conn);
cmd.ExecuteNonQuery();
var cmd2 = new SqlCommand("select * from #dummyTable", conn);
var reader = cmd2.ExecuteReader();
reader.Read();
Func<Func<IDataRecord, String, Boolean>, List<Results>> test = funcToTest =>
{
var results = new List<Results>();
Random r = new Random();
for (var faultRate = 0.1; faultRate <= 0.5; faultRate += 0.1)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
var faultCount=0;
for (var testRun = 0; testRun < totalRuns; testRun++)
{
if (r.NextDouble() <= faultRate)
{
faultCount++;
if(funcToTest(reader, "colDNE"))
throw new ApplicationException("Should have thrown false");
}
else
{
for (var col = 0; col < colCount; col++)
{
if(!funcToTest(reader, $"col{col}"))
throw new ApplicationException("Should have thrown true");
}
}
}
stopwatch.Stop();
results.Add(new UserQuery.Results{
ColumnCount = colCount,
TargetNotFoundRate = faultRate,
NotFoundRate = faultCount * 1.0f / totalRuns,
TotalTime=stopwatch.Elapsed
});
}
return results;
};
loopResults.AddRange(test(HasColumnLoop));
exceptionResults.AddRange(test(HasColumnException));
}
}
"Loop".Dump();
loopResults.Dump();
"Exception".Dump();
exceptionResults.Dump();
var combinedResults = loopResults.Join(exceptionResults,l => l.ResultKey, e=> e.ResultKey, (l, e) => new{ResultKey = l.ResultKey, LoopResult=l.TotalTime, ExceptionResult=e.TotalTime});
combinedResults.Dump();
combinedResults
.Chart(r => r.ResultKey, r => r.LoopResult.Milliseconds * 1.0 / totalRuns, LINQPad.Util.SeriesType.Line)
.AddYSeries(r => r.ExceptionResult.Milliseconds * 1.0 / totalRuns, LINQPad.Util.SeriesType.Line)
.Dump();
}
public static bool HasColumnLoop(IDataRecord dr, string columnName)
{
for (int i = 0; i < dr.FieldCount; i++)
{
if (dr.GetName(i).Equals(columnName, StringComparison.InvariantCultureIgnoreCase))
return true;
}
return false;
}
public static bool HasColumnException(IDataRecord r, string columnName)
{
try
{
return r.GetOrdinal(columnName) >= 0;
}
catch (IndexOutOfRangeException)
{
return false;
}
}
public class Results
{
public double NotFoundRate { get; set; }
public double TargetNotFoundRate { get; set; }
public int ColumnCount { get; set; }
public double ResultKey {get => ColumnCount + TargetNotFoundRate;}
public TimeSpan TotalTime { get; set; }
}
I recently measured C# exceptions (throw and catch) in a summation loop that threw an arithmetic overflow on every addition. Throw and catch of arithmetic overflow was around 8.5 microseconds = 117 KiloExceptions/second, on a quad-core laptop.
Exceptions are expensive, but there is more to it when you want to choose between exception and return codes.
Historically speaking the argument was: exceptions ensure that code is forced to handle the situation whereas return codes can be ignored. I never favoured these arguments as no programmer will want to ignore and break their codes on purpose - especially a good test team / or a well written test case will definitely not ignore return codes.
From a modern programming practices point of view, managing exceptions need to be looked at not only for their cost, but also for their viability.
First
Since most front ends will be disconnected from the API that is throwing exception. For example, a mobile app using a REST API. The same API can also be used for an Angular-based web frontend.
Either scenario will prefer return codes instead of exceptions.
Second
Nowadays, hackers randomly attempt to break all web utilities. In such a scenario, if they are constantly attacking your app's login API and if the app is constantly throwing exceptions, then you will end up dealing with thousands of exceptions a day. Of course, many will say the firewall will take care of such attacks. However, not all are spending money to manage a dedicated firewall or an expensive anti-spam service. It is better that your code is prepared for these scenarios.
The title may seem a little odd, because I have no idea how to describe this in one sentence.
For the course Algorithms we have to micro-optimize some stuff, one is finding out how deleting from an array works. The assignment is delete something from an array and re-align the contents so that there are no gaps, I think it is quite similar to how std::vector::erase works from c++.
Because I like the idea of understanding everything low-level, I went a little further and tried to bench my solutions. This presented some weird results.
At first, here is a little code that I used:
class Test {
Stopwatch sw;
Obj[] objs;
public Test() {
this.sw = new Stopwatch();
this.objs = new Obj[1000000];
// Fill objs
for (int i = 0; i < objs.Length; i++) {
objs[i] = new Obj(i);
}
}
public void test() {
// Time deletion
sw.Restart();
deleteValue(400000, objs);
sw.Stop();
// Show timings
Console.WriteLine(sw.Elapsed);
}
// Delete function
// value is the to-search-for item in the list of objects
private static void deleteValue(int value, Obj[] list) {
for (int i = 0; i < list.Length; i++) {
if (list[i].Value == value) {
for (int j = i; j < list.Length - 1; j++) {
list[j] = list[j + 1];
//if (list[j + 1] == null) {
// break;
//}
}
list[list.Length - 1] = null;
break;
}
}
}
}
I would just create this class and call the test() method. I did this in a loop for 25 times.
My findings:
The first round it takes a lot longer than the other 24, I think this is because of caching, but I am not sure.
When I use a value that is in the start of the list, it has to move more items in memory than when I use a value at the end, though it still seems to take less time.
Benchtimes differ quite a bit.
When I enable the commented if, performance goes up (10-20%) even if the value I search for is almost at the end of the list (which means the if goes off a lot of times without actually being useful).
I have no idea why these things happen, is there someone who can explain (some of) them? And maybe if someone sees this who is a pro at this, where can I find more info to do this the most efficient way?
Edit after testing:
I did some testing and found some interesting results. I run the test on an array with a size of a million items, filled with a million objects. I run that 25 times and report the cumulative time in milliseconds. I do that 10 times and take the average of that as a final value.
When I run the test with my function described just above here I get a score of:
362,1
When I run it with the answer of dbc I get a score of:
846,4
So mine was faster, but then I started to experiment with a half empty empty array and things started to get weird. To get rid of the inevitable nullPointerExceptions I added an extra check to the if (thinking it would ruin a bit more of the performance) like so:
if (fromItem != null && fromItem.Value != value)
list[to++] = fromItem;
This seemed to not only work, but improve performance dramatically! Now I get a score of:
247,9
The weird thing is, the scores seem to low to be true, but sometimes spike, this is the set I took the avg from:
94, 26, 966, 36, 632, 95, 47, 35, 109, 439
So the extra evaluation seems to improve my performance, despite of doing an extra check. How is this possible?
You are using Stopwatch to time your method. This calculates the total clock time taken during your method call, which could include the time required for .Net to initially JIT your method, interruptions for garbage collection, or slowdowns caused by system loads from other processes. Noise from these sources will likely dominate noise due to cache misses.
This answer gives some suggestions as to how you can minimize some of the noise from garbage collection or other processes. To eliminate JIT noise, you should call your method once without timing it -- or show the time taken by the first call in a separate column in your results table since it will be so different. You might also consider using a proper profiler which will report exactly how much time your code used exclusive of "noise" from other threads or processes.
Finally, I'll note that your algorithm to remove matching items from an array and shift everything else down uses a nested loop, which is not necessary and will access items in the array after the matching index twice. The standard algorithm looks like this:
public static void RemoveFromArray(this Obj[] array, int value)
{
int to = 0;
for (int from = 0; from < array.Length; from++)
{
var fromItem = array[from];
if (fromItem.Value != value)
array[to++] = fromItem;
}
for (; to < array.Length; to++)
{
array[to] = default(Obj);
}
}
However, instead of using the standard algorithm you might experiment by using Array.RemoveAt() with your version, since (I believe) internally it does the removal in unmanaged code.
How expensive are exceptions in C#? It seems like they are not incredibly expensive as long as the stack is not deep; however I have read conflicting reports.
Is there definitive report that hasn't been rebutted?
Having read that exceptions are costly in terms of performance I threw together a simple measurement program, very similar to the one Jon Skeet published years ago. I mention this here mainly to provide updated numbers.
It took the program below 29914 milliseconds to process one million exceptions, which amounts to 33 exceptions per millisecond. That is fast enough to make exceptions a viable alternative to return codes for most situations.
Please note, though, that with return codes instead of exceptions the same program runs less than one millisecond, which means exceptions are at least 30,000 times slower than return codes. As stressed by Rico Mariani these numbers are also minimum numbers. In practice, throwing and catching an exception will take more time.
Measured on a laptop with Intel Core2 Duo T8100 # 2,1 GHz with .NET 4.0 in release build not run under debugger (which would make it way slower).
This is my test code:
static void Main(string[] args)
{
int iterations = 1000000;
Console.WriteLine("Starting " + iterations.ToString() + " iterations...\n");
var stopwatch = new Stopwatch();
// Test exceptions
stopwatch.Reset();
stopwatch.Start();
for (int i = 1; i <= iterations; i++)
{
try
{
TestExceptions();
}
catch (Exception)
{
// Do nothing
}
}
stopwatch.Stop();
Console.WriteLine("Exceptions: " + stopwatch.ElapsedMilliseconds.ToString() + " ms");
// Test return codes
stopwatch.Reset();
stopwatch.Start();
int retcode;
for (int i = 1; i <= iterations; i++)
{
retcode = TestReturnCodes();
if (retcode == 1)
{
// Do nothing
}
}
stopwatch.Stop();
Console.WriteLine("Return codes: " + stopwatch.ElapsedMilliseconds.ToString() + " ms");
Console.WriteLine("\nFinished.");
Console.ReadKey();
}
static void TestExceptions()
{
throw new Exception("Failed");
}
static int TestReturnCodes()
{
return 1;
}
I guess I'm in the camp that if performance of exceptions impacts your application then you're throwing WAY too many of them. Exceptions should be for exceptional conditions, not as routine error handling.
That said, my recollection of how exceptions are handled is essentially walking up the stack finding a catch statement that matches the type of the exception thrown. So performance will be impacted most by how deep you are from the catch and how many catch statements you have.
In my case, exceptions were very expensive. I rewrote this:
public BlockTemplate this[int x,int y, int z]
{
get
{
try
{
return Data.BlockTemplate[World[Center.X + x, Center.Y + y, Center.Z + z]];
}
catch(IndexOutOfRangeException e)
{
return Data.BlockTemplate[BlockType.Air];
}
}
}
Into this:
public BlockTemplate this[int x,int y, int z]
{
get
{
int ix = Center.X + x;
int iy = Center.Y + y;
int iz = Center.Z + z;
if (ix < 0 || ix >= World.GetLength(0)
|| iy < 0 || iy >= World.GetLength(1)
|| iz < 0 || iz >= World.GetLength(2))
return Data.BlockTemplate[BlockType.Air];
return Data.BlockTemplate[World[ix, iy, iz]];
}
}
And I noticed a good speed increase of about 30 seconds. This function gets called at least 32,000 times at startup. The code isn't as clear as to what the intention is, but the cost savings were huge.
I did my own measurements to find out how serious the exceptions implication were. I didn't try to measure the absolute time for throwing/catching exception. I was mostly interested in how much slower a loop will become if an exception is thrown in each pass. The measuring code looks like this:
for(; ; ) {
iValue = Level1(iValue);
lCounter += 1;
if(DateTime.Now >= sFinish)
break;
}
vs.
for(; ; ) {
try {
iValue = Level3Throw(iValue);
}
catch(InvalidOperationException) {
iValue += 3;
}
lCounter += 1;
if(DateTime.Now >= sFinish)
break;
}
The difference is 20 times. The second snippet is 20 times slower.
Barebones exception objects in C# are fairly lightweight; it's usually the ability to encapsulate an InnerException that makes it heavy when the object tree becomes too deep.
As for a definitive report, I'm not aware of any, although a cursory dotTrace profile (or any other profiler) for memory consumption and speed will be fairly easy to do.
Just to give my personal experience:
I'm working on a program that parses JSON files and extracts data from them, with Newtonsoft (Json.NET).
I rewrote this:
Option 1, with exceptions
try
{
name = rawPropWithChildren.Value["title"].ToString();
}
catch(System.NullReferenceException)
{
name = rawPropWithChildren.Name;
}
To this:
Option 2, without exceptions
if(rawPropWithChildren.Value["title"] == null)
{
name = rawPropWithChildren.Name;
}
else
{
name = rawPropWithChildren.Value["title"].ToString();
}
Of course, you don't really have context to judge about it, but here are my results (in debug mode):
Option 1, with exceptions.
38.50 seconds
Option 2, without exceptions.
06.48 seconds
To give a little bit of context, I'm working with thousands of JSON properties that can be null. Exceptions were thrown way too much, like maybe during 15% of the execution time. Well, not really precise, but they were thrown too many times.
I wanted to fix this, so I changed my code, and I did not understand why the execution time was so much faster. That was because of my poor exception handling.
So, what I've learned from this: I need to use exceptions only in particular cases and for things that can't be tested with a simple conditional statement. They also must be thrown the less often possible.
This is a kind of a random story for you, but I guess I would definitely think twice before using exceptions in my code from now on!
The performance hit with exceptions seems to be at the point of generating the exception object (albeit too small to cause any concerns 90% of the time). The recommendation therefore is to profile your code - if exceptions are causing a performance hit, you write a new high-perf method that does not use exceptions. (An example that comes to mind would be (TryParse introduced to overcome perf issues with Parse which uses exceptions)
THat said, exceptions in most cases do not cause significant performance hits in most situations - so the MS Design Guideline is to report failures by throwing exceptions
TLDR;
The answer should always start by answering "expensive compared to what?"
Exceptions are likely orders of magnitude faster than any connected service or data call so it's unlikely that avoiding their use is providing a tangible benefit over the improved information and control flow that they provide.
Throwing an exception can be measured in MICROseconds (but it depends on stack depth):
Image from this article and he posts the testing code: .Net exceptions performance
Do you generally get what you pay for? Most of the time yes.
Longer Explanation:
I'm quite interested in the origins of this question. As far as I can tell it is residual distaste for marginally useful c++ exceptions. .Net exceptions have a wealth of info in them and allow for neat and tidy code without excessive checks for success and logging. I explain much of the benefit of exceptions in another answer.
In 20 years of programming, I've never removed a throw or a catch to make something faster (not to say that I couldn't, just to say that there was lower hanging fruit and after that, nobody complained).
There is a separate question with competing answers, one catching an exception (no "Try" method was provided by the built-in method) and one that avoided exceptions.
I decided to do a head to head performance comparison of the two, and for a smaller number of columns the non-exception version was faster, but the exception version scaled better and eventually outperformed the exception-avoiding version:
The linqpad code for that test is below (including the graph rendering).
The point here though, is that this idea of "exceptions are slow" begs the question of "slower than what?" If a deep-stack exception costs 500 microseconds, does it matter if it occurs in response to a unique constraint that took a database 3000 microseconds to create? In any case, this demonstrates a generalized avoiding of exceptions for performance reasons will not necessarily yield more performant code.
Code for performance test:
void Main()
{
var loopResults = new List<Results>();
var exceptionResults = new List<Results>();
var totalRuns = 10000;
for (var colCount = 1; colCount < 20; colCount++)
{
using (var conn = new SqlConnection(#"Data Source=(localdb)\MSSQLLocalDb;Initial Catalog=master;Integrated Security=True;"))
{
conn.Open();
//create a dummy table where we can control the total columns
var columns = String.Join(",",
(new int[colCount]).Select((item, i) => $"'{i}' as col{i}")
);
var sql = $"select {columns} into #dummyTable";
var cmd = new SqlCommand(sql,conn);
cmd.ExecuteNonQuery();
var cmd2 = new SqlCommand("select * from #dummyTable", conn);
var reader = cmd2.ExecuteReader();
reader.Read();
Func<Func<IDataRecord, String, Boolean>, List<Results>> test = funcToTest =>
{
var results = new List<Results>();
Random r = new Random();
for (var faultRate = 0.1; faultRate <= 0.5; faultRate += 0.1)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
var faultCount=0;
for (var testRun = 0; testRun < totalRuns; testRun++)
{
if (r.NextDouble() <= faultRate)
{
faultCount++;
if(funcToTest(reader, "colDNE"))
throw new ApplicationException("Should have thrown false");
}
else
{
for (var col = 0; col < colCount; col++)
{
if(!funcToTest(reader, $"col{col}"))
throw new ApplicationException("Should have thrown true");
}
}
}
stopwatch.Stop();
results.Add(new UserQuery.Results{
ColumnCount = colCount,
TargetNotFoundRate = faultRate,
NotFoundRate = faultCount * 1.0f / totalRuns,
TotalTime=stopwatch.Elapsed
});
}
return results;
};
loopResults.AddRange(test(HasColumnLoop));
exceptionResults.AddRange(test(HasColumnException));
}
}
"Loop".Dump();
loopResults.Dump();
"Exception".Dump();
exceptionResults.Dump();
var combinedResults = loopResults.Join(exceptionResults,l => l.ResultKey, e=> e.ResultKey, (l, e) => new{ResultKey = l.ResultKey, LoopResult=l.TotalTime, ExceptionResult=e.TotalTime});
combinedResults.Dump();
combinedResults
.Chart(r => r.ResultKey, r => r.LoopResult.Milliseconds * 1.0 / totalRuns, LINQPad.Util.SeriesType.Line)
.AddYSeries(r => r.ExceptionResult.Milliseconds * 1.0 / totalRuns, LINQPad.Util.SeriesType.Line)
.Dump();
}
public static bool HasColumnLoop(IDataRecord dr, string columnName)
{
for (int i = 0; i < dr.FieldCount; i++)
{
if (dr.GetName(i).Equals(columnName, StringComparison.InvariantCultureIgnoreCase))
return true;
}
return false;
}
public static bool HasColumnException(IDataRecord r, string columnName)
{
try
{
return r.GetOrdinal(columnName) >= 0;
}
catch (IndexOutOfRangeException)
{
return false;
}
}
public class Results
{
public double NotFoundRate { get; set; }
public double TargetNotFoundRate { get; set; }
public int ColumnCount { get; set; }
public double ResultKey {get => ColumnCount + TargetNotFoundRate;}
public TimeSpan TotalTime { get; set; }
}
I recently measured C# exceptions (throw and catch) in a summation loop that threw an arithmetic overflow on every addition. Throw and catch of arithmetic overflow was around 8.5 microseconds = 117 KiloExceptions/second, on a quad-core laptop.
Exceptions are expensive, but there is more to it when you want to choose between exception and return codes.
Historically speaking the argument was: exceptions ensure that code is forced to handle the situation whereas return codes can be ignored. I never favoured these arguments as no programmer will want to ignore and break their codes on purpose - especially a good test team / or a well written test case will definitely not ignore return codes.
From a modern programming practices point of view, managing exceptions need to be looked at not only for their cost, but also for their viability.
First
Since most front ends will be disconnected from the API that is throwing exception. For example, a mobile app using a REST API. The same API can also be used for an Angular-based web frontend.
Either scenario will prefer return codes instead of exceptions.
Second
Nowadays, hackers randomly attempt to break all web utilities. In such a scenario, if they are constantly attacking your app's login API and if the app is constantly throwing exceptions, then you will end up dealing with thousands of exceptions a day. Of course, many will say the firewall will take care of such attacks. However, not all are spending money to manage a dedicated firewall or an expensive anti-spam service. It is better that your code is prepared for these scenarios.
In C#/VB.NET/.NET, which loop runs faster, for or foreach?
Ever since I read that a for loop works faster than a foreach loop a long time ago I assumed it stood true for all collections, generic collections, all arrays, etc.
I scoured Google and found a few articles, but most of them are inconclusive (read comments on the articles) and open ended.
What would be ideal is to have each scenario listed and the best solution for the same.
For example (just an example of how it should be):
for iterating an array of 1000+
strings - for is better than foreach
for iterating over IList (non generic) strings - foreach is better
than for
A few references found on the web for the same:
Original grand old article by Emmanuel Schanzer
CodeProject FOREACH Vs. FOR
Blog - To foreach or not to foreach, that is the question
ASP.NET forum - NET 1.1 C# for vs foreach
[Edit]
Apart from the readability aspect of it, I am really interested in facts and figures. There are applications where the last mile of performance optimization squeezed do matter.
Patrick Smacchia blogged about this last month, with the following conclusions:
for loops on List are a bit more than 2 times cheaper than foreach
loops on List.
Looping on array is around 2 times cheaper than looping on List.
As a consequence, looping on array using for is 5 times cheaper
than looping on List using foreach
(which I believe, is what we all do).
First, a counter-claim to Dmitry's (now deleted) answer. For arrays, the C# compiler emits largely the same code for foreach as it would for an equivalent for loop. That explains why for this benchmark, the results are basically the same:
using System;
using System.Diagnostics;
using System.Linq;
class Test
{
const int Size = 1000000;
const int Iterations = 10000;
static void Main()
{
double[] data = new double[Size];
Random rng = new Random();
for (int i=0; i < data.Length; i++)
{
data[i] = rng.NextDouble();
}
double correctSum = data.Sum();
Stopwatch sw = Stopwatch.StartNew();
for (int i=0; i < Iterations; i++)
{
double sum = 0;
for (int j=0; j < data.Length; j++)
{
sum += data[j];
}
if (Math.Abs(sum-correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("For loop: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (int i=0; i < Iterations; i++)
{
double sum = 0;
foreach (double d in data)
{
sum += d;
}
if (Math.Abs(sum-correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("Foreach loop: {0}", sw.ElapsedMilliseconds);
}
}
Results:
For loop: 16638
Foreach loop: 16529
Next, validation that Greg's point about the collection type being important - change the array to a List<double> in the above, and you get radically different results. Not only is it significantly slower in general, but foreach becomes significantly slower than accessing by index. Having said that, I would still almost always prefer foreach to a for loop where it makes the code simpler - because readability is almost always important, whereas micro-optimisation rarely is.
foreach loops demonstrate more specific intent than for loops.
Using a foreach loop demonstrates to anyone using your code that you are planning to do something to each member of a collection irrespective of its place in the collection. It also shows you aren't modifying the original collection (and throws an exception if you try to).
The other advantage of foreach is that it works on any IEnumerable, where as for only makes sense for IList, where each element actually has an index.
However, if you need to use the index of an element, then of course you should be allowed to use a for loop. But if you don't need to use an index, having one is just cluttering your code.
There are no significant performance implications as far as I'm aware. At some stage in the future it might be easier to adapt code using foreach to run on multiple cores, but that's not something to worry about right now.
Any time there's arguments over performance, you just need to write a small test so that you can use quantitative results to support your case.
Use the StopWatch class and repeat something a few million times, for accuracy. (This might be hard without a for loop):
using System.Diagnostics;
//...
Stopwatch sw = new Stopwatch()
sw.Start()
for(int i = 0; i < 1000000;i ++)
{
//do whatever it is you need to time
}
sw.Stop();
//print out sw.ElapsedMilliseconds
Fingers crossed the results of this show that the difference is negligible, and you might as well just do whatever results in the most maintainable code
It will always be close. For an array, sometimes for is slightly quicker, but foreach is more expressive, and offers LINQ, etc. In general, stick with foreach.
Additionally, foreach may be optimised in some scenarios. For example, a linked list might be terrible by indexer, but it might be quick by foreach. Actually, the standard LinkedList<T> doesn't even offer an indexer for this reason.
My guess is that it will probably not be significant in 99% of the cases, so why would you choose the faster instead of the most appropriate (as in easiest to understand/maintain)?
There are very good reasons to prefer foreach loops over for loops. If you can use a foreach loop, your boss is right that you should.
However, not every iteration is simply going through a list in order one by one. If he is forbidding for, yes that is wrong.
If I were you, what I would do is turn all of your natural for loops into recursion. That'd teach him, and it's also a good mental exercise for you.
There is unlikely to be a huge performance difference between the two. As always, when faced with a "which is faster?" question, you should always think "I can measure this."
Write two loops that do the same thing in the body of the loop, execute and time them both, and see what the difference in speed is. Do this with both an almost-empty body, and a loop body similar to what you'll actually be doing. Also try it with the collection type that you're using, because different types of collections can have different performance characteristics.
Jeffrey Richter on TechEd 2005:
"I have come to learn over the years the C# compiler is basically a liar to me." .. "It lies about many things." .. "Like when you do a foreach loop..." .. "...that is one little line of code that you write, but what the C# compiler spits out in order to do that it's phenomenal. It puts out a try/finally block in there, inside the finally block it casts your variable to an IDisposable interface, and if the cast suceeds it calls the Dispose method on it, inside the loop it calls the Current property and the MoveNext method repeatedly inside the loop, objects are being created underneath the covers. A lot of people use foreach because it's very easy coding, very easy to do.." .. "foreach is not very good in terms of performance, if you iterated over a collection instead by using square bracket notation, just doing index, that's just much faster, and it doesn't create any objects on the heap..."
On-Demand Webcast:
http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032292286&EventCategory=3&culture=en-US&CountryCode=US
you can read about it in Deep .NET - part 1 Iteration
it's cover the results (without the first initialization) from .NET source code all the way to the disassembly.
for example - Array Iteration with a foreach loop:
and - list iteration with foreach loop:
and the end results:
In cases where you work with a collection of objects, foreach is better, but if you increment a number, a for loop is better.
Note that in the last case, you could do something like:
foreach (int i in Enumerable.Range(1, 10))...
But it certainly doesn't perform better, it actually has worse performance compared to a for.
This should save you:
public IEnumerator<int> For(int start, int end, int step) {
int n = start;
while (n <= end) {
yield n;
n += step;
}
}
Use:
foreach (int n in For(1, 200, 4)) {
Console.WriteLine(n);
}
For greater win, you may take three delegates as parameters.
The differences in speed in a for- and a foreach-loop are tiny when you're looping through common structures like arrays, lists, etc, and doing a LINQ query over the collection is almost always slightly slower, although it's nicer to write! As the other posters said, go for expressiveness rather than a millisecond of extra performance.
What hasn't been said so far is that when a foreach loop is compiled, it is optimised by the compiler based on the collection it is iterating over. That means that when you're not sure which loop to use, you should use the foreach loop - it will generate the best loop for you when it gets compiled. It's more readable too.
Another key advantage with the foreach loop is that if your collection implementation changes (from an int array to a List<int> for example) then your foreach loop won't require any code changes:
foreach (int i in myCollection)
The above is the same no matter what type your collection is, whereas in your for loop, the following will not build if you changed myCollection from an array to a List:
for (int i = 0; i < myCollection.Length, i++)
This has the same two answers as most "which is faster" questions:
1) If you don't measure, you don't know.
2) (Because...) It depends.
It depends on how expensive the "MoveNext()" method is, relative to how expensive the "this[int index]" method is, for the type (or types) of IEnumerable that you will be iterating over.
The "foreach" keyword is shorthand for a series of operations - it calls GetEnumerator() once on the IEnumerable, it calls MoveNext() once per iteration, it does some type checking, and so on. The thing most likely to impact performance measurements is the cost of MoveNext() since that gets invoked O(N) times. Maybe it's cheap, but maybe it's not.
The "for" keyword looks more predictable, but inside most "for" loops you'll find something like "collection[index]". This looks like a simple array indexing operation, but it's actually a method call, whose cost depends entirely on the nature of the collection that you're iterating over. Probably it's cheap, but maybe it's not.
If the collection's underlying structure is essentially a linked list, MoveNext is dirt-cheap, but the indexer might have O(N) cost, making the true cost of a "for" loop O(N*N).
"Are there any arguments I could use to help me convince him the for loop is acceptable to use?"
No, if your boss is micromanaging to the level of telling you what programming language constructs to use, there's really nothing you can say. Sorry.
Every language construct has an appropriate time and place for usage. There is a reason the C# language has a four separate iteration statements - each is there for a specific purpose, and has an appropriate use.
I recommend sitting down with your boss and trying to rationally explain why a for loop has a purpose. There are times when a for iteration block more clearly describes an algorithm than a foreach iteration. When this is true, it is appropriate to use them.
I'd also point out to your boss - Performance is not, and should not be an issue in any practical way - it's more a matter of expression the algorithm in a succinct, meaningful, maintainable manner. Micro-optimizations like this miss the point of performance optimization completely, since any real performance benefit will come from algorithmic redesign and refactoring, not loop restructuring.
If, after a rational discussion, there is still this authoritarian view, it is up to you as to how to proceed. Personally, I would not be happy working in an environment where rational thought is discouraged, and would consider moving to another position under a different employer. However, I strongly recommend discussion prior to getting upset - there may just be a simple misunderstanding in place.
It probably depends on the type of collection you are enumerating and the implementation of its indexer. In general though, using foreach is likely to be a better approach.
Also, it'll work with any IEnumerable - not just things with indexers.
Whether for is faster than foreach is really besides the point. I seriously doubt that choosing one over the other will make a significant impact on your performance.
The best way to optimize your application is through profiling of the actual code. That will pinpoint the methods that account for the most work/time. Optimize those first. If performance is still not acceptable, repeat the procedure.
As a general rule I would recommend to stay away from micro optimizations as they will rarely yield any significant gains. Only exception is when optimizing identified hot paths (i.e. if your profiling identifies a few highly used methods, it may make sense to optimize these extensively).
It is what you do inside the loop that affects perfomance, not the actual looping construct (assuming your case is non-trivial).
The two will run almost exactly the same way. Write some code to use both, then show him the IL. It should show comparable computations, meaning no difference in performance.
In most cases there's really no difference.
Typically you always have to use foreach when you don't have an explicit numerical index, and you always have to use for when you don't actually have an iterable collection (e.g. iterating over a two-dimensional array grid in an upper triangle). There are some cases where you have a choice.
One could argue that for loops can be a little more difficult to maintain if magic numbers start to appear in the code. You should be right to be annoyed at not being able to use a for loop and have to build a collection or use a lambda to build a subcollection instead just because for loops have been banned.
It seems a bit strange to totally forbid the use of something like a for loop.
There's an interesting article here that covers a lot of the performance differences between the two loops.
I would say personally I find foreach a bit more readable over for loops but you should use the best for the job at hand and not have to write extra long code to include a foreach loop if a for loop is more appropriate.
You can really screw with his head and go for an IQueryable .foreach closure instead:
myList.ForEach(c => Console.WriteLine(c.ToString());
for has more simple logic to implement so it's faster than foreach.
Unless you're in a specific speed optimization process, I would say use whichever method produces the easiest to read and maintain code.
If an iterator is already setup, like with one of the collection classes, then the foreach is a good easy option. And if it's an integer range you're iterating, then for is probably cleaner.
Jeffrey Richter talked the performance difference between for and foreach on a recent podcast: http://pixel8.infragistics.com/shows/everything.aspx#Episode:9317
I did test it a while ago, with the result that a for loop is much faster than a foreach loop. The cause is simple, the foreach loop first needs to instantiate an IEnumerator for the collection.
I found the foreach loop which iterating through a List faster. See my test results below. In the code below I iterate an array of size 100, 10000 and 100000 separately using for and foreach loop to measure the time.
private static void MeasureTime()
{
var array = new int[10000];
var list = array.ToList();
Console.WriteLine("Array size: {0}", array.Length);
Console.WriteLine("Array For loop ......");
var stopWatch = Stopwatch.StartNew();
for (int i = 0; i < array.Length; i++)
{
Thread.Sleep(1);
}
stopWatch.Stop();
Console.WriteLine("Time take to run the for loop is {0} millisecond", stopWatch.ElapsedMilliseconds);
Console.WriteLine(" ");
Console.WriteLine("Array Foreach loop ......");
var stopWatch1 = Stopwatch.StartNew();
foreach (var item in array)
{
Thread.Sleep(1);
}
stopWatch1.Stop();
Console.WriteLine("Time take to run the foreach loop is {0} millisecond", stopWatch1.ElapsedMilliseconds);
Console.WriteLine(" ");
Console.WriteLine("List For loop ......");
var stopWatch2 = Stopwatch.StartNew();
for (int i = 0; i < list.Count; i++)
{
Thread.Sleep(1);
}
stopWatch2.Stop();
Console.WriteLine("Time take to run the for loop is {0} millisecond", stopWatch2.ElapsedMilliseconds);
Console.WriteLine(" ");
Console.WriteLine("List Foreach loop ......");
var stopWatch3 = Stopwatch.StartNew();
foreach (var item in list)
{
Thread.Sleep(1);
}
stopWatch3.Stop();
Console.WriteLine("Time take to run the foreach loop is {0} millisecond", stopWatch3.ElapsedMilliseconds);
}
UPDATED
After #jgauffin suggestion I used #johnskeet code and found that the for loop with array is faster than following,
Foreach loop with array.
For loop with list.
Foreach loop with list.
See my test results and code below,
private static void MeasureNewTime()
{
var data = new double[Size];
var rng = new Random();
for (int i = 0; i < data.Length; i++)
{
data[i] = rng.NextDouble();
}
Console.WriteLine("Lenght of array: {0}", data.Length);
Console.WriteLine("No. of iteration: {0}", Iterations);
Console.WriteLine(" ");
double correctSum = data.Sum();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
double sum = 0;
for (int j = 0; j < data.Length; j++)
{
sum += data[j];
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("For loop with Array: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (var i = 0; i < Iterations; i++)
{
double sum = 0;
foreach (double d in data)
{
sum += d;
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("Foreach loop with Array: {0}", sw.ElapsedMilliseconds);
Console.WriteLine(" ");
var dataList = data.ToList();
sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
double sum = 0;
for (int j = 0; j < dataList.Count; j++)
{
sum += data[j];
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("For loop with List: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
double sum = 0;
foreach (double d in dataList)
{
sum += d;
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("Foreach loop with List: {0}", sw.ElapsedMilliseconds);
}
A powerful and precise way to measure time is by using the BenchmarkDotNet library.
In the following sample, I did a loop on 1,000,000,000 integer records on for/foreach and measured it with BenchmarkDotNet:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Program
{
public static void Main()
{
BenchmarkRunner.Run<LoopsBenchmarks>();
}
}
[MemoryDiagnoser]
public class LoopsBenchmarks
{
private List<int> arr = Enumerable.Range(1, 1_000_000_000).ToList();
[Benchmark]
public void For()
{
for (int i = 0; i < arr.Count; i++)
{
int item = arr[i];
}
}
[Benchmark]
public void Foreach()
{
foreach (int item in arr)
{
}
}
}
And here are the results:
Conclusion
In the example above we can see that for loop is slightly faster than foreach loop for lists. We can also see that both use the same memory allocation.
I wouldn't expect anyone to find a "huge" performance difference between the two.
I guess the answer depends on the whether the collection you are trying to access has a faster indexer access implementation or a faster IEnumerator access implementation. Since IEnumerator often uses the indexer and just holds a copy of the current index position, I would expect enumerator access to be at least as slow or slower than direct index access, but not by much.
Of course this answer doesn't account for any optimizations the compiler may implement.