C# Double.ToString() performance issue - c#

I have the following method to convert a double array to a List<string>:
static Dest Test(Source s)
{
Dest d = new Dest();
if (s.A24 != null)
{
double[] dd = s.A24;
int cnt = dd.Length;
List<string> lst = new List<string>();
for (int i = 0; i < cnt; i++)
lst.Add(((double)dd[i]).ToString());
d.A24 = lst;
}
else
{
d.A24 = null;
}
return d;
}
Doing a List.Add() in a loop seems like the fastest way according to my benchmarks beating all the various LINQ and Convert tricks.
This is really slow. 2400ms for a million calls (Any CPU, prefer 64-bit). So I was experimenting with various ways to make it faster. Let's assume I cannot cache the source or dest lists, etc obviously.
So anyways, I stumbled across something weird here... if I change the lst.Add() line to cast to a decimal instead of a double, it is much, MUCH faster. 900ms vs 2400ms.
Here's my questions
1) decimal has greater accuracy then double, so I shouldn't lose anything in the type cast, correct?
2) why is the Decimal.ToString() so much faster then Double.ToString()?
3) is this a reasonable optimization, or am I missing some key detail where this will come back to bite me?
I'm not concerned about using up a little bit more memory, I am only concerned about performance.
Nothing sophisticated for the test data at this point, just using:
s.A24 = new double[] { 1.2, 3.4, 5.6 };

For what it's worth, I ran the following and got different results, with decimal usually taking slightly longer (but both calls of the calls to lst.Add() and number.ToString() being roughly equivalent).
What type of collection is A24 in your code? I wouldn't be surprised if the additional overhead you're seeing is actually in casting or something you're not currently looking at.
var iterations = 1000000;
var lst = new List<string>();
var rnd = new Random();
var dblArray = new double[iterations];
for (var i = 0; i < iterations; i++)
//INTERESTING FINDING FROM THE COMMENTS
//double.ToString() is faster if this line is rnd.NextDouble()
//but decimal.ToString() is faster if hard-coding the value "3.5"
//(despite the overhead of casting to decimal)
dblArray[i] = rnd.NextDouble();
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < iterations; i++)
lst.Add(dblArray[i].ToString());
sw.Stop();
//takes 280-300 MS
Debug.WriteLine("Double loop MS: " + sw.ElapsedMilliseconds);
//reset list
lst = new List<string>();
sw.Restart();
for (var i = 0; i < iterations; i++)
lst.Add(((decimal)dblArray[i]).ToString());
sw.Stop();
//takes 280-320 MS
Debug.WriteLine("Decimal loop MS: " + sw.ElapsedMilliseconds);

A Decimal and Double are often confused and interchanged, but they are completely different animals at the processor level. If I had to imagine writing the code to for Double.ToString(), I can see the problem... It's hard. Comparatively, writing the code for Decimal.ToString() shouldn't be much more difficult than Int32.ToString(). I'm sure if you compare Int32.ToString() to Decimal.ToString() you will find they results are very close.
FYI: Double and Float(Single) are not exact, and many numbers can't be expressed in a Double. In your example, you give 1.2 which is really 1 + 1/5. That can't exist as a true double (even if the VS IDE covers for it). You would get something like 1.1999999999999998. If you want performance, use a Decimal.

2) why is the Decimal.ToString() so much faster then Double.ToString()?
Because Double.ToString() is actually much more complex. Compare the core implementation of Decimal.ToString() and Double.ToString(), Decimal.ToString() has fixed precision whereas Double.ToString()'s precision is on demand. Double has its IEEE floating point definition which is also much more complex than Decimal.
Current Double.ToString() implementation relies on _ecvt on Windows and snprintf on Linux. They're inefficient (especially for Linux implementation). There's an in-progress PR to re-write Double.ToString() in an efficient way which removes the dependency of _ecvt and snprintf.

Related

Calculating the approximate run time of a for loop

I have a piece of code in my C# Windows Form Application that looks like below:
List<string> RESULT_LIST = new List<string>();
int[] arr = My_LIST.ToArray();
string s = "";
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < arr.Length; i++)
{
int counter = i;
for (int j = 1; j <= arr.Length; j++)
{
counter++;
if (counter == arr.Length)
{
counter = 0;
}
s += arr[counter].ToString();
RESULT_LIST.Add(s);
}
s = "";
}
sw.Stop();
TimeSpan ts = sw.Elapsed;
string elapsedTime = String.Format("{0:00}", ts.TotalMilliseconds * 1000);
MessageBox.Show(elapsedTime);
I use this code to get any combination of the numbers of My list. I have behaved with My_LIST like a recursive one. The image below demonstrates my purpose very clearly:
All I need to do is:
Making a formula to calculate the approximate run time of these two
nested for loops to guess the run time for any length and help the
user know the approximate time that he/she must wait.
I have used a C# Stopwatch like this: Stopwatch sw = new Stopwatch(); to show the run time and below are the results(Note that in order to reduce the chance of error I've repeated the calculation three times for each length and the numbers show the time in nano seconds for the first, second and third attempt respectively.):
arr.Length = 400; 127838 - 107251 - 100898
arr.Length = 800; 751282 - 750574 - 739869
arr.Length = 1200; 2320517 - 2136107 - 2146099
arr.Length = 2000; 8502631 - 7554743 - 7635173
Note that there are only one-digit numbers in My_LIST to make the time
of adding numbers to the list approximately equal.
How can I find out the relation between arr.Length and run time?
First, let's suppose you have examined the algorithm and noticed that it appears to be quadratic in the array length. This suggests to us that the time taken to run should be a function of the form
t = A + B n + C n2
You've gathered some observations by running the code multiple times with different values for n and measuring t. That's a good approach.
The question now is: what are the best values for A, B and C such that they match your observations closely?
This problem can be solved in a variety of ways; I would suggest to you that the least-squares method of regression would be the place to start, and see if you get good results. There's a page on it here:
www.efunda.com/math/leastsquares/lstsqr2dcurve.cfm
UPDATE: I just looked at your algorithm again and realized it is cubic because you have a quadratic string concat in the inner loop. So this technique might not work so well. I suggest you use StringBuilder to make your algorithm quadratic.
Now, suppose you did not know ahead of time that the problem was quadratic. How would you determine the formula then? A good start would be to graph your points on log scale paper; if they roughly form a straight line then the slope of the line gives you a clue as to the power of the polynomial. If they don't form a straight line -- well, cross that bridge when you come to it.
Well you gonna do some math here.
Since the total number of runs is exactly n^2, not O(n^2) but exactly n^2 times.
Then what you could do is to keep a counter variable for the number of items processed and use math to find out an estimate
int numItemProcessed;
int timeElapsed;//read from stop watch
int totalItems = n * n;
int remainingEstimate = ((float) totalItems - numItemProcessed) / numItemProcessed) * timeElapsed
Don't assume the algorithm is necessarily N^2 in time complexity.
Take the averages of your numbers, and plot the best fit on a log-log plot, then measure the gradient. This will give you an idea as to the largest term in the polynomial. (see wikipedia log-log plot)
Once you have that, you can do a least-squares regression to work out the coefficients of the polynomial of the correct order. This will allow an estimate from the data, of the time taken for an unseen problem.
Note: As Eric Lippert said, it depends on what you want to measure - averaging may not be appropriate depending on your use case - the first run time might be more correct.
This method will work for any polynomial algorithm. It will also tell you if the algorithm is polynomial (non-polynomial running times will not give straight lines on the log-log plot).

C# Finding an element in a List

Let's say I have the following C# code
var my_list = new List<string>();
// Filling the list with tons of sentences.
string sentence = Console.ReadLine();
Is there any difference between doing either of the following ?
bool c1 = my_list.Contains(sentence);
bool c2 = my_list.Any(s => s == sentence);
I imagine the pure algorithmic behind isn't exactly the same. But what are the actual differences on my side? Is one way faster or more efficient than the other? Will one method sometime return true and the other false? What should I consider to pick one method or the other? Or is it purely up to me and both work in any situation?
The most upvoted answer isn't completely correct (and it's a reason big O doesn't always work). Any will be slower than Contains in this scenario (by about double).
Any will have an extra call every iteration, the delegate you specified on every item in your list, something contain does not have to do. An extra call will slow it down substantially.
The results will be the same, but the speed will be very different.
Example benchmark:
Stopwatch watch = new Stopwatch();
List<string> stringList = new List<string>();
for (int i = 0; i < 10000000; i++)
{
stringList.Add(i.ToString());
}
int t = 0;
watch.Start();
for (int i = 0; i < 1000000; i++)
if (stringList.Any(x => x == "29"))
t = i;
watch.Stop();
("Any takes: " + watch.ElapsedMilliseconds).Dump();
GC.Collect();
watch.Restart();
for (int i = 0; i < 1000000; i++)
if (stringList.Contains("29"))
t = i;
watch.Stop();
("Contains takes: " + watch.ElapsedMilliseconds).Dump();
Results:
Any takes: 481
Contains takes: 235
Size and amount of iterations will not effect the % difference, Any will always be slower.
Realistically, the two will operate in almost the same fashion: iterate the list's items and check to see if sentence matches any list elements, giving a complexity of about O(n). I would argue List.Contains since that is a little easier and more natural, but it's entirely preferential!
Now, if you're looking for something faster in terms of lookup complexity and speed, I'd suggest a HashSet<T>. HashSets have, generally speaking, a lookup of about O(1) since the hashing function, theoretically, should be a constant time operation. Again, just a suggestion :)
For string objects, there's no difference, since the == operator simply calls String.Equals.
However, for other objects, there could be differences between == and .Equals - looking at the implementation of .Contains, it will use the EqualityComparer<T>.Default, which hooks into Equals(T) as long as you class implements IEquatable<T> (where T is itself). Without overloading ==, most classes instead use referential comparison for == since that's what they inherit from Object.

Efficient way of counting specific occurrences during a for loop

I have a loop of about one million values. I want to store just the exponent of each value to display them e.g. in a histogram.
At the moment I'm doing it this way:
int histogram[51]; //init all with 0
for(int i = 0; i < 1000000; i++)
{
int exponent = getExponent(getValue(i));
//getExponent(double value) gives the exponent(base 10)
//getValue(int i) gives the value for loop i
if(exponent > 25)
exponent = 25;
if(exponent < -25)
exponent = -25;
histogramm[exponent+25]++;
}
Is there a more efficient and elegant way of doing this?
Perhaps without using an array?
Is there a more efficient and elegant way of doing this?
The only more elegant way would be to use Math.Min and Math.Max but it wouldn't be any more efficient. histogramm[Math.Max(0,Math.Min(50,exponent+25))]++ is fewer characters but no more performant or elegant in my opinion.
Perhaps without using an array?
An array is a raw set of values so is the most straightforward way of storing them.
Assuming that getExponent and getValue are optimized, the only way to optimize this is to use Parrallel.For. I don't think the difference will be significant, but i see it as the only way.
As for the array, it is the best indexed low-level data structure that you can use for storing data.
Use Parallel Library [ using System.Threading.Tasks; ]:
int[] histogram = new int[51]; //init all with 0
Action<int> work = (i) =>
{
int exponent = getExponent(getValue(i));
//getExponent(double value) gives the exponent(base 10)
//getValue(int i) gives the value for loop i
if (exponent > 25)
exponent = 25;
if (exponent < -25)
exponent = -25;
histogram[exponent + 25]++;
};
Parallel.For(0, 1000000, work);
There's not much to optimize with the conditional and array access. However, if the getExponent function is a time consuming operation, you can cache the results. A lot is really going to depend on what typical data looks like. Profile to see if it helps. Also, someone else mentioned using Parallel. That's worth profiling too as it could make a difference if getValue or getExponent are slow enough to overcome the Parallel and locking overhead.
Important note: Using double is probably a bad idea for a dictionary. But I'm confused by the math going on and conversions from double's to int's.
Either way, the idea here is to cache a calculation. So perhaps you can find a better way if this test proves useful.
int[] histogram = Enumerable.Repeat(0, 51).ToArray();
...
Dictionary<double, int> cache = new Dictionary<double, int>(histogram.Length);
...
for (int i = 0; i < 1000000; i++)
{
double value = getValue(i);
int exponent;
if(!cache.TryGetValue(value, out exponent))
{
exponent = getExponent(value);
cache[value] = exponent;
}
if (exponent > 25)
exponent = 25;
else if (exponent < -25)
exponent = -25;
histogram[exponent + 25]++;
}
Try using an public property to store your values for you exponent, so can test and check your results.
As for as elegance goes, I would simplify the if conditions with actual functions or subroutines, and create additional abstraction with clear and concise naming.

Optimization of algorithm that convert 8-bit number to char

I need to convert a 8-bit number such as 00001110 to char. The problem is easy so I wrote the code and everything is working fine, but now I need to optimize for speed as much as possible.
In test class :
class Program
{
static void Main(string[] args)
{
Random r = new Random();
int[] testTab = new int[8];
Normal n = new Normal();
long time;
Stopwatch watch = new Stopwatch();
watch.Start();
for (int i = 0; i < 9000; i++)
{
for (int j = 0; j < 8; j++)
{
testTab[j] = r.Next(2);
}
n.SetTable(testTab);
n.Decode();
}
watch.Stop();
time = watch.ElapsedTicks;
Console.WriteLine(time);
time = watch.ElapsedMilliseconds;
Console.WriteLine(time);
Console.ReadKey();
}
}
and class with algorithm :
class Normal
{
private int[] _tab = new int[8];
public void SetTable(int[] tab)
{
_tab = tab;
}
public void Decode()
{
char a = ((char)( _tab[0]*1 + _tab[1]*2 + _tab[2]*4 + _tab[3]*8 + _tab[4]*16 + _tab[5]*32 +
_tab[6]*64 + _tab[7]*124));
}
}
In the output for 9000 times I get time 2ms it is not a long time ( for 9000 ) time, but I have good proc in my PC.
The final code will be running in smartphone so there is no powerful CPU. In my algorithm I use random data, in final version I will load data by Camera (so it will be longer ) and try to repeat this operation 10 times in one second so that is why I need best time in even smallest operations.
Is there a faster way to convert byte to char than this?
char a = ((char)( _tab[0]*1 + _tab[1]*2 + _tab[2]*4 + _tab[3]*8 + _tab[4]*16 + _tab[5]*32 + _tab[6]*64 + _tab[7]*128));
tl;dr Your conversion code is already efficient, and is not your bottleneck.
Your benchmarking is flawed. You are not just timing the conversion of binary stored in int[] to integer value. You are also timing the generation of your random data. I expect that the majority of the time is spent generating the random data.
Re-write your benchmarking program to operate on data prepared before you start timing. Make sure that the duration of the test is at least 5 or 10 seconds so that you can generate meaningful answers. If you only run for two milliseconds then the granularity of your timer affects the quality of your results.
Bear in mind that in your real application you will be taking a picture on a camera of a QR code and decoding that. The cost of that is many orders of magnitude greater than the cost of converting the 8 bit int arrays.
Your code to do that conversion is already efficient. Do not seek to optimize it further. Not only is there no need to optimize it, there is little hope for significant gains. For the sake of clarity and conciseness you may well opt to use one of the .net library methods that perform such a conversion, but performance of this part of your program is not an issue.
As an aside, it looks like you need to be converting the 8 bit value to byte, adding these values to a byte array, and then feeding to Encoding.GetString to obtain your text. A cast to UTF-16 char as per your code is not correct.
It worth a try this:
var yourString = "00100000";
char yourChar = (char) Convert.ToByte(yourString, 2); // you got ' ' (space)
It may or may not faster, but definitely simpler, more stable and more maintainable.
I ran some tests with different implementations.
First was #Melnikovl answer.
Second was mine, where I replaced + with | and * with << operator.
Third was author's original solution.
I tested with modified code and measured only conversion code.
First and second solution showed a little better performance. But BitConverter a little more often was better, so I think you should choose it (also because if simplicity of code)
var byte[] bytes = { 1, 1, 1, 1 };
int i = BitConverter.ToInt32(bytes, 0);
char a = (char)i;
Don't forgot to check if byte array litte or big endian

In .NET, which loop runs faster, 'for' or 'foreach'?

In C#/VB.NET/.NET, which loop runs faster, for or foreach?
Ever since I read that a for loop works faster than a foreach loop a long time ago I assumed it stood true for all collections, generic collections, all arrays, etc.
I scoured Google and found a few articles, but most of them are inconclusive (read comments on the articles) and open ended.
What would be ideal is to have each scenario listed and the best solution for the same.
For example (just an example of how it should be):
for iterating an array of 1000+
strings - for is better than foreach
for iterating over IList (non generic) strings - foreach is better
than for
A few references found on the web for the same:
Original grand old article by Emmanuel Schanzer
CodeProject FOREACH Vs. FOR
Blog - To foreach or not to foreach, that is the question
ASP.NET forum - NET 1.1 C# for vs foreach
[Edit]
Apart from the readability aspect of it, I am really interested in facts and figures. There are applications where the last mile of performance optimization squeezed do matter.
Patrick Smacchia blogged about this last month, with the following conclusions:
for loops on List are a bit more than 2 times cheaper than foreach
loops on List.
Looping on array is around 2 times cheaper than looping on List.
As a consequence, looping on array using for is 5 times cheaper
than looping on List using foreach
(which I believe, is what we all do).
First, a counter-claim to Dmitry's (now deleted) answer. For arrays, the C# compiler emits largely the same code for foreach as it would for an equivalent for loop. That explains why for this benchmark, the results are basically the same:
using System;
using System.Diagnostics;
using System.Linq;
class Test
{
const int Size = 1000000;
const int Iterations = 10000;
static void Main()
{
double[] data = new double[Size];
Random rng = new Random();
for (int i=0; i < data.Length; i++)
{
data[i] = rng.NextDouble();
}
double correctSum = data.Sum();
Stopwatch sw = Stopwatch.StartNew();
for (int i=0; i < Iterations; i++)
{
double sum = 0;
for (int j=0; j < data.Length; j++)
{
sum += data[j];
}
if (Math.Abs(sum-correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("For loop: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (int i=0; i < Iterations; i++)
{
double sum = 0;
foreach (double d in data)
{
sum += d;
}
if (Math.Abs(sum-correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("Foreach loop: {0}", sw.ElapsedMilliseconds);
}
}
Results:
For loop: 16638
Foreach loop: 16529
Next, validation that Greg's point about the collection type being important - change the array to a List<double> in the above, and you get radically different results. Not only is it significantly slower in general, but foreach becomes significantly slower than accessing by index. Having said that, I would still almost always prefer foreach to a for loop where it makes the code simpler - because readability is almost always important, whereas micro-optimisation rarely is.
foreach loops demonstrate more specific intent than for loops.
Using a foreach loop demonstrates to anyone using your code that you are planning to do something to each member of a collection irrespective of its place in the collection. It also shows you aren't modifying the original collection (and throws an exception if you try to).
The other advantage of foreach is that it works on any IEnumerable, where as for only makes sense for IList, where each element actually has an index.
However, if you need to use the index of an element, then of course you should be allowed to use a for loop. But if you don't need to use an index, having one is just cluttering your code.
There are no significant performance implications as far as I'm aware. At some stage in the future it might be easier to adapt code using foreach to run on multiple cores, but that's not something to worry about right now.
Any time there's arguments over performance, you just need to write a small test so that you can use quantitative results to support your case.
Use the StopWatch class and repeat something a few million times, for accuracy. (This might be hard without a for loop):
using System.Diagnostics;
//...
Stopwatch sw = new Stopwatch()
sw.Start()
for(int i = 0; i < 1000000;i ++)
{
//do whatever it is you need to time
}
sw.Stop();
//print out sw.ElapsedMilliseconds
Fingers crossed the results of this show that the difference is negligible, and you might as well just do whatever results in the most maintainable code
It will always be close. For an array, sometimes for is slightly quicker, but foreach is more expressive, and offers LINQ, etc. In general, stick with foreach.
Additionally, foreach may be optimised in some scenarios. For example, a linked list might be terrible by indexer, but it might be quick by foreach. Actually, the standard LinkedList<T> doesn't even offer an indexer for this reason.
My guess is that it will probably not be significant in 99% of the cases, so why would you choose the faster instead of the most appropriate (as in easiest to understand/maintain)?
There are very good reasons to prefer foreach loops over for loops. If you can use a foreach loop, your boss is right that you should.
However, not every iteration is simply going through a list in order one by one. If he is forbidding for, yes that is wrong.
If I were you, what I would do is turn all of your natural for loops into recursion. That'd teach him, and it's also a good mental exercise for you.
There is unlikely to be a huge performance difference between the two. As always, when faced with a "which is faster?" question, you should always think "I can measure this."
Write two loops that do the same thing in the body of the loop, execute and time them both, and see what the difference in speed is. Do this with both an almost-empty body, and a loop body similar to what you'll actually be doing. Also try it with the collection type that you're using, because different types of collections can have different performance characteristics.
Jeffrey Richter on TechEd 2005:
"I have come to learn over the years the C# compiler is basically a liar to me." .. "It lies about many things." .. "Like when you do a foreach loop..." .. "...that is one little line of code that you write, but what the C# compiler spits out in order to do that it's phenomenal. It puts out a try/finally block in there, inside the finally block it casts your variable to an IDisposable interface, and if the cast suceeds it calls the Dispose method on it, inside the loop it calls the Current property and the MoveNext method repeatedly inside the loop, objects are being created underneath the covers. A lot of people use foreach because it's very easy coding, very easy to do.." .. "foreach is not very good in terms of performance, if you iterated over a collection instead by using square bracket notation, just doing index, that's just much faster, and it doesn't create any objects on the heap..."
On-Demand Webcast:
http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032292286&EventCategory=3&culture=en-US&CountryCode=US
you can read about it in Deep .NET - part 1 Iteration
it's cover the results (without the first initialization) from .NET source code all the way to the disassembly.
for example - Array Iteration with a foreach loop:
and - list iteration with foreach loop:
and the end results:
In cases where you work with a collection of objects, foreach is better, but if you increment a number, a for loop is better.
Note that in the last case, you could do something like:
foreach (int i in Enumerable.Range(1, 10))...
But it certainly doesn't perform better, it actually has worse performance compared to a for.
This should save you:
public IEnumerator<int> For(int start, int end, int step) {
int n = start;
while (n <= end) {
yield n;
n += step;
}
}
Use:
foreach (int n in For(1, 200, 4)) {
Console.WriteLine(n);
}
For greater win, you may take three delegates as parameters.
The differences in speed in a for- and a foreach-loop are tiny when you're looping through common structures like arrays, lists, etc, and doing a LINQ query over the collection is almost always slightly slower, although it's nicer to write! As the other posters said, go for expressiveness rather than a millisecond of extra performance.
What hasn't been said so far is that when a foreach loop is compiled, it is optimised by the compiler based on the collection it is iterating over. That means that when you're not sure which loop to use, you should use the foreach loop - it will generate the best loop for you when it gets compiled. It's more readable too.
Another key advantage with the foreach loop is that if your collection implementation changes (from an int array to a List<int> for example) then your foreach loop won't require any code changes:
foreach (int i in myCollection)
The above is the same no matter what type your collection is, whereas in your for loop, the following will not build if you changed myCollection from an array to a List:
for (int i = 0; i < myCollection.Length, i++)
This has the same two answers as most "which is faster" questions:
1) If you don't measure, you don't know.
2) (Because...) It depends.
It depends on how expensive the "MoveNext()" method is, relative to how expensive the "this[int index]" method is, for the type (or types) of IEnumerable that you will be iterating over.
The "foreach" keyword is shorthand for a series of operations - it calls GetEnumerator() once on the IEnumerable, it calls MoveNext() once per iteration, it does some type checking, and so on. The thing most likely to impact performance measurements is the cost of MoveNext() since that gets invoked O(N) times. Maybe it's cheap, but maybe it's not.
The "for" keyword looks more predictable, but inside most "for" loops you'll find something like "collection[index]". This looks like a simple array indexing operation, but it's actually a method call, whose cost depends entirely on the nature of the collection that you're iterating over. Probably it's cheap, but maybe it's not.
If the collection's underlying structure is essentially a linked list, MoveNext is dirt-cheap, but the indexer might have O(N) cost, making the true cost of a "for" loop O(N*N).
"Are there any arguments I could use to help me convince him the for loop is acceptable to use?"
No, if your boss is micromanaging to the level of telling you what programming language constructs to use, there's really nothing you can say. Sorry.
Every language construct has an appropriate time and place for usage. There is a reason the C# language has a four separate iteration statements - each is there for a specific purpose, and has an appropriate use.
I recommend sitting down with your boss and trying to rationally explain why a for loop has a purpose. There are times when a for iteration block more clearly describes an algorithm than a foreach iteration. When this is true, it is appropriate to use them.
I'd also point out to your boss - Performance is not, and should not be an issue in any practical way - it's more a matter of expression the algorithm in a succinct, meaningful, maintainable manner. Micro-optimizations like this miss the point of performance optimization completely, since any real performance benefit will come from algorithmic redesign and refactoring, not loop restructuring.
If, after a rational discussion, there is still this authoritarian view, it is up to you as to how to proceed. Personally, I would not be happy working in an environment where rational thought is discouraged, and would consider moving to another position under a different employer. However, I strongly recommend discussion prior to getting upset - there may just be a simple misunderstanding in place.
It probably depends on the type of collection you are enumerating and the implementation of its indexer. In general though, using foreach is likely to be a better approach.
Also, it'll work with any IEnumerable - not just things with indexers.
Whether for is faster than foreach is really besides the point. I seriously doubt that choosing one over the other will make a significant impact on your performance.
The best way to optimize your application is through profiling of the actual code. That will pinpoint the methods that account for the most work/time. Optimize those first. If performance is still not acceptable, repeat the procedure.
As a general rule I would recommend to stay away from micro optimizations as they will rarely yield any significant gains. Only exception is when optimizing identified hot paths (i.e. if your profiling identifies a few highly used methods, it may make sense to optimize these extensively).
It is what you do inside the loop that affects perfomance, not the actual looping construct (assuming your case is non-trivial).
The two will run almost exactly the same way. Write some code to use both, then show him the IL. It should show comparable computations, meaning no difference in performance.
In most cases there's really no difference.
Typically you always have to use foreach when you don't have an explicit numerical index, and you always have to use for when you don't actually have an iterable collection (e.g. iterating over a two-dimensional array grid in an upper triangle). There are some cases where you have a choice.
One could argue that for loops can be a little more difficult to maintain if magic numbers start to appear in the code. You should be right to be annoyed at not being able to use a for loop and have to build a collection or use a lambda to build a subcollection instead just because for loops have been banned.
It seems a bit strange to totally forbid the use of something like a for loop.
There's an interesting article here that covers a lot of the performance differences between the two loops.
I would say personally I find foreach a bit more readable over for loops but you should use the best for the job at hand and not have to write extra long code to include a foreach loop if a for loop is more appropriate.
You can really screw with his head and go for an IQueryable .foreach closure instead:
myList.ForEach(c => Console.WriteLine(c.ToString());
for has more simple logic to implement so it's faster than foreach.
Unless you're in a specific speed optimization process, I would say use whichever method produces the easiest to read and maintain code.
If an iterator is already setup, like with one of the collection classes, then the foreach is a good easy option. And if it's an integer range you're iterating, then for is probably cleaner.
Jeffrey Richter talked the performance difference between for and foreach on a recent podcast: http://pixel8.infragistics.com/shows/everything.aspx#Episode:9317
I did test it a while ago, with the result that a for loop is much faster than a foreach loop. The cause is simple, the foreach loop first needs to instantiate an IEnumerator for the collection.
I found the foreach loop which iterating through a List faster. See my test results below. In the code below I iterate an array of size 100, 10000 and 100000 separately using for and foreach loop to measure the time.
private static void MeasureTime()
{
var array = new int[10000];
var list = array.ToList();
Console.WriteLine("Array size: {0}", array.Length);
Console.WriteLine("Array For loop ......");
var stopWatch = Stopwatch.StartNew();
for (int i = 0; i < array.Length; i++)
{
Thread.Sleep(1);
}
stopWatch.Stop();
Console.WriteLine("Time take to run the for loop is {0} millisecond", stopWatch.ElapsedMilliseconds);
Console.WriteLine(" ");
Console.WriteLine("Array Foreach loop ......");
var stopWatch1 = Stopwatch.StartNew();
foreach (var item in array)
{
Thread.Sleep(1);
}
stopWatch1.Stop();
Console.WriteLine("Time take to run the foreach loop is {0} millisecond", stopWatch1.ElapsedMilliseconds);
Console.WriteLine(" ");
Console.WriteLine("List For loop ......");
var stopWatch2 = Stopwatch.StartNew();
for (int i = 0; i < list.Count; i++)
{
Thread.Sleep(1);
}
stopWatch2.Stop();
Console.WriteLine("Time take to run the for loop is {0} millisecond", stopWatch2.ElapsedMilliseconds);
Console.WriteLine(" ");
Console.WriteLine("List Foreach loop ......");
var stopWatch3 = Stopwatch.StartNew();
foreach (var item in list)
{
Thread.Sleep(1);
}
stopWatch3.Stop();
Console.WriteLine("Time take to run the foreach loop is {0} millisecond", stopWatch3.ElapsedMilliseconds);
}
UPDATED
After #jgauffin suggestion I used #johnskeet code and found that the for loop with array is faster than following,
Foreach loop with array.
For loop with list.
Foreach loop with list.
See my test results and code below,
private static void MeasureNewTime()
{
var data = new double[Size];
var rng = new Random();
for (int i = 0; i < data.Length; i++)
{
data[i] = rng.NextDouble();
}
Console.WriteLine("Lenght of array: {0}", data.Length);
Console.WriteLine("No. of iteration: {0}", Iterations);
Console.WriteLine(" ");
double correctSum = data.Sum();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
double sum = 0;
for (int j = 0; j < data.Length; j++)
{
sum += data[j];
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("For loop with Array: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (var i = 0; i < Iterations; i++)
{
double sum = 0;
foreach (double d in data)
{
sum += d;
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("Foreach loop with Array: {0}", sw.ElapsedMilliseconds);
Console.WriteLine(" ");
var dataList = data.ToList();
sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
double sum = 0;
for (int j = 0; j < dataList.Count; j++)
{
sum += data[j];
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("For loop with List: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
double sum = 0;
foreach (double d in dataList)
{
sum += d;
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("Foreach loop with List: {0}", sw.ElapsedMilliseconds);
}
A powerful and precise way to measure time is by using the BenchmarkDotNet library.
In the following sample, I did a loop on 1,000,000,000 integer records on for/foreach and measured it with BenchmarkDotNet:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Program
{
public static void Main()
{
BenchmarkRunner.Run<LoopsBenchmarks>();
}
}
[MemoryDiagnoser]
public class LoopsBenchmarks
{
private List<int> arr = Enumerable.Range(1, 1_000_000_000).ToList();
[Benchmark]
public void For()
{
for (int i = 0; i < arr.Count; i++)
{
int item = arr[i];
}
}
[Benchmark]
public void Foreach()
{
foreach (int item in arr)
{
}
}
}
And here are the results:
Conclusion
In the example above we can see that for loop is slightly faster than foreach loop for lists. We can also see that both use the same memory allocation.
I wouldn't expect anyone to find a "huge" performance difference between the two.
I guess the answer depends on the whether the collection you are trying to access has a faster indexer access implementation or a faster IEnumerator access implementation. Since IEnumerator often uses the indexer and just holds a copy of the current index position, I would expect enumerator access to be at least as slow or slower than direct index access, but not by much.
Of course this answer doesn't account for any optimizations the compiler may implement.

Categories

Resources