Fastest way to count number of uppercase characters in c# - c#

Any thoughts on the efficiency of this? ...
CommentText.ToCharArray().Where(c => c >= 'A' && c <= 'Z').Count()

Ok, just knocked up some code to time your method against this:
int count = 0;
for (int i = 0; i < s.Length; i++)
{
if (char.IsUpper(s[i])) count++;
}
The result:
Yours: 19737 ticks
Mine: 118 ticks
Pretty big difference! Sometimes the most straight-forward way is the most efficient.
Edit
Just out of interest, this:
int count = s.Count(c => char.IsUpper(c));
Comes in at at around 2500 ticks. So for a "Linqy" one-liner it's pretty quick.

First there is no reason you need to call ToCharArray() since, assuming CommentText is a string it is already an IEnumerable<char>. Second, you should probably be calling char.IsUpper instead of assuming you are only dealing with ASCII values. The code should really look like,
CommentText.Count(char.IsUpper)
Third, if you are worried about speed there isn't much that can beat the old for loop,
int count = 0;
for (int i = 0; i < CommentText.Length; i++)
if (char.IsUpper(CommentText[i]) count++;
In general, calling any method is going to be slower than inlining the code but this kind of optimization should only be done if you are absolutely sure this is the bottle-neck in your code.

You're only counting standard ASCII and not ÃÐÊ etc.
How about
CommentText.ToCharArray().Where(c => Char.IsUpper(c)).Count()

Without even testing I'd say
int count = 0;
foreach (char c in commentText)
{
if (Char.IsUpper(c))
count++;
}
is faster, off now to test it.

What you are doing with that code is to create a collection with the characters, then create a new collection containing only the uppercase characters, then loop through that collection only to find out how many there are.
This will perform better (but still not quite as good as a plain loop), as it doesn't create the intermediate collections:
CommentText.Count(c => Char.IsUpper(c))
Edit: Removed the ToCharArray call also, as Matt suggested.

I've got this
Regex x = new Regex("[A-Z]{1}",
RegexOptions.Compiled | RegexOptions.CultureInvariant);
int c = x.Matches(s).Count;
but I don't know if its particularly quick. It won't get special chars either, I s'pose
EDIT:
Quick comparison to this question's answer. Debug in vshost, 10'000 iterations with the string:
aBcDeFGHi1287jKK6437628asghwHllmTbynerA
The answer: 20-30 ms
The regex solution: 140-170 ms

Related

C# Finding an element in a List

Let's say I have the following C# code
var my_list = new List<string>();
// Filling the list with tons of sentences.
string sentence = Console.ReadLine();
Is there any difference between doing either of the following ?
bool c1 = my_list.Contains(sentence);
bool c2 = my_list.Any(s => s == sentence);
I imagine the pure algorithmic behind isn't exactly the same. But what are the actual differences on my side? Is one way faster or more efficient than the other? Will one method sometime return true and the other false? What should I consider to pick one method or the other? Or is it purely up to me and both work in any situation?
The most upvoted answer isn't completely correct (and it's a reason big O doesn't always work). Any will be slower than Contains in this scenario (by about double).
Any will have an extra call every iteration, the delegate you specified on every item in your list, something contain does not have to do. An extra call will slow it down substantially.
The results will be the same, but the speed will be very different.
Example benchmark:
Stopwatch watch = new Stopwatch();
List<string> stringList = new List<string>();
for (int i = 0; i < 10000000; i++)
{
stringList.Add(i.ToString());
}
int t = 0;
watch.Start();
for (int i = 0; i < 1000000; i++)
if (stringList.Any(x => x == "29"))
t = i;
watch.Stop();
("Any takes: " + watch.ElapsedMilliseconds).Dump();
GC.Collect();
watch.Restart();
for (int i = 0; i < 1000000; i++)
if (stringList.Contains("29"))
t = i;
watch.Stop();
("Contains takes: " + watch.ElapsedMilliseconds).Dump();
Results:
Any takes: 481
Contains takes: 235
Size and amount of iterations will not effect the % difference, Any will always be slower.
Realistically, the two will operate in almost the same fashion: iterate the list's items and check to see if sentence matches any list elements, giving a complexity of about O(n). I would argue List.Contains since that is a little easier and more natural, but it's entirely preferential!
Now, if you're looking for something faster in terms of lookup complexity and speed, I'd suggest a HashSet<T>. HashSets have, generally speaking, a lookup of about O(1) since the hashing function, theoretically, should be a constant time operation. Again, just a suggestion :)
For string objects, there's no difference, since the == operator simply calls String.Equals.
However, for other objects, there could be differences between == and .Equals - looking at the implementation of .Contains, it will use the EqualityComparer<T>.Default, which hooks into Equals(T) as long as you class implements IEquatable<T> (where T is itself). Without overloading ==, most classes instead use referential comparison for == since that's what they inherit from Object.

What would be the shortest way to sum up the digits in odd and even places separately

I've always loved reducing number of code lines by using simple but smart math approaches. This situation seems to be one of those that need this approach. So what I basically need is to sum up digits in the odd and even places separately with minimum code. So far this is the best way I have been able to think of:
string number = "123456789";
int sumOfDigitsInOddPlaces=0;
int sumOfDigitsInEvenPlaces=0;
for (int i=0;i<number.length;i++){
if(i%2==0)//Means odd ones
sumOfDigitsInOddPlaces+=number[i];
else
sumOfDigitsInEvenPlaces+=number[i];
{
//The rest is not important
Do you have a better idea? Something without needing to use if else
int* sum[2] = {&sumOfDigitsInOddPlaces,&sumOfDigitsInEvenPlaces};
for (int i=0;i<number.length;i++)
{
*(sum[i&1])+=number[i];
}
You could use two separate loops, one for the odd indexed digits and one for the even indexed digits.
Also your modulus conditional may be wrong, you're placing the even indexed digits (0,2,4...) in the odd accumulator. Could just be that you're considering the number to be 1-based indexing with the number array being 0-based (maybe what you intended), but for algorithms sake I will consider the number to be 0-based.
Here's my proposition
number = 123456789;
sumOfDigitsInOddPlaces=0;
sumOfDigitsInEvenPlaces=0;
//even digits
for (int i = 0; i < number.length; i = i + 2){
sumOfDigitsInEvenPlaces += number[i];
}
//odd digits, note the start at j = 1
for (int j = 1; i < number.length; i = i + 2){
sumOfDigitsInOddPlaces += number[j];
}
On the large scale this doesn't improve efficiency, still an O(N) algorithm, but it eliminates the branching
Since you added C# to the question:
var numString = "123456789";
var odds = numString.Split().Where((v, i) => i % 2 == 1);
var evens = numString.Split().Where((v, i) => i % 2 == 0);
var sumOfOdds = odds.Select(int.Parse).Sum();
var sumOfEvens = evens.Select(int.Parse).Sum();
Do you like Python?
num_string = "123456789"
odds = sum(map(int, num_string[::2]))
evens = sum(map(int, num_string[1::2]))
This Java solution requires no if/else, has no code duplication and is O(N):
number = "123456789";
int[] sums = new int[2]; //sums[0] == sum of even digits, sums[1] == sum of odd
for(int arrayIndex=0; arrayIndex < 2; ++arrayIndex)
{
for (int i=0; i < number.length()-arrayIndex; i += 2)
{
sums[arrayIndex] += Character.getNumericValue(number.charAt(i+arrayIndex));
}
}
Assuming number.length is even, it is quite simple. Then the corner case is to consider the last element if number is uneven.
int i=0;
while(i<number.length-1)
{
sumOfDigitsInEvenPlaces += number[ i++ ];
sumOfDigitsInOddPlaces += number[ i++ ];
}
if( i < number.length )
sumOfDigitsInEvenPlaces += number[ i ];
Because the loop goes over i 2 by 2, if number.length is even, removing 1 does nothing.
If number.length is uneven, it removes the last item.
If number.length is uneven, then the last value of i when exiting the loop is that of the not yet visited last element.
If number.length is uneven, by modulo 2 reasoning, you have to add the last item to sumOfDigitsInEvenPlaces.
This seems slightly more verbose, but also more readable, to me than Anonymous' (accepted) answer. However, benchmarks to come.
Well, the compiler seems to think my code more understandable as well, since he removes it all if I don't print the results (which explains why I kept getting a time of 0 all along...). The other code though is obfuscated enough for even the compiler.
In the end, even with huge arrays, it's pretty hard for clock_t to tell the difference between the two. You get about a third less instructions in the second case, but since everything's in cache (and your running sums even in registers) it doesn't matter much.
For the curious, I've put the disassembly of both versions (compiled from C) here : http://pastebin.com/2fciLEMw

Most efficient way to determine first character in string?

Which of these methods are the most efficient one or is there a better way to do it?
this.returnList[i].Title[0].ToString()
or
this.returnList[i].Title.Substring(0, 1)
They're both very fast:
Char Index
var sample = "sample";
var clock = new Stopwatch();
for (var i = 0; i < 10; i++)
{
clock.Start();
for (var j = 0; j < 10000000; j++)
{
var first = sample[0].ToString();
}
clock.Stop();
Console.Write(clock.Elapsed);
clock.Reset();
}
// Results
00:00:00.2012243
00:00:00.2207168
00:00:00.2184807
00:00:00.2258847
00:00:00.2296456
00:00:00.2261465
00:00:00.2120131
00:00:00.2221702
00:00:00.2346083
00:00:00.2330840
Substring
var sample = "sample";
var clock = new Stopwatch();
for (var i = 0; i < 10; i++)
{
clock.Start();
for (var j = 0; j < 10000000; j++)
{
var first = sample.Substring(0, 1);
}
clock.Stop();
Console.Write(clock.Elapsed);
clock.Reset();
}
// Results
00:00:00.3268155
00:00:00.3337077
00:00:00.3439908
00:00:00.3273090
00:00:00.3380794
00:00:00.3400650
00:00:00.3280275
00:00:00.3333719
00:00:00.3295982
00:00:00.3368425
I also agree with BrokenGlass that using the char index is a cleaner way of writing it. Plus if you're doing it 10 trillion times it'll be much faster!
There is a big loophole in your code that may cause problems, depending on what you mean by "first character" and what returnList contains.
C# strings contain UTF-16, which is a variable-length encoding, and if returnList is an array of strings, then returnList[i] might only be one char of a Unicode point. If you want to return the first Unicode grapheme of a string:
string s = returnList[i].Title;
if (string.IsNullOrEmpty(s))
return s;
int charsInGlyph = char.IsSurrogatePair(s, 0) ? 2 : 1;
return s.Substring(0, charsInGlyph);
You can run into the same problems with BOMs, tagged, and combining characters; these are all valid characters but are not meaningful if displayed to a user.
If you want Unicode points or graphemes, not chars, you must use strings; Unicode graphemes can be more than one char.
I don't think it would matter much efficiency wise, but in my opinion the clearer, more idiomatic and hence more maintainable way of returning the first character is using the index operator:
char c = returnList[i].Title[0];
This assumes of course there is at least one character, if that's not a given you have to check for that.
Those should be close to identical in performance.
The expensive part of the operation is to create the string, and there is no more efficient way to do that.
Unless of couse you want to pre-create strings for all possible characters and store in a dictionary, but that would use up a lot of memory for such a trivial task.
returnList[I].Title[0] is much faster as it does not need to create a new string, only accessing a char from the original one.
Of course, it will throw an exception if the string is empty, so you should check that first.
As a rule of thumb, never use strings with a fixed length of 1. that's what char is for.
The performance difference is not likely to matter though, but the better readability certainly will.

Does any one know of a faster method to do String.Split()?

I am reading each line of a CSV file and need to get the individual values in each column. So right now I am just using:
values = line.Split(delimiter);
where line is the a string that holds the values that are seperated by the delimiter.
Measuring the performance of my ReadNextRow method I noticed that it spends 66% on String.Split, so I was wondering if someone knows of a faster method to do this.
Thanks!
The BCL implementation of string.Split is actually quite fast, I've done some testing here trying to out preform it and it's not easy.
But there's one thing you can do and that's to implement this as a generator:
public static IEnumerable<string> GetSplit( this string s, char c )
{
int l = s.Length;
int i = 0, j = s.IndexOf( c, 0, l );
if ( j == -1 ) // No such substring
{
yield return s; // Return original and break
yield break;
}
while ( j != -1 )
{
if ( j - i > 0 ) // Non empty?
{
yield return s.Substring( i, j - i ); // Return non-empty match
}
i = j + 1;
j = s.IndexOf( c, i, l - i );
}
if ( i < l ) // Has remainder?
{
yield return s.Substring( i, l - i ); // Return remaining trail
}
}
The above method is not necessarily faster than string.Split for small strings but it returns results as it finds them, this is the power of lazy evaluation. If you have long lines or need to conserve memory, this is the way to go.
The above method is bounded by the performance of IndexOf and Substring which does too much index of out range checking and to be faster you need to optimize away these and implement your own helper methods. You can beat the string.Split performance but it's gonna take cleaver int-hacking. You can read my post about that here.
It should be pointed out that split() is a questionable approach for parsing CSV files in case you come across commas in the file eg:
1,"Something, with a comma",2,3
The other thing I'll point out without knowing how you profiled is be careful about profiling this kind of low level detail. The granularity of the Windows/PC timer might come into play and you may have a significant overhead in just looping so use some sort of control value.
That being said, split() is built to handle regular expressions, which are obviously more complex than you need (and the wrong tool to deal with escaped commas anyway). Also, split() creates lots of temporary objects.
So if you want to speed it up (and I have trouble believing that performance of this part is really an issue) then you want to do it by hand and you want to reuse your buffer objects so you're not constantly creating objects and giving the garbage collector work to do in cleaning them up.
The algorithm for that is relatively simple:
Stop at every comma;
When you hit quotes continue until you hit the next set of quotes;
Handle escaped quotes (ie \") and arguably escaped commas (\,).
Oh and to give you some idea of the cost of regex, there was a question (Java not C# but the principle was the same) where someone wanted to replace every n-th character with a string. I suggested using replaceAll() on String. Jon Skeet manually coded the loop. Out of curiosity I compared the two versions and his was an order of magnitude better.
So if you really want performance, it's time to hand parse.
Or, better yet, use someone else's optimized solution like this fast CSV reader.
By the way, while this is in relation to Java it concerns the performance of regular expressions in general (which is universal) and replaceAll() vs a hand-coded loop: Putting char into a java string for each N characters.
Here's a very basic example using ReadOnlySpan. On my machine this takes around 150ns as opposed to string.Split() which takes around 250ns. That's a nice 40% improvement right there.
string serialized = "1577836800;1000;1";
ReadOnlySpan<char> span = serialized.AsSpan();
Trade result = new Trade();
index = span.IndexOf(';');
result.UnixTimestamp = long.Parse(span.Slice(0, index));
span = span.Slice(index + 1);
index = span.IndexOf(';');
result.Price = float.Parse(span.Slice(0, index));
span = span.Slice(index + 1);
index = span.IndexOf(';');
result.Quantity = float.Parse(span.Slice(0, index));
return result;
Note that a ReadOnlySpan.Split() will soon be part of the framework. See
https://github.com/dotnet/runtime/pull/295
Depending on use, you can speed this up by using Pattern.split instead of String.split. If you have this code in a loop (which I assume you probably do since it sounds like you are parsing lines from a file) String.split(String regex) will call Pattern.compile on your regex string every time that statement of the loop executes. To optimize this, Pattern.compile the pattern once outside the loop and then use Pattern.split, passing the line you want to split, inside the loop.
Hope this helps
I found this implementation which is 30% faster from Dejan Pelzel's blog. I qoute from there:
The Solution
With this in mind, I set to create a string splitter that would use an internal buffer similarly to a StringBuilder. It uses very simple logic of going through the string and saving the value parts into the buffer as it goes along.
public int Split(string value, char separator)
{
int resultIndex = 0;
int startIndex = 0;
// Find the mid-parts
for (int i = 0; i < value.Length; i++)
{
if (value[i] == separator)
{
this.buffer[resultIndex] = value.Substring(startIndex, i - startIndex);
resultIndex++;
startIndex = i + 1;
}
}
// Find the last part
this.buffer[resultIndex] = value.Substring(startIndex, value.Length - startIndex);
resultIndex++;
return resultIndex;
How To Use
The StringSplitter class is incredibly simple to use as you can see in the example below. Just be careful to reuse the StringSplitter object and not create a new instance of it in loops or for a single time use. In this case it would be better to juse use the built in String.Split.
var splitter = new StringSplitter(2);
splitter.Split("Hello World", ' ');
if (splitter.Results[0] == "Hello" && splitter.Results[1] == "World")
{
Console.WriteLine("It works!");
}
The Split methods returns the number of items found, so you can easily iterate through the results like this:
var splitter = new StringSplitter(2);
var len = splitter.Split("Hello World", ' ');
for (int i = 0; i < len; i++)
{
Console.WriteLine(splitter.Results[i]);
}
This approach has advantages and disadvantages.
You might think that there are optimizations to be had, but the reality will be you'll pay for them elsewhere.
You could, for example, do the split 'yourself' and walk through all the characters and process each column as you encounter it, but you'd be copying all the parts of the string in the long run anyhow.
One of the optimizations we could do in C or C++, for example, is replace all the delimiters with '\0' characters, and keep pointers to the start of the column. Then, we wouldn't have to copy all of the string data just to get to a part of it. But this you can't do in C#, nor would you want to.
If there is a big difference between the number of columns that are in the source, and the number of columns that you need, walking the string manually may yield some benefit. But that benefit would cost you the time to develop it and maintain it.
I've been told that 90% of the CPU time is spent in 10% of the code. There are variations to this "truth". In my opinion, spending 66% of your time in Split is not that bad if processing CSV is the thing that your app needs to do.
Dave
Some very thorough analysis on String.Slit() vs Regex and other methods.
We are talking ms savings over very large strings though.
The main problem(?) with String.Split is that it's general, in that it caters for many needs.
If you know more about your data than Split would, it can make an improvement to make your own.
For instance, if:
You don't care about empty strings, so you don't need to handle those any special way
You don't need to trim strings, so you don't need to do anything with or around those
You don't need to check for quoted commas or quotes
You don't need to handle quotes at all
If any of these are true, you might see an improvement by writing your own more specific version of String.Split.
Having said that, the first question you should ask is whether this actually is a problem worth solving. Is the time taken to read and import the file so long that you actually feel this is a good use of your time? If not, then I would leave it alone.
The second question is why String.Split is using that much time compared to the rest of your code. If the answer is that the code is doing very little with the data, then I would probably not bother.
However, if, say, you're stuffing the data into a database, then 66% of the time of your code spent in String.Split constitutes a big big problem.
CSV parsing is actually fiendishly complex to get right, I used classes based on wrapping the ODBC Text driver the one and only time I had to do this.
The ODBC solution recommended above looks at first glance to be basically the same approach.
I thoroughly recommend you do some research on CSV parsing before you get too far down a path that nearly-but-not-quite works (all too common). The Excel thing of only double-quoting strings that need it is one of the trickiest to deal with in my experience.
As others have said, String.Split() will not always work well with CSV files. Consider a file that looks like this:
"First Name","Last Name","Address","Town","Postcode"
David,O'Leary,"12 Acacia Avenue",London,NW5 3DF
June,Robinson,"14, Abbey Court","Putney",SW6 4FG
Greg,Hampton,"",,
Stephen,James,"""Dunroamin"" 45 Bridge Street",Bristol,BS2 6TG
(e.g. inconsistent use of speechmarks, strings including commas and speechmarks, etc)
This CSV reading framework will deal with all of that, and is also very efficient:
LumenWorks.Framework.IO.Csv by Sebastien Lorien
This is my solution:
Public Shared Function FastSplit(inputString As String, separator As String) As String()
Dim kwds(1) As String
Dim k = 0
Dim tmp As String = ""
For l = 1 To inputString.Length - 1
tmp = Mid(inputString, l, 1)
If tmp = separator Then k += 1 : tmp = "" : ReDim Preserve kwds(k + 1)
kwds(k) &= tmp
Next
Return kwds
End Function
Here is a version with benchmarking:
Public Shared Function FastSplit(inputString As String, separator As String) As String()
Dim sw As New Stopwatch
sw.Start()
Dim kwds(1) As String
Dim k = 0
Dim tmp As String = ""
For l = 1 To inputString.Length - 1
tmp = Mid(inputString, l, 1)
If tmp = separator Then k += 1 : tmp = "" : ReDim Preserve kwds(k + 1)
kwds(k) &= tmp
Next
sw.Stop()
Dim fsTime As Long = sw.ElapsedTicks
sw.Start()
Dim strings() As String = inputString.Split(separator)
sw.Stop()
Debug.Print("FastSplit took " + fsTime.ToString + " whereas split took " + sw.ElapsedTicks.ToString)
Return kwds
End Function
Here are some results on relatively small strings but with varying sizes, up to 8kb blocks. (times are in ticks)
FastSplit took 8 whereas split took 10
FastSplit took 214 whereas split took 216
FastSplit took 10 whereas split took 12
FastSplit took 8 whereas split took 9
FastSplit took 8 whereas split took 10
FastSplit took 10 whereas split took 12
FastSplit took 7 whereas split took 9
FastSplit took 6 whereas split took 8
FastSplit took 5 whereas split took 7
FastSplit took 10 whereas split took 13
FastSplit took 9 whereas split took 232
FastSplit took 7 whereas split took 8
FastSplit took 8 whereas split took 9
FastSplit took 8 whereas split took 10
FastSplit took 215 whereas split took 217
FastSplit took 10 whereas split took 231
FastSplit took 8 whereas split took 10
FastSplit took 8 whereas split took 10
FastSplit took 7 whereas split took 9
FastSplit took 8 whereas split took 10
FastSplit took 10 whereas split took 1405
FastSplit took 9 whereas split took 11
FastSplit took 8 whereas split took 10
Also, I know someone will discourage my use of ReDim Preserve instead of using a list... The reason is, the list really didn't provide any speed difference in my benchmarks so I went back to the "simple" way.
public static unsafe List<string> SplitString(char separator, string input)
{
List<string> result = new List<string>();
int i = 0;
fixed(char* buffer = input)
{
for (int j = 0; j < input.Length; j++)
{
if (buffer[j] == separator)
{
buffer[i] = (char)0;
result.Add(new String(buffer));
i = 0;
}
else
{
buffer[i] = buffer[j];
i++;
}
}
buffer[i] = (char)0;
result.Add(new String(buffer));
}
return result;
}
You can assume that String.Split will be close to optimal; i.e. it could be quite hard to improve on it. By far the easier solution is to check whether you need to split the string at all. It's quite likely that you'll be using the individual strings directly. If you define a StringShim class (reference to String, begin & end index) you'll be able to split a String into a set of shims instead. These will have a small, fixed size, and will not cause string data copies.
String.split is rather slow, if you want some faster methods, here you go. :)
However CSV is much better parsed by a rule based parser.
This guy, has made a rule based tokenizer for java. (requires some copy and pasting unfortunately)
http://www.csdgn.org/code/rule-tokenizer
private static final String[] fSplit(String src, char delim) {
ArrayList<String> output = new ArrayList<String>();
int index = 0;
int lindex = 0;
while((index = src.indexOf(delim,lindex)) != -1) {
output.add(src.substring(lindex,index));
lindex = index+1;
}
output.add(src.substring(lindex));
return output.toArray(new String[output.size()]);
}
private static final String[] fSplit(String src, String delim) {
ArrayList<String> output = new ArrayList<String>();
int index = 0;
int lindex = 0;
while((index = src.indexOf(delim,lindex)) != -1) {
output.add(src.substring(lindex,index));
lindex = index+delim.length();
}
output.add(src.substring(lindex));
return output.toArray(new String[output.size()]);
}

In .NET, which loop runs faster, 'for' or 'foreach'?

In C#/VB.NET/.NET, which loop runs faster, for or foreach?
Ever since I read that a for loop works faster than a foreach loop a long time ago I assumed it stood true for all collections, generic collections, all arrays, etc.
I scoured Google and found a few articles, but most of them are inconclusive (read comments on the articles) and open ended.
What would be ideal is to have each scenario listed and the best solution for the same.
For example (just an example of how it should be):
for iterating an array of 1000+
strings - for is better than foreach
for iterating over IList (non generic) strings - foreach is better
than for
A few references found on the web for the same:
Original grand old article by Emmanuel Schanzer
CodeProject FOREACH Vs. FOR
Blog - To foreach or not to foreach, that is the question
ASP.NET forum - NET 1.1 C# for vs foreach
[Edit]
Apart from the readability aspect of it, I am really interested in facts and figures. There are applications where the last mile of performance optimization squeezed do matter.
Patrick Smacchia blogged about this last month, with the following conclusions:
for loops on List are a bit more than 2 times cheaper than foreach
loops on List.
Looping on array is around 2 times cheaper than looping on List.
As a consequence, looping on array using for is 5 times cheaper
than looping on List using foreach
(which I believe, is what we all do).
First, a counter-claim to Dmitry's (now deleted) answer. For arrays, the C# compiler emits largely the same code for foreach as it would for an equivalent for loop. That explains why for this benchmark, the results are basically the same:
using System;
using System.Diagnostics;
using System.Linq;
class Test
{
const int Size = 1000000;
const int Iterations = 10000;
static void Main()
{
double[] data = new double[Size];
Random rng = new Random();
for (int i=0; i < data.Length; i++)
{
data[i] = rng.NextDouble();
}
double correctSum = data.Sum();
Stopwatch sw = Stopwatch.StartNew();
for (int i=0; i < Iterations; i++)
{
double sum = 0;
for (int j=0; j < data.Length; j++)
{
sum += data[j];
}
if (Math.Abs(sum-correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("For loop: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (int i=0; i < Iterations; i++)
{
double sum = 0;
foreach (double d in data)
{
sum += d;
}
if (Math.Abs(sum-correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("Foreach loop: {0}", sw.ElapsedMilliseconds);
}
}
Results:
For loop: 16638
Foreach loop: 16529
Next, validation that Greg's point about the collection type being important - change the array to a List<double> in the above, and you get radically different results. Not only is it significantly slower in general, but foreach becomes significantly slower than accessing by index. Having said that, I would still almost always prefer foreach to a for loop where it makes the code simpler - because readability is almost always important, whereas micro-optimisation rarely is.
foreach loops demonstrate more specific intent than for loops.
Using a foreach loop demonstrates to anyone using your code that you are planning to do something to each member of a collection irrespective of its place in the collection. It also shows you aren't modifying the original collection (and throws an exception if you try to).
The other advantage of foreach is that it works on any IEnumerable, where as for only makes sense for IList, where each element actually has an index.
However, if you need to use the index of an element, then of course you should be allowed to use a for loop. But if you don't need to use an index, having one is just cluttering your code.
There are no significant performance implications as far as I'm aware. At some stage in the future it might be easier to adapt code using foreach to run on multiple cores, but that's not something to worry about right now.
Any time there's arguments over performance, you just need to write a small test so that you can use quantitative results to support your case.
Use the StopWatch class and repeat something a few million times, for accuracy. (This might be hard without a for loop):
using System.Diagnostics;
//...
Stopwatch sw = new Stopwatch()
sw.Start()
for(int i = 0; i < 1000000;i ++)
{
//do whatever it is you need to time
}
sw.Stop();
//print out sw.ElapsedMilliseconds
Fingers crossed the results of this show that the difference is negligible, and you might as well just do whatever results in the most maintainable code
It will always be close. For an array, sometimes for is slightly quicker, but foreach is more expressive, and offers LINQ, etc. In general, stick with foreach.
Additionally, foreach may be optimised in some scenarios. For example, a linked list might be terrible by indexer, but it might be quick by foreach. Actually, the standard LinkedList<T> doesn't even offer an indexer for this reason.
My guess is that it will probably not be significant in 99% of the cases, so why would you choose the faster instead of the most appropriate (as in easiest to understand/maintain)?
There are very good reasons to prefer foreach loops over for loops. If you can use a foreach loop, your boss is right that you should.
However, not every iteration is simply going through a list in order one by one. If he is forbidding for, yes that is wrong.
If I were you, what I would do is turn all of your natural for loops into recursion. That'd teach him, and it's also a good mental exercise for you.
There is unlikely to be a huge performance difference between the two. As always, when faced with a "which is faster?" question, you should always think "I can measure this."
Write two loops that do the same thing in the body of the loop, execute and time them both, and see what the difference in speed is. Do this with both an almost-empty body, and a loop body similar to what you'll actually be doing. Also try it with the collection type that you're using, because different types of collections can have different performance characteristics.
Jeffrey Richter on TechEd 2005:
"I have come to learn over the years the C# compiler is basically a liar to me." .. "It lies about many things." .. "Like when you do a foreach loop..." .. "...that is one little line of code that you write, but what the C# compiler spits out in order to do that it's phenomenal. It puts out a try/finally block in there, inside the finally block it casts your variable to an IDisposable interface, and if the cast suceeds it calls the Dispose method on it, inside the loop it calls the Current property and the MoveNext method repeatedly inside the loop, objects are being created underneath the covers. A lot of people use foreach because it's very easy coding, very easy to do.." .. "foreach is not very good in terms of performance, if you iterated over a collection instead by using square bracket notation, just doing index, that's just much faster, and it doesn't create any objects on the heap..."
On-Demand Webcast:
http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032292286&EventCategory=3&culture=en-US&CountryCode=US
you can read about it in Deep .NET - part 1 Iteration
it's cover the results (without the first initialization) from .NET source code all the way to the disassembly.
for example - Array Iteration with a foreach loop:
and - list iteration with foreach loop:
and the end results:
In cases where you work with a collection of objects, foreach is better, but if you increment a number, a for loop is better.
Note that in the last case, you could do something like:
foreach (int i in Enumerable.Range(1, 10))...
But it certainly doesn't perform better, it actually has worse performance compared to a for.
This should save you:
public IEnumerator<int> For(int start, int end, int step) {
int n = start;
while (n <= end) {
yield n;
n += step;
}
}
Use:
foreach (int n in For(1, 200, 4)) {
Console.WriteLine(n);
}
For greater win, you may take three delegates as parameters.
The differences in speed in a for- and a foreach-loop are tiny when you're looping through common structures like arrays, lists, etc, and doing a LINQ query over the collection is almost always slightly slower, although it's nicer to write! As the other posters said, go for expressiveness rather than a millisecond of extra performance.
What hasn't been said so far is that when a foreach loop is compiled, it is optimised by the compiler based on the collection it is iterating over. That means that when you're not sure which loop to use, you should use the foreach loop - it will generate the best loop for you when it gets compiled. It's more readable too.
Another key advantage with the foreach loop is that if your collection implementation changes (from an int array to a List<int> for example) then your foreach loop won't require any code changes:
foreach (int i in myCollection)
The above is the same no matter what type your collection is, whereas in your for loop, the following will not build if you changed myCollection from an array to a List:
for (int i = 0; i < myCollection.Length, i++)
This has the same two answers as most "which is faster" questions:
1) If you don't measure, you don't know.
2) (Because...) It depends.
It depends on how expensive the "MoveNext()" method is, relative to how expensive the "this[int index]" method is, for the type (or types) of IEnumerable that you will be iterating over.
The "foreach" keyword is shorthand for a series of operations - it calls GetEnumerator() once on the IEnumerable, it calls MoveNext() once per iteration, it does some type checking, and so on. The thing most likely to impact performance measurements is the cost of MoveNext() since that gets invoked O(N) times. Maybe it's cheap, but maybe it's not.
The "for" keyword looks more predictable, but inside most "for" loops you'll find something like "collection[index]". This looks like a simple array indexing operation, but it's actually a method call, whose cost depends entirely on the nature of the collection that you're iterating over. Probably it's cheap, but maybe it's not.
If the collection's underlying structure is essentially a linked list, MoveNext is dirt-cheap, but the indexer might have O(N) cost, making the true cost of a "for" loop O(N*N).
"Are there any arguments I could use to help me convince him the for loop is acceptable to use?"
No, if your boss is micromanaging to the level of telling you what programming language constructs to use, there's really nothing you can say. Sorry.
Every language construct has an appropriate time and place for usage. There is a reason the C# language has a four separate iteration statements - each is there for a specific purpose, and has an appropriate use.
I recommend sitting down with your boss and trying to rationally explain why a for loop has a purpose. There are times when a for iteration block more clearly describes an algorithm than a foreach iteration. When this is true, it is appropriate to use them.
I'd also point out to your boss - Performance is not, and should not be an issue in any practical way - it's more a matter of expression the algorithm in a succinct, meaningful, maintainable manner. Micro-optimizations like this miss the point of performance optimization completely, since any real performance benefit will come from algorithmic redesign and refactoring, not loop restructuring.
If, after a rational discussion, there is still this authoritarian view, it is up to you as to how to proceed. Personally, I would not be happy working in an environment where rational thought is discouraged, and would consider moving to another position under a different employer. However, I strongly recommend discussion prior to getting upset - there may just be a simple misunderstanding in place.
It probably depends on the type of collection you are enumerating and the implementation of its indexer. In general though, using foreach is likely to be a better approach.
Also, it'll work with any IEnumerable - not just things with indexers.
Whether for is faster than foreach is really besides the point. I seriously doubt that choosing one over the other will make a significant impact on your performance.
The best way to optimize your application is through profiling of the actual code. That will pinpoint the methods that account for the most work/time. Optimize those first. If performance is still not acceptable, repeat the procedure.
As a general rule I would recommend to stay away from micro optimizations as they will rarely yield any significant gains. Only exception is when optimizing identified hot paths (i.e. if your profiling identifies a few highly used methods, it may make sense to optimize these extensively).
It is what you do inside the loop that affects perfomance, not the actual looping construct (assuming your case is non-trivial).
The two will run almost exactly the same way. Write some code to use both, then show him the IL. It should show comparable computations, meaning no difference in performance.
In most cases there's really no difference.
Typically you always have to use foreach when you don't have an explicit numerical index, and you always have to use for when you don't actually have an iterable collection (e.g. iterating over a two-dimensional array grid in an upper triangle). There are some cases where you have a choice.
One could argue that for loops can be a little more difficult to maintain if magic numbers start to appear in the code. You should be right to be annoyed at not being able to use a for loop and have to build a collection or use a lambda to build a subcollection instead just because for loops have been banned.
It seems a bit strange to totally forbid the use of something like a for loop.
There's an interesting article here that covers a lot of the performance differences between the two loops.
I would say personally I find foreach a bit more readable over for loops but you should use the best for the job at hand and not have to write extra long code to include a foreach loop if a for loop is more appropriate.
You can really screw with his head and go for an IQueryable .foreach closure instead:
myList.ForEach(c => Console.WriteLine(c.ToString());
for has more simple logic to implement so it's faster than foreach.
Unless you're in a specific speed optimization process, I would say use whichever method produces the easiest to read and maintain code.
If an iterator is already setup, like with one of the collection classes, then the foreach is a good easy option. And if it's an integer range you're iterating, then for is probably cleaner.
Jeffrey Richter talked the performance difference between for and foreach on a recent podcast: http://pixel8.infragistics.com/shows/everything.aspx#Episode:9317
I did test it a while ago, with the result that a for loop is much faster than a foreach loop. The cause is simple, the foreach loop first needs to instantiate an IEnumerator for the collection.
I found the foreach loop which iterating through a List faster. See my test results below. In the code below I iterate an array of size 100, 10000 and 100000 separately using for and foreach loop to measure the time.
private static void MeasureTime()
{
var array = new int[10000];
var list = array.ToList();
Console.WriteLine("Array size: {0}", array.Length);
Console.WriteLine("Array For loop ......");
var stopWatch = Stopwatch.StartNew();
for (int i = 0; i < array.Length; i++)
{
Thread.Sleep(1);
}
stopWatch.Stop();
Console.WriteLine("Time take to run the for loop is {0} millisecond", stopWatch.ElapsedMilliseconds);
Console.WriteLine(" ");
Console.WriteLine("Array Foreach loop ......");
var stopWatch1 = Stopwatch.StartNew();
foreach (var item in array)
{
Thread.Sleep(1);
}
stopWatch1.Stop();
Console.WriteLine("Time take to run the foreach loop is {0} millisecond", stopWatch1.ElapsedMilliseconds);
Console.WriteLine(" ");
Console.WriteLine("List For loop ......");
var stopWatch2 = Stopwatch.StartNew();
for (int i = 0; i < list.Count; i++)
{
Thread.Sleep(1);
}
stopWatch2.Stop();
Console.WriteLine("Time take to run the for loop is {0} millisecond", stopWatch2.ElapsedMilliseconds);
Console.WriteLine(" ");
Console.WriteLine("List Foreach loop ......");
var stopWatch3 = Stopwatch.StartNew();
foreach (var item in list)
{
Thread.Sleep(1);
}
stopWatch3.Stop();
Console.WriteLine("Time take to run the foreach loop is {0} millisecond", stopWatch3.ElapsedMilliseconds);
}
UPDATED
After #jgauffin suggestion I used #johnskeet code and found that the for loop with array is faster than following,
Foreach loop with array.
For loop with list.
Foreach loop with list.
See my test results and code below,
private static void MeasureNewTime()
{
var data = new double[Size];
var rng = new Random();
for (int i = 0; i < data.Length; i++)
{
data[i] = rng.NextDouble();
}
Console.WriteLine("Lenght of array: {0}", data.Length);
Console.WriteLine("No. of iteration: {0}", Iterations);
Console.WriteLine(" ");
double correctSum = data.Sum();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
double sum = 0;
for (int j = 0; j < data.Length; j++)
{
sum += data[j];
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("For loop with Array: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (var i = 0; i < Iterations; i++)
{
double sum = 0;
foreach (double d in data)
{
sum += d;
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("Foreach loop with Array: {0}", sw.ElapsedMilliseconds);
Console.WriteLine(" ");
var dataList = data.ToList();
sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
double sum = 0;
for (int j = 0; j < dataList.Count; j++)
{
sum += data[j];
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("For loop with List: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
double sum = 0;
foreach (double d in dataList)
{
sum += d;
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("Foreach loop with List: {0}", sw.ElapsedMilliseconds);
}
A powerful and precise way to measure time is by using the BenchmarkDotNet library.
In the following sample, I did a loop on 1,000,000,000 integer records on for/foreach and measured it with BenchmarkDotNet:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Program
{
public static void Main()
{
BenchmarkRunner.Run<LoopsBenchmarks>();
}
}
[MemoryDiagnoser]
public class LoopsBenchmarks
{
private List<int> arr = Enumerable.Range(1, 1_000_000_000).ToList();
[Benchmark]
public void For()
{
for (int i = 0; i < arr.Count; i++)
{
int item = arr[i];
}
}
[Benchmark]
public void Foreach()
{
foreach (int item in arr)
{
}
}
}
And here are the results:
Conclusion
In the example above we can see that for loop is slightly faster than foreach loop for lists. We can also see that both use the same memory allocation.
I wouldn't expect anyone to find a "huge" performance difference between the two.
I guess the answer depends on the whether the collection you are trying to access has a faster indexer access implementation or a faster IEnumerator access implementation. Since IEnumerator often uses the indexer and just holds a copy of the current index position, I would expect enumerator access to be at least as slow or slower than direct index access, but not by much.
Of course this answer doesn't account for any optimizations the compiler may implement.

Categories

Resources