Copy an array backwards? Array.Copy? - c#

I have a List<T> that I want to be able to copy to an array backwards, meaning start from List.Count and copy maybe 5 items starting at the end of the list and working its way backwards. I could do this with a simple reverse for loop; however there is probably a faster/more efficient way of doing this so I thought I should ask. Can I use Array.Copy somehow?
Originally I was using a Queue as that pops it off in the correct order I need, but I now need to pop off multiple items at once into an array and I thought a list would be faster.

Looks like Array.Reverse has native code for reversing an array which sometimes doesn't apply and would fall back to using a simple for loop. In my testing Array.Reverse is very slightly faster than a simple for loop. In this test of reversing a 1,000,000 element array 1,000 times, Array.Reverse is about 600ms whereas a for-loop is about 800ms.
I wouldn't recommend performance as a reason to use Array.Reverse though. It's a very minor difference which you'll lose the minute you load it into a List which will loop through the array again. Regardless, you shouldn't worry about performance until you've profiled your app and identified the performance bottlenecks.
public static void Test()
{
var a = Enumerable.Range(0, 1000000).ToArray();
var stopwatch = Stopwatch.StartNew();
for(int i=0; i<1000; i++)
{
Array.Reverse(a);
}
stopwatch.Stop();
Console.WriteLine("Elapsed Array.Reverse: " + stopwatch.ElapsedMilliseconds);
stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 1000; i++)
{
MyReverse(a);
}
stopwatch.Stop();
Console.WriteLine("Elapsed MyReverse: " + stopwatch.ElapsedMilliseconds);
}
private static void MyReverse(int[] a)
{
int j = a.Length - 1;
for(int i=0; i<j; i++, j--)
{
int z = a[i];
a[i] = a[j];
a[j] = z;
}
}

It is not possible to do this faster than a simple for loop.

You can accomplish it any number of ways, but the fastest way is get the elements in exactly the manner you are. You can use Array.Reverse, Array.Copy, etc., or you can use LINQ and extension methods, and both are valid alternatives, but they shouldn't be any faster.

In one of your comments:
Currently we are pulling out one result and committing it to a database one at a time
There is a big difference between using a for loop to iterate backwards over a List<T> and committing records to a database one at a time. The former is fine; nobody's endorsing the latter.
Why not just iterate first--to populate an array--and then send that array into the database, all populated?
var myArray = new T[numItemsYouWantToSend];
int arrayIndex = 0;
for (int i = myList.Count - 1; arrayIndex < myArray.Length; --i) {
if (i < 0) break;
myArray[arrayIndex++] = myList[i];
}
UpdateDatabase(myArray);

Related

time complexity of Method

I want to learn about big-o, I hope someone can help me count operators in Method and tell me what the time complexity of this method is and teach me how to count. I tried to study on Youtube and I was a bit confused.
static void SelectionSort(int[] data)
{
int temp, min;
for (int i = 0; i < data.Length - 1 ; i++)
{
min = i;
for (int j = 0; j < data.Length; j++)
{
if (data[j] < data[min])
{
min = j;
}
temp = data[min];
data[min] = data[i];
data[i] = temp;
}
}
}
first of all this is a function not a method because a method is simply a function inside a class.
the time complexity of this algorithm is O(n^2) because of the double for loop that means that this algorithm will take around n^2 operations to be done.
for example if you input an array of length 10 it will make 100 steps that's not an exact number but it means a lot if you try an array of length 100 it will make 10000 steps that means that if would take 100 more time to finish.
so the less the time complexity the faster the algorithm is.
to learn about time complexity check this video it will help a lot-->
https://www.youtube.com/watch?v=6aDHWSNKlVw&t=6s
Time Complexity will be O(n^2) for this one. Reason is you have nested loop inside another loop.
The outer loop iterates n times giving an element to the inner loop which again loops n times, per one loop of the outer array.
https://adrianmejia.com/most-popular-algorithms-time-complexity-every-programmer-should-know-free-online-tutorial-course/#Bubble-sort

2D Array vs Array of Arrays in tight loop performance C#

I had a look and couldn't see anything quite answering my question.
I'm not exactly the best at creating accurate 'real life' tests, so i'm not sure if that's the problem here. Basically I want to create a few simple neural networks to create something to the effect of Gridworld. Performance of these neural networks will be critical and i dont want the hidden layer to be a bottleneck as much as possible.
I would rather use more memory and be faster, so I opted to use arrays instead of lists (due to lists having an extra bounds check over arrays). The arrays aren't always full, but because the if statement (check if the element is null) is the same until the end, it can be predicted and there is no performance drop from that at all.
My question comes from how I store the data for the network to process. I figured due to 2D arrays storing all the data together it would be better cache wise and would run faster. But from my mock up test that an array of arrays performs much better in this scenario.
Some Code:
private void RunArrayOfArrayTest(float[][] testArray, Data[] data)
{
for (int i = 0; i < testArray.Length; i++) {
for (int j = 0; j < testArray[i].Length; j++) {
var inputTotal = data[i].bias;
for (int k = 0; k < data[i].weights.Length; k++) {
inputTotal += testArray[i][k];
}
}
}
}
private void Run2DArrayTest(float[,] testArray, Data[] data, int maxI, int maxJ)
{
for (int i = 0; i < maxI; i++) {
for (int j = 0; j < maxJ; j++) {
var inputTotal = data[i].bias;
for (int k = 0; k < maxJ; k++) {
inputTotal += testArray[i, k];
}
}
}
}
These are the two functions that are timed. Each 'creature' has its own network (The first for loop), each network has hidden nodes (The second for loop) and i need to find the sum of the weights for each input (The third loop). In my test i stripped it so that it's not really what i am doing in my actual code, but the same amount of loops happen (The data variable would have it's own 2D array, but i didn't want to possibly skew the results). From this i was trying to get a feel for which one is faster, and to my surprise the array of arrays was.
Code to start the tests:
// Array of Array test
Stopwatch timer = Stopwatch.StartNew();
RunArrayOfArrayTest(arrayOfArrays, dataArrays);
timer.Stop();
Console.WriteLine("Array of Arrays finished in: " + timer.ElapsedTicks);
// 2D Array test
timer = Stopwatch.StartNew();
Run2DArrayTest(array2D, dataArrays, NumberOfNetworks, NumberOfInputNeurons);
timer.Stop();
Console.WriteLine("2D Array finished in: " + timer.ElapsedTicks);
Just wanted to show how i was testing it. The results from this in release mode give me values like:
Array of Arrays finished in: 8972
2D Array finished in: 16376
Can someone explain to me what i'm doing wrong? Why is an array of arrays faster in this situation by so much? Isn't a 2D array all stored together, meaning it would be more cache friendly?
Note i really do need this to be fast as it needs to sum up hundreds of thousands - millions of numbers per frame, and like i said i don't want this is be a problem. I know this can be multi threaded in the future quite easily because each network is completely separate and even each node is completely separate.
Last question i suppose, would something like this be possible to run on the GPU instead? I figure a GPU would not struggle to have much larger amounts of networks with much larger numbers of input/hidden neurons.
In the CLR, there are two different types of array:
Vectors, which are zero-based, single-dimensional arrays
Arrays, which can have non-zero bases and multiple dimensions
Your "array of arrays" is a "vector of vectors" in CLR terms.
Vectors are significantly faster than arrays, basically. It's possible that arrays could be optimized further in later CLR versions, but I doubt that there'll get the same amount of love as vectors, as they're so relatively rarely used. There's not a lot you can do to make CLR arrays faster. As you say, they'll be more cache friendly, but they have this CLR penalty.
You can improve your array-of-arrays code already, however, by only performing the first indexing operation once per row:
private void RunArrayOfArrayTest(float[][] testArray, Data[] data)
{
for (int i = 0; i < testArray.Length; i++) {
// These don't change in the loop below, so extract them
var row = testArray[i];
var inputTotal = data[i].bias;
var weightLength = data[i].weights.Length;
for (int j = 0; j < row.Length; j++) {
for (int k = 0; k < weightLength; k++) {
inputTotal += row[k];
}
}
}
}
If you want to get the cache friendliness and still use a vector, you could have a single float[] and perform the indexing yourself... but I'd probably start off with the array-of-arrays approach.

Binary search slower, what am I doing wrong?

EDIT: so it looks like this is normal behavior, so can anyone just recommend a faster way to do these numerous intersections?
so my problem is this. I have 8000 lists (strings in each list). For each list (ranging from size 50 to 400), I'm comparing it to every other list and performing a calculation based on the intersection number. So I'll do
list1(intersect)list1= number
list1(intersect)list2= number
list1(intersect)list888= number
And I do this for every list. Previously, I had HashList and my code was essentially this: (well, I was actually searching through properties of an object, so I
had to modify the code a bit, but it's basically this:
I have my two versions below, but if anyone knows anything faster, please let me know!
Loop through AllLists, getting each list, starting with list1, and then do this:
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
foreach (string word in list1)
{
if (block.generator_list.Contains(word))
{
//simple integer count
}
}
}
// a little more code, but the same, but looping through the other list if it's smaller/bigger
Then I make the lists into regular lists, and applied Sort(), which changed my code to
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
for (int i = 0; i < list1_length; i++)
{
var test = list.BinarySearch(list1[i]);
if (test > -1)
{
//simple integer count
}
}
}
The first version takes about 6 seconds, the other one takes more than 20 (I just stop there cuz otherwise it would take more than a minute!!!) (and this is for a smallish subset of the data)
I'm sure there's a drastic mistake somewhere, but I can't find it.
Well I have tried three distinct methods for achieving this (assuming I understood the problem correctly). Please note I have used HashSet<int> in order to more easily generate random input.
setting up:
List<HashSet<int>> allSets = new List<HashSet<int>>();
Random rand = new Random();
for(int i = 0; i < 8000; ++i) {
HashSet<int> ints = new HashSet<int>();
for(int j = 0; j < rand.Next(50, 400); ++j) {
ints.Add(rand.Next(0, 1000));
}
allSets.Add(ints);
}
the three methods I checked (code is what runs in the inner loop):
the loop:
note that you are getting duplicated results in your code (intersecting set A with set B and later intersecting set B with set A).
It won't affect your performance thanks to the list length check you are doing. But iterating this way is clearer.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
}
}
first method:
used IEnumerable.Intersect() to get the intersection with the other list and checked IEnumerable.Count() to get the size of the intersection.
var intersect = allSets[i].Intersect(allSets[j]);
count = intersect.Count();
this was the slowest one averaging 177s
second method:
cloned the smaller set of the two sets I was intersecting, then used ISet.IntersectWith() and checked the resulting sets Count.
HashSet<int> intersect;
HashSet<int> intersectWith;
if(allSets[i].Count < allSets[j].Count) {
intersect = new HashSet<int>(allSets[i]);
intersectWith = allSets[j];
} else {
intersect = new HashSet<int>(allSets[j]);
intersectWith = allSets[i];
}
intersect.IntersectWith(intersectWith);
count = intersect.Count;
}
}
this one was slightly faster, averaging 154s
third method:
did something very similar to what you did iterated over the shorter set and checked ISet.Contains on the longer set.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
count = 0;
if(allSets[i].Count < allSets[j].Count) {
loopingSet = allSets[i];
containsSet = allSets[j];
} else {
loopingSet = allSets[j];
containsSet = allSets[i];
}
foreach(int k in loopingSet) {
if(containsSet.Contains(k)) {
++count;
}
}
}
}
this method was by far the fastest (as expected), averaging 66s
conclusion
the method you're using is the fastest of these three. I certainly can't think of a faster single threaded way to do this. Perhaps there is a better concurrent solution.
I've found that one of the most important considerations in iterating/searching any kind of collection is to choose the collection type very carefully. To iterate through a normal collection for your purposes will not be the most optimal. Try using something like:
System.Collections.Generic.HashSet<T>
Using the Contains() method while iterating over the shorter list of two (as you mentioned you're already doing) should give close to O(1) performance, the same as key lookups in the generic Dictionary type.

Fastest algorithm to create an array of known size

I need to create an array of boolean values, which could be on the scale of 100,000s or even millions of entries. It also needs to be super-fast, so every millisecond per iteration counts.
At the time of beginning the loop, I will already know how many entries there are going to be in the array. The question is, will it be faster to create a bool array up front and fill in the values by index (which is random access - could be slow?), or should I create a List<bool>, keep adding entries to the list, and at the end return .ToArray()?
In other words:
Option 1
var array = new bool[size];
for (var n=0; n<size; n++)
array[n] = GetValue(n);
return array;
Option 2
var list = new List<bool>();
for (var n=0; n<size; n++)
list.Add(GetValue(n));
return list.ToArray();
Or maybe there's a 3rd way that's even faster?
Use a System.Collections.BitArray and don't worry about speed.
What you are suggesting above will only waste your memory. This is optimizes both for speed and size, and will pack your bool values nicely (8 per byte, as the gods intended :).
Reply to below comments: If you use a BitArray, everything will be zero at first. Set only those bits for which you have GetValue == true.
The following code seems to show (at least to me) that of the methods discussed on this page, the simple allocation to a bool[] using a loop is quickest.
The code also seems to show me that unless GetValue(n) is computationally trivial, the overhead of allocating the bytes is not the part of the process I would be hoping to optimise.
Hope this helps in some way.
edit: added the results from the run (on my machine)
-- 187ms BitArray
-- 171ms List<bool>().ToArray
-- 168ms bool[] set only if true
-- 130ms bool[] always set
--11460ms bool[] always set with 'complex' GetValue()
class Program
{
static void Main(string[] args)
{
BitArray bitArray = new BitArray(10000000);
bool[] boolArray = new bool[10000000];
Stopwatch sw1 = new Stopwatch();
sw1.Start();
for (int i = 0; i < 10000000; i++)
{
bitArray[i] = GetMod2(i);
}
Console.WriteLine(sw1.ElapsedMilliseconds);
sw1.Restart();
var list = new List<bool>();
for (int i = 0; i < 10000000; i++)
list.Add(GetMod2(i));
var boolArray2 = list.ToArray();
Console.WriteLine(sw1.ElapsedMilliseconds);
sw1.Restart();
for (int i = 0; i < 10000000; i++)
{
bool nextVal = GetMod2(i);
if (nextVal)
bitArray[i] = true;
}
Console.WriteLine(sw1.ElapsedMilliseconds);
sw1.Restart();
for (int i = 0; i < 10000000; i++)
{
boolArray[i] = GetMod2(i);
}
Console.WriteLine(sw1.ElapsedMilliseconds);
sw1.Restart();
for (int i = 0; i < 10000000; i++)
{
boolArray[i] = GetRand(i);
}
Console.WriteLine(sw1.ElapsedMilliseconds);
Console.ReadLine();
}
static bool GetMod2(int i)
{
return (i % 2) == 1;
}
static bool GetRand(int i)
{
return new Random().Next(2) == 1;
}
}
Go with the first. The only reason it might be. "slow" is if it keeps paging data from outside the processor cache.
The list will have exactly the same problem, except it will also need to perform several memory allocations and copies.
Now here's a funny old thing. Inspired by #paul, I ran these benchmark tests myself, on 10,000,000 booleans. The results (in milliseconds) are very surprising, given the discussion in the comments to this question:
BitArray: 517
BitArray + CopyTo(array): 536
List + ToArray(): 455
bool array: 483
And what a turnout for the books! Despite the fact that the List<Bool> is inserting a new record every time, while the bool[] and BitArray are initialized to false on every record, and I only updated them where the value should be true, the List<bool> comes out tops, consistently, even including the .ToArray() call.
Yet another case where practical application is better than textbook knowledge, it seems... :)

In .NET, which loop runs faster, 'for' or 'foreach'?

In C#/VB.NET/.NET, which loop runs faster, for or foreach?
Ever since I read that a for loop works faster than a foreach loop a long time ago I assumed it stood true for all collections, generic collections, all arrays, etc.
I scoured Google and found a few articles, but most of them are inconclusive (read comments on the articles) and open ended.
What would be ideal is to have each scenario listed and the best solution for the same.
For example (just an example of how it should be):
for iterating an array of 1000+
strings - for is better than foreach
for iterating over IList (non generic) strings - foreach is better
than for
A few references found on the web for the same:
Original grand old article by Emmanuel Schanzer
CodeProject FOREACH Vs. FOR
Blog - To foreach or not to foreach, that is the question
ASP.NET forum - NET 1.1 C# for vs foreach
[Edit]
Apart from the readability aspect of it, I am really interested in facts and figures. There are applications where the last mile of performance optimization squeezed do matter.
Patrick Smacchia blogged about this last month, with the following conclusions:
for loops on List are a bit more than 2 times cheaper than foreach
loops on List.
Looping on array is around 2 times cheaper than looping on List.
As a consequence, looping on array using for is 5 times cheaper
than looping on List using foreach
(which I believe, is what we all do).
First, a counter-claim to Dmitry's (now deleted) answer. For arrays, the C# compiler emits largely the same code for foreach as it would for an equivalent for loop. That explains why for this benchmark, the results are basically the same:
using System;
using System.Diagnostics;
using System.Linq;
class Test
{
const int Size = 1000000;
const int Iterations = 10000;
static void Main()
{
double[] data = new double[Size];
Random rng = new Random();
for (int i=0; i < data.Length; i++)
{
data[i] = rng.NextDouble();
}
double correctSum = data.Sum();
Stopwatch sw = Stopwatch.StartNew();
for (int i=0; i < Iterations; i++)
{
double sum = 0;
for (int j=0; j < data.Length; j++)
{
sum += data[j];
}
if (Math.Abs(sum-correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("For loop: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (int i=0; i < Iterations; i++)
{
double sum = 0;
foreach (double d in data)
{
sum += d;
}
if (Math.Abs(sum-correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("Foreach loop: {0}", sw.ElapsedMilliseconds);
}
}
Results:
For loop: 16638
Foreach loop: 16529
Next, validation that Greg's point about the collection type being important - change the array to a List<double> in the above, and you get radically different results. Not only is it significantly slower in general, but foreach becomes significantly slower than accessing by index. Having said that, I would still almost always prefer foreach to a for loop where it makes the code simpler - because readability is almost always important, whereas micro-optimisation rarely is.
foreach loops demonstrate more specific intent than for loops.
Using a foreach loop demonstrates to anyone using your code that you are planning to do something to each member of a collection irrespective of its place in the collection. It also shows you aren't modifying the original collection (and throws an exception if you try to).
The other advantage of foreach is that it works on any IEnumerable, where as for only makes sense for IList, where each element actually has an index.
However, if you need to use the index of an element, then of course you should be allowed to use a for loop. But if you don't need to use an index, having one is just cluttering your code.
There are no significant performance implications as far as I'm aware. At some stage in the future it might be easier to adapt code using foreach to run on multiple cores, but that's not something to worry about right now.
Any time there's arguments over performance, you just need to write a small test so that you can use quantitative results to support your case.
Use the StopWatch class and repeat something a few million times, for accuracy. (This might be hard without a for loop):
using System.Diagnostics;
//...
Stopwatch sw = new Stopwatch()
sw.Start()
for(int i = 0; i < 1000000;i ++)
{
//do whatever it is you need to time
}
sw.Stop();
//print out sw.ElapsedMilliseconds
Fingers crossed the results of this show that the difference is negligible, and you might as well just do whatever results in the most maintainable code
It will always be close. For an array, sometimes for is slightly quicker, but foreach is more expressive, and offers LINQ, etc. In general, stick with foreach.
Additionally, foreach may be optimised in some scenarios. For example, a linked list might be terrible by indexer, but it might be quick by foreach. Actually, the standard LinkedList<T> doesn't even offer an indexer for this reason.
My guess is that it will probably not be significant in 99% of the cases, so why would you choose the faster instead of the most appropriate (as in easiest to understand/maintain)?
There are very good reasons to prefer foreach loops over for loops. If you can use a foreach loop, your boss is right that you should.
However, not every iteration is simply going through a list in order one by one. If he is forbidding for, yes that is wrong.
If I were you, what I would do is turn all of your natural for loops into recursion. That'd teach him, and it's also a good mental exercise for you.
There is unlikely to be a huge performance difference between the two. As always, when faced with a "which is faster?" question, you should always think "I can measure this."
Write two loops that do the same thing in the body of the loop, execute and time them both, and see what the difference in speed is. Do this with both an almost-empty body, and a loop body similar to what you'll actually be doing. Also try it with the collection type that you're using, because different types of collections can have different performance characteristics.
Jeffrey Richter on TechEd 2005:
"I have come to learn over the years the C# compiler is basically a liar to me." .. "It lies about many things." .. "Like when you do a foreach loop..." .. "...that is one little line of code that you write, but what the C# compiler spits out in order to do that it's phenomenal. It puts out a try/finally block in there, inside the finally block it casts your variable to an IDisposable interface, and if the cast suceeds it calls the Dispose method on it, inside the loop it calls the Current property and the MoveNext method repeatedly inside the loop, objects are being created underneath the covers. A lot of people use foreach because it's very easy coding, very easy to do.." .. "foreach is not very good in terms of performance, if you iterated over a collection instead by using square bracket notation, just doing index, that's just much faster, and it doesn't create any objects on the heap..."
On-Demand Webcast:
http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032292286&EventCategory=3&culture=en-US&CountryCode=US
you can read about it in Deep .NET - part 1 Iteration
it's cover the results (without the first initialization) from .NET source code all the way to the disassembly.
for example - Array Iteration with a foreach loop:
and - list iteration with foreach loop:
and the end results:
In cases where you work with a collection of objects, foreach is better, but if you increment a number, a for loop is better.
Note that in the last case, you could do something like:
foreach (int i in Enumerable.Range(1, 10))...
But it certainly doesn't perform better, it actually has worse performance compared to a for.
This should save you:
public IEnumerator<int> For(int start, int end, int step) {
int n = start;
while (n <= end) {
yield n;
n += step;
}
}
Use:
foreach (int n in For(1, 200, 4)) {
Console.WriteLine(n);
}
For greater win, you may take three delegates as parameters.
The differences in speed in a for- and a foreach-loop are tiny when you're looping through common structures like arrays, lists, etc, and doing a LINQ query over the collection is almost always slightly slower, although it's nicer to write! As the other posters said, go for expressiveness rather than a millisecond of extra performance.
What hasn't been said so far is that when a foreach loop is compiled, it is optimised by the compiler based on the collection it is iterating over. That means that when you're not sure which loop to use, you should use the foreach loop - it will generate the best loop for you when it gets compiled. It's more readable too.
Another key advantage with the foreach loop is that if your collection implementation changes (from an int array to a List<int> for example) then your foreach loop won't require any code changes:
foreach (int i in myCollection)
The above is the same no matter what type your collection is, whereas in your for loop, the following will not build if you changed myCollection from an array to a List:
for (int i = 0; i < myCollection.Length, i++)
This has the same two answers as most "which is faster" questions:
1) If you don't measure, you don't know.
2) (Because...) It depends.
It depends on how expensive the "MoveNext()" method is, relative to how expensive the "this[int index]" method is, for the type (or types) of IEnumerable that you will be iterating over.
The "foreach" keyword is shorthand for a series of operations - it calls GetEnumerator() once on the IEnumerable, it calls MoveNext() once per iteration, it does some type checking, and so on. The thing most likely to impact performance measurements is the cost of MoveNext() since that gets invoked O(N) times. Maybe it's cheap, but maybe it's not.
The "for" keyword looks more predictable, but inside most "for" loops you'll find something like "collection[index]". This looks like a simple array indexing operation, but it's actually a method call, whose cost depends entirely on the nature of the collection that you're iterating over. Probably it's cheap, but maybe it's not.
If the collection's underlying structure is essentially a linked list, MoveNext is dirt-cheap, but the indexer might have O(N) cost, making the true cost of a "for" loop O(N*N).
"Are there any arguments I could use to help me convince him the for loop is acceptable to use?"
No, if your boss is micromanaging to the level of telling you what programming language constructs to use, there's really nothing you can say. Sorry.
Every language construct has an appropriate time and place for usage. There is a reason the C# language has a four separate iteration statements - each is there for a specific purpose, and has an appropriate use.
I recommend sitting down with your boss and trying to rationally explain why a for loop has a purpose. There are times when a for iteration block more clearly describes an algorithm than a foreach iteration. When this is true, it is appropriate to use them.
I'd also point out to your boss - Performance is not, and should not be an issue in any practical way - it's more a matter of expression the algorithm in a succinct, meaningful, maintainable manner. Micro-optimizations like this miss the point of performance optimization completely, since any real performance benefit will come from algorithmic redesign and refactoring, not loop restructuring.
If, after a rational discussion, there is still this authoritarian view, it is up to you as to how to proceed. Personally, I would not be happy working in an environment where rational thought is discouraged, and would consider moving to another position under a different employer. However, I strongly recommend discussion prior to getting upset - there may just be a simple misunderstanding in place.
It probably depends on the type of collection you are enumerating and the implementation of its indexer. In general though, using foreach is likely to be a better approach.
Also, it'll work with any IEnumerable - not just things with indexers.
Whether for is faster than foreach is really besides the point. I seriously doubt that choosing one over the other will make a significant impact on your performance.
The best way to optimize your application is through profiling of the actual code. That will pinpoint the methods that account for the most work/time. Optimize those first. If performance is still not acceptable, repeat the procedure.
As a general rule I would recommend to stay away from micro optimizations as they will rarely yield any significant gains. Only exception is when optimizing identified hot paths (i.e. if your profiling identifies a few highly used methods, it may make sense to optimize these extensively).
It is what you do inside the loop that affects perfomance, not the actual looping construct (assuming your case is non-trivial).
The two will run almost exactly the same way. Write some code to use both, then show him the IL. It should show comparable computations, meaning no difference in performance.
In most cases there's really no difference.
Typically you always have to use foreach when you don't have an explicit numerical index, and you always have to use for when you don't actually have an iterable collection (e.g. iterating over a two-dimensional array grid in an upper triangle). There are some cases where you have a choice.
One could argue that for loops can be a little more difficult to maintain if magic numbers start to appear in the code. You should be right to be annoyed at not being able to use a for loop and have to build a collection or use a lambda to build a subcollection instead just because for loops have been banned.
It seems a bit strange to totally forbid the use of something like a for loop.
There's an interesting article here that covers a lot of the performance differences between the two loops.
I would say personally I find foreach a bit more readable over for loops but you should use the best for the job at hand and not have to write extra long code to include a foreach loop if a for loop is more appropriate.
You can really screw with his head and go for an IQueryable .foreach closure instead:
myList.ForEach(c => Console.WriteLine(c.ToString());
for has more simple logic to implement so it's faster than foreach.
Unless you're in a specific speed optimization process, I would say use whichever method produces the easiest to read and maintain code.
If an iterator is already setup, like with one of the collection classes, then the foreach is a good easy option. And if it's an integer range you're iterating, then for is probably cleaner.
Jeffrey Richter talked the performance difference between for and foreach on a recent podcast: http://pixel8.infragistics.com/shows/everything.aspx#Episode:9317
I did test it a while ago, with the result that a for loop is much faster than a foreach loop. The cause is simple, the foreach loop first needs to instantiate an IEnumerator for the collection.
I found the foreach loop which iterating through a List faster. See my test results below. In the code below I iterate an array of size 100, 10000 and 100000 separately using for and foreach loop to measure the time.
private static void MeasureTime()
{
var array = new int[10000];
var list = array.ToList();
Console.WriteLine("Array size: {0}", array.Length);
Console.WriteLine("Array For loop ......");
var stopWatch = Stopwatch.StartNew();
for (int i = 0; i < array.Length; i++)
{
Thread.Sleep(1);
}
stopWatch.Stop();
Console.WriteLine("Time take to run the for loop is {0} millisecond", stopWatch.ElapsedMilliseconds);
Console.WriteLine(" ");
Console.WriteLine("Array Foreach loop ......");
var stopWatch1 = Stopwatch.StartNew();
foreach (var item in array)
{
Thread.Sleep(1);
}
stopWatch1.Stop();
Console.WriteLine("Time take to run the foreach loop is {0} millisecond", stopWatch1.ElapsedMilliseconds);
Console.WriteLine(" ");
Console.WriteLine("List For loop ......");
var stopWatch2 = Stopwatch.StartNew();
for (int i = 0; i < list.Count; i++)
{
Thread.Sleep(1);
}
stopWatch2.Stop();
Console.WriteLine("Time take to run the for loop is {0} millisecond", stopWatch2.ElapsedMilliseconds);
Console.WriteLine(" ");
Console.WriteLine("List Foreach loop ......");
var stopWatch3 = Stopwatch.StartNew();
foreach (var item in list)
{
Thread.Sleep(1);
}
stopWatch3.Stop();
Console.WriteLine("Time take to run the foreach loop is {0} millisecond", stopWatch3.ElapsedMilliseconds);
}
UPDATED
After #jgauffin suggestion I used #johnskeet code and found that the for loop with array is faster than following,
Foreach loop with array.
For loop with list.
Foreach loop with list.
See my test results and code below,
private static void MeasureNewTime()
{
var data = new double[Size];
var rng = new Random();
for (int i = 0; i < data.Length; i++)
{
data[i] = rng.NextDouble();
}
Console.WriteLine("Lenght of array: {0}", data.Length);
Console.WriteLine("No. of iteration: {0}", Iterations);
Console.WriteLine(" ");
double correctSum = data.Sum();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
double sum = 0;
for (int j = 0; j < data.Length; j++)
{
sum += data[j];
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("For loop with Array: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (var i = 0; i < Iterations; i++)
{
double sum = 0;
foreach (double d in data)
{
sum += d;
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("Foreach loop with Array: {0}", sw.ElapsedMilliseconds);
Console.WriteLine(" ");
var dataList = data.ToList();
sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
double sum = 0;
for (int j = 0; j < dataList.Count; j++)
{
sum += data[j];
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("For loop with List: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
double sum = 0;
foreach (double d in dataList)
{
sum += d;
}
if (Math.Abs(sum - correctSum) > 0.1)
{
Console.WriteLine("Summation failed");
return;
}
}
sw.Stop();
Console.WriteLine("Foreach loop with List: {0}", sw.ElapsedMilliseconds);
}
A powerful and precise way to measure time is by using the BenchmarkDotNet library.
In the following sample, I did a loop on 1,000,000,000 integer records on for/foreach and measured it with BenchmarkDotNet:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Program
{
public static void Main()
{
BenchmarkRunner.Run<LoopsBenchmarks>();
}
}
[MemoryDiagnoser]
public class LoopsBenchmarks
{
private List<int> arr = Enumerable.Range(1, 1_000_000_000).ToList();
[Benchmark]
public void For()
{
for (int i = 0; i < arr.Count; i++)
{
int item = arr[i];
}
}
[Benchmark]
public void Foreach()
{
foreach (int item in arr)
{
}
}
}
And here are the results:
Conclusion
In the example above we can see that for loop is slightly faster than foreach loop for lists. We can also see that both use the same memory allocation.
I wouldn't expect anyone to find a "huge" performance difference between the two.
I guess the answer depends on the whether the collection you are trying to access has a faster indexer access implementation or a faster IEnumerator access implementation. Since IEnumerator often uses the indexer and just holds a copy of the current index position, I would expect enumerator access to be at least as slow or slower than direct index access, but not by much.
Of course this answer doesn't account for any optimizations the compiler may implement.

Categories

Resources