Related
I am using c# to go through a loop and do something (this loop is massive, sometimes as big as 1,000,000 records long). I wanted to replace the inline code with code that does the exact same thing, except in a function.
I am guessing there is a slight decrease in performance, but will it actually be noticeable?
If I have a loop:
public void main()
{
int x = 0;
for (int i = 0; i < 1000; i++)
{
x += 1;
}
}
Would my loop slow down if I did the same thing except this time making use of a function?
public void main()
{
int x = 0;
for (int i = 0; i < 1000; i++)
{
x = incrementInt(x);
}
}
public int incrementInt(int x)
{
return x + 1;
}
EDIT:
Fixed logic bug, sorry for that.
A method call will always slow you down. But the JIT compiler can inline your method if a set of conditions is fullfilled which results in assembly code which is equivalent to your first example (if you fix the logic bug in your example).
The question you are indirectly asking is under which circumstances my method is inlined? There are many different rules but the easiest way to be sure that inlining does work is that you measure it.
You can also use PerfView to find out for each method why it was not inlined. You can give the JIT compiler a hint to relax some of the rules and to inline a method with .NET 4.5
See http://blogs.microsoft.co.il/sasha/2012/01/20/aggressive-inlining-in-the-clr-45-jit/
There are some conditions described which prevent inlining:
Methods marked with MethodImplOptions.NoInlining
Methods larger than 32 bytes of IL
Virtual methods
Methods that take a large value type as a parameter
Methods on MarshalByRef classes
Methods with complicated flowgraphs
Methods meeting other, more exotic criteria
If you follow the rules and measure carefully you can write highly performant code while keeping readable and maintainable code.
I have written a test application and run the performance analyzer on the code and the function call is slower than the loop (Although as mentioned above the two do different things.)
It is very simple to analyze these things in VS2012. Just click the "ANALYZE" menu item and select "Start Performance Analysis".
Calling a function is slower than not calling it, but you can really ignore this.
In C#, I have got a collection of unique elements and I want to efficeiently execute some code for each unordered pair. For instance, if my container holds {a,b,c}, the unordered pairs are (a,b), (a,c) and (b,c). The problem arises in the scope of performing a 2-opt optimization, thus efficency is an issue.
My current solution looks like:
foreach(var a in container)
{
foreach(var b in container)
{
if (a < b)
{
// execute code
}
}
}
Obviously, this can be modified easily if the operator [] is available to get the i-th element (i.e. if the underlying data structure is a list.) But for all other containers, the solution is dependent on the existence of some comparison function and not very efficient.
I've also tried a formulation based on a LINQ statement that generates each desired pair exactly once. However, as expected, this was much slower than the first approach. This applies to a solution using ElementAt too.
Edit: here is the (improved) LINQ code that was used:
var x = from a in container
from b in container
where a < b
select new KeyValuePair<int,int>(a,b);
Still, execution is slower by a factor of 3-5 compared to the other solutions.
Here is the way I would do it in C++ (obtaining good efficiency):
for(auto it1 = container.begin(); it1!=container.end(); ++it1)
{
auto it2 = it1;
for(++it2; it2!=container.end(); ++it2)
{
// execute code
}
}
Unfortunatelly, to transform this into C#, it would be required to clone the (internally used) Enumerator, which is not supported by the language itself.
Has anyone a better idea / solution?
Did you try to copy the elements into a list first and then do the algorithm with the indexer ([i]) operator? Since the algorithm has quadratic runtime anyways it may be negligible to have a linear copy operation in front of it. You would have to find out the actual runtime for small, middle and large containers yourself...
I think it may be worth a try, this may well be a lot faster than working with the comparison operator each time.
You could also check if the container is of type IList<T> and jump past the copy operation.
If you don't care about the order, you can do it like this:
int i = 0;
foreach (var a in list)
{
int j = 0;
foreach (var b in list)
{
if (i <= j)
break;
// execute code
j++;
}
i++;
}
If you do care about the order, you can limit yourself to collections that implement IList<T>, which contains the [] operator. Or you could copy the collection into a List<T> first, and then work with that.
Enumerators in C# are not the same with C++ enumerators from your question. In C# you have neither begin nor end elements of container. You only have Current element and Next() method. It allows you to yield much more sequences, for example - you can enumerate through infinity sequence of random numbers, which obviously has not begin or end.
So - you can't do it in C# like in your C++ code using only IEnumerable classes. The best way to do it is to use System.Collection.Generics.IList<T> interface. Many of types (such as arrays) inherit this interface.
If you use IEnumerable then inside your type you (in most cases) iterate through some collection. If you do this - you can just implement IList<T> interface.
There is another solution - in C# lists and arrays of reference types contains only reference to object. So - you can copy your data to local list and operate with it. But it depends on your memory and performance requirements.
Put your items into a List or IList, and then you can access them using indices in a very similar pattern to your C++ code.
for (int i = 0; i < container.Count(); i++)
for (int j = i+1; j < container.Count(); j++)
{
var item1 = container.Item[i];
var item2 = container.Item[j];
}
I'd expect it to be more efficient to iterate an ordered collection using indices rather than the n^2 comparisons. The efficiency of the comparison operator is important but you shouldn't need to compare at all.
I am making a XNA game and I am calling following code 2 to 20 times per update. I tried googling and it seems like this is semi-slow, so I just thought I'd ask if there is any faster way to compare types?
Code:
public Modifier this[Type type]
{
get
{
for (int i = 0; i < this.Count; i++)
{
if (this[i].GetType() == type)
{
return this[i];
}
}
throw new NotImplementedException("Fix this");
}
set
{
for (int i = 0; i < this.Count; i++)
{
if (this[i].GetType() == type)
{
this[i] = value;
}
}
if(System.Diagnostics.Debugger.IsAttached)
System.Diagnostics.Debugger.Break();
}
}
This code is in ModifierCollection class which inherits from a List. Modifier is a part of particle engine. Also, my game isnt in condition where I can actually test this yet so I cant test this, but this should work right?
I read something about RunTimeTypeHandles which should be faster, should I use it?
EDIT: What I am aiming to do with this is that I can do the following:
(particleEffect["NameOfEmitter"].Modifiers[typeof(SomeCoolModifier)] as SomeCoolModifier).Variable = Value;
Basically I just want to change the value of some Modifiers in runtime.
EDIT 2: I just realized that I can just save the reference of Modifier to the class where I am at the moment calling this :P Maybe not as clean code if I have 5-10 modifiers but should remove this problem.
If you don't need any of the extra functionality exposed by Type, and you're only concerned with absolute equality between types--i.e., you don't need to support inheritance--RuntimeTypeHandle is the fastest way to do this comparison.
Really, though, I would question whether this isn't a weakness of your class design. Unless you have a compelling reason to check the type directly, it's probably better to expose some sort of value (probably an enum) on your objects that represents what they are, and do your comparisons against that.
If you want to be really fast and can trust the code that is calling you, change the indexer to just take an int. Then in whatever method (which you didn't show) that callers use to add Types to the list, return back to them the corresponding int. It's a worse API but it means you don't have to do any loops or lookups.
You could store the values in a dictionary indexed by type rather than a list so you wouldn't have to do an O(n) iteration over the list each time.
As noted in the comments, this does depend on the size of n and may be a micro-optimization. I'd recommend profiling your application.
The question field is a bit too short to pose my real question. If anyone can recapitulate it better, please feel free.
My real question is this: I'm reading a lot of other people's code in C# these days, and I have noticed that one particular form of iteration is widely spread, (see in code).
My first question is:
Are all these iterations equivalent?
And my second is: why prefer the first? Has it something to do with readibility? Now I don't believe the first form is more readable then the for-form once you get used to it, and readibility is far too much a subjective item in these constructs, of course, what you use the most will seem more readable, but I can assure everyone that the for-form is at least as readable, since it has all in one line, and you can even read the initializing in the construct.
Thus the second question: why is the 3rd form seen much less in code?
// the 'widespread' construct
int nr = getNumber();
while (NotZero(nr))
{
Console.Write(1/nr);
nr = getNumber();
}
// the somewhat shorter form
int nr;
while (NotZero(nr = getNumber()))
Console.Write(1 / nr);
// the for - form
for (int nr = getNumber(); NotZero(nr); nr = getNumber())
Console.Write(1 / nr);
The first and third forms you've shown repeat the call to GetNumber. I prefer the second form, although it has the disadvantage of using a side-effect within a condition of course. However I pretty much only do that with a while loop. Usually I don't end up passing the result as an argument though - the common situations I find myself in are:
string line;
while ( (line = reader.ReadLine()) != null)
...
and
int bytesRead;
while ( (bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
...
Both of these are now so idiomatic to me that they don't cause me any problems - and as I say, they allow me to only state each piece of logic once.
If you don't like the variable having too much scope, you can just introduce an extra block:
{
int bytesRead;
while ( (bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
{
// Body
}
}
Personally I don't tend to do this - the "too-wide" scope doesn't bother me that much.
I suspect it wouldn't be too hard to write a method to encapsulate all of this. Something like:
ForEach(() => reader.ReadLine(), // Way to obtain a value
line => line != null, // Condition
line =>
{
// body
};
Mind you, for line reading I have a class which helps:
foreach (string line in new LineReader(file))
{
// body
}
(It doesn't just work with files - it's pretty flexible.)
Are all this iterations equivalents?
yes
why prefer the first? Has it sth. to do with readibility?
because you may want to extend the scope of the nr var beyond the while loop?
why is the 3th form seen much less in code?
it is equivalent, same statements!
You may prefer the latter because you don't want to extend the scope of the nr variable
I think that the third form (for-loop) is the best of these alternatives, because it puts things into the right scope. On the other hand, having to repeat the call to getNumber() is a bit awkward, too.
Generally, I think that explicit looping is widely overused. High-level languages should provide mapping, filtering, and reducing. When these high level constructs are applicable and available, looping instead is like using goto instead of looping.
If mapping, filtering, or reducing is not applicable, I would perhaps write a little macro for this kind of loop (C# doesn't have those, though, does it?).
I offer another alternative
foreach (var x in InitInfinite(() => GetNumber()).TakeWhile(NotZero))
{
Console.WriteLine(1.0/x);
}
where InitInfinite is a trivial helper function. Whole program:
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static IEnumerable<T> InitInfinite<T>(Func<T> f)
{
while (true)
{
yield return f();
}
}
static int N = 5;
static int GetNumber()
{
N--;
return N;
}
static bool NotZero(int n) { return n != 0; }
static void Main(string[] args)
{
foreach (var x in InitInfinite(() => GetNumber()).TakeWhile(NotZero))
{
Console.WriteLine(1.0/x);
}
}
}
I think people use the while() loop often because it best represents the way you would visualize the task in your head. I think think there is any performance benefits for using it over any other loop structure.
Here is a random speculation:
When I write C# code, the only two looping constructs I write are while() and foreach(). That is, no one uses 'for' any more, since 'foreach' often works and is often superior. (This is an overgeneralization, but it has a core of truth.) As a result, my brain has to strain to read any 'for' loop because it's unfamiliar.
As for why (1) and (2) are "preferred" over (3), my feeling is that most people think of the latter as a way to iterate over a range, using the condition to define the range, rather than continuing to iterate over a block while some condition still holds. The keyword semantics lend themselves to this interpretation and I suspect that, partly because of that, people find that the expressions are most readable in that context. For instance, I would never use (1) or (2) to iterate over a range, though I could.
Between (1) and (2), I'm torn. I used to use (2) (in C) most often due to the compactness, but now (in C#) I generally write (1). I suppose that I've come to value readability over compactness and (1) seems easier to parse quickly and thus more readable to my mind even though I do end up repeating a small amount of logic.
Honestly, I rarely write while statements anymore, typically using foreach -- or LINQ -- in the cases where while statements would previously been used. Come to think of it, I'm not sure I use many for statements, either, except in unit tests where I'm generating some fixed number of a test object.
There are apparently many ways to iterate over a collection. Curious if there are any differences, or why you'd use one way over the other.
First type:
List<string> someList = <some way to init>
foreach(string s in someList) {
<process the string>
}
Other Way:
List<string> someList = <some way to init>
someList.ForEach(delegate(string s) {
<process the string>
});
I suppose off the top of my head, that instead of the anonymous delegate I use above, you'd have a reusable delegate you could specify...
There is one important, and useful, distinction between the two.
Because .ForEach uses a for loop to iterate the collection, this is valid (edit: prior to .net 4.5 - the implementation changed and they both throw):
someList.ForEach(x => { if(x.RemoveMe) someList.Remove(x); });
whereas foreach uses an enumerator, so this is not valid:
foreach(var item in someList)
if(item.RemoveMe) someList.Remove(item);
tl;dr: Do NOT copypaste this code into your application!
These examples aren't best practice, they are just to demonstrate the differences between ForEach() and foreach.
Removing items from a list within a for loop can have side effects. The most common one is described in the comments to this question.
Generally, if you are looking to remove multiple items from a list, you would want to separate the determination of which items to remove from the actual removal. It doesn't keep your code compact, but it guarantees that you do not miss any items.
We had some code here (in VS2005 and C#2.0) where the previous engineers went out of their way to use list.ForEach( delegate(item) { foo;}); instead of foreach(item in list) {foo; }; for all the code that they wrote. e.g. a block of code for reading rows from a dataReader.
I still don't know exactly why they did this.
The drawbacks of list.ForEach() are:
It is more verbose in C# 2.0. However, in C# 3 onwards, you can use the "=>" syntax to make some nicely terse expressions.
It is less familiar. People who have to maintain this code will wonder why you did it that way. It took me awhile to decide that there wasn't any reason, except maybe to make the writer seem clever (the quality of the rest of the code undermined that). It was also less readable, with the "})" at the end of the delegate code block.
See also Bill Wagner's book "Effective C#: 50 Specific Ways to Improve Your C#" where he talks about why foreach is preferred to other loops like for or while loops - the main point is that you are letting the compiler decide the best way to construct the loop. If a future version of the compiler manages to use a faster technique, then you will get this for free by using foreach and rebuilding, rather than changing your code.
a foreach(item in list) construct allows you to use break or continue if you need to exit the iteration or the loop. But you cannot alter the list inside a foreach loop.
I'm surprised to see that list.ForEach is slightly faster. But that's probably not a valid reason to use it throughout , that would be premature optimisation. If your application uses a database or web service that, not loop control, is almost always going to be be where the time goes. And have you benchmarked it against a for loop too? The list.ForEach could be faster due to using that internally and a for loop without the wrapper would be even faster.
I disagree that the list.ForEach(delegate) version is "more functional" in any significant way. It does pass a function to a function, but there's no big difference in the outcome or program organisation.
I don't think that foreach(item in list) "says exactly how you want it done" - a for(int 1 = 0; i < count; i++) loop does that, a foreach loop leaves the choice of control up to the compiler.
My feeling is, on a new project, to use foreach(item in list) for most loops in order to adhere to the common usage and for readability, and use list.Foreach() only for short blocks, when you can do something more elegantly or compactly with the C# 3 "=>" operator. In cases like that, there may already be a LINQ extension method that is more specific than ForEach(). See if Where(), Select(), Any(), All(), Max() or one of the many other LINQ methods doesn't already do what you want from the loop.
As they say, the devil is in the details...
The biggest difference between the two methods of collection enumeration is that foreach carries state, whereas ForEach(x => { }) does not.
But lets dig a little deeper, because there are some things you should be aware of that can influence your decision, and there are some caveats you should be aware of when coding for either case.
Lets use List<T> in our little experiment to observe behavior. For this experiment, I am using .NET 4.7.2:
var names = new List<string>
{
"Henry",
"Shirley",
"Ann",
"Peter",
"Nancy"
};
Lets iterate over this with foreach first:
foreach (var name in names)
{
Console.WriteLine(name);
}
We could expand this into:
using (var enumerator = names.GetEnumerator())
{
}
With the enumerator in hand, looking under the covers we get:
public List<T>.Enumerator GetEnumerator()
{
return new List<T>.Enumerator(this);
}
internal Enumerator(List<T> list)
{
this.list = list;
this.index = 0;
this.version = list._version;
this.current = default (T);
}
public bool MoveNext()
{
List<T> list = this.list;
if (this.version != list._version || (uint) this.index >= (uint) list._size)
return this.MoveNextRare();
this.current = list._items[this.index];
++this.index;
return true;
}
object IEnumerator.Current
{
{
if (this.index == 0 || this.index == this.list._size + 1)
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumOpCantHappen);
return (object) this.Current;
}
}
Two things become immediate evident:
We are returned a stateful object with intimate knowledge of the underlying collection.
The copy of the collection is a shallow copy.
This is of course in no way thread safe. As was pointed out above, changing the collection while iterating is just bad mojo.
But what about the problem of the collection becoming invalid during iteration by means outside of us mucking with the collection during iteration? Best practices suggests versioning the collection during operations and iteration, and checking versions to detect when the underlying collection changes.
Here's where things get really murky. According to the Microsoft documentation:
If changes are made to the collection, such as adding, modifying, or
deleting elements, the behavior of the enumerator is undefined.
Well, what does that mean? By way of example, just because List<T> implements exception handling does not mean that all collections that implement IList<T> will do the same. That seems to be a clear violation of the Liskov Substitution Principle:
Objects of a superclass shall be replaceable with objects of its
subclasses without breaking the application.
Another problem is that the enumerator must implement IDisposable -- that means another source of potential memory leaks, not only if the caller gets it wrong, but if the author does not implement the Dispose pattern correctly.
Lastly, we have a lifetime issue... what happens if the iterator is valid, but the underlying collection is gone? We now a snapshot of what was... when you separate the lifetime of a collection and its iterators, you are asking for trouble.
Lets now examine ForEach(x => { }):
names.ForEach(name =>
{
});
This expands to:
public void ForEach(Action<T> action)
{
if (action == null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
int version = this._version;
for (int index = 0; index < this._size && (version == this._version || !BinaryCompatibility.TargetsAtLeast_Desktop_V4_5); ++index)
action(this._items[index]);
if (version == this._version || !BinaryCompatibility.TargetsAtLeast_Desktop_V4_5)
return;
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
}
Of important note is the following:
for (int index = 0; index < this._size && ... ; ++index)
action(this._items[index]);
This code does not allocate any enumerators (nothing to Dispose), and does not pause while iterating.
Note that this also performs a shallow copy of the underlying collection, but the collection is now a snapshot in time. If the author does not correctly implement a check for the collection changing or going 'stale', the snapshot is still valid.
This doesn't in any way protect you from the problem of the lifetime issues... if the underlying collection disappears, you now have a shallow copy that points to what was... but at least you don't have a Dispose problem to deal with on orphaned iterators...
Yes, I said iterators... sometimes its advantageous to have state. Suppose you want to maintain something akin to a database cursor... maybe multiple foreach style Iterator<T>'s is the way to go. I personally dislike this style of design as there are too many lifetime issues, and you rely on the good graces of the authors of the collections you are relying on (unless you literally write everything yourself from scratch).
There is always a third option...
for (var i = 0; i < names.Count; i++)
{
Console.WriteLine(names[i]);
}
It ain't sexy, but its got teeth (apologies to Tom Cruise and the movie The Firm)
Its your choice, but now you know and it can be an informed one.
For fun, I popped List into reflector and this is the resulting C#:
public void ForEach(Action<T> action)
{
if (action == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
}
for (int i = 0; i < this._size; i++)
{
action(this._items[i]);
}
}
Similarly, the MoveNext in Enumerator which is what is used by foreach is this:
public bool MoveNext()
{
if (this.version != this.list._version)
{
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
}
if (this.index < this.list._size)
{
this.current = this.list._items[this.index];
this.index++;
return true;
}
this.index = this.list._size + 1;
this.current = default(T);
return false;
}
The List.ForEach is much more trimmed down than MoveNext - far less processing - will more likely JIT into something efficient..
In addition, foreach() will allocate a new Enumerator no matter what. The GC is your friend, but if you're doing the same foreach repeatedly, this will make more throwaway objects, as opposed to reusing the same delegate - BUT - this is really a fringe case. In typical usage you will see little or no difference.
I guess the someList.ForEach() call could be easily parallelized whereas the normal foreach is not that easy to run parallel.
You could easily run several different delegates on different cores, which is not that easy to do with a normal foreach.
Just my 2 cents
I know two obscure-ish things that make them different. Go me!
Firstly, there's the classic bug of making a delegate for each item in the list. If you use the foreach keyword, all your delegates can end up referring to the last item of the list:
// A list of actions to execute later
List<Action> actions = new List<Action>();
// Numbers 0 to 9
List<int> numbers = Enumerable.Range(0, 10).ToList();
// Store an action that prints each number (WRONG!)
foreach (int number in numbers)
actions.Add(() => Console.WriteLine(number));
// Run the actions, we actually print 10 copies of "9"
foreach (Action action in actions)
action();
// So try again
actions.Clear();
// Store an action that prints each number (RIGHT!)
numbers.ForEach(number =>
actions.Add(() => Console.WriteLine(number)));
// Run the actions
foreach (Action action in actions)
action();
The List.ForEach method doesn't have this problem. The current item of the iteration is passed by value as an argument to the outer lambda, and then the inner lambda correctly captures that argument in its own closure. Problem solved.
(Sadly I believe ForEach is a member of List, rather than an extension method, though it's easy to define it yourself so you have this facility on any enumerable type.)
Secondly, the ForEach method approach has a limitation. If you are implementing IEnumerable by using yield return, you can't do a yield return inside the lambda. So looping through the items in a collection in order to yield return things is not possible by this method. You'll have to use the foreach keyword and work around the closure problem by manually making a copy of the current loop value inside the loop.
More here
You could name the anonymous delegate :-)
And you can write the second as:
someList.ForEach(s => s.ToUpper())
Which I prefer, and saves a lot of typing.
As Joachim says, parallelism is easier to apply to the second form.
List.ForEach() is considered to be more functional.
List.ForEach() says what you want done. foreach(item in list) also says exactly how you want it done. This leaves List.ForEach free to change the implementation of the how part in the future. For example, a hypothetical future version of .Net might always run List.ForEach in parallel, under the assumption that at this point everyone has a number of cpu cores that are generally sitting idle.
On the other hand, foreach (item in list) gives you a little more control over the loop. For example, you know that the items will be iterated in some kind of sequential order, and you could easily break in the middle if an item meets some condition.
Some more recent remarks on this issue are available here:
https://stackoverflow.com/a/529197/3043
The entire ForEach scope (delegate function) is treated as a single line of code (calling the function), and you cannot set breakpoints or step into the code. If an unhandled exception occurs the entire block is marked.
Behind the scenes, the anonymous delegate gets turned into an actual method so you could have some overhead with the second choice if the compiler didn't choose to inline the function. Additionally, any local variables referenced by the body of the anonymous delegate example would change in nature because of compiler tricks to hide the fact that it gets compiled to a new method. More info here on how C# does this magic:
http://blogs.msdn.com/oldnewthing/archive/2006/08/04/688527.aspx
The ForEach function is member of the generic class List.
I have created the following extension to reproduce the internal code:
public static class MyExtension<T>
{
public static void MyForEach(this IEnumerable<T> collection, Action<T> action)
{
foreach (T item in collection)
action.Invoke(item);
}
}
So a the end we are using a normal foreach (or a loop for if you want).
On the other hand, using a delegate function is just another way to define a function, this code:
delegate(string s) {
<process the string>
}
is equivalent to:
private static void myFunction(string s, <other variables...>)
{
<process the string>
}
or using labda expressions:
(s) => <process the string>
The second way you showed uses an extension method to execute the delegate method for each of the elements in the list.
This way, you have another delegate (=method) call.
Additionally, there is the possibility to iterate the list with a for loop.
One thing to be wary of is how to exit from the Generic .ForEach method - see this discussion. Although the link seems to say that this way is the fastest. Not sure why - you'd think they would be equivalent once compiled...