I came across a method to change a list in a foreach loop by converting to a list in itself like this:
foreach (var item in myList.ToList())
{
//add or remove items from myList
}
(If you attempt to modify myList directly an error is thrown since the enumerator basically locks it)
This works because it's not the original myList that's being modified. My question is, does this method create garbage when the loop is over (namely from the List that's returned from the ToList method? For small loops, would it be preferable to using a for loop to avoid the creation of garbage?
The second list is going to be garbage, there will be garbage for an enumerator that is used in building the second list, and add in the enumerator that the foreach would spawn, which you would have had with or without the second list.
Should you switch to a for? Maybe, if you can point to this region of code being a true performance bottleneck. Otherwise, code for simplicity and maintainability.
Yes. ToList() would create another list that would need to be garbage collected.
That's an interesting technique which I will keep in mind for the future! (I can't believe I've never thought of that!)
Anyway, yes, the list that you are building doesn't magically unallocate itself. The possible performance problems with this technique are:
Increased memory usage (building a List, separate from the IEnumerable). Probably not that big of a deal, unless you do this very frequently, or the IEnumerable is very large.
Decreased speed, since it has to go through the IEnumerable at once to build the List.
Also, if enumerating the IEnumerable has side effects, they will all be triggered by this process.
Unless this is actually inside an inner loop, or you're working with very large data sets, you can probably do this without any problems.
Yes, the ToList() method creates "garbage". I would just indexing.
for (int i = MyList.Count - 1; 0 <= i; --i)
{
var item = MyList[i];
//add or remove items from myList
}
It's non-deterministic. But the reference created from the call ToList() will be GCd eventually.
I wouldn't worry about it too much, since all it would be holding at most would be references or small value types.
Related
Hi guys I have found an issue that I'm unable to explain logically. In the following snippet flpRecordIndexes is a FlowLayoutPabel that contains lots of RecordIndexControl (a user control that I created). I want to delete everything except the first control. The same idea with flpRecordContainer.
If I execute this (without the ToList call), it only removes half of the controls, if it was a sequence for example it will remove (2,4,6,8) etc.
foreach (var recordIndexControl in flpRecordIndexes.Controls.Cast<RecordIndexControl>().Skip(1))
{
flpRecordIndexes.Controls.Remove(recordIndexControl);
}
foreach (var recordControl in flpRecordContainer.Controls.Cast<RecordControl>().Skip(1))
{
flpRecordContainer.Controls.Remove(recordControl);
}
If I execute this (with the ToList), it removes everything except the first control, what I wanted.
foreach (var recordIndexControl in flpRecordIndexes.Controls.Cast<RecordIndexControl>().ToList().Skip(1))
{
flpRecordIndexes.Controls.Remove(recordIndexControl);
}
foreach (var recordControl in flpRecordContainer.Controls.Cast<RecordControl>().ToList().Skip(1))
{
flpRecordContainer.Controls.Remove(recordControl);
}
Why calling Cast without ToList produce this behavior?
This is entirely normal, you are modifying the collection you are iterating with the Controls.Remove() call. The Controls collection behaves different from other framework collections, it doesn't throw an exception when you do this. So in effect you remove every other control, depending on the mix.
The ToList() call creates a copy of the Controls collection, it is no longer affected by the Remove() calls. It is the correct workaround.
Do keep in mind that you most likely have a nasty leak. The controls you remove must be disposed. You can no longer rely on Winforms doing this for you, it can't since they are no longer in the Controls collection. Failure to dispose them is a permanent leak, the garbage collector cannot help.
Why calling Cast without ToList produce this behavior?
Invoking ToList() materializes the collection, whereas Cast<T> does not. Once ToList() is invoked the list is solidified so to speak and you have a finite number in the list.
I would suggest iterating the Control.Controls via a for loop instead of a foreach. This will avoid the issue you're seeing entirely and is actually more performant. The ControlCollection class inherits IList so you should be good with that.
for (var index = Controls.Count - 1; index >= 1; -- index)
{
flpRecordContainer.Controls.RemoveAt(index);
}
Note the index >= 1 to ensure we leave the first control in the list.
Regarding the collections implementing this[int] and assuming the collection won't change during the enumeration, does the foreach (var item in list) loop produce the same sequence as for (var i = 0; i < list.Count; ++i) anytime?
This means, when I need the ascending order by index, could I use foreach or is just simply safer to use for? Or it just depends on the curren collection implementation and migh vary or change in time?
foreach (var item in list)
{
// do things
}
translates to
var enumerator = list.GetEnumerator();
while(enumerator.MoveNext())
{
var item = enumerator.Current;
// do things
}
So as you can see, it's not using the indexor list[i] in the general case.
For most collections types, however, the semantics is the same.
edit
There are IList<T> implementations where the enumerator IList<T> as a linked list, it's very unlikely you will use the indexor in your enumerator implementation, as it would be very inefficient.
As a rule of thumb, using foreach ensure you use the most efficient algorithm for the class at hand, as it is the one chosen by the class' Creator. In the worst case, you will just suffer a small indirection overhead that is very unlikely to be noticeable.
edit 2 after nos's comment
There is a case where the semantics of the two constructs varies widly: the collection modification.
While using a simple for loop, nothing particular will happen if you change the collection while iterating through it. The program will behave as if it assumed you know what you're doing. This could result in some values iterated over more than once or other skipped, but no exception as long as you're not accessing outside of the range of the indexor (which would require a multithreaded program ot happen).
While using a foreachloop; if you modify the collection while iterating through it, you enter undefined behavior. The documentation tells us
An enumerator remains valid as long as the collection remains
unchanged. If changes are made to the collection, such as adding,
modifying, or deleting elements, the enumerator is irrecoverably
invalidated and its behavior is undefined.
In that case, expect most of C# built-in types to throw an InvalidOperationException, but everything can happen in a custom implementation, from missed values to repeated values , including infinite loops...
Generally speaking, yes, but strictly spoken: no. It really depends on the implementation.
Usually with for you would use the this indexer properties. foreach uses GetEnumerator() to get the enumerator that iterates over the collection. Depending on the implementation the enumerator might yield another result than the for.
The implied logic of a list is that is has a specific order, and when implementing IList you may state is it save to assume that the order of both the indexer properties as the enumerator are the same.
There is no guarantee that this would be the case. The code paths can be completely separate. Of course collections like List will produce the same result but you can write data structures (even useful ones) that do not.
The indexer is just a property with additional index argument. You can return a random value if you feel like it.
One important think you should have in mind, as a difference between the 2 is that inside foreach you can/t make any changes to the enumarared objects.
If you wish to alter (basicaly delete) objects from the enumeration you must use a for loop
I have a method which returns an array of fixed type objects (let's say MyObject).
The method creates a new empty Stack<MyObject>. Then, it does some work and pushes some number of MyObjects to the end of the Stack. Finally, it returns the Stack.ToArray().
It does not change already added items or their properties, nor remove them. The number of elements to add will cost performance. There is no need to sort/order the elements.
Is Stack a best thing to use? Or must I switch to Collection or List to ensure better performance and/or lower memory cost?
Stack<T> will not be any faster than List<T>.
For optimal performance, you should use a List<T> and set the Capacity to a number larger than or equal to the number of items you plan to add.
If the ordering doesn't matter and your method doesn't need to add/remove/edit items that have already been processed then why not return IEnumerable<MyObject> and just yield each item as you go?
Then your calling code can either use the IEnumerable<MyObject> sequence directly, or call ToArray, ToList etc as required.
For example...
// use the sequence directly
foreach (MyObject item in GetObjects())
{
Console.WriteLine(item.ToString());
}
// ...
// convert to an array
MyObject[] myArray = GetObjects().ToArray();
// ...
// convert to a list
List<MyObject> myList = GetObjects().ToList();
// ...
public IEnumerable<MyObject> GetObjects()
{
foreach (MyObject foo in GetObjectsFromSomewhereElse())
{
MyObject bar = DoSomeProcessing(foo);
yield return bar;
}
}
Stack<T> is not any faster than List<T> in this case, so I would probably use List, unless something about what you are doing is "stack-like". List<T> is the more standard data structure to use when what you want is basically a growable array, whereas stacks are usually used when you need LIFO behavior for the collection.
For this purpose, there is not any other collections in the framework that will perform considerably better than a Stack<T>.
However, both Stack<T> and List<T> auto-grows their internal array of items when the initial capacity is exceeded. This involves creating a new larger array and copying all items. This costs some performance.
If you know the number of items beforehand, initialize your collection to that capacity to avoid auto-growth. If you don't know exactly, choose a capacity that is unlikely to be insufficient.
Most of the built in collections take the initial capacity as a constructor argument:
var stack = new Stack<T>(200); // Initial capacity of 200 items.
Use a LinkedList maybe?
Though LinkedLists are only useful with sequential data.
You don't need Stack<> if all you're going to do is append. You can use List<>.Add (http://msdn.microsoft.com/en-us/library/d9hw1as6.aspx) and then ToArray.
(You'll also want to set initial capacity, as others have pointed out.)
If you need the semantics of a stack (last-in first-out), then the answer is, without any doubt, yes, a stack is your best solution. If you know from the start how many elements it will end up with, you can avoid the cost of automatic resizing by calling the constructor that receives a capacity.
If you're worried about the memory cost of copying the stack into an array, and you only need sequential access to the result, then, you can return the Stack<T> as an IEnumerable<T> instead of an array and iterate it with foreach.
All that said, unless this code proves it is problematic in terms of performance (i.e., by looking at data from a profiler), I wouldn't bother much and go with the semantics call.
I have been told that there is a performance difference between the following code blocks.
foreach (Entity e in entityList)
{
....
}
and
for (int i=0; i<entityList.Count; i++)
{
Entity e = (Entity)entityList[i];
...
}
where
List<Entity> entityList;
I am no CLR expect but from what I can tell they should boil down to basically the same code. Does anybody have concrete (heck, I'd take packed dirt) evidence one way or the other?
foreach creates an instance of an enumerator (returned from GetEnumerator) and that enumerator also keeps state throughout the course of the foreach loop. It then repeatedly calls for the Next() object on the enumerator and runs your code for each object it returns.
They don't boil down to the same code in any way, really, which you'd see if you wrote your own enumerator.
Here is a good article that shows the IL differences between the two loops.
Foreach is technically slower however much easier to use and easier to read. Unless performance is critical I prefer the foreach loop over the for loop.
The foreach sample roughly corresponds to this code:
using(IEnumerator<Entity> e = entityList.GetEnumerator()) {
while(e.MoveNext()) {
Entity entity = e.Current;
...
}
}
There are two costs here that a regular for loop does not have to pay:
The cost of allocating the enumerator object by entityList.GetEnumerator().
The cost of two virtual methods calls (MoveNext and Current) for each element of the list.
One point missed here:
A List has a Count property, it internally keeps track of how many elements are in it.
An IEnumerable DOES NOT.
If you program to the interface IEnumerable and use the count extention method it will enumerate just to count the elements.
A moot point though since in the IEnumerable you cannot refer to items by index.
So if you want to lock in to Lists and Arrays you can get small performance increases.
If you want flexability use foreach and program to IEnumerable. (allowing the use of linq and/or yield return).
In terms of allocations, it'd be better to look at this blogpost. It shows in exactly in what circumstances an enumerator is allocated on the heap.
I think one possible situation where you might get a performance gain is if the enumerable type's size and the loop condition is a constant; for example:
const int ArraySize = 10;
int[] values = new int[ArraySize];
//...
for (int i = 0; i
In this case, depending on the complexity of the loop body, the compiler might be able to replace the loop with inline calls. I have no idea if the .NET compiler does this, and it's of limited utility if the size of the enumerable type is dynamic.
One situation where foreach might perform better is with data structures like a linked list where random access means traversing the list; the enumerator used by foreach will probably iterate one item at a time, making each access O(1) and the full loop O(n), but calling the indexer means starting at the head and finding the item at the right index; O(N) each loop for O(n^2).
Personally I don't usually worry about it and use foreach any time I need all items and don't care about the index of the item. If I'm not working with all of the items or I really need to know the index, I use for. The only time I could see it being a big concern is with structures like linked lists.
For Loop
for loop is used to perform the opreration n times
for(int i=0;i<n;i++)
{
l=i;
}
foreach loop
int[] i={1,2,3,4,5,6}
foreach loop is used to perform each operation value/object in IEnumarable
foreach(var k in i)
{
l=k;
}
I need to enumerate though generic IList<> of objects. The contents of the list may change, as in being added or removed by other threads, and this will kill my enumeration with a "Collection was modified; enumeration operation may not execute."
What is a good way of doing threadsafe foreach on a IList<>? prefferably without cloning the entire list. It is not possible to clone the actual objects referenced by the list.
Cloning the list is the easiest and best way, because it ensures your list won't change out from under you. If the list is simply too large to clone, consider putting a lock around it that must be taken before reading/writing to it.
There is no such operation. The best you can do is
lock(collection){
foreach (object o in collection){
...
}
}
Your problem is that an enumeration does not allow the IList to change. This means you have to avoid this while going through the list.
A few possibilities come to mind:
Clone the list. Now each enumerator has its own copy to work on.
Serialize the access to the list. Use a lock to make sure no other thread can modify it while it is being enumerated.
Alternatively, you could write your own implementation of IList and IEnumerator that allows the kind of parallel access you need. However, I'm afraid this won't be simple.
ICollection MyCollection;
// Instantiate and populate the collection
lock(MyCollection.SyncRoot) {
// Some operation on the collection, which is now thread safe.
}
From MSDN
You'll find that's a very interesting topic.
The best approach relies on the ReadWriteResourceLock which use to have big performance issues due to the so called Convoy Problem.
The best article I've found treating the subject is this one by Jeffrey Richter which exposes its own method for a high performance solution.
So the requirements are: you need to enumerate through an IList<> without making a copy while simultaniously adding and removing elements.
Could you clarify a few things? Are insertions and deletions happening only at the beginning or end of the list?
If modifications can occur at any point in the list, how should the enumeration behave when elements are removed or added near or on the location of the enumeration's current element?
This is certainly doable by creating a custom IEnumerable object with perhaps an integer index, but only if you can control all access to your IList<> object (for locking and maintaining the state of your enumeration). But multithreaded programming is a tricky business under the best of circumstances, and this is a complex probablem.
Forech depends on the fact that the collection will not change. If you want to iterate over a collection that can change, use the normal for construct and be prepared to nondeterministic behavior. Locking might be a better idea, depending on what you're doing.
Default behavior for a simple indexed data structure like a linked list, b-tree, or hash table is to enumerate in order from the first to the last. It would not cause a problem to insert an element in the data structure after the iterator had already past that point or to insert one that the iterator would enumerate once it had arrived, and such an event could be detected by the application and handled if the application required it. To detect a change in the collection and throw an error during enumeration I could only imagine was someone's (bad) idea of doing what they thought the programmer would want. Indeed, Microsoft has fixed their collections to work correctly. They have called their shiny new unbroken collections ConcurrentCollections (System.Collections.Concurrent) in .NET 4.0.
I recently spend some time multip-threading a large application and had a lot of issues with the foreach operating on list of objects shared across threads.
In many cases you can use the good old for-loop and immediately assign the object to a copy to use inside the loop. Just keep in mind that all threads writing to the objects of your list should write to different data of the objects. Otherwise, use a lock or a copy as the other contributors suggest.
Example:
foreach(var p in Points)
{
// work with p...
}
Can be replaced by:
for(int i = 0; i < Points.Count; i ++)
{
Point p = Points[i];
// work with p...
}
Wrap the list in a locking object for reading and writing. You can even iterate with multiple readers at once if you have a suitable lock, that allows multiple concurrent readers but also a single writer (when there are no readers).
This is something that I've recently had to deal with and to me it really depends on what you're doing with the list.
If you need to use the list at a point in time (given the number of elements currently in it) AND another thread can only ADD to the end of the list, then maybe you just switch out to a FOR loop with a counter. At the point you grab the counter, you're only seeing X numbers of elements in the list. You can walk through the list (while others are adding to the end of it) . . . should not cause a problem.
Now, if the list needs to have items taken OUT of it by other threads, or CLEARED by other threads, then you'll need to implement one of the locking mechanisms mentioned above. Also, you may want to look at some of the newer "concurrent" collection classes (though I don't believe they implement IList - so you may need refactor for a dictionary).