I see the following code:
using(var iterator = source.GetEnumerator()) {...}
Where source is a IEnumerable<T>.
What is the advantage of doing the above versus converting source into a List<T> and then iterating over it?
Converting it to a list will iterate the enumerable once and copy all the references (or even values for value types) into a new List<>. Then, you would iterate over the list. That means you would iterate twice.
Using the IEnumerable<> as a source for enumeration iterates over the sequence only once.
Why someone decided to do the iteration manually using the enumerator instead of leaving the details to a foreach is unclear from the small scope you posted.
Converting to a List<T> would require additional memory and CPU cycles to perform the conversion not to mention you'd be iterating over the data twice.
There's no need to convert to a List<T> before iterating. foreach can iterate over anything that implements IEnumerable<T>.
Related
Regarding the collections implementing this[int] and assuming the collection won't change during the enumeration, does the foreach (var item in list) loop produce the same sequence as for (var i = 0; i < list.Count; ++i) anytime?
This means, when I need the ascending order by index, could I use foreach or is just simply safer to use for? Or it just depends on the curren collection implementation and migh vary or change in time?
foreach (var item in list)
{
// do things
}
translates to
var enumerator = list.GetEnumerator();
while(enumerator.MoveNext())
{
var item = enumerator.Current;
// do things
}
So as you can see, it's not using the indexor list[i] in the general case.
For most collections types, however, the semantics is the same.
edit
There are IList<T> implementations where the enumerator IList<T> as a linked list, it's very unlikely you will use the indexor in your enumerator implementation, as it would be very inefficient.
As a rule of thumb, using foreach ensure you use the most efficient algorithm for the class at hand, as it is the one chosen by the class' Creator. In the worst case, you will just suffer a small indirection overhead that is very unlikely to be noticeable.
edit 2 after nos's comment
There is a case where the semantics of the two constructs varies widly: the collection modification.
While using a simple for loop, nothing particular will happen if you change the collection while iterating through it. The program will behave as if it assumed you know what you're doing. This could result in some values iterated over more than once or other skipped, but no exception as long as you're not accessing outside of the range of the indexor (which would require a multithreaded program ot happen).
While using a foreachloop; if you modify the collection while iterating through it, you enter undefined behavior. The documentation tells us
An enumerator remains valid as long as the collection remains
unchanged. If changes are made to the collection, such as adding,
modifying, or deleting elements, the enumerator is irrecoverably
invalidated and its behavior is undefined.
In that case, expect most of C# built-in types to throw an InvalidOperationException, but everything can happen in a custom implementation, from missed values to repeated values , including infinite loops...
Generally speaking, yes, but strictly spoken: no. It really depends on the implementation.
Usually with for you would use the this indexer properties. foreach uses GetEnumerator() to get the enumerator that iterates over the collection. Depending on the implementation the enumerator might yield another result than the for.
The implied logic of a list is that is has a specific order, and when implementing IList you may state is it save to assume that the order of both the indexer properties as the enumerator are the same.
There is no guarantee that this would be the case. The code paths can be completely separate. Of course collections like List will produce the same result but you can write data structures (even useful ones) that do not.
The indexer is just a property with additional index argument. You can return a random value if you feel like it.
One important think you should have in mind, as a difference between the 2 is that inside foreach you can/t make any changes to the enumarared objects.
If you wish to alter (basicaly delete) objects from the enumeration you must use a for loop
Today, I faced a problem with performance while iterating through a list of items. After done some diagnostic, I finally figured out the reason which slowed down performance. It turned out that iterating through an IEnumerable<T> took much more time than iterating through a List<T>. Please help me understand why IEnumerable<T> is slower than List<T>.
UPDATE benchmark context:
I'm using NHibernate to fetch a collection of items from a database into an IEnumerable<T> and sum its property's value. This is just a simple entity without any reference type:
public SimpleEntity
{
public int Id {get;set}
public string Name {get;set}
public decimal Price {get;set}
}
Public Test
{
void Main()
{
//this query get a list of about 200 items
IEnumerable<SimpleEntity> entities = from entity in Session.Query<SimpleEntity>
select entity;
decimal value = 0.0;
foreach(SimpleEntity item in entities)
{
//this for loop took 1.5 seconds
value += item.Price;
}
List<SimpleEntity> lstEntities = entities.ToList();
foreach(SimpleEntity item in lstEntities)
{
//this for loop took less than a milisecond
value += item.Price;
}
}
}
Enumerating an IEnumerable<T> is 2 to 3 times slower than enumerating the same List<T> directly. This is due to a subtlety on how C# selects its enumerator for a given type.
List<T> exposes 3 enumerators:
List<T>.Enumerator List<T>.GetEnumerator()
IEnumerator<T> IEnumerable<T>.GetEnumerator()
IEnumerator IEnumerable.GetEnumerator()
When C# compiles a foreach loop, it will select the enumerator in the above order. Note that a type doesn't need to implement IEnumerable or IEnumerable<T> to be enumerable, it just needs a method named GetEnumerator() that returns an enumerator.
Now, List<T>.GetEnumerator() has the advantage of being statically typed which makes all calls to List<T>.Enumerator.get_Current and List<T>.Enumerator.MoveNext() static-bound instead of virtual.
10M iterations (coreclr):
for(int i ...) 73 ms
foreach(... List<T>) 215 ms
foreach(... IEnumerable<T>) 698 ms
foreach(... IEnumerable) 1028 ms
for(int *p ...) 50 ms
10M iterations (Framework):
for(int i ...) 210 ms
foreach(... List<T>) 252 ms
foreach(... IEnumerable<T>) 537 ms
foreach(... IEnumerable) 844 ms
for(int *p ...) 202 ms
Disclaimer
I should point out the actual iteration in a list is rarely the bottleneck. Keep in mind those are hundreds of milliseconds over millions of iterations. Any work in the loop more complicated than a few arithmetic operations will be overwhelmingly costlier than the iteration itself.
List<T> is an IEnumerable<T>. When you are iterating through your List<T>, you are performing the same sequence of operations as you are for any other IEnumerable<T>:
Get an IEnumerator<T>.
Invoke IEnumerator<T>.MoveNext() on your enumerator.
Take the IEnumerator<T>.Current element from the IEnumerator interface while MoveNext() returns true.
Dispose of the IEnumerator<T>.
What we know about List<T> is that it is an in-memory collection, so the MoveNext() function on its enumerator is going to be very cheap. It looks like your collection gives an enumerator whose MoveNext() method is more expensive, perhaps because it is interacting with some external resource such as a database connection.
When you call ToList() on your IEnumerable<T>, you are running a full iteration of your collection and loading all of the elements into memory with that iteration. This is worth doing if you expect to be iterating through the same collection multiple times. If you expect to iterate through the collection only once, then ToList() is a false economy: all it does is to create an in-memory collection that will later have to be garbage collected.
List<T> is an implementation of IEnumerable<T> interface. To use the foreach syntax, you don't need a List<T> type or a IEnumerable<T> type, but you are required to use a type with a GetEnumerator() method. Quote from Microsoft docs:
The foreach statement isn't limited to those types. You can use it with an >instance of any type that satisfies the following conditions:
A type has the public parameterless GetEnumerator method whose return type is either class, struct, or interface type. Beginning with
C# 9.0, the GetEnumerator method can be a type's extension method.
The return type of the GetEnumerator method has the public Current property and the public parameterless MoveNext method whose return
type is Boolean.
Considering for example a LINQ context, performing a query, using an IEnumerable structure you have the advantange of a deferred execution of the query (the query will be executed only when needed), but, using the ToList() method, you're requesting that the query must be executed (or evaluated) immediately and you want your results in memory, saving them in a list, to perform later some operations on them, like changing some values.
About the performance, it depends on what you're trying to do. We don't know which operations you're performing (like fetching data from a database), which collection types you're using and so on.
UPDATE
The reason why you have a different timing between the IEnumerable collection iteration and the List collection iteration, is, like I said, that you have a deferred execution of the query when you're invoking:
IEnumerable<SimpleEntity> entities = from entity in Session.Query<SimpleEntity>
select entity;
That means the query is executed only when you're iterating over the IEnumerable collection. This doesn't happen when you're calling the ToList() method in entities.ToList(); for the reasons I described above.
I believe it has nothing to do with IEnumerable. It's because on the first loop, when you are iterating over the IEnumerable, you are actually executing the query.
Which is completely different from the second case, when you would be executing the query here:
List<SimpleEntity> lstEntities = entities.ToList();
Making the iteration much faster because you are not actually querying the BD and transforming the result to a list while you are in the loop.
If you instead do this:
foreach(SimpleEntity item in entities.ToList())
{
//this for loop took less than a milisecond
value += item.Price;
}
Perhaps you would get a similar performance.
You are using linq.
IEnumerable<SimpleEntity> entities = from entity in Session.Query<SimpleEntity>
select entity;
Justs declare the query. It will be executed when foreach gets the enumerator. The 1.5 seconds include the excution of Session.Query<>.
If you measure the line
List<SimpleEntity> lstEntities = entities.ToList();
You should get the 1.5 seconds or at least more than 1 second.
Are you sure your measures are being taken correctly? You should mesaure the second loop including entites.ToList().
Cheers!
The code Im maintaining has a common pattern like the following, a nested loop with an if to find certain elements.
foreach (Storage storage in mStorage.Values)
foreach (OrderStorage oStorage in storage.OrderStorage)
if (oStorage.OrderStorageId == orderStorageId)
I was thinking to change this into LINQ:
foreach (OrderStorage oStorage in (from storage in mStorage.Values
from oStorage in storage.OrderStorage
where oStorage.OrderStorageId == orderStorageId
select oStorage))
But it doesnt seem all that appealing because it's less transparent whats going on here, more objects may be created costing performance in terms of both memory and cpu. Will there actually be more objects created or does the C# compiler emit code resembling the nested loop with an if inside?
Will there actually be more objects created or does the C# compiler emit code resembling the nested loop with an if inside?
More objects; each LINQ operation (SelectMany, Where, Select etc) will result in a new place-holder object that represents the pending IEnumerable<T> query of that operation, and then when it is finally iterated, each of those will result in an enumerator instance, along with the context etc. Plus there is a captured-variable context for the hoisted orderStorageId, etc.
Note that the regular foreach will also result in an enumerator instance, but foreach has the advantage that it can also use duck-typed enumerators - which means that for things like List<T> it is actually using a struct enumerator, not a class enumerator. And of course using the local variable (orderStorageId) directly (rather than in an anonymous method) means that it doesn't need to be hoisted into a state/context object.
So yes, the raw foreach is more direct and efficient. The interesting question is: is the difference important. Sometimes it is, sometimes it isn't.
I came across a method to change a list in a foreach loop by converting to a list in itself like this:
foreach (var item in myList.ToList())
{
//add or remove items from myList
}
(If you attempt to modify myList directly an error is thrown since the enumerator basically locks it)
This works because it's not the original myList that's being modified. My question is, does this method create garbage when the loop is over (namely from the List that's returned from the ToList method? For small loops, would it be preferable to using a for loop to avoid the creation of garbage?
The second list is going to be garbage, there will be garbage for an enumerator that is used in building the second list, and add in the enumerator that the foreach would spawn, which you would have had with or without the second list.
Should you switch to a for? Maybe, if you can point to this region of code being a true performance bottleneck. Otherwise, code for simplicity and maintainability.
Yes. ToList() would create another list that would need to be garbage collected.
That's an interesting technique which I will keep in mind for the future! (I can't believe I've never thought of that!)
Anyway, yes, the list that you are building doesn't magically unallocate itself. The possible performance problems with this technique are:
Increased memory usage (building a List, separate from the IEnumerable). Probably not that big of a deal, unless you do this very frequently, or the IEnumerable is very large.
Decreased speed, since it has to go through the IEnumerable at once to build the List.
Also, if enumerating the IEnumerable has side effects, they will all be triggered by this process.
Unless this is actually inside an inner loop, or you're working with very large data sets, you can probably do this without any problems.
Yes, the ToList() method creates "garbage". I would just indexing.
for (int i = MyList.Count - 1; 0 <= i; --i)
{
var item = MyList[i];
//add or remove items from myList
}
It's non-deterministic. But the reference created from the call ToList() will be GCd eventually.
I wouldn't worry about it too much, since all it would be holding at most would be references or small value types.
Whats the best/easiest way to obtain a count of items within an IEnumerable collection without enumerating over all of the items in the collection?
Possible with LINQ or Lambda?
In any case, you have to loop through it. Linq offers the Count method:
var result = myenum.Count();
The solution depends on why you don't want to enumerate through the collection.
If it's because enumerating the collection might be slow, then there is no solution that will be faster. You might want to consider using an ICollection instead if possible. Unless the enumeration is remarkably slow (e.g. it reads items from disk) speed shouldn't be a problem though.
If it's because enumerating the collection will require more code then it's already been written for you in the form of the .Count() extension method. Just use MyEnumerable.Count().
If it's because you want to be able to enumerate the collection after you've counted then the .Count() extension method allows for this. You can even call .Count() on a collection you're in the middle of enumerating and it will carry on from where it was before the count. For example:
foreach (int item in Series.Generate(5))
{
Console.WriteLine(item + "(" + myEnumerable.Count() + ")");
}
will give the results
0 (5)
1 (5)
2 (5)
3 (5)
4 (5)
If it's because the enumeration has side effects (e.g. writes to disk/console) or is dependant on variables that may change between counting and enumerating (e.g. reads from disk) [N.B. If possible, I would suggest rethinking the architecture as this can cause a lot of problems] then one possibility to consider is reading the enumeration into an intermittent storage. For example:
List<int> seriesAsList = Series.Generate(5).ToList();
All of the above assume you can't change the type (i.e. it is returned from a library that you do not own). If possible you might want to consider changing to use an ICollection or IList (ICollection being more widely scoped than IList) which has a Count property on it.
You will have to enumerate to get a count. Other constructs like the List keep a running count.
Use this.
IEnumerable list =..........;
list.OfType<T>().Count()
it will return the count.
There's also IList or ICollection, if you want to use a construct that is still somewhat flexible, but also has the feature you require. They both imply IEnumerable.
It also depends on what you want to achieve by counting.. If you are interested to find if the enumerable collection has any elements, you could use
myEnumerable.Any() over myEnumerable.Count() where the former will yield the first element and the later will yield all the elements.
An IEnumerable will have to iterate through every item. to get the full count.
If you just need to check if there is one or more items in an IEnumerable a more efficient method is to check if there are any. Any() only check to see there is a value and does not loop through everything.
IEnumerable myStrings = new List(){"one","two", "three"};
bool hasValues = myStrings.Any();
Not possible with LINQ, as calling .Count(...) does enumerate the collection. If you're running into the problem where you can't iterate through a collection twice, try this:
List<MyTableItem> myList = dataContext.MyTable.ToList();
int myTableCount = myList.Count;
foreach (MyTableItem in myList)
{
...
}
If you need to count and then loop you may be better off with a list.
If you're using count to check for members you can use Any() to avoid enumerating the entire collection.
The best solution -as I think is to do the following:
using System.Linq.Dynamic;
myEnumerable.AsQueryable().Count()
When I want to use the Count property, I use ILIST which implements IEnumerable and ICollection interfaces. The ILIST data structure is an Array. I stepped through using the VS Debugger and found that the .Count property below returns the Array.Length property.
IList<string> FileServerVideos = Directory.GetFiles(VIDEOSERVERPATH, "*.mp4");
if (FileServerVideos.Count == 0)
return;