Performance between Iterating through IEnumerable<T> and List<T> - c#

Today, I faced a problem with performance while iterating through a list of items. After done some diagnostic, I finally figured out the reason which slowed down performance. It turned out that iterating through an IEnumerable<T> took much more time than iterating through a List<T>. Please help me understand why IEnumerable<T> is slower than List<T>.
UPDATE benchmark context:
I'm using NHibernate to fetch a collection of items from a database into an IEnumerable<T> and sum its property's value. This is just a simple entity without any reference type:
public SimpleEntity
{
public int Id {get;set}
public string Name {get;set}
public decimal Price {get;set}
}
Public Test
{
void Main()
{
//this query get a list of about 200 items
IEnumerable<SimpleEntity> entities = from entity in Session.Query<SimpleEntity>
select entity;
decimal value = 0.0;
foreach(SimpleEntity item in entities)
{
//this for loop took 1.5 seconds
value += item.Price;
}
List<SimpleEntity> lstEntities = entities.ToList();
foreach(SimpleEntity item in lstEntities)
{
//this for loop took less than a milisecond
value += item.Price;
}
}
}

Enumerating an IEnumerable<T> is 2 to 3 times slower than enumerating the same List<T> directly. This is due to a subtlety on how C# selects its enumerator for a given type.
List<T> exposes 3 enumerators:
List<T>.Enumerator List<T>.GetEnumerator()
IEnumerator<T> IEnumerable<T>.GetEnumerator()
IEnumerator IEnumerable.GetEnumerator()
When C# compiles a foreach loop, it will select the enumerator in the above order. Note that a type doesn't need to implement IEnumerable or IEnumerable<T> to be enumerable, it just needs a method named GetEnumerator() that returns an enumerator.
Now, List<T>.GetEnumerator() has the advantage of being statically typed which makes all calls to List<T>.Enumerator.get_Current and List<T>.Enumerator.MoveNext() static-bound instead of virtual.
10M iterations (coreclr):
for(int i ...) 73 ms
foreach(... List<T>) 215 ms
foreach(... IEnumerable<T>) 698 ms
foreach(... IEnumerable) 1028 ms
for(int *p ...) 50 ms
10M iterations (Framework):
for(int i ...) 210 ms
foreach(... List<T>) 252 ms
foreach(... IEnumerable<T>) 537 ms
foreach(... IEnumerable) 844 ms
for(int *p ...) 202 ms
Disclaimer
I should point out the actual iteration in a list is rarely the bottleneck. Keep in mind those are hundreds of milliseconds over millions of iterations. Any work in the loop more complicated than a few arithmetic operations will be overwhelmingly costlier than the iteration itself.

List<T> is an IEnumerable<T>. When you are iterating through your List<T>, you are performing the same sequence of operations as you are for any other IEnumerable<T>:
Get an IEnumerator<T>.
Invoke IEnumerator<T>.MoveNext() on your enumerator.
Take the IEnumerator<T>.Current element from the IEnumerator interface while MoveNext() returns true.
Dispose of the IEnumerator<T>.
What we know about List<T> is that it is an in-memory collection, so the MoveNext() function on its enumerator is going to be very cheap. It looks like your collection gives an enumerator whose MoveNext() method is more expensive, perhaps because it is interacting with some external resource such as a database connection.
When you call ToList() on your IEnumerable<T>, you are running a full iteration of your collection and loading all of the elements into memory with that iteration. This is worth doing if you expect to be iterating through the same collection multiple times. If you expect to iterate through the collection only once, then ToList() is a false economy: all it does is to create an in-memory collection that will later have to be garbage collected.

List<T> is an implementation of IEnumerable<T> interface. To use the foreach syntax, you don't need a List<T> type or a IEnumerable<T> type, but you are required to use a type with a GetEnumerator() method. Quote from Microsoft docs:
The foreach statement isn't limited to those types. You can use it with an >instance of any type that satisfies the following conditions:
A type has the public parameterless GetEnumerator method whose return type is either class, struct, or interface type. Beginning with
C# 9.0, the GetEnumerator method can be a type's extension method.
The return type of the GetEnumerator method has the public Current property and the public parameterless MoveNext method whose return
type is Boolean.
Considering for example a LINQ context, performing a query, using an IEnumerable structure you have the advantange of a deferred execution of the query (the query will be executed only when needed), but, using the ToList() method, you're requesting that the query must be executed (or evaluated) immediately and you want your results in memory, saving them in a list, to perform later some operations on them, like changing some values.
About the performance, it depends on what you're trying to do. We don't know which operations you're performing (like fetching data from a database), which collection types you're using and so on.
UPDATE
The reason why you have a different timing between the IEnumerable collection iteration and the List collection iteration, is, like I said, that you have a deferred execution of the query when you're invoking:
IEnumerable<SimpleEntity> entities = from entity in Session.Query<SimpleEntity>
select entity;
That means the query is executed only when you're iterating over the IEnumerable collection. This doesn't happen when you're calling the ToList() method in entities.ToList(); for the reasons I described above.

I believe it has nothing to do with IEnumerable. It's because on the first loop, when you are iterating over the IEnumerable, you are actually executing the query.
Which is completely different from the second case, when you would be executing the query here:
List<SimpleEntity> lstEntities = entities.ToList();
Making the iteration much faster because you are not actually querying the BD and transforming the result to a list while you are in the loop.
If you instead do this:
foreach(SimpleEntity item in entities.ToList())
{
//this for loop took less than a milisecond
value += item.Price;
}
Perhaps you would get a similar performance.

You are using linq.
IEnumerable<SimpleEntity> entities = from entity in Session.Query<SimpleEntity>
select entity;
Justs declare the query. It will be executed when foreach gets the enumerator. The 1.5 seconds include the excution of Session.Query<>.
If you measure the line
List<SimpleEntity> lstEntities = entities.ToList();
You should get the 1.5 seconds or at least more than 1 second.
Are you sure your measures are being taken correctly? You should mesaure the second loop including entites.ToList().
Cheers!

Related

Any() Time Complexity [duplicate]

This question already has answers here:
Which method performs better: .Any() vs .Count() > 0?
(11 answers)
Closed 1 year ago.
I believe the answer to this question is well explained here:LINQ Ring: Any() vs Contains() for Huge Collections
But my question is specific for the current implementation
IEnumerable<T> msgs = null;
/// ...
/// some method calls which returns a long list of messages
/// The return type of the method is IEnumerable<T>
/// List<T> ret = new List<T>();
/// ...
/// return ret
/// ...
if (msgs.Any())
object= msgs.Last();
The msgs is an in memory collection (IEnumerable) said. How does Any() work here? There's no condition for this Any() method call, isn't it just O(1) instead? Or it still looks through each element?
I assume that IEnumerable<BaseJournalMessage> msgs is not a collection like an array or list, otherwise the Any and Last would be no problem(but you have performance issues). So it seems to be an expensive LINQ query which gets executed twice, once at Any and again at Last.
Any needs to enumerate the sequence to see if there is at least one. Last needs to enumerate it fully to get the last one. You can make it more efficient in this way:
BaseJournalMessage last = msgs.LastOrDefault();
if (last != null)
time = last.JournalTime;
To explain a bit more. Consider msg was an array:
IEnumerable<BaseJournalMessage> msgs = new BaseJournalMessage[0];
Here Any is simple and efficient since it just needs to check if the enumerator from the array has one element, same with other collections. The complexity is O(1).
Now consider that it's a complex query, like it seems to be in your case. Here the complexity of a following Any is clearly not O(1).:
IEnumerable<BaseJournalMessage> msgs = hugeMessageList
.Where(msg => ComplexMethod(msg) && OtherComplexCondition(msg))
.OrderBy(msg => msg.SomeProperty);
This is not a collection since you don't append ToList/ToArray/ToHashSet. Instead it's a deferred executed LINQ query. You will execute it every time it will be enumerated. That could be a foreach-loop, an Any or Last call or any other method that enumerates it. Sometimes it's useful to always get the currrent state, but normally you should materialize the query to a collection if you have to access it multiple times. So append ToList and everything's fine.
Have a look at the term "deferred execution" in each LINQ method(as for example Where, Select or OrderBy) if you want to know whether it's executing a query or not. You can chain as many deferred executed methods as you want without actually executing the query. But if a method contain "forces immediate query evaluation"(like for example ToList) the query gets executed(so avoid those methods in a middle of a query).
How does Any() work here? There's
no condition for this Any() method call, isn't it just O(1) instead?
Or it still looks through each element?
As for LinQ-To-Object, implemented in System.Linq.Enumerable static class, the implementation of Any() just gets the IEnumerator and invokes MoveNext(). If the result is true, Any() returns true itself. Otherwise it returns false. It never iterates any further.
So it is a pure O(1) algorithm.
EDIT: I have to correct myself: The time depends on the enumerable "Any" iterates. I had a misconception of the Big O notation and the meaning of "O(1)" and "O(n)".
This is the source code (source available at GitHub these days):
public static bool Any<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
using (IEnumerator<TSource> e = source.GetEnumerator()) {
if (e.MoveNext()) return true;
}
return false;
}

Should I use IEnumerable or List for parameters from WebApi? [duplicate]

I have some doubts over how Enumerators work, and LINQ. Consider these two simple selects:
List<Animal> sel = (from animal in Animals
join race in Species
on animal.SpeciesKey equals race.SpeciesKey
select animal).Distinct().ToList();
or
IEnumerable<Animal> sel = (from animal in Animals
join race in Species
on animal.SpeciesKey equals race.SpeciesKey
select animal).Distinct();
I changed the names of my original objects so that this looks like a more generic example. The query itself is not that important. What I want to ask is this:
foreach (Animal animal in sel) { /*do stuff*/ }
I noticed that if I use IEnumerable, when I debug and inspect "sel", which in that case is the IEnumerable, it has some interesting members: "inner", "outer", "innerKeySelector" and "outerKeySelector", these last 2 appear to be delegates. The "inner" member does not have "Animal" instances in it, but rather "Species" instances, which was very strange for me. The "outer" member does contain "Animal" instances. I presume that the two delegates determine which goes in and what goes out of it?
I noticed that if I use "Distinct", the "inner" contains 6 items (this is incorrect as only 2 are Distinct), but the "outer" does contain the correct values. Again, probably the delegated methods determine this but this is a bit more than I know about IEnumerable.
Most importantly, which of the two options is the best performance-wise?
The evil List conversion via .ToList()?
Or maybe using the enumerator directly?
If you can, please also explain a bit or throw some links that explain this use of IEnumerable.
IEnumerable describes behavior, while List is an implementation of that behavior. When you use IEnumerable, you give the compiler a chance to defer work until later, possibly optimizing along the way. If you use ToList() you force the compiler to reify the results right away.
Whenever I'm "stacking" LINQ expressions, I use IEnumerable, because by only specifying the behavior I give LINQ a chance to defer evaluation and possibly optimize the program. Remember how LINQ doesn't generate the SQL to query the database until you enumerate it? Consider this:
public IEnumerable<Animals> AllSpotted()
{
return from a in Zoo.Animals
where a.coat.HasSpots == true
select a;
}
public IEnumerable<Animals> Feline(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Felidae"
select a;
}
public IEnumerable<Animals> Canine(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Canidae"
select a;
}
Now you have a method that selects an initial sample ("AllSpotted"), plus some filters. So now you can do this:
var Leopards = Feline(AllSpotted());
var Hyenas = Canine(AllSpotted());
So is it faster to use List over IEnumerable? Only if you want to prevent a query from being executed more than once. But is it better overall? Well in the above, Leopards and Hyenas get converted into single SQL queries each, and the database only returns the rows that are relevant. But if we had returned a List from AllSpotted(), then it may run slower because the database could return far more data than is actually needed, and we waste cycles doing the filtering in the client.
In a program, it may be better to defer converting your query to a list until the very end, so if I'm going to enumerate through Leopards and Hyenas more than once, I'd do this:
List<Animals> Leopards = Feline(AllSpotted()).ToList();
List<Animals> Hyenas = Canine(AllSpotted()).ToList();
There is a very good article written by: Claudio Bernasconi's TechBlog here: When to use IEnumerable, ICollection, IList and List
Here some basics points about scenarios and functions:
A class that implement IEnumerable allows you to use the foreach syntax.
Basically it has a method to get the next item in the collection. It doesn't need the whole collection to be in memory and doesn't know how many items are in it, foreach just keeps getting the next item until it runs out.
This can be very useful in certain circumstances, for instance in a massive database table you don't want to copy the entire thing into memory before you start processing the rows.
Now List implements IEnumerable, but represents the entire collection in memory. If you have an IEnumerable and you call .ToList() you create a new list with the contents of the enumeration in memory.
Your linq expression returns an enumeration, and by default the expression executes when you iterate through using the foreach. An IEnumerable linq statement executes when you iterate the foreach, but you can force it to iterate sooner using .ToList().
Here's what I mean:
var things =
from item in BigDatabaseCall()
where ....
select item;
// this will iterate through the entire linq statement:
int count = things.Count();
// this will stop after iterating the first one, but will execute the linq again
bool hasAnyRecs = things.Any();
// this will execute the linq statement *again*
foreach( var thing in things ) ...
// this will copy the results to a list in memory
var list = things.ToList()
// this won't iterate through again, the list knows how many items are in it
int count2 = list.Count();
// this won't execute the linq statement - we have it copied to the list
foreach( var thing in list ) ...
Nobody mentioned one crucial difference, ironically answered on a question closed as a duplicated of this.
IEnumerable is read-only and List is not.
See Practical difference between List and IEnumerable
The most important thing to realize is that, using Linq, the query does not get evaluated immediately. It is only run as part of iterating through the resulting IEnumerable<T> in a foreach - that's what all the weird delegates are doing.
So, the first example evaluates the query immediately by calling ToList and putting the query results in a list.
The second example returns an IEnumerable<T> that contains all the information needed to run the query later on.
In terms of performance, the answer is it depends. If you need the results to be evaluated at once (say, you're mutating the structures you're querying later on, or if you don't want the iteration over the IEnumerable<T> to take a long time) use a list. Else use an IEnumerable<T>. The default should be to use the on-demand evaluation in the second example, as that generally uses less memory, unless there is a specific reason to store the results in a list.
The advantage of IEnumerable is deferred execution (usually with databases). The query will not get executed until you actually loop through the data. It's a query waiting until it's needed (aka lazy loading).
If you call ToList, the query will be executed, or "materialized" as I like to say.
There are pros and cons to both. If you call ToList, you may remove some mystery as to when the query gets executed. If you stick to IEnumerable, you get the advantage that the program doesn't do any work until it's actually required.
I will share one misused concept that I fell into one day:
var names = new List<string> {"mercedes", "mazda", "bmw", "fiat", "ferrari"};
var startingWith_M = names.Where(x => x.StartsWith("m"));
var startingWith_F = names.Where(x => x.StartsWith("f"));
// updating existing list
names[0] = "ford";
// Guess what should be printed before continuing
print( startingWith_M.ToList() );
print( startingWith_F.ToList() );
Expected result
// I was expecting
print( startingWith_M.ToList() ); // mercedes, mazda
print( startingWith_F.ToList() ); // fiat, ferrari
Actual result
// what printed actualy
print( startingWith_M.ToList() ); // mazda
print( startingWith_F.ToList() ); // ford, fiat, ferrari
Explanation
As per other answers, the evaluation of the result was deferred until calling ToList or similar invocation methods for example ToArray.
So I can rewrite the code in this case as:
var names = new List<string> {"mercedes", "mazda", "bmw", "fiat", "ferrari"};
// updating existing list
names[0] = "ford";
// before calling ToList directly
var startingWith_M = names.Where(x => x.StartsWith("m"));
var startingWith_F = names.Where(x => x.StartsWith("f"));
print( startingWith_M.ToList() );
print( startingWith_F.ToList() );
Play arround
https://repl.it/E8Ki/0
If all you want to do is enumerate them, use the IEnumerable.
Beware, though, that changing the original collection being enumerated is a dangerous operation - in this case, you will want to ToList first. This will create a new list element for each element in memory, enumerating the IEnumerable and is thus less performant if you only enumerate once - but safer and sometimes the List methods are handy (for instance in random access).
In addition to all the answers posted above, here is my two cents. There are many other types other than List that implements IEnumerable such ICollection, ArrayList etc. So if we have IEnumerable as parameter of any method, we can pass any collection types to the function. Ie we can have method to operate on abstraction not any specific implementation.
The downside of IEnumerable (a deferred execution) is that until you invoke the .ToList() the list can potentially change. For a really simple example of this - this would work
var persons;
using (MyEntities db = new MyEntities()) {
persons = db.Persons.ToList(); // It's mine now. In the memory
}
// do what you want with the list of persons;
and this would not work
IEnumerable<Person> persons;
using (MyEntities db = new MyEntities()) {
persons = db.Persons; // nothing is brought until you use it;
}
persons = persons.ToList(); // trying to use it...
// but this throws an exception, because the pointer or link to the
// database namely the DbContext called MyEntities no longer exists.
There are many cases (such as an infinite list or a very large list) where IEnumerable cannot be transformed to a List. The most obvious examples are all the prime numbers, all the users of facebook with their details, or all the items on ebay.
The difference is that "List" objects are stored "right here and right now", whereas "IEnumerable" objects work "just one at a time". So if I am going through all the items on ebay, one at a time would be something even a small computer can handle, but ".ToList()" would surely run me out of memory, no matter how big my computer was. No computer can by itself contain and handle such a huge amount of data.
[Edit] - Needless to say - it's not "either this or that". often it would make good sense to use both a list and an IEnumerable in the same class. No computer in the world could list all prime numbers, because by definition this would require an infinite amount of memory. But you could easily think of a class PrimeContainer which contains an
IEnumerable<long> primes, which for obvious reasons also contains a SortedList<long> _primes. all the primes calculated so far. the next prime to be checked would only be run against the existing primes (up to the square root). That way you gain both - primes one at a time (IEnumerable) and a good list of "primes so far", which is a pretty good approximation of the entire (infinite) list.

Efficiency of ToList() [duplicate]

This question already has answers here:
Is there a performance impact when calling ToList()?
(10 answers)
Closed 5 years ago.
A lot of the developers I work with feel more comfortable working with a List as opposed to IEnumerable (for example). I am wondering whether there is any performance impact for ToList() overuse. For example, or, will use ToList() after ordering to get a list back out again i.e.
private void ListThinger(List<T> input)
{
input = input.OrderBy(s => s.Thing).ToList();
foreach(var thing in input)
{
// do things
}
}
My question is:
How efficient is the ToList() method? Will it create a new list and how much memory does that take, assuming the contents are POCOs? Does this change if its a value type rather than a POCO?
Will the size of the list determine efficiency or does size of list not determine cost of ToList()?
If a list is cast to an IEnumerable and then ToList() is called on it, will it just return the original object?
P.s. I understand that a single use of ToList won't break any backs, but we are building a highly concurrent system that is currently CPU bound so I am looking for little wins that, when scaled, will add up to a big improvement
How efficient is the ToList() method? Will it create a new list and how much memory does that take, assuming the contents are POCOs? Does this change if its a value type rather than a POCO?
The ToList() method materializes the given collection by creating a new list and populating it with the items of the given collection. Linq.ToList() implementation:
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
return new List<TSource>(source);
}
By doing so you are not gaining the power of deffered execution if needed
Will the size of the list determine efficiency or does size of list not determine cost of ToList()?
As it calls Lists copy constructor and it creates a new list then it'll work on each of the items. So it'll run in O(n) - meaning that list's size matters. MSDNs documentation about the operation of the copy constructor:
Initializes a new instance of the List class that contains elements copied from the specified collection and has sufficient capacity to accommodate the number of elements copied.
As #Jason mentioned in the comment bellow the Copy Constructor is smart and is efficient but doing it when not needed is still an O(n) operation that doesn't have to happen
If a list is cast to an IEnumerable and then ToList() is called on it, will it just return the original object?
No. It will create a new list as seen above
As for your example code:
input = input.OrderBy(s => s.Thing).ToList();
foreach(var thing in input)
{
// do things
}
As you are getting a materialized list (rather than an IQueriable/IEnumerable that might perform in deffered execution) adding the ToList after the adding gives you no benefit.
You can look here, might also help: When to use LINQ's .ToList() or .ToArray()
Yes is creates a new list. It is hard to accurately measure the memory usage but it is likely to be class size + (system word size * element count). I recommend a memory profiler.
Algorithmic efficiency of operations will of course be impacted by the element count
Yes, you get a brand new list every time. References inside are not duplicated but primitives are.
Try it yourself:
var list = new List<int>();
bool areListsTheSame = list == ((IEnumerable<int>)list).ToList();

Why is LINQ ToList not used here?

I see the following code:
using(var iterator = source.GetEnumerator()) {...}
Where source is a IEnumerable<T>.
What is the advantage of doing the above versus converting source into a List<T> and then iterating over it?
Converting it to a list will iterate the enumerable once and copy all the references (or even values for value types) into a new List<>. Then, you would iterate over the list. That means you would iterate twice.
Using the IEnumerable<> as a source for enumeration iterates over the sequence only once.
Why someone decided to do the iteration manually using the enumerator instead of leaving the details to a foreach is unclear from the small scope you posted.
Converting to a List<T> would require additional memory and CPU cycles to perform the conversion not to mention you'd be iterating over the data twice.
There's no need to convert to a List<T> before iterating. foreach can iterate over anything that implements IEnumerable<T>.

IEnumerable vs List - What to Use? How do they work?

I have some doubts over how Enumerators work, and LINQ. Consider these two simple selects:
List<Animal> sel = (from animal in Animals
join race in Species
on animal.SpeciesKey equals race.SpeciesKey
select animal).Distinct().ToList();
or
IEnumerable<Animal> sel = (from animal in Animals
join race in Species
on animal.SpeciesKey equals race.SpeciesKey
select animal).Distinct();
I changed the names of my original objects so that this looks like a more generic example. The query itself is not that important. What I want to ask is this:
foreach (Animal animal in sel) { /*do stuff*/ }
I noticed that if I use IEnumerable, when I debug and inspect "sel", which in that case is the IEnumerable, it has some interesting members: "inner", "outer", "innerKeySelector" and "outerKeySelector", these last 2 appear to be delegates. The "inner" member does not have "Animal" instances in it, but rather "Species" instances, which was very strange for me. The "outer" member does contain "Animal" instances. I presume that the two delegates determine which goes in and what goes out of it?
I noticed that if I use "Distinct", the "inner" contains 6 items (this is incorrect as only 2 are Distinct), but the "outer" does contain the correct values. Again, probably the delegated methods determine this but this is a bit more than I know about IEnumerable.
Most importantly, which of the two options is the best performance-wise?
The evil List conversion via .ToList()?
Or maybe using the enumerator directly?
If you can, please also explain a bit or throw some links that explain this use of IEnumerable.
IEnumerable describes behavior, while List is an implementation of that behavior. When you use IEnumerable, you give the compiler a chance to defer work until later, possibly optimizing along the way. If you use ToList() you force the compiler to reify the results right away.
Whenever I'm "stacking" LINQ expressions, I use IEnumerable, because by only specifying the behavior I give LINQ a chance to defer evaluation and possibly optimize the program. Remember how LINQ doesn't generate the SQL to query the database until you enumerate it? Consider this:
public IEnumerable<Animals> AllSpotted()
{
return from a in Zoo.Animals
where a.coat.HasSpots == true
select a;
}
public IEnumerable<Animals> Feline(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Felidae"
select a;
}
public IEnumerable<Animals> Canine(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Canidae"
select a;
}
Now you have a method that selects an initial sample ("AllSpotted"), plus some filters. So now you can do this:
var Leopards = Feline(AllSpotted());
var Hyenas = Canine(AllSpotted());
So is it faster to use List over IEnumerable? Only if you want to prevent a query from being executed more than once. But is it better overall? Well in the above, Leopards and Hyenas get converted into single SQL queries each, and the database only returns the rows that are relevant. But if we had returned a List from AllSpotted(), then it may run slower because the database could return far more data than is actually needed, and we waste cycles doing the filtering in the client.
In a program, it may be better to defer converting your query to a list until the very end, so if I'm going to enumerate through Leopards and Hyenas more than once, I'd do this:
List<Animals> Leopards = Feline(AllSpotted()).ToList();
List<Animals> Hyenas = Canine(AllSpotted()).ToList();
There is a very good article written by: Claudio Bernasconi's TechBlog here: When to use IEnumerable, ICollection, IList and List
Here some basics points about scenarios and functions:
A class that implement IEnumerable allows you to use the foreach syntax.
Basically it has a method to get the next item in the collection. It doesn't need the whole collection to be in memory and doesn't know how many items are in it, foreach just keeps getting the next item until it runs out.
This can be very useful in certain circumstances, for instance in a massive database table you don't want to copy the entire thing into memory before you start processing the rows.
Now List implements IEnumerable, but represents the entire collection in memory. If you have an IEnumerable and you call .ToList() you create a new list with the contents of the enumeration in memory.
Your linq expression returns an enumeration, and by default the expression executes when you iterate through using the foreach. An IEnumerable linq statement executes when you iterate the foreach, but you can force it to iterate sooner using .ToList().
Here's what I mean:
var things =
from item in BigDatabaseCall()
where ....
select item;
// this will iterate through the entire linq statement:
int count = things.Count();
// this will stop after iterating the first one, but will execute the linq again
bool hasAnyRecs = things.Any();
// this will execute the linq statement *again*
foreach( var thing in things ) ...
// this will copy the results to a list in memory
var list = things.ToList()
// this won't iterate through again, the list knows how many items are in it
int count2 = list.Count();
// this won't execute the linq statement - we have it copied to the list
foreach( var thing in list ) ...
Nobody mentioned one crucial difference, ironically answered on a question closed as a duplicated of this.
IEnumerable is read-only and List is not.
See Practical difference between List and IEnumerable
The most important thing to realize is that, using Linq, the query does not get evaluated immediately. It is only run as part of iterating through the resulting IEnumerable<T> in a foreach - that's what all the weird delegates are doing.
So, the first example evaluates the query immediately by calling ToList and putting the query results in a list.
The second example returns an IEnumerable<T> that contains all the information needed to run the query later on.
In terms of performance, the answer is it depends. If you need the results to be evaluated at once (say, you're mutating the structures you're querying later on, or if you don't want the iteration over the IEnumerable<T> to take a long time) use a list. Else use an IEnumerable<T>. The default should be to use the on-demand evaluation in the second example, as that generally uses less memory, unless there is a specific reason to store the results in a list.
The advantage of IEnumerable is deferred execution (usually with databases). The query will not get executed until you actually loop through the data. It's a query waiting until it's needed (aka lazy loading).
If you call ToList, the query will be executed, or "materialized" as I like to say.
There are pros and cons to both. If you call ToList, you may remove some mystery as to when the query gets executed. If you stick to IEnumerable, you get the advantage that the program doesn't do any work until it's actually required.
I will share one misused concept that I fell into one day:
var names = new List<string> {"mercedes", "mazda", "bmw", "fiat", "ferrari"};
var startingWith_M = names.Where(x => x.StartsWith("m"));
var startingWith_F = names.Where(x => x.StartsWith("f"));
// updating existing list
names[0] = "ford";
// Guess what should be printed before continuing
print( startingWith_M.ToList() );
print( startingWith_F.ToList() );
Expected result
// I was expecting
print( startingWith_M.ToList() ); // mercedes, mazda
print( startingWith_F.ToList() ); // fiat, ferrari
Actual result
// what printed actualy
print( startingWith_M.ToList() ); // mazda
print( startingWith_F.ToList() ); // ford, fiat, ferrari
Explanation
As per other answers, the evaluation of the result was deferred until calling ToList or similar invocation methods for example ToArray.
So I can rewrite the code in this case as:
var names = new List<string> {"mercedes", "mazda", "bmw", "fiat", "ferrari"};
// updating existing list
names[0] = "ford";
// before calling ToList directly
var startingWith_M = names.Where(x => x.StartsWith("m"));
var startingWith_F = names.Where(x => x.StartsWith("f"));
print( startingWith_M.ToList() );
print( startingWith_F.ToList() );
Play arround
https://repl.it/E8Ki/0
If all you want to do is enumerate them, use the IEnumerable.
Beware, though, that changing the original collection being enumerated is a dangerous operation - in this case, you will want to ToList first. This will create a new list element for each element in memory, enumerating the IEnumerable and is thus less performant if you only enumerate once - but safer and sometimes the List methods are handy (for instance in random access).
In addition to all the answers posted above, here is my two cents. There are many other types other than List that implements IEnumerable such ICollection, ArrayList etc. So if we have IEnumerable as parameter of any method, we can pass any collection types to the function. Ie we can have method to operate on abstraction not any specific implementation.
The downside of IEnumerable (a deferred execution) is that until you invoke the .ToList() the list can potentially change. For a really simple example of this - this would work
var persons;
using (MyEntities db = new MyEntities()) {
persons = db.Persons.ToList(); // It's mine now. In the memory
}
// do what you want with the list of persons;
and this would not work
IEnumerable<Person> persons;
using (MyEntities db = new MyEntities()) {
persons = db.Persons; // nothing is brought until you use it;
}
persons = persons.ToList(); // trying to use it...
// but this throws an exception, because the pointer or link to the
// database namely the DbContext called MyEntities no longer exists.
There are many cases (such as an infinite list or a very large list) where IEnumerable cannot be transformed to a List. The most obvious examples are all the prime numbers, all the users of facebook with their details, or all the items on ebay.
The difference is that "List" objects are stored "right here and right now", whereas "IEnumerable" objects work "just one at a time". So if I am going through all the items on ebay, one at a time would be something even a small computer can handle, but ".ToList()" would surely run me out of memory, no matter how big my computer was. No computer can by itself contain and handle such a huge amount of data.
[Edit] - Needless to say - it's not "either this or that". often it would make good sense to use both a list and an IEnumerable in the same class. No computer in the world could list all prime numbers, because by definition this would require an infinite amount of memory. But you could easily think of a class PrimeContainer which contains an
IEnumerable<long> primes, which for obvious reasons also contains a SortedList<long> _primes. all the primes calculated so far. the next prime to be checked would only be run against the existing primes (up to the square root). That way you gain both - primes one at a time (IEnumerable) and a good list of "primes so far", which is a pretty good approximation of the entire (infinite) list.

Categories

Resources