I have some code that filters through a collection of sorted objects according to a filter value. For instance, I want to find the objects where Name=="searchquery". Then I want to take the top X values from that collection.
My questions:
My collection is a List<T>. Does this collection guarantee the sort order?
If so, is there a built-in way to find the the top X objects that satisfy the condition? I'm looking for something like
collection.FindAll(o=>o.Name=="searchquery",100);
That would give me the top 100 objects that satisfy the condition. The reason is performance, once I've found my 100 objects, I don't want to keep checking the entire collection.
If i write:
collection.FindAll(o=>o.Name=="searchquery").Take(100);
will the runtime be intelligent enough to stop checking once it hits 100?
I can of course implement this myself, but if there is a built-in way (like a LInQ method) I'd prefer to use it.
collection.Where(o=>o.Name=="searchquery").Take(100)
The order should be in the same order as the original list, and it will stop checking once it takes 100 elements (Where returns an enumeration which is only evaluated as you take elements). From the documentation:
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
If you need a different sort order, you will have to specify it (this of course means you have no choice but to examine all elements though).
Ok,
My collection is a List<T>. Does this collection guarantee the sort order?
No, but it will preserve the order of insertion.
If so, is there a built-in way to find the the top X objects that satisfy the condition?
someEnumerable.Where(r => r.Name == "searchquery").Take(100)
If i write:
// Some linq that works
will the runtime be intelligent enough to stop checking once it hits 100?
Yes, probably
Now, if you have a IList that has been sorted and you want to quickly iterate the top 100 items do this.
var list = sourceEnumerable.OrderBy(r => r.Name).ToList();
foreach(var r in list.Where(r => r.Name == "searchquery").Take(100))
{
// Do something
}
collection.Where(o=>o.Name=="searchquery").Take(100)
Is the most correct answer, because behind the scene Where is deferred execution, below is how Where method is implemented:
Where(this IEnumerable<T>, Func<T, bool> func)
{
foreach (var item in collection)
{
if (func(item))
{
yield return item;
}
}
}
So when calling Take(100), the loop just finds first 100 items which satisfy the criteria.
If you know for sure that the objects in your collection are not repeated (e.g.like a primary key), then you can use SortedList instead of List<T>. This will guarantee, that your list will be sorted when you filter it using a certain criteria. Have a look here for sorted list example:
http://msdn.microsoft.com/en-us/library/system.collections.sortedlist(v=vs.100).aspx
Related
I have some doubts over how Enumerators work, and LINQ. Consider these two simple selects:
List<Animal> sel = (from animal in Animals
join race in Species
on animal.SpeciesKey equals race.SpeciesKey
select animal).Distinct().ToList();
or
IEnumerable<Animal> sel = (from animal in Animals
join race in Species
on animal.SpeciesKey equals race.SpeciesKey
select animal).Distinct();
I changed the names of my original objects so that this looks like a more generic example. The query itself is not that important. What I want to ask is this:
foreach (Animal animal in sel) { /*do stuff*/ }
I noticed that if I use IEnumerable, when I debug and inspect "sel", which in that case is the IEnumerable, it has some interesting members: "inner", "outer", "innerKeySelector" and "outerKeySelector", these last 2 appear to be delegates. The "inner" member does not have "Animal" instances in it, but rather "Species" instances, which was very strange for me. The "outer" member does contain "Animal" instances. I presume that the two delegates determine which goes in and what goes out of it?
I noticed that if I use "Distinct", the "inner" contains 6 items (this is incorrect as only 2 are Distinct), but the "outer" does contain the correct values. Again, probably the delegated methods determine this but this is a bit more than I know about IEnumerable.
Most importantly, which of the two options is the best performance-wise?
The evil List conversion via .ToList()?
Or maybe using the enumerator directly?
If you can, please also explain a bit or throw some links that explain this use of IEnumerable.
IEnumerable describes behavior, while List is an implementation of that behavior. When you use IEnumerable, you give the compiler a chance to defer work until later, possibly optimizing along the way. If you use ToList() you force the compiler to reify the results right away.
Whenever I'm "stacking" LINQ expressions, I use IEnumerable, because by only specifying the behavior I give LINQ a chance to defer evaluation and possibly optimize the program. Remember how LINQ doesn't generate the SQL to query the database until you enumerate it? Consider this:
public IEnumerable<Animals> AllSpotted()
{
return from a in Zoo.Animals
where a.coat.HasSpots == true
select a;
}
public IEnumerable<Animals> Feline(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Felidae"
select a;
}
public IEnumerable<Animals> Canine(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Canidae"
select a;
}
Now you have a method that selects an initial sample ("AllSpotted"), plus some filters. So now you can do this:
var Leopards = Feline(AllSpotted());
var Hyenas = Canine(AllSpotted());
So is it faster to use List over IEnumerable? Only if you want to prevent a query from being executed more than once. But is it better overall? Well in the above, Leopards and Hyenas get converted into single SQL queries each, and the database only returns the rows that are relevant. But if we had returned a List from AllSpotted(), then it may run slower because the database could return far more data than is actually needed, and we waste cycles doing the filtering in the client.
In a program, it may be better to defer converting your query to a list until the very end, so if I'm going to enumerate through Leopards and Hyenas more than once, I'd do this:
List<Animals> Leopards = Feline(AllSpotted()).ToList();
List<Animals> Hyenas = Canine(AllSpotted()).ToList();
There is a very good article written by: Claudio Bernasconi's TechBlog here: When to use IEnumerable, ICollection, IList and List
Here some basics points about scenarios and functions:
A class that implement IEnumerable allows you to use the foreach syntax.
Basically it has a method to get the next item in the collection. It doesn't need the whole collection to be in memory and doesn't know how many items are in it, foreach just keeps getting the next item until it runs out.
This can be very useful in certain circumstances, for instance in a massive database table you don't want to copy the entire thing into memory before you start processing the rows.
Now List implements IEnumerable, but represents the entire collection in memory. If you have an IEnumerable and you call .ToList() you create a new list with the contents of the enumeration in memory.
Your linq expression returns an enumeration, and by default the expression executes when you iterate through using the foreach. An IEnumerable linq statement executes when you iterate the foreach, but you can force it to iterate sooner using .ToList().
Here's what I mean:
var things =
from item in BigDatabaseCall()
where ....
select item;
// this will iterate through the entire linq statement:
int count = things.Count();
// this will stop after iterating the first one, but will execute the linq again
bool hasAnyRecs = things.Any();
// this will execute the linq statement *again*
foreach( var thing in things ) ...
// this will copy the results to a list in memory
var list = things.ToList()
// this won't iterate through again, the list knows how many items are in it
int count2 = list.Count();
// this won't execute the linq statement - we have it copied to the list
foreach( var thing in list ) ...
Nobody mentioned one crucial difference, ironically answered on a question closed as a duplicated of this.
IEnumerable is read-only and List is not.
See Practical difference between List and IEnumerable
The most important thing to realize is that, using Linq, the query does not get evaluated immediately. It is only run as part of iterating through the resulting IEnumerable<T> in a foreach - that's what all the weird delegates are doing.
So, the first example evaluates the query immediately by calling ToList and putting the query results in a list.
The second example returns an IEnumerable<T> that contains all the information needed to run the query later on.
In terms of performance, the answer is it depends. If you need the results to be evaluated at once (say, you're mutating the structures you're querying later on, or if you don't want the iteration over the IEnumerable<T> to take a long time) use a list. Else use an IEnumerable<T>. The default should be to use the on-demand evaluation in the second example, as that generally uses less memory, unless there is a specific reason to store the results in a list.
The advantage of IEnumerable is deferred execution (usually with databases). The query will not get executed until you actually loop through the data. It's a query waiting until it's needed (aka lazy loading).
If you call ToList, the query will be executed, or "materialized" as I like to say.
There are pros and cons to both. If you call ToList, you may remove some mystery as to when the query gets executed. If you stick to IEnumerable, you get the advantage that the program doesn't do any work until it's actually required.
I will share one misused concept that I fell into one day:
var names = new List<string> {"mercedes", "mazda", "bmw", "fiat", "ferrari"};
var startingWith_M = names.Where(x => x.StartsWith("m"));
var startingWith_F = names.Where(x => x.StartsWith("f"));
// updating existing list
names[0] = "ford";
// Guess what should be printed before continuing
print( startingWith_M.ToList() );
print( startingWith_F.ToList() );
Expected result
// I was expecting
print( startingWith_M.ToList() ); // mercedes, mazda
print( startingWith_F.ToList() ); // fiat, ferrari
Actual result
// what printed actualy
print( startingWith_M.ToList() ); // mazda
print( startingWith_F.ToList() ); // ford, fiat, ferrari
Explanation
As per other answers, the evaluation of the result was deferred until calling ToList or similar invocation methods for example ToArray.
So I can rewrite the code in this case as:
var names = new List<string> {"mercedes", "mazda", "bmw", "fiat", "ferrari"};
// updating existing list
names[0] = "ford";
// before calling ToList directly
var startingWith_M = names.Where(x => x.StartsWith("m"));
var startingWith_F = names.Where(x => x.StartsWith("f"));
print( startingWith_M.ToList() );
print( startingWith_F.ToList() );
Play arround
https://repl.it/E8Ki/0
If all you want to do is enumerate them, use the IEnumerable.
Beware, though, that changing the original collection being enumerated is a dangerous operation - in this case, you will want to ToList first. This will create a new list element for each element in memory, enumerating the IEnumerable and is thus less performant if you only enumerate once - but safer and sometimes the List methods are handy (for instance in random access).
In addition to all the answers posted above, here is my two cents. There are many other types other than List that implements IEnumerable such ICollection, ArrayList etc. So if we have IEnumerable as parameter of any method, we can pass any collection types to the function. Ie we can have method to operate on abstraction not any specific implementation.
The downside of IEnumerable (a deferred execution) is that until you invoke the .ToList() the list can potentially change. For a really simple example of this - this would work
var persons;
using (MyEntities db = new MyEntities()) {
persons = db.Persons.ToList(); // It's mine now. In the memory
}
// do what you want with the list of persons;
and this would not work
IEnumerable<Person> persons;
using (MyEntities db = new MyEntities()) {
persons = db.Persons; // nothing is brought until you use it;
}
persons = persons.ToList(); // trying to use it...
// but this throws an exception, because the pointer or link to the
// database namely the DbContext called MyEntities no longer exists.
There are many cases (such as an infinite list or a very large list) where IEnumerable cannot be transformed to a List. The most obvious examples are all the prime numbers, all the users of facebook with their details, or all the items on ebay.
The difference is that "List" objects are stored "right here and right now", whereas "IEnumerable" objects work "just one at a time". So if I am going through all the items on ebay, one at a time would be something even a small computer can handle, but ".ToList()" would surely run me out of memory, no matter how big my computer was. No computer can by itself contain and handle such a huge amount of data.
[Edit] - Needless to say - it's not "either this or that". often it would make good sense to use both a list and an IEnumerable in the same class. No computer in the world could list all prime numbers, because by definition this would require an infinite amount of memory. But you could easily think of a class PrimeContainer which contains an
IEnumerable<long> primes, which for obvious reasons also contains a SortedList<long> _primes. all the primes calculated so far. the next prime to be checked would only be run against the existing primes (up to the square root). That way you gain both - primes one at a time (IEnumerable) and a good list of "primes so far", which is a pretty good approximation of the entire (infinite) list.
Suppose I have a given collection. Without ever changing the collection in any way, I loop through its contents twice with a foreach. Barring cosmic rays and what not, is it absolutely guaranteed that the order will be consistent in both loops?
Alternatively, given a HashSet<string> with a number of elements, what can cause the output from the the commented lines in the following to be unequal:
{
var mySet = new HashSet<string>();
// Some code which populates the HashSet<string>
// Output1
printContents(mySet);
// Output2
printContents(mySet);
}
public void printContents(HashSet<string> set) {
foreach(var element in set) {
Console.WriteLine(element);
}
}
It would be helpful if I could get a general answer explaining what causes an implementation to not meet the criteria described above. Specifically, though, I am interested in Dictionary, List and arrays.
Array enumeration guarantees order.
List and List<T> are expected to provide stable order (since they are expected to implement sequentially-indexed elements).
Dictionary, HashSet are explicitly do not guarantee order. Its is very unlikely that 2 calls to iterate items one after each other will return items in different order, but there is no guarantees or expectations. One should not expect any particular order.
Sorted versions of Dictionary/HashSet return items in sort order.
Other IEnumerable objects are free to do whatever they want. Normally one implements iterators in such a way that it matches user's expectations. I.e. enumeration of something that have implicit order should be stable, if explicit order provided - expected to be stable. Query to database that does not specify order should be expected to return items in semi-random order.
Check this question for links: Does the foreach loop in C# guarantee an order of evaluation?
Everything that implements IEnumerable<T> does so in its own way. There is no general guarantee that any given collection must ensure stability.
If you are referring specifically to Collection<T> (http://msdn.microsoft.com/en-us/library/ms132397.aspx) I don't see any specific guarantee in its MSDN reference that ordering is consistent.
Will it probably be consistent? Yes. Is there a written guarantee? Not that I can find.
For many of the C# collections there are sorted versions of the collection. For instance, a HashSet is to a SortedSet as a Dictionary is to a SortedDictionary. If you're working with something where the order isn't important like the Dictionary then you can't assume the loop order will behave the same way every time.
As per your example with HashSet<T>, we now have source code to check: HashSet:Enumerator
As it is, the Slot[] set.m_slots array is iterated.
The array object is only changed in the methods TrimExcess, Initialize (both of which are only called in the constructor), OnDeserialization, and SetCapacity (only called by AddIfNotPresent and AddOrGetLocation).
The values of m_slots are only changed in methods that change elements of the HashSet(Clear, Remove, AddIfNotPresent, IntersectWith, SymmetricExceptWith).
So yes, if nothing touches the set, it enumerates in the same order.
Dictionary:Enumerator works in quite the same way, iterating an Entry[] entries that only changes when such non-readonly methods are called.
I have a List<String> and two List<CustomObject>.
I want to combine two List<CustomObject> but only if second List<CustomObject> property is not contained in List<String>.
So, want to do something like this:
resultSet1.AddRange(resultSet2.Select(x => !names.Contains(x.City)).ToList());
What am I missing here?
You are misusing Select when you should be using Where.
resultSet1.AddRange(resultSet2.Where(x => !names.Contains(x.City));
Select is a projection method. It takes a sequence and selects it, sometimes selecting the entire item, a single property, or projecting it into something else entirely.
Where is a filtering method. It takes a sequence and applies a predicate to it, yielding only the elements that pass the predicate.
(In your example, by using Select, you're effectively trying to add a list of bool to a list of CustomObject, which is not going to work.)
Not addressing the specific error at hand, here are some additional ideas to consider.
This is a situation where a HashSet<string> might be beneficial for names, particularly if names is significantly large. Contains on a HashSet<T> is of O(1) complexity, whereas it is O(n) for List<T>. However, there is overhead associated with the HashSet, so if you have any concerns, it's best to measure both and see which is more performant.
One more thing that might help, if you simply need to stream one sequence after the other and do not necessarily need to change or add to either collection, you might consider using Union or Concat operations.
var sequence = resultSet1.Union(resultSet2.Where(x => !names.Contains(x.City)));
var sequence = resultSet1.Concat(resultSet2.Where(x => !names.Contains(x.City)));
The difference in the two being that Union will filter out any duplicates in the resulting sequence (from both inputs, not just the second against the first), Concat applies no duplicate-filtering logic and simply streams one sequence after the other. The input sequences (resultSet1 and resultSet2) are unmodified.
I have some doubts over how Enumerators work, and LINQ. Consider these two simple selects:
List<Animal> sel = (from animal in Animals
join race in Species
on animal.SpeciesKey equals race.SpeciesKey
select animal).Distinct().ToList();
or
IEnumerable<Animal> sel = (from animal in Animals
join race in Species
on animal.SpeciesKey equals race.SpeciesKey
select animal).Distinct();
I changed the names of my original objects so that this looks like a more generic example. The query itself is not that important. What I want to ask is this:
foreach (Animal animal in sel) { /*do stuff*/ }
I noticed that if I use IEnumerable, when I debug and inspect "sel", which in that case is the IEnumerable, it has some interesting members: "inner", "outer", "innerKeySelector" and "outerKeySelector", these last 2 appear to be delegates. The "inner" member does not have "Animal" instances in it, but rather "Species" instances, which was very strange for me. The "outer" member does contain "Animal" instances. I presume that the two delegates determine which goes in and what goes out of it?
I noticed that if I use "Distinct", the "inner" contains 6 items (this is incorrect as only 2 are Distinct), but the "outer" does contain the correct values. Again, probably the delegated methods determine this but this is a bit more than I know about IEnumerable.
Most importantly, which of the two options is the best performance-wise?
The evil List conversion via .ToList()?
Or maybe using the enumerator directly?
If you can, please also explain a bit or throw some links that explain this use of IEnumerable.
IEnumerable describes behavior, while List is an implementation of that behavior. When you use IEnumerable, you give the compiler a chance to defer work until later, possibly optimizing along the way. If you use ToList() you force the compiler to reify the results right away.
Whenever I'm "stacking" LINQ expressions, I use IEnumerable, because by only specifying the behavior I give LINQ a chance to defer evaluation and possibly optimize the program. Remember how LINQ doesn't generate the SQL to query the database until you enumerate it? Consider this:
public IEnumerable<Animals> AllSpotted()
{
return from a in Zoo.Animals
where a.coat.HasSpots == true
select a;
}
public IEnumerable<Animals> Feline(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Felidae"
select a;
}
public IEnumerable<Animals> Canine(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Canidae"
select a;
}
Now you have a method that selects an initial sample ("AllSpotted"), plus some filters. So now you can do this:
var Leopards = Feline(AllSpotted());
var Hyenas = Canine(AllSpotted());
So is it faster to use List over IEnumerable? Only if you want to prevent a query from being executed more than once. But is it better overall? Well in the above, Leopards and Hyenas get converted into single SQL queries each, and the database only returns the rows that are relevant. But if we had returned a List from AllSpotted(), then it may run slower because the database could return far more data than is actually needed, and we waste cycles doing the filtering in the client.
In a program, it may be better to defer converting your query to a list until the very end, so if I'm going to enumerate through Leopards and Hyenas more than once, I'd do this:
List<Animals> Leopards = Feline(AllSpotted()).ToList();
List<Animals> Hyenas = Canine(AllSpotted()).ToList();
There is a very good article written by: Claudio Bernasconi's TechBlog here: When to use IEnumerable, ICollection, IList and List
Here some basics points about scenarios and functions:
A class that implement IEnumerable allows you to use the foreach syntax.
Basically it has a method to get the next item in the collection. It doesn't need the whole collection to be in memory and doesn't know how many items are in it, foreach just keeps getting the next item until it runs out.
This can be very useful in certain circumstances, for instance in a massive database table you don't want to copy the entire thing into memory before you start processing the rows.
Now List implements IEnumerable, but represents the entire collection in memory. If you have an IEnumerable and you call .ToList() you create a new list with the contents of the enumeration in memory.
Your linq expression returns an enumeration, and by default the expression executes when you iterate through using the foreach. An IEnumerable linq statement executes when you iterate the foreach, but you can force it to iterate sooner using .ToList().
Here's what I mean:
var things =
from item in BigDatabaseCall()
where ....
select item;
// this will iterate through the entire linq statement:
int count = things.Count();
// this will stop after iterating the first one, but will execute the linq again
bool hasAnyRecs = things.Any();
// this will execute the linq statement *again*
foreach( var thing in things ) ...
// this will copy the results to a list in memory
var list = things.ToList()
// this won't iterate through again, the list knows how many items are in it
int count2 = list.Count();
// this won't execute the linq statement - we have it copied to the list
foreach( var thing in list ) ...
Nobody mentioned one crucial difference, ironically answered on a question closed as a duplicated of this.
IEnumerable is read-only and List is not.
See Practical difference between List and IEnumerable
The most important thing to realize is that, using Linq, the query does not get evaluated immediately. It is only run as part of iterating through the resulting IEnumerable<T> in a foreach - that's what all the weird delegates are doing.
So, the first example evaluates the query immediately by calling ToList and putting the query results in a list.
The second example returns an IEnumerable<T> that contains all the information needed to run the query later on.
In terms of performance, the answer is it depends. If you need the results to be evaluated at once (say, you're mutating the structures you're querying later on, or if you don't want the iteration over the IEnumerable<T> to take a long time) use a list. Else use an IEnumerable<T>. The default should be to use the on-demand evaluation in the second example, as that generally uses less memory, unless there is a specific reason to store the results in a list.
The advantage of IEnumerable is deferred execution (usually with databases). The query will not get executed until you actually loop through the data. It's a query waiting until it's needed (aka lazy loading).
If you call ToList, the query will be executed, or "materialized" as I like to say.
There are pros and cons to both. If you call ToList, you may remove some mystery as to when the query gets executed. If you stick to IEnumerable, you get the advantage that the program doesn't do any work until it's actually required.
I will share one misused concept that I fell into one day:
var names = new List<string> {"mercedes", "mazda", "bmw", "fiat", "ferrari"};
var startingWith_M = names.Where(x => x.StartsWith("m"));
var startingWith_F = names.Where(x => x.StartsWith("f"));
// updating existing list
names[0] = "ford";
// Guess what should be printed before continuing
print( startingWith_M.ToList() );
print( startingWith_F.ToList() );
Expected result
// I was expecting
print( startingWith_M.ToList() ); // mercedes, mazda
print( startingWith_F.ToList() ); // fiat, ferrari
Actual result
// what printed actualy
print( startingWith_M.ToList() ); // mazda
print( startingWith_F.ToList() ); // ford, fiat, ferrari
Explanation
As per other answers, the evaluation of the result was deferred until calling ToList or similar invocation methods for example ToArray.
So I can rewrite the code in this case as:
var names = new List<string> {"mercedes", "mazda", "bmw", "fiat", "ferrari"};
// updating existing list
names[0] = "ford";
// before calling ToList directly
var startingWith_M = names.Where(x => x.StartsWith("m"));
var startingWith_F = names.Where(x => x.StartsWith("f"));
print( startingWith_M.ToList() );
print( startingWith_F.ToList() );
Play arround
https://repl.it/E8Ki/0
If all you want to do is enumerate them, use the IEnumerable.
Beware, though, that changing the original collection being enumerated is a dangerous operation - in this case, you will want to ToList first. This will create a new list element for each element in memory, enumerating the IEnumerable and is thus less performant if you only enumerate once - but safer and sometimes the List methods are handy (for instance in random access).
In addition to all the answers posted above, here is my two cents. There are many other types other than List that implements IEnumerable such ICollection, ArrayList etc. So if we have IEnumerable as parameter of any method, we can pass any collection types to the function. Ie we can have method to operate on abstraction not any specific implementation.
The downside of IEnumerable (a deferred execution) is that until you invoke the .ToList() the list can potentially change. For a really simple example of this - this would work
var persons;
using (MyEntities db = new MyEntities()) {
persons = db.Persons.ToList(); // It's mine now. In the memory
}
// do what you want with the list of persons;
and this would not work
IEnumerable<Person> persons;
using (MyEntities db = new MyEntities()) {
persons = db.Persons; // nothing is brought until you use it;
}
persons = persons.ToList(); // trying to use it...
// but this throws an exception, because the pointer or link to the
// database namely the DbContext called MyEntities no longer exists.
There are many cases (such as an infinite list or a very large list) where IEnumerable cannot be transformed to a List. The most obvious examples are all the prime numbers, all the users of facebook with their details, or all the items on ebay.
The difference is that "List" objects are stored "right here and right now", whereas "IEnumerable" objects work "just one at a time". So if I am going through all the items on ebay, one at a time would be something even a small computer can handle, but ".ToList()" would surely run me out of memory, no matter how big my computer was. No computer can by itself contain and handle such a huge amount of data.
[Edit] - Needless to say - it's not "either this or that". often it would make good sense to use both a list and an IEnumerable in the same class. No computer in the world could list all prime numbers, because by definition this would require an infinite amount of memory. But you could easily think of a class PrimeContainer which contains an
IEnumerable<long> primes, which for obvious reasons also contains a SortedList<long> _primes. all the primes calculated so far. the next prime to be checked would only be run against the existing primes (up to the square root). That way you gain both - primes one at a time (IEnumerable) and a good list of "primes so far", which is a pretty good approximation of the entire (infinite) list.
Whats the best/easiest way to obtain a count of items within an IEnumerable collection without enumerating over all of the items in the collection?
Possible with LINQ or Lambda?
In any case, you have to loop through it. Linq offers the Count method:
var result = myenum.Count();
The solution depends on why you don't want to enumerate through the collection.
If it's because enumerating the collection might be slow, then there is no solution that will be faster. You might want to consider using an ICollection instead if possible. Unless the enumeration is remarkably slow (e.g. it reads items from disk) speed shouldn't be a problem though.
If it's because enumerating the collection will require more code then it's already been written for you in the form of the .Count() extension method. Just use MyEnumerable.Count().
If it's because you want to be able to enumerate the collection after you've counted then the .Count() extension method allows for this. You can even call .Count() on a collection you're in the middle of enumerating and it will carry on from where it was before the count. For example:
foreach (int item in Series.Generate(5))
{
Console.WriteLine(item + "(" + myEnumerable.Count() + ")");
}
will give the results
0 (5)
1 (5)
2 (5)
3 (5)
4 (5)
If it's because the enumeration has side effects (e.g. writes to disk/console) or is dependant on variables that may change between counting and enumerating (e.g. reads from disk) [N.B. If possible, I would suggest rethinking the architecture as this can cause a lot of problems] then one possibility to consider is reading the enumeration into an intermittent storage. For example:
List<int> seriesAsList = Series.Generate(5).ToList();
All of the above assume you can't change the type (i.e. it is returned from a library that you do not own). If possible you might want to consider changing to use an ICollection or IList (ICollection being more widely scoped than IList) which has a Count property on it.
You will have to enumerate to get a count. Other constructs like the List keep a running count.
Use this.
IEnumerable list =..........;
list.OfType<T>().Count()
it will return the count.
There's also IList or ICollection, if you want to use a construct that is still somewhat flexible, but also has the feature you require. They both imply IEnumerable.
It also depends on what you want to achieve by counting.. If you are interested to find if the enumerable collection has any elements, you could use
myEnumerable.Any() over myEnumerable.Count() where the former will yield the first element and the later will yield all the elements.
An IEnumerable will have to iterate through every item. to get the full count.
If you just need to check if there is one or more items in an IEnumerable a more efficient method is to check if there are any. Any() only check to see there is a value and does not loop through everything.
IEnumerable myStrings = new List(){"one","two", "three"};
bool hasValues = myStrings.Any();
Not possible with LINQ, as calling .Count(...) does enumerate the collection. If you're running into the problem where you can't iterate through a collection twice, try this:
List<MyTableItem> myList = dataContext.MyTable.ToList();
int myTableCount = myList.Count;
foreach (MyTableItem in myList)
{
...
}
If you need to count and then loop you may be better off with a list.
If you're using count to check for members you can use Any() to avoid enumerating the entire collection.
The best solution -as I think is to do the following:
using System.Linq.Dynamic;
myEnumerable.AsQueryable().Count()
When I want to use the Count property, I use ILIST which implements IEnumerable and ICollection interfaces. The ILIST data structure is an Array. I stepped through using the VS Debugger and found that the .Count property below returns the Array.Length property.
IList<string> FileServerVideos = Directory.GetFiles(VIDEOSERVERPATH, "*.mp4");
if (FileServerVideos.Count == 0)
return;