I have a c# statement which iterate thru a collection of row. One of the fields call a private method to get an array of object but I am getting null. I placed a breakpoint inside the linq but it never hits the method.
Here is my code
IQueryable<MyObject> myObject = ds.Tables['Table'].AsEnumerable().Select(row => new MyObject
{
id = row.Field<int>("ID"),
MyCollectionArray = this.getCollectionArray(row.Field<string>("MyAggregatedString")),
}).AsQueryable();
private MyObect[] getCollectionArray(string concatString)
{
// placed a breakpoint, it is never called. Not sure why
}
Thanks for any asistance.
What you are facing is called Deferred Execution.
Which means that your query is not executed until you use it somewhere.
Here is a part from the documentation:
Deferred execution means that the evaluation of an expression is
delayed until its realized value is actually required. Deferred
execution can greatly improve performance when you have to manipulate
large data collections, especially in programs that contain a series
of chained queries or manipulations. In the best case, deferred
execution enables only a single iteration through the source
collection.
And in order to execute your statement you just have to use it.
The simplest way might be just by calling
myObject.ToList()
Also it will be executed in case you use functions which require the values to be populated (like Sum, Average, etc)
I have some doubts over how Enumerators work, and LINQ. Consider these two simple selects:
List<Animal> sel = (from animal in Animals
join race in Species
on animal.SpeciesKey equals race.SpeciesKey
select animal).Distinct().ToList();
or
IEnumerable<Animal> sel = (from animal in Animals
join race in Species
on animal.SpeciesKey equals race.SpeciesKey
select animal).Distinct();
I changed the names of my original objects so that this looks like a more generic example. The query itself is not that important. What I want to ask is this:
foreach (Animal animal in sel) { /*do stuff*/ }
I noticed that if I use IEnumerable, when I debug and inspect "sel", which in that case is the IEnumerable, it has some interesting members: "inner", "outer", "innerKeySelector" and "outerKeySelector", these last 2 appear to be delegates. The "inner" member does not have "Animal" instances in it, but rather "Species" instances, which was very strange for me. The "outer" member does contain "Animal" instances. I presume that the two delegates determine which goes in and what goes out of it?
I noticed that if I use "Distinct", the "inner" contains 6 items (this is incorrect as only 2 are Distinct), but the "outer" does contain the correct values. Again, probably the delegated methods determine this but this is a bit more than I know about IEnumerable.
Most importantly, which of the two options is the best performance-wise?
The evil List conversion via .ToList()?
Or maybe using the enumerator directly?
If you can, please also explain a bit or throw some links that explain this use of IEnumerable.
IEnumerable describes behavior, while List is an implementation of that behavior. When you use IEnumerable, you give the compiler a chance to defer work until later, possibly optimizing along the way. If you use ToList() you force the compiler to reify the results right away.
Whenever I'm "stacking" LINQ expressions, I use IEnumerable, because by only specifying the behavior I give LINQ a chance to defer evaluation and possibly optimize the program. Remember how LINQ doesn't generate the SQL to query the database until you enumerate it? Consider this:
public IEnumerable<Animals> AllSpotted()
{
return from a in Zoo.Animals
where a.coat.HasSpots == true
select a;
}
public IEnumerable<Animals> Feline(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Felidae"
select a;
}
public IEnumerable<Animals> Canine(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Canidae"
select a;
}
Now you have a method that selects an initial sample ("AllSpotted"), plus some filters. So now you can do this:
var Leopards = Feline(AllSpotted());
var Hyenas = Canine(AllSpotted());
So is it faster to use List over IEnumerable? Only if you want to prevent a query from being executed more than once. But is it better overall? Well in the above, Leopards and Hyenas get converted into single SQL queries each, and the database only returns the rows that are relevant. But if we had returned a List from AllSpotted(), then it may run slower because the database could return far more data than is actually needed, and we waste cycles doing the filtering in the client.
In a program, it may be better to defer converting your query to a list until the very end, so if I'm going to enumerate through Leopards and Hyenas more than once, I'd do this:
List<Animals> Leopards = Feline(AllSpotted()).ToList();
List<Animals> Hyenas = Canine(AllSpotted()).ToList();
There is a very good article written by: Claudio Bernasconi's TechBlog here: When to use IEnumerable, ICollection, IList and List
Here some basics points about scenarios and functions:
A class that implement IEnumerable allows you to use the foreach syntax.
Basically it has a method to get the next item in the collection. It doesn't need the whole collection to be in memory and doesn't know how many items are in it, foreach just keeps getting the next item until it runs out.
This can be very useful in certain circumstances, for instance in a massive database table you don't want to copy the entire thing into memory before you start processing the rows.
Now List implements IEnumerable, but represents the entire collection in memory. If you have an IEnumerable and you call .ToList() you create a new list with the contents of the enumeration in memory.
Your linq expression returns an enumeration, and by default the expression executes when you iterate through using the foreach. An IEnumerable linq statement executes when you iterate the foreach, but you can force it to iterate sooner using .ToList().
Here's what I mean:
var things =
from item in BigDatabaseCall()
where ....
select item;
// this will iterate through the entire linq statement:
int count = things.Count();
// this will stop after iterating the first one, but will execute the linq again
bool hasAnyRecs = things.Any();
// this will execute the linq statement *again*
foreach( var thing in things ) ...
// this will copy the results to a list in memory
var list = things.ToList()
// this won't iterate through again, the list knows how many items are in it
int count2 = list.Count();
// this won't execute the linq statement - we have it copied to the list
foreach( var thing in list ) ...
Nobody mentioned one crucial difference, ironically answered on a question closed as a duplicated of this.
IEnumerable is read-only and List is not.
See Practical difference between List and IEnumerable
The most important thing to realize is that, using Linq, the query does not get evaluated immediately. It is only run as part of iterating through the resulting IEnumerable<T> in a foreach - that's what all the weird delegates are doing.
So, the first example evaluates the query immediately by calling ToList and putting the query results in a list.
The second example returns an IEnumerable<T> that contains all the information needed to run the query later on.
In terms of performance, the answer is it depends. If you need the results to be evaluated at once (say, you're mutating the structures you're querying later on, or if you don't want the iteration over the IEnumerable<T> to take a long time) use a list. Else use an IEnumerable<T>. The default should be to use the on-demand evaluation in the second example, as that generally uses less memory, unless there is a specific reason to store the results in a list.
The advantage of IEnumerable is deferred execution (usually with databases). The query will not get executed until you actually loop through the data. It's a query waiting until it's needed (aka lazy loading).
If you call ToList, the query will be executed, or "materialized" as I like to say.
There are pros and cons to both. If you call ToList, you may remove some mystery as to when the query gets executed. If you stick to IEnumerable, you get the advantage that the program doesn't do any work until it's actually required.
I will share one misused concept that I fell into one day:
var names = new List<string> {"mercedes", "mazda", "bmw", "fiat", "ferrari"};
var startingWith_M = names.Where(x => x.StartsWith("m"));
var startingWith_F = names.Where(x => x.StartsWith("f"));
// updating existing list
names[0] = "ford";
// Guess what should be printed before continuing
print( startingWith_M.ToList() );
print( startingWith_F.ToList() );
Expected result
// I was expecting
print( startingWith_M.ToList() ); // mercedes, mazda
print( startingWith_F.ToList() ); // fiat, ferrari
Actual result
// what printed actualy
print( startingWith_M.ToList() ); // mazda
print( startingWith_F.ToList() ); // ford, fiat, ferrari
Explanation
As per other answers, the evaluation of the result was deferred until calling ToList or similar invocation methods for example ToArray.
So I can rewrite the code in this case as:
var names = new List<string> {"mercedes", "mazda", "bmw", "fiat", "ferrari"};
// updating existing list
names[0] = "ford";
// before calling ToList directly
var startingWith_M = names.Where(x => x.StartsWith("m"));
var startingWith_F = names.Where(x => x.StartsWith("f"));
print( startingWith_M.ToList() );
print( startingWith_F.ToList() );
Play arround
https://repl.it/E8Ki/0
If all you want to do is enumerate them, use the IEnumerable.
Beware, though, that changing the original collection being enumerated is a dangerous operation - in this case, you will want to ToList first. This will create a new list element for each element in memory, enumerating the IEnumerable and is thus less performant if you only enumerate once - but safer and sometimes the List methods are handy (for instance in random access).
In addition to all the answers posted above, here is my two cents. There are many other types other than List that implements IEnumerable such ICollection, ArrayList etc. So if we have IEnumerable as parameter of any method, we can pass any collection types to the function. Ie we can have method to operate on abstraction not any specific implementation.
The downside of IEnumerable (a deferred execution) is that until you invoke the .ToList() the list can potentially change. For a really simple example of this - this would work
var persons;
using (MyEntities db = new MyEntities()) {
persons = db.Persons.ToList(); // It's mine now. In the memory
}
// do what you want with the list of persons;
and this would not work
IEnumerable<Person> persons;
using (MyEntities db = new MyEntities()) {
persons = db.Persons; // nothing is brought until you use it;
}
persons = persons.ToList(); // trying to use it...
// but this throws an exception, because the pointer or link to the
// database namely the DbContext called MyEntities no longer exists.
There are many cases (such as an infinite list or a very large list) where IEnumerable cannot be transformed to a List. The most obvious examples are all the prime numbers, all the users of facebook with their details, or all the items on ebay.
The difference is that "List" objects are stored "right here and right now", whereas "IEnumerable" objects work "just one at a time". So if I am going through all the items on ebay, one at a time would be something even a small computer can handle, but ".ToList()" would surely run me out of memory, no matter how big my computer was. No computer can by itself contain and handle such a huge amount of data.
[Edit] - Needless to say - it's not "either this or that". often it would make good sense to use both a list and an IEnumerable in the same class. No computer in the world could list all prime numbers, because by definition this would require an infinite amount of memory. But you could easily think of a class PrimeContainer which contains an
IEnumerable<long> primes, which for obvious reasons also contains a SortedList<long> _primes. all the primes calculated so far. the next prime to be checked would only be run against the existing primes (up to the square root). That way you gain both - primes one at a time (IEnumerable) and a good list of "primes so far", which is a pretty good approximation of the entire (infinite) list.
In my application there are a fair number of existing "service commands" which generally return a List<TEntity>. However, I wrote them in such a way that any queries would not be evaluated until the very last statement, when they are cast ToList<TEntity> (or at least I think I did).
Now I need to start obtaining some "context-specific" information from the commands, and I am thinking about doing the following:
Keep existing commands largely the same as they are today, but make sure they return an IEnumerable<TEntity> rather than an IList<TEntity>.
Create new commands that call the old commands but return IEnumerable<TResult> where TResult is not an entity but rather a view model, result model, etc - some representation of the data that is useful for the application.
The first case in which I have needed this is while doing a search for a Group entity. In my schema, Groups come with User-specific permissions, but it is not realistic for me to spit out the entire list of users and permissions in my result - first, because there could be many users, second, because there are many permissions, and third, because that information should not be available to insufficiently-privileged users (ie a "guest" should not be able to see what a "member" can do).
So, I want to be able to take the result of the original command, an IEnumerable<Group>, and describe how each Group ought to be transformed into a GroupResult, given a specific input of User (by Username in this case).
If I try to iterate over the result of the original command with ForEach I know this will force the execution of the result and therefore potentially result in a needlessly longer execution time. What if I wanted to further compose the result of the "new" command (that returns GroupResult) and filter out certain groups? Then maybe I would be calculating a ton of privileges for the inputted user, only to filter out the parent GroupResult objects later on anyway!
I guess my question boils down to... how do I tell C# how I'd like to transform each member of the IEnumerable without necessarily doing it at the time the method is run?
To lazily cast an enumerable from one type to another you do this:
IEnumerable<TResult> result = source.Cast<TResult>();
This assumes that the elements of the source enumerable can be cast to TResult. If they can't you need to use a standard projection with .Select(x => ... ).
Also, be careful returning IEnumerable<T> from a service or database as often there are resources that you need to open to obtain the data so now you would need make sure those resources are open whenever you try to evaluate the enumerable. Keeping a database connection open is a bad idea. I would be more inclined to return an array that you've cast as an IEnumerable<>.
However, if you really want to get an IEnmerable<> from a service or database that is truly lazy and will automatically refresh the data then you need to try Microsoft's Reactive Framework Team's "Interactive Extensions" to help with it.
They have an nice IEnumerable<> extension called Using that makes a "hot" enumerable that opens a resource for each iteration.
It would look something like this:
var d =
EnumerableEx
.Using(
() => new DB(),
db => db.Data.Where(x => x == 2));
It creates a new DB instance every time the enumerable is iterated and will dispose of the database when the enumerable is completed. Something worth considering.
Use NuGet and look for "Ix-Main" for the Interactive Extensions.
You're looking for the yield return command.
When you define a method returning an IEnumerable, and return its data by yield return, the return value is iterated over in the consuming method. This is what it could look like:
IEnumerable<GroupResult> GetGroups(string userName)
{
foreach(var group in context.Groups.Where(g => <some user-specific condition>))
{
var result = new GroupResult()
... // Further compose the result.
yield return result;
}
}
In consuming code:
var groups = GetGroups("tacos");
// At this point no eumeration has occurred yet. Any breakpoints in GetGroups
// have not been hit.
foreach(var g in groups)
{
// Now iteration in GetGroups starts!
}
Does there exist a standard pattern for yield returning all the items within an Enumerable?
More often than I like I find some of my code reflecting the following pattern:
public IEnumerable<object> YieldReturningFunction()
{
...
[logic and various standard yield return]
...
foreach(object obj in methodReturningEnumerable(x,y,z))
{
yield return obj;
}
}
The explicit usage of a foreach loop solely to return the results of an Enumerable reeks of code smell to me.
Obviously I could abandon the use of yield return increasing the complexity of my code by explicitly building an Enumerable and adding the result of each standard yield return to it as well as adding a the range of the results of the methodReturningEnumerable. This would be unfortunate, as such I was hoping there exists a better way to manage the yield return pattern.
No, there is no way around that.
It's a feature that's been requested, and it's not a bad idea (a yield foreach or equivalent exists in other languages).
At this point Microsoft simply hasn't allocated the time and money to implement it. They may or may not implement it in the future; I would guess (with no factual basis) that it's somewhere on the to do list; it's simply a question of if/when it gets high enough on that list to actually get implemented.
The only possible change that I could see would be to refactor out all of the individual yield returns from the top of the method into their own enumerable returning method, and then add a new method that returns the concatenation of that method and methodReturningEnumerable(x,y,z). Would it be better; no, probably not. The Concat would add back in just as much as you would have saved, if not more.
Can't be done. It's not that bad though. You can shorten it to a single line:
foreach (var o in otherEnumerator) yield return o;
Unrelated note: you should be careful of what logic you include in your generators; all execution is deferred until GetEnumerator() is called on the returned IEnumerable. I catch myself throwing NullArgumentExceptions incorrectly this way so often that I thought it was worth mentioning. :)
Which are the advantages/drawbacks of both approaches?
return items.Select(item => DoSomething(item));
versus
foreach(var item in items)
{
yield return DoSomething(item);
}
EDIT As they are MSIL roughly equivalent, which one you find more readable?
The yield return technique causes the C# compiler to generate an enumerator class "behind the scenes", while the Select call uses a standard enumerator class parameterized with a delegate. In practice, there shouldn't be a whole lot of difference between the two, other than possibly an extra call frame in the Select case, for the delegate.
For what it's worth, wrapping a lambda around DoSomething is sort of pointless as well; just pass a delegate for it directly.
In the slow-moving corporate world where I currently spend more time than I'd wish, yield return has the enormous advantage that it doesn't need that brand new .NET 3.5 Framework that won't be installed for at least another 2 years.
Select only allows you to return one object for each item in your "items" collection.
Using an additional .Where(x => DoIReallyWantThis(x)) allows you to weed out unwanted items, but still only allows you to return one object per item.
If you want potentially more than one object per item, you can use .SelectMany but it is easy to wind up with a single long line that is less than easy to read.
"yield return" has the potential to make your code more readable if you are looking through a complex data structure and picking out bits of information here and there. The best example of this that I have seen was where there were around a dozen separate conditions which would result in a returned object, and in each case the returned object was constructed differently.