LINQ Select only throws IOException when the Enumerable is looked at - c#

I'm currently using LINQ to load a list of files into XDocuments, like so:
var fileNames = new List<string>(){ "C:\file.xml" };
var xDocs = fileNames.Select(XDocument.Load);
var xDocs2 = xDocs.ToList(); // Crashes here
If I deliberately 'lock' one of the files with a different process, the IOException is only thrown when I actually start to look at the XDocuments I've been generating, ie when ToList() is called.
Can anyone explain why this is, and how best to handle this error? I'd like to have access to the working XDocuments still, if possible.

Can anyone explain why this is
As many pointed out, this is because of the so called deferred execution of many LINQ methods. For instanse, the Enumerable.Select method documentation states
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
while the Enumerable.ToList documentation contains
The ToList<TSource>(IEnumerable<TSource>) method forces immediate query evaluation and returns a List that contains the query results. You can append this method to your query in order to obtain a cached copy of the query results.
So the XDocument.Load is really executed for each file name during the ToList call. I guess that covers the why part.
and how best to handle this error? I'd like to have access to the working XDocuments still, if possible.
I don't know what does "best" mean in this context, but if you want to ignore the errors and include the "working XDocuments", then you can use something like this
var xDocs = fileNames.Select(fileName =>
{
try { return XDocument.Load(fileName); }
catch { return null; }
});
and then either append .Where(doc => doc != null), or account for null documents when processing the list.

This is why the linq .Select is an IEnumerable and the elements are first called if you make your IEnumerable to an List. Then you go through all of your elements.

Related

Should I use IEnumerable or List for parameters from WebApi? [duplicate]

I have some doubts over how Enumerators work, and LINQ. Consider these two simple selects:
List<Animal> sel = (from animal in Animals
join race in Species
on animal.SpeciesKey equals race.SpeciesKey
select animal).Distinct().ToList();
or
IEnumerable<Animal> sel = (from animal in Animals
join race in Species
on animal.SpeciesKey equals race.SpeciesKey
select animal).Distinct();
I changed the names of my original objects so that this looks like a more generic example. The query itself is not that important. What I want to ask is this:
foreach (Animal animal in sel) { /*do stuff*/ }
I noticed that if I use IEnumerable, when I debug and inspect "sel", which in that case is the IEnumerable, it has some interesting members: "inner", "outer", "innerKeySelector" and "outerKeySelector", these last 2 appear to be delegates. The "inner" member does not have "Animal" instances in it, but rather "Species" instances, which was very strange for me. The "outer" member does contain "Animal" instances. I presume that the two delegates determine which goes in and what goes out of it?
I noticed that if I use "Distinct", the "inner" contains 6 items (this is incorrect as only 2 are Distinct), but the "outer" does contain the correct values. Again, probably the delegated methods determine this but this is a bit more than I know about IEnumerable.
Most importantly, which of the two options is the best performance-wise?
The evil List conversion via .ToList()?
Or maybe using the enumerator directly?
If you can, please also explain a bit or throw some links that explain this use of IEnumerable.
IEnumerable describes behavior, while List is an implementation of that behavior. When you use IEnumerable, you give the compiler a chance to defer work until later, possibly optimizing along the way. If you use ToList() you force the compiler to reify the results right away.
Whenever I'm "stacking" LINQ expressions, I use IEnumerable, because by only specifying the behavior I give LINQ a chance to defer evaluation and possibly optimize the program. Remember how LINQ doesn't generate the SQL to query the database until you enumerate it? Consider this:
public IEnumerable<Animals> AllSpotted()
{
return from a in Zoo.Animals
where a.coat.HasSpots == true
select a;
}
public IEnumerable<Animals> Feline(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Felidae"
select a;
}
public IEnumerable<Animals> Canine(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Canidae"
select a;
}
Now you have a method that selects an initial sample ("AllSpotted"), plus some filters. So now you can do this:
var Leopards = Feline(AllSpotted());
var Hyenas = Canine(AllSpotted());
So is it faster to use List over IEnumerable? Only if you want to prevent a query from being executed more than once. But is it better overall? Well in the above, Leopards and Hyenas get converted into single SQL queries each, and the database only returns the rows that are relevant. But if we had returned a List from AllSpotted(), then it may run slower because the database could return far more data than is actually needed, and we waste cycles doing the filtering in the client.
In a program, it may be better to defer converting your query to a list until the very end, so if I'm going to enumerate through Leopards and Hyenas more than once, I'd do this:
List<Animals> Leopards = Feline(AllSpotted()).ToList();
List<Animals> Hyenas = Canine(AllSpotted()).ToList();
There is a very good article written by: Claudio Bernasconi's TechBlog here: When to use IEnumerable, ICollection, IList and List
Here some basics points about scenarios and functions:
A class that implement IEnumerable allows you to use the foreach syntax.
Basically it has a method to get the next item in the collection. It doesn't need the whole collection to be in memory and doesn't know how many items are in it, foreach just keeps getting the next item until it runs out.
This can be very useful in certain circumstances, for instance in a massive database table you don't want to copy the entire thing into memory before you start processing the rows.
Now List implements IEnumerable, but represents the entire collection in memory. If you have an IEnumerable and you call .ToList() you create a new list with the contents of the enumeration in memory.
Your linq expression returns an enumeration, and by default the expression executes when you iterate through using the foreach. An IEnumerable linq statement executes when you iterate the foreach, but you can force it to iterate sooner using .ToList().
Here's what I mean:
var things =
from item in BigDatabaseCall()
where ....
select item;
// this will iterate through the entire linq statement:
int count = things.Count();
// this will stop after iterating the first one, but will execute the linq again
bool hasAnyRecs = things.Any();
// this will execute the linq statement *again*
foreach( var thing in things ) ...
// this will copy the results to a list in memory
var list = things.ToList()
// this won't iterate through again, the list knows how many items are in it
int count2 = list.Count();
// this won't execute the linq statement - we have it copied to the list
foreach( var thing in list ) ...
Nobody mentioned one crucial difference, ironically answered on a question closed as a duplicated of this.
IEnumerable is read-only and List is not.
See Practical difference between List and IEnumerable
The most important thing to realize is that, using Linq, the query does not get evaluated immediately. It is only run as part of iterating through the resulting IEnumerable<T> in a foreach - that's what all the weird delegates are doing.
So, the first example evaluates the query immediately by calling ToList and putting the query results in a list.
The second example returns an IEnumerable<T> that contains all the information needed to run the query later on.
In terms of performance, the answer is it depends. If you need the results to be evaluated at once (say, you're mutating the structures you're querying later on, or if you don't want the iteration over the IEnumerable<T> to take a long time) use a list. Else use an IEnumerable<T>. The default should be to use the on-demand evaluation in the second example, as that generally uses less memory, unless there is a specific reason to store the results in a list.
The advantage of IEnumerable is deferred execution (usually with databases). The query will not get executed until you actually loop through the data. It's a query waiting until it's needed (aka lazy loading).
If you call ToList, the query will be executed, or "materialized" as I like to say.
There are pros and cons to both. If you call ToList, you may remove some mystery as to when the query gets executed. If you stick to IEnumerable, you get the advantage that the program doesn't do any work until it's actually required.
I will share one misused concept that I fell into one day:
var names = new List<string> {"mercedes", "mazda", "bmw", "fiat", "ferrari"};
var startingWith_M = names.Where(x => x.StartsWith("m"));
var startingWith_F = names.Where(x => x.StartsWith("f"));
// updating existing list
names[0] = "ford";
// Guess what should be printed before continuing
print( startingWith_M.ToList() );
print( startingWith_F.ToList() );
Expected result
// I was expecting
print( startingWith_M.ToList() ); // mercedes, mazda
print( startingWith_F.ToList() ); // fiat, ferrari
Actual result
// what printed actualy
print( startingWith_M.ToList() ); // mazda
print( startingWith_F.ToList() ); // ford, fiat, ferrari
Explanation
As per other answers, the evaluation of the result was deferred until calling ToList or similar invocation methods for example ToArray.
So I can rewrite the code in this case as:
var names = new List<string> {"mercedes", "mazda", "bmw", "fiat", "ferrari"};
// updating existing list
names[0] = "ford";
// before calling ToList directly
var startingWith_M = names.Where(x => x.StartsWith("m"));
var startingWith_F = names.Where(x => x.StartsWith("f"));
print( startingWith_M.ToList() );
print( startingWith_F.ToList() );
Play arround
https://repl.it/E8Ki/0
If all you want to do is enumerate them, use the IEnumerable.
Beware, though, that changing the original collection being enumerated is a dangerous operation - in this case, you will want to ToList first. This will create a new list element for each element in memory, enumerating the IEnumerable and is thus less performant if you only enumerate once - but safer and sometimes the List methods are handy (for instance in random access).
In addition to all the answers posted above, here is my two cents. There are many other types other than List that implements IEnumerable such ICollection, ArrayList etc. So if we have IEnumerable as parameter of any method, we can pass any collection types to the function. Ie we can have method to operate on abstraction not any specific implementation.
The downside of IEnumerable (a deferred execution) is that until you invoke the .ToList() the list can potentially change. For a really simple example of this - this would work
var persons;
using (MyEntities db = new MyEntities()) {
persons = db.Persons.ToList(); // It's mine now. In the memory
}
// do what you want with the list of persons;
and this would not work
IEnumerable<Person> persons;
using (MyEntities db = new MyEntities()) {
persons = db.Persons; // nothing is brought until you use it;
}
persons = persons.ToList(); // trying to use it...
// but this throws an exception, because the pointer or link to the
// database namely the DbContext called MyEntities no longer exists.
There are many cases (such as an infinite list or a very large list) where IEnumerable cannot be transformed to a List. The most obvious examples are all the prime numbers, all the users of facebook with their details, or all the items on ebay.
The difference is that "List" objects are stored "right here and right now", whereas "IEnumerable" objects work "just one at a time". So if I am going through all the items on ebay, one at a time would be something even a small computer can handle, but ".ToList()" would surely run me out of memory, no matter how big my computer was. No computer can by itself contain and handle such a huge amount of data.
[Edit] - Needless to say - it's not "either this or that". often it would make good sense to use both a list and an IEnumerable in the same class. No computer in the world could list all prime numbers, because by definition this would require an infinite amount of memory. But you could easily think of a class PrimeContainer which contains an
IEnumerable<long> primes, which for obvious reasons also contains a SortedList<long> _primes. all the primes calculated so far. the next prime to be checked would only be run against the existing primes (up to the square root). That way you gain both - primes one at a time (IEnumerable) and a good list of "primes so far", which is a pretty good approximation of the entire (infinite) list.

Using LINQ First() method shows "Possible multiple iteration of IEnumerable" warning by Resharper

I have code like this:
Person firstPerson = personsEnumerable.First();
, where personsEnumerable is IEnumerable<Person>
Now, Resharper underlines the personsEnumerable variable and says "Possible iteration of IEnumerable". I understand what this warning means from other questions here on SO, but I'm wondering why it's showing it in my example? I think First() returns the first element and there's no need to iterate the collection at all ?
Is this a 'general' warning message which is not applicable in my case (and I can ignore it) or I don't understand how First() actually works ?
Yes, it returns the first element but it still creates an enumerator to get that first element.You could add ToArray or ToList after Where to prevent this. However if you want to just get the first element, you could use the overloaded version of First which takes a Func delegate and you can remove the Where.
Consider this code:
var personsEnumerable = peopleList.OrderBy(p => p.Age);
var firstPerson = personsEnumerable.First();
foreach (var p in personsEnumerable)
{
// whatever
}
The first line of code sets things up to sort the list and create the results, but it doesn't actually do anything until you try to enumerate it. The second line of code, then would end up doing the sort. So although you're not enumerating the entire result, you're executing code to produce the result. And for a sort, that would entail doing all the heavy lifting.
The foreach loop would do the sort again before actually enumerating the results.

Casting an IEnumerable<TEntity> to an IEnumerable<TResult> while ensuring deferred execution

In my application there are a fair number of existing "service commands" which generally return a List<TEntity>. However, I wrote them in such a way that any queries would not be evaluated until the very last statement, when they are cast ToList<TEntity> (or at least I think I did).
Now I need to start obtaining some "context-specific" information from the commands, and I am thinking about doing the following:
Keep existing commands largely the same as they are today, but make sure they return an IEnumerable<TEntity> rather than an IList<TEntity>.
Create new commands that call the old commands but return IEnumerable<TResult> where TResult is not an entity but rather a view model, result model, etc - some representation of the data that is useful for the application.
The first case in which I have needed this is while doing a search for a Group entity. In my schema, Groups come with User-specific permissions, but it is not realistic for me to spit out the entire list of users and permissions in my result - first, because there could be many users, second, because there are many permissions, and third, because that information should not be available to insufficiently-privileged users (ie a "guest" should not be able to see what a "member" can do).
So, I want to be able to take the result of the original command, an IEnumerable<Group>, and describe how each Group ought to be transformed into a GroupResult, given a specific input of User (by Username in this case).
If I try to iterate over the result of the original command with ForEach I know this will force the execution of the result and therefore potentially result in a needlessly longer execution time. What if I wanted to further compose the result of the "new" command (that returns GroupResult) and filter out certain groups? Then maybe I would be calculating a ton of privileges for the inputted user, only to filter out the parent GroupResult objects later on anyway!
I guess my question boils down to... how do I tell C# how I'd like to transform each member of the IEnumerable without necessarily doing it at the time the method is run?
To lazily cast an enumerable from one type to another you do this:
IEnumerable<TResult> result = source.Cast<TResult>();
This assumes that the elements of the source enumerable can be cast to TResult. If they can't you need to use a standard projection with .Select(x => ... ).
Also, be careful returning IEnumerable<T> from a service or database as often there are resources that you need to open to obtain the data so now you would need make sure those resources are open whenever you try to evaluate the enumerable. Keeping a database connection open is a bad idea. I would be more inclined to return an array that you've cast as an IEnumerable<>.
However, if you really want to get an IEnmerable<> from a service or database that is truly lazy and will automatically refresh the data then you need to try Microsoft's Reactive Framework Team's "Interactive Extensions" to help with it.
They have an nice IEnumerable<> extension called Using that makes a "hot" enumerable that opens a resource for each iteration.
It would look something like this:
var d =
EnumerableEx
.Using(
() => new DB(),
db => db.Data.Where(x => x == 2));
It creates a new DB instance every time the enumerable is iterated and will dispose of the database when the enumerable is completed. Something worth considering.
Use NuGet and look for "Ix-Main" for the Interactive Extensions.
You're looking for the yield return command.
When you define a method returning an IEnumerable, and return its data by yield return, the return value is iterated over in the consuming method. This is what it could look like:
IEnumerable<GroupResult> GetGroups(string userName)
{
foreach(var group in context.Groups.Where(g => <some user-specific condition>))
{
var result = new GroupResult()
... // Further compose the result.
yield return result;
}
}
In consuming code:
var groups = GetGroups("tacos");
// At this point no eumeration has occurred yet. Any breakpoints in GetGroups
// have not been hit.
foreach(var g in groups)
{
// Now iteration in GetGroups starts!
}

Using Linq for copying/filtering produces a dynamic result in foreach loops?

So I create this projection of a dictionary of items I would like to remove.
var toRemoveList =
this.outputDic.Keys.Where(key =>
this.removeDic.ContainsKey(key));
Then I iterate through the result removing from the actual dictionary
foreach(var key in toRemoveList)
this.outputDic.Remove(key);
However during that foreach an exception is thrown saying that the list was modified during the loop. But, how so? is the linq query somewhat dynamic and gets re evaluated every time the dictionary changes? A simple .ToArray() call on the end of the query solves the issues, but imo, it shouldn't even occur in the first place.
So I create this projection of a dictionary of items I would like to remove.
var toRemoveList =
this.outputDic.Keys.Where(key =>
this.removeDic.ContainsKey(key));
As I have often said, if I can teach people one thing about LINQ it is this: the result of a query expression is a query, not the results of executing the query. You now have an object that means "the keys of a dictionary such that the key is... something". It is not the results of that query, it is that query. The query is an object unto itself; it does not give you a result set until you ask for one.
Then you do this:
foreach(var key in toRemoveList)
this.outputDic.Remove(key);
So what are you doing? You are iterating over the query. Iterating over the query executes the query, so the query is iterating over the original dictionary. But you then remove an item from the dictionary, while you are iterating over it, which is illegal.
imo, it shouldn't even occur in the first place.
Your opinion about how the world should be is a common one, but doing it your way leads to deep inefficiencies. Let us suppose that creating a query executes the query immediately rather than creates a query object. What does this do?
var query = expensiveRemoteDatabase
.Where(somefilter)
.Where(someotherfilter)
.OrderBy(something);
The first call to Where produces a query, which in your world is then executed, pulling down from the remote database all records which match that query. The second call to Where then says "oh, sorry, I meant to also apply this filter here as well, can we do that whole query again, this time with the second filter?" and so then that whole record set is computed, and then we say "oh, no, wait, I forgot to tell you when you built that last query object, we're going to need to sort it, so database, can you run this query for me a third time?"
Now perhaps do you see why queries produce a query that then does not execute until it needs to?
The reason is that toRemoveList does not contain a list of things to be removed, it contains a description of how to get a list of things that can be removed.
If you step through this in a debugger using F11 you can see this quite clearly for yourself. The first point it stops is with the cursor on foreach which is what you would expect.
Next you stop at toRemoveList (the one in foreach(var key in toRemoveList)). This is where it is setting up the iterator.
When you step through var key (with F11) however it now jumps into the original definition of toRemoveList, specifically the this.removeDic.ContainsKey(key) part. Now you get an idea of what is really happening.
The foreach is calling the iterators Next method to move to the next point in the dictionary's keys and holds onto the list. When you call into this.outputDic.Remove(key); this detects that the iterator hasn't finished and therefore stops you with this error.
As everybody is saying on here, the correct way to solve this is to use ToArray()/ToList() as what these do is to give you another copy of the list. So the you have one to step through, and one to remove from.
The .ToArray() solves the issues because it forces you to evaluate the entire enumeration and cache the local values. Without doing so, when you enumerate through it the enumerable attempts to calculate the first index, return that, then return to the collection and calculate the next index. If the underlying collection you're iterating over changes, you can no longer guarantee that the enumeration will return the appropriate value.
In short: just force the evaluation with .ToArray() (or .ToList(), or whatever).
The LINQ query uses deferred execution. It streams the items one by one, retruning them as they're requested. So yes, every time you try to remove a key it changes the result which is why it throws an exception. When you invoke ToArray() it forces execution of the query which is why it works.
EDIT: This is somewhat in response to your comments. Check out iterator blocks on msdn this is the mechanism being used when your for each executes. Your query just gets turned into an expression tree and the filter, projects, operation ect is applied to the elements one by one when they're retrieved unless it is not possible.
The reason you are getting this error is because of deferred execution of linq. To explain it fully when your loop runs is actually when the data is fetch from the dictionary. Thus modification in outputdic takes place at this point of time and it is not allowed to modify the collection you are looping upon. This is why you get this error. You can get rid of this error by asking the compiler to execute it before you run the loop.
var toRemoveList =
this.outputDic.Keys.Where(key =>
this.removeDic.ContainsKey(key)).ToList();
Notice the ToList() in the above statement. It will make sure that your query has been executed and you have your list in toRemoveList.

LINQ to SQL use query after DataContext Dispose

Im writing DLL for one project, I just started using LINQ to SQL and after moving all methods to this new dll. I disovered that I can't acess DataContext because it was disposed, I understand why but I'm not sure how I can acess results of query for my main project method so:
My method in DLL
public static IEnumerable<Problem> GetProblemsNewest(int howMuch)
{
using (ProblemClassesDataContext context = new ProblemClassesDataContext())
{
var problems = (from p in context.Problems orderby p.DateTimeCreated select p).Take(howMuch);
return problems;
}
}
Calling it:
IEnumerable<Problem> problems = ProblemsManipulation.GetProblemsNewest(10);
//Error can't acess it because it was disposed..
This is just first method, I have larger ones so I really need a way to do this. There must be a way to use LINQ to SQL in DLL? I know I can do something Like .ToList or .ToArray but then I wouldn't be able to acess row properties directly and would have to reference it as problem[0],problem[1] etc. which is even more messy than having tone of code in main project.
After you are outside of the using statement the context is automatically disposed, so when the IEnumerable is actually enumerated the context is already disposed.
Therefore you need to tell Linq that it should go ahead and actually retrieve the values from the DB while your inside of your using statement. You can do so via ToList() or ToArray (or others).
See the updated code:
public static IList<Problem> GetProblemsNewest(int howMuch)
{
using (ProblemClassesDataContext context = new ProblemClassesDataContext())
{
var problems = (from p in context.Problems orderby p.DateTimeCreated select p).Take(howMuch);
return problems.ToList();
}
}
Change this:
return problems;
to this:
return problems.ToList();
Explanation:
The ToList() will iterate through the results and pull them all into memory. Because this happens inside the using statement, you're fine. And because it creates a list, your values will be returned.
You could do this in other ways. The basic idea is to actually retrieve the results before the using statement closes.
An alternate solution would be to avoid the using statement, create an iterator that owns the object and disposes it when the last item has been iterated past.
You could handle it by doing a .ToList() on the IEnumerable before exiting the using block. This will retrieve the records and populate the list. Depending on your scenario this might not be optimal in terms of performance though (you lose the possibility of lazy retrieval and additional filtering of the query)
You have to finish the query inside the using clause, i.e. use ToList() or First() or Count(), etc...
Currently you returning query, and when you want use it, because database connection closed before your usage, you will get an exception, just do:
return problems.AsEnumerable()
This is because of deffered execution manner of linq. In fact your problems object is just query, and you should convert it to objects to use it somewhere else.
You may not want to use the context in a using; the problem is then you can't use the navigation properties later on, because you'll have the same "object disposed" issue when it tries to load the related data. What you need to do is let the context live, and return the results directly. Or, when you return the results, call ToList(), and later on query all related data.

Categories

Resources