I've noticed that certain command cause LINQtoSQL to connect to the database and download the records that are part of the query, for example, .ToArray().
Does the command .Cast() cause a query to execute (and how can I tell these things in the future?). For example...
IRevision<T> current = context.GetTable(typeof(T))
.Cast<IRevision<T>>()
.SingleOrDefault(o => o.ID == recordId);
I know there is a command for .GetTable that allows you to specify a generic type, but for strange and unexplainable reasons, it cannot be used in this situation.
From Enumerable.Cast()'s remarks:
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
All of the LINQ operators will let you know if they are deferred execution or immediate query execution. Additionally, here are the standard LINQ operators which are NOT deferred:
Aggregate
All
Any
Average
Contains
Count
ElementAt
ElementAtOrDefault
First
FirstOrDefault
Last
LastOrDefault
LongCount
Max
Min
SequenceEqual
Single
SingleOrDefault
Sum
ToArray
ToDictionary
ToList
ToLookup
No, it does not. It simply will perform a cast when you iterate through the IEnumerable.
There isn't any definitive way (in code) to know whether or not a method will use deferred execution or not. The documentation is going to be your best friend here as it will tell you if it defers execution or not.
However, that doesn't mean that you can't make some assumptions if the documentation is unclear.
If you have a method that returns another list/structure (like ToList, ToArray), then it will have to execute the query in order to populate the new data structure.
If the method returns a scalar value, then it will have to execute the query to generate that scalar value.
Other than that, if it simply returns IEnumerable<T>, then it more-than-likely is deferring execution. However, that doesn't mean that it's guaranteed, it just means it is more-than-likely.
What you are looking for is called "Deferred Execution". Statements that defer execution only run when you attempt to access the data. Statements such as ToList execute immediately, as the data is needed to transform it into a list.
Cast can wait until you actually access it, so it is a deferred statement.
Related
I have a c# statement which iterate thru a collection of row. One of the fields call a private method to get an array of object but I am getting null. I placed a breakpoint inside the linq but it never hits the method.
Here is my code
IQueryable<MyObject> myObject = ds.Tables['Table'].AsEnumerable().Select(row => new MyObject
{
id = row.Field<int>("ID"),
MyCollectionArray = this.getCollectionArray(row.Field<string>("MyAggregatedString")),
}).AsQueryable();
private MyObect[] getCollectionArray(string concatString)
{
// placed a breakpoint, it is never called. Not sure why
}
Thanks for any asistance.
What you are facing is called Deferred Execution.
Which means that your query is not executed until you use it somewhere.
Here is a part from the documentation:
Deferred execution means that the evaluation of an expression is
delayed until its realized value is actually required. Deferred
execution can greatly improve performance when you have to manipulate
large data collections, especially in programs that contain a series
of chained queries or manipulations. In the best case, deferred
execution enables only a single iteration through the source
collection.
And in order to execute your statement you just have to use it.
The simplest way might be just by calling
myObject.ToList()
Also it will be executed in case you use functions which require the values to be populated (like Sum, Average, etc)
Please consider the following statement:
var matches = person.Contacts.Where(c => c.ContactType == searchContact.ContactType).ToList();
This will filter all the records with matching ContactType of searchContact object and returns only the filtered Contacts of person.
But without ToList() method call at the end of the Where clause, it will return all the Contacts of person.
Now, consider the following code segment.
Dictionary<int, string> colors = new Dictionary<int, string>(){ {1, "red"}, {2, "blue"}, {3, "green"}, {4, "yellow"}, {5, "red"}, {6, "blue"}, {7, "red"} };
var colorSet = colors.Where(c => c.Value == "red");
This query will filter only the elements with value "red", even without calling ToList() method.
My question is why this two statements (one that compares values and one that compares properties) behave in a different way without ToList() method call?
Why this problem does not occur with FirstOrDefault instead of Where clause?
I really appreciate, if anyone can explain the scenario or post some references that I can follow.
Thanks!!
You are mistaken. Without calling ToList() or another method to force immediate execution, both statements will return an IQueryable<T>. Until you iterate over your query variable by using a foreach the query variable remains just that.
This article on MSDN should explain things well: Query Execution.
What you are experiencing is called Deferred Query Execution.
In a query that returns a sequence of values, the query variable
itself never holds the query results and only stores the query
commands. Execution of the query is deferred until the query variable
is iterated over in a foreach or For Each loop. This is known as
deferred execution.
When you use ToList() what occurs is known as Immediate Query Execution.
In contrast to the deferred execution of queries that produce a
sequence of values, queries that return a singleton value are executed
immediately. Some examples of singleton queries are Average, Count,
First, and Max. These execute immediately because the query must
produce a sequence to calculate the singleton result. You can also
force immediate execution. This is useful when you want to cache the
results of a query. To force immediate execution of a query that does
not produce a singleton value, you can call the ToList method, the
ToDictionary method, or the ToArray method on a query or query
variable.
These are core behaviors of LINQ.
But without ToList() method call at the end of the Where clause, it will return all the Contacts of person.
No, without ToList it will return a query which, when iterated, will yield all of the contacts matching the value you specified to filter on. Calling ToList only materializes that query into the results of that query. Waiting a while and iterating it later, possibly using some other method of iteration such as foreach, will only change the results if the underlying data source (in this case, a database, by the look of thigns) changes its data.
As to your dictionary filter, the same thing applies. Without calling ToList the variable represents a query to get the data when asked, not the results of that query, which is what you would get by calling ToList.
The use of a property versus a field is irrelevant here. Having said that, both queries are using properties, not fields. Even if one did use a field though, it wouldn't change a thing.
But without ToList() method call at the end of the Where clause, it will return all the Contacts of person.
You are wrong.ToList is just forces the iteration and gives you your filtered elements as a List.LINQ uses deferred execution which means until you use foreach loop to iterate over items or use ToList or ToArray methods, it's not executed.So ToList doesn't change your items. Value is also a property (see KeyValuePair<T,K> class), so you are performing a comparison based on property values in both query.There is no difference at all.
So I create this projection of a dictionary of items I would like to remove.
var toRemoveList =
this.outputDic.Keys.Where(key =>
this.removeDic.ContainsKey(key));
Then I iterate through the result removing from the actual dictionary
foreach(var key in toRemoveList)
this.outputDic.Remove(key);
However during that foreach an exception is thrown saying that the list was modified during the loop. But, how so? is the linq query somewhat dynamic and gets re evaluated every time the dictionary changes? A simple .ToArray() call on the end of the query solves the issues, but imo, it shouldn't even occur in the first place.
So I create this projection of a dictionary of items I would like to remove.
var toRemoveList =
this.outputDic.Keys.Where(key =>
this.removeDic.ContainsKey(key));
As I have often said, if I can teach people one thing about LINQ it is this: the result of a query expression is a query, not the results of executing the query. You now have an object that means "the keys of a dictionary such that the key is... something". It is not the results of that query, it is that query. The query is an object unto itself; it does not give you a result set until you ask for one.
Then you do this:
foreach(var key in toRemoveList)
this.outputDic.Remove(key);
So what are you doing? You are iterating over the query. Iterating over the query executes the query, so the query is iterating over the original dictionary. But you then remove an item from the dictionary, while you are iterating over it, which is illegal.
imo, it shouldn't even occur in the first place.
Your opinion about how the world should be is a common one, but doing it your way leads to deep inefficiencies. Let us suppose that creating a query executes the query immediately rather than creates a query object. What does this do?
var query = expensiveRemoteDatabase
.Where(somefilter)
.Where(someotherfilter)
.OrderBy(something);
The first call to Where produces a query, which in your world is then executed, pulling down from the remote database all records which match that query. The second call to Where then says "oh, sorry, I meant to also apply this filter here as well, can we do that whole query again, this time with the second filter?" and so then that whole record set is computed, and then we say "oh, no, wait, I forgot to tell you when you built that last query object, we're going to need to sort it, so database, can you run this query for me a third time?"
Now perhaps do you see why queries produce a query that then does not execute until it needs to?
The reason is that toRemoveList does not contain a list of things to be removed, it contains a description of how to get a list of things that can be removed.
If you step through this in a debugger using F11 you can see this quite clearly for yourself. The first point it stops is with the cursor on foreach which is what you would expect.
Next you stop at toRemoveList (the one in foreach(var key in toRemoveList)). This is where it is setting up the iterator.
When you step through var key (with F11) however it now jumps into the original definition of toRemoveList, specifically the this.removeDic.ContainsKey(key) part. Now you get an idea of what is really happening.
The foreach is calling the iterators Next method to move to the next point in the dictionary's keys and holds onto the list. When you call into this.outputDic.Remove(key); this detects that the iterator hasn't finished and therefore stops you with this error.
As everybody is saying on here, the correct way to solve this is to use ToArray()/ToList() as what these do is to give you another copy of the list. So the you have one to step through, and one to remove from.
The .ToArray() solves the issues because it forces you to evaluate the entire enumeration and cache the local values. Without doing so, when you enumerate through it the enumerable attempts to calculate the first index, return that, then return to the collection and calculate the next index. If the underlying collection you're iterating over changes, you can no longer guarantee that the enumeration will return the appropriate value.
In short: just force the evaluation with .ToArray() (or .ToList(), or whatever).
The LINQ query uses deferred execution. It streams the items one by one, retruning them as they're requested. So yes, every time you try to remove a key it changes the result which is why it throws an exception. When you invoke ToArray() it forces execution of the query which is why it works.
EDIT: This is somewhat in response to your comments. Check out iterator blocks on msdn this is the mechanism being used when your for each executes. Your query just gets turned into an expression tree and the filter, projects, operation ect is applied to the elements one by one when they're retrieved unless it is not possible.
The reason you are getting this error is because of deferred execution of linq. To explain it fully when your loop runs is actually when the data is fetch from the dictionary. Thus modification in outputdic takes place at this point of time and it is not allowed to modify the collection you are looping upon. This is why you get this error. You can get rid of this error by asking the compiler to execute it before you run the loop.
var toRemoveList =
this.outputDic.Keys.Where(key =>
this.removeDic.ContainsKey(key)).ToList();
Notice the ToList() in the above statement. It will make sure that your query has been executed and you have your list in toRemoveList.
If I know there is only one matching item in a collection, is there any way to tell Linq about this so that it will abort the search when it finds it?
I am assuming that both of these search the full collection before returning one item?
var fred = _people.Where((p) => p.Name == "Fred").First();
var bill = _people.Where((p) => p.Name == "Bill").Take(1);
EDIT: People seem fixated on FirstOrDefault, or SingleOrDefault. These have nothing to do with my question. They simply provide a default value if the collection is empty. As I stated, I know that my collection has a single matching item.
AakashM's comment is of most interest to me. I would appear my assumption is wrong but I'm interested why.
For instance, when linq to objects is running the Where() function in my example code, how does it know that there are further operations on its return value?
Your assumption is wrong. LINQ uses deferred execution and lazy evaluation a lot. What this means is that, for example, when you call Where(), it doesn't actually iterate the collection. Only when you iterate the object it returns, will it iterate the original collection. And it will do it in a lazy manner: only as much as is necessary.
So, in your case, neither query will iterate the whole collection: both will iterate it only up to the point where they find the first element, and then stop.
Actually, the second query (with Take()) won't do even that, it will iterate the source collection only if you iterate the result.
This all applies to LINQ to objects. Other providers (LINQ to SQL and others) can behave differently, but at least the principle of deferred execution usually still holds.
I think First() will not scan the whole collection. It will return immediatelly after the first match. But I suggest to use FirstOrDefault() instead.
EDIT:
Difference between First() and FirstOrDefault() (from MSDN):
The First() method throws an exception if source contains no elements. To instead return a default value when the source sequence is empty, use the FirstOrDefault() method.
Enumerable.First
Substitue .Where( by .SingleorDefault(
This will find the first and only item for you.
But you can't do this for any given number. If you need 2 items, you'll need to get the entire collection.
However, you shouldn't worry about time. The most effort is used in opening a database connection and establishing a query. Executing the query doesn't take that much time, so there's no real reason to stop a query halfway :-)
Consider this code:
var query = db.Table
.Where(t => SomeCondition(t))
.AsEnumerable();
int recordCount = query.Count();
int totalSomeNumber = query.Sum();
decimal average = query.Average();
Assume query takes a very long time to run. I need to get the record count, total SomeNumber's returned, and take an average at the end. I thought based on my reading that .AsEnumerable() would execute the query using LINQ-to-SQL, then use LINQ-to-Objects for the Count, Sum, and Average. Instead, when I do this in LINQPad, I see the same query is run three times. If I replace .AsEnumerable() with .ToList(), it only gets queried once.
Am I missing something about what AsEnumerable is/does?
Calling AsEnumerable() does not execute the query, enumerating it does.
IQueryable is the interface that allows LINQ to SQL to perform its magic. IQueryable implements IEnumerable so when you call AsEnumerable(), you are changing the extension-methods being called from there on, ie from the IQueryable-methods to the IEnumerable-methods (ie changing from LINQ to SQL to LINQ to Objects in this particular case). But you are not executing the actual query, just changing how it is going to be executed in its entirety.
To force query execution, you must call ToList().
Yes. All that AsEnumerable will do is cause the Count, Sum, and Average functions to be executed client-side (in other words, it will bring back the entire result set to the client, then the client will perform those aggregates instead of creating COUNT() SUM() and AVG() statements in SQL).
Justin Niessner's answer is perfect.
I just want to quote a MSDN explanation here: .NET Language-Integrated Query for Relational Data
The AsEnumerable() operator, unlike ToList() and ToArray(), does not cause execution of the query. It is still deferred. The AsEnumerable() operator merely changes the static typing of the query, turning a IQueryable into an IEnumerable, tricking the compiler into treating the rest of the query as locally executed.
I hope this is what is meant by:
IQueryable-methods to the IEnumerable-methods (ie changing from LINQ to SQL to LINQ to Objects
Once it is LINQ to Objects we can apply object's methods (e.g. ToString()). This is the explanation for one of the frequently asked questions about LINQ - Why LINQ to Entities does not recognize the method 'System.String ToString()?
According to ASENUMERABLE - codeblog.jonskeet, AsEnumerable can be handy when:
some aspects of the query in the database, and then a bit more manipulation in .NET – particularly if there are aspects you basically can’t implement in LINQ to SQL (or whatever provider you’re using).
It also says:
All we’re doing is changing the compile-time type of the sequence which is propagating through our query from IQueryable to IEnumerable – but that means that the compiler will use the methods in Enumerable (taking delegates, and executing in LINQ to Objects) instead of the ones in Queryable (taking expression trees, and usually executing out-of-process).
Finally, also see this related question: Returning IEnumerable vs. IQueryable
Well, you are on the right track. The problem is that an IQueryable (what the statement is before the AsEnumerable call) is also an IEnumerable, so that call is, in effect, a nop. It will require forcing it to a specific in-memory data structure (e.g., ToList()) to force the query.
I would presume that ToList forces Linq to fetch the records from the database. When you then perform the proceeding calculations they are done against the in memory objects rather than involving the database.
Leaving the return type as an Enumerable means that the data is not fetched until it is called upon by the code performing the calculations. I guess the knock on of this is that the database is hit three times - one for each calculation and the data is not persisted to memory.
Just adding a little more clarification:
I thought based on my reading that .AsEnumerable() would execute the query using LINQ-to-SQL
It will not execute the query right away, as Justin's answer explains. It only will be materialized (hit the database) later on.
Instead, when I do this in LINQPad, I see the same query is run three times.
Yes, and note that all three queries are exact the same, basically fetching all rows from the given condition into memory and then computing the count/sum/avg locally.
If I replace .AsEnumerable() with .ToList(), it only gets queried once.
But still getting all data into memory, with the advantage that now it run only once.
If performance improvement is a concern, just remove .AsEnumerable() and then the count/sum/avg will be translated correctly to their SQL correspondents. Doing so three queries will run (probably faster if there are index satisfying the conditions) but with a lot less memory footprint.