NHibernate Futures using Linq to query count - c#

I have this method below that queries over database to get for instance User, Orders or any entity for that matter. I want to know if the below query is optimal or do i have to tweak that.
Does the query do where, only after storing all records in the memory ?
Is there a way i can specify Expression for the filtering rather than Func<> ?
What it the optimal way to get Count, specifying a condition/where without much memory consumption?
C# Sample Code
public IList<TEntity> Find(Func<TEntity, bool> predicate)
{
//Criteria for creating
ICriteria filterCriterea = _unitOfWork.CurrentSession.CreateCriteria(typeof(TEntity));
//filtered result
return filterCriterea.Future<TEntity>().Where(predicate).ToList();
}

The are several things that you need to fix:
1. Ask for an Expression, not a Func
Your Find method specifies that it wants a Func. If you want an Expression, you have to say so:
public IList<TEntity> Find(Expression<Func<TEntity, bool>> predicate)
2. Use LINQ or QueryOver instead of ICriteria
Assuming you're not using an ancient version of NHibernate that doesn't have LINQ or QueryOver (oddly, there are a LOT of people still using NHibernate 1.2.1 - I can't figure out why)...
ICriteria doesn't understand Expressions. Replace CreateCriteria(typeof(TEntity)) with Query<TEntity>() or QueryOver<TEntity>() depending on which query syntax you prefer.
3. Where should go before Future
You need to move the Where before the Future. Future is used to batch several queries together so that they all get executed in one round trip to the database as soon as your code tries to evaluate the results of one of the "Future" queries. At the point your above code calls Future, the query only consists of CreateCriteria(typeof(TEntity)), which just tells it which table to query against, like so:
select * from TEntity
If you want a where clause in there, you have to switch those method calls around:
filterCriterea.Where(predicate).Future<TEntity>()
This should give you a SQL query like:
select * from TEntity where ...
It is very important to learn the difference between IQueryable and IEnumerable. You should notice that Query returns an IQueryable, whereas Future returns an IEnumerable. Manipulating an IQueryable using Expressions will result in changes to the SQL that gets executed. Manipulating an IEnumerable using a Func just changes the way you're looking at the in-memory data.
4. ToList immediately after Future negates benefit of Future
ToList iterates over the collection that is passed to it and puts each element in a list. The act of iterating over the collection will cause the Future query to be immediately executed, not giving you any chance to batch it together with other queries. If you must have a list, you should just omit the Future. I think however, that it would be a better idea to change your method to return an IEnumerable, which would give you the ability to batch this query together with other queries. Like so...
public IEnumerable<TEntity> Find(Expression<Func<TEntity, bool>> predicate)
{
var query = _unitOfWork.CurrentSession.Query<TEntity>();
return query.Where(predicate).Future<TEntity>();
}
In order to take advantage of the query batching, you are probably going to have to rearrange the code that calls this Find method. For example, instead of...
foreach (var openThing in thingRepository.Find(x => x.Status == Status.Open))
CloseIt(openThing);
foreach (var negativeWhatsit in whatsitRepository.Find(x => x.Amount < 0))
BePositive(negativeWhatsit);
... you should do this:
var openThings = thingRepository.Find(x => x.Status == Status.Open);
var negativeWhatsits = whatsitRepository.Find(x => x.Amount < 0);
// both queries will be executed here in one round-trip to database
foreach (var openThing in openThings)
CloseIt(openThing);
foreach (var negativeWhatsit in negativeWhatsits)
BePositive(negativeWhatsit);

Related

Force Entity Framework 6.1 to use exists instead of populating entire child object graph [duplicate]

What is the difference between returning IQueryable<T> vs. IEnumerable<T>, when should one be preferred over the other?
IQueryable<Customer> custs = from c in db.Customers
where c.City == "<City>"
select c;
IEnumerable<Customer> custs = from c in db.Customers
where c.City == "<City>"
select c;
Will both be deferred execution and when should one be preferred over the other?
Yes, both will give you deferred execution.
The difference is that IQueryable<T> is the interface that allows LINQ-to-SQL (LINQ.-to-anything really) to work. So if you further refine your query on an IQueryable<T>, that query will be executed in the database, if possible.
For the IEnumerable<T> case, it will be LINQ-to-object, meaning that all objects matching the original query will have to be loaded into memory from the database.
In code:
IQueryable<Customer> custs = ...;
// Later on...
var goldCustomers = custs.Where(c => c.IsGold);
That code will execute SQL to only select gold customers. The following code, on the other hand, will execute the original query in the database, then filtering out the non-gold customers in the memory:
IEnumerable<Customer> custs = ...;
// Later on...
var goldCustomers = custs.Where(c => c.IsGold);
This is quite an important difference, and working on IQueryable<T> can in many cases save you from returning too many rows from the database. Another prime example is doing paging: If you use Take and Skip on IQueryable, you will only get the number of rows requested; doing that on an IEnumerable<T> will cause all of your rows to be loaded in memory.
The top answer is good but it doesn't mention expression trees which explain "how" the two interfaces differ. Basically, there are two identical sets of LINQ extensions. Where(), Sum(), Count(), FirstOrDefault(), etc all have two versions: one that accepts functions and one that accepts expressions.
The IEnumerable version signature is: Where(Func<Customer, bool> predicate)
The IQueryable version signature is: Where(Expression<Func<Customer, bool>> predicate)
You've probably been using both of those without realizing it because both are called using identical syntax:
e.g. Where(x => x.City == "<City>") works on both IEnumerable and IQueryable
When using Where() on an IEnumerable collection, the compiler passes a compiled function to Where()
When using Where() on an IQueryable collection, the compiler passes an expression tree to Where(). An expression tree is like the reflection system but for code. The compiler converts your code into a data structure that describes what your code does in a format that's easily digestible.
Why bother with this expression tree thing? I just want Where() to filter my data.
The main reason is that both the EF and Linq2SQL ORMs can convert expression trees directly into SQL where your code will execute much faster.
Oh, that sounds like a free performance boost, should I use AsQueryable() all over the place in that case?
No, IQueryable is only useful if the underlying data provider can do something with it. Converting something like a regular List to IQueryable will not give you any benefit.
Yes, both use deferred execution. Let's illustrate the difference using the SQL Server profiler....
When we run the following code:
MarketDevEntities db = new MarketDevEntities();
IEnumerable<WebLog> first = db.WebLogs;
var second = first.Where(c => c.DurationSeconds > 10);
var third = second.Where(c => c.WebLogID > 100);
var result = third.Where(c => c.EmailAddress.Length > 11);
Console.Write(result.First().UserName);
In SQL Server profiler we find a command equal to:
"SELECT * FROM [dbo].[WebLog]"
It approximately takes 90 seconds to run that block of code against a WebLog table which has 1 million records.
So, all table records are loaded into memory as objects, and then with each .Where() it will be another filter in memory against these objects.
When we use IQueryable instead of IEnumerable in the above example (second line):
In SQL Server profiler we find a command equal to:
"SELECT TOP 1 * FROM [dbo].[WebLog] WHERE [DurationSeconds] > 10 AND [WebLogID] > 100 AND LEN([EmailAddress]) > 11"
It approximately takes four seconds to run this block of code using IQueryable.
IQueryable has a property called Expression which stores a tree expression which starts being created when we used the result in our example (which is called deferred execution), and at the end this expression will be converted to an SQL query to run on the database engine.
Both will give you deferred execution, yes.
As for which is preferred over the other, it depends on what your underlying datasource is.
Returning an IEnumerable will automatically force the runtime to use LINQ to Objects to query your collection.
Returning an IQueryable (which implements IEnumerable, by the way) provides the extra functionality to translate your query into something that might perform better on the underlying source (LINQ to SQL, LINQ to XML, etc.).
A lot has been said previously, but back to the roots, in a more technical way:
IEnumerable is a collection of objects in memory that you can enumerate - an in-memory sequence that makes it possible to iterate through (makes it way easy for within foreach loop, though you can go with IEnumerator only). They reside in the memory as is.
IQueryable is an expression tree that will get translated into something else at some point with ability to enumerate over the final outcome. I guess this is what confuses most people.
They obviously have different connotations.
IQueryable represents an expression tree (a query, simply) that will be translated to something else by the underlying query provider as soon as release APIs are called, like LINQ aggregate functions (Sum, Count, etc.) or ToList[Array, Dictionary,...]. And IQueryable objects also implement IEnumerable, IEnumerable<T> so that if they represent a query the result of that query could be iterated. It means IQueryable don't have to be queries only. The right term is they are expression trees.
Now how those expressions are executed and what they turn to is all up to so called query providers (expression executors we can think them of).
In the Entity Framework world (which is that mystical underlying data source provider, or the query provider) IQueryable expressions are translated into native T-SQL queries. Nhibernate does similar things with them. You can write your own one following the concepts pretty well described in LINQ: Building an IQueryable Provider link, for example, and you might want to have a custom querying API for your product store provider service.
So basically, IQueryable objects are getting constructed all the way long until we explicitly release them and tell the system to rewrite them into SQL or whatever and send down the execution chain for onward processing.
As if to deferred execution it's a LINQ feature to hold up the expression tree scheme in the memory and send it into the execution only on demand, whenever certain APIs are called against the sequence (the same Count, ToList, etc.).
The proper usage of both heavily depends on the tasks you're facing for the specific case. For the well-known repository pattern I personally opt for returning IList, that is IEnumerable over Lists (indexers and the like). So it is my advice to use IQueryable only within repositories and IEnumerable anywhere else in the code. Not saying about the testability concerns that IQueryable breaks down and ruins the separation of concerns principle. If you return an expression from within repositories consumers may play with the persistence layer as they would wish.
A little addition to the mess :) (from a discussion in the comments))
None of them are objects in memory since they're not real types per se, they're markers of a type - if you want to go that deep. But it makes sense (and that's why even MSDN put it this way) to think of IEnumerables as in-memory collections whereas IQueryables as expression trees. The point is that the IQueryable interface inherits the IEnumerable interface so that if it represents a query, the results of that query can be enumerated. Enumeration causes the expression tree associated with an IQueryable object to be executed.
So, in fact, you can't really call any IEnumerable member without having the object in the memory. It will get in there if you do, anyways, if it's not empty. IQueryables are just queries, not the data.
In general terms I would recommend the following:
Return IQueryable<T> if you want to enable the developer using your method to refine the query you return before executing.
Return IEnumerable if you want to transport a set of Objects to enumerate over.
Imagine an IQueryable as that what it is - a "query" for data (which you can refine if you want to). An IEnumerable is a set of objects (which has already been received or was created) over which you can enumerate.
In general you want to preserve the original static type of the query until it matters.
For this reason, you can define your variable as 'var' instead of either IQueryable<> or IEnumerable<> and you will know that you are not changing the type.
If you start out with an IQueryable<>, you typically want to keep it as an IQueryable<> until there is some compelling reason to change it. The reason for this is that you want to give the query processor as much information as possible. For example, if you're only going to use 10 results (you've called Take(10)) then you want SQL Server to know about that so that it can optimize its query plans and send you only the data you'll use.
A compelling reason to change the type from IQueryable<> to IEnumerable<> might be that you are calling some extension function that the implementation of IQueryable<> in your particular object either cannot handle or handles inefficiently. In that case, you might wish to convert the type to IEnumerable<> (by assigning to a variable of type IEnumerable<> or by using the AsEnumerable extension method for example) so that the extension functions you call end up being the ones in the Enumerable class instead of the Queryable class.
There is a blog post with brief source code sample about how misuse of IEnumerable<T> can dramatically impact LINQ query performance: Entity Framework: IQueryable vs. IEnumerable.
If we dig deeper and look into the sources, we can see that there are obviously different extension methods are perfomed for IEnumerable<T>:
// Type: System.Linq.Enumerable
// Assembly: System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
// Assembly location: C:\Windows\Microsoft.NET\Framework\v4.0.30319\System.Core.dll
public static class Enumerable
{
public static IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
return (IEnumerable<TSource>)
new Enumerable.WhereEnumerableIterator<TSource>(source, predicate);
}
}
and IQueryable<T>:
// Type: System.Linq.Queryable
// Assembly: System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
// Assembly location: C:\Windows\Microsoft.NET\Framework\v4.0.30319\System.Core.dll
public static class Queryable
{
public static IQueryable<TSource> Where<TSource>(
this IQueryable<TSource> source,
Expression<Func<TSource, bool>> predicate)
{
return source.Provider.CreateQuery<TSource>(
Expression.Call(
null,
((MethodInfo) MethodBase.GetCurrentMethod()).MakeGenericMethod(
new Type[] { typeof(TSource) }),
new Expression[]
{ source.Expression, Expression.Quote(predicate) }));
}
}
The first one returns enumerable iterator, and the second one creates query through the query provider, specified in IQueryable source.
The main difference between “IEnumerable” and “IQueryable” is about where the filter logic is executed. One executes on the client side (in memory) and the other executes on the database.
For example, we can consider an example where we have 10,000 records for a user in our database and let's say only 900 out which are active users, so in this case if we use “IEnumerable” then first it loads all 10,000 records in memory and then applies the IsActive filter on it which eventually returns the 900 active users.
While on the other hand on the same case if we use “IQueryable” it will directly apply the IsActive filter on the database which directly from there will return the 900 active users.
I would like to clarify a few things due to seemingly conflicting responses (mostly surrounding IEnumerable).
(1) IQueryable extends the IEnumerable interface. (You can send an IQueryable to something which expects IEnumerable without error.)
(2) Both IQueryable and IEnumerable LINQ attempt lazy loading when iterating over the result set. (Note that implementation can be seen in interface extension methods for each type.)
In other words, IEnumerables are not exclusively "in-memory". IQueryables are not always executed on the database. IEnumerable must load things into memory (once retrieved, possibly lazily) because it has no abstract data provider. IQueryables rely on an abstract provider (like LINQ-to-SQL), although this could also be the .NET in-memory provider.
Sample use case
(a) Retrieve list of records as IQueryable from EF context. (No records are in-memory.)
(b) Pass the IQueryable to a view whose model is IEnumerable. (Valid. IQueryable extends IEnumerable.)
(c) Iterate over and access the data set's records, child entities and properties from the view. (May cause exceptions!)
Possible Issues
(1) The IEnumerable attempts lazy loading and your data context is expired. Exception thrown because provider is no longer available.
(2) Entity Framework entity proxies are enabled (the default), and you attempt to access a related (virtual) object with an expired data context. Same as (1).
(3) Multiple Active Result Sets (MARS). If you are iterating over the IEnumerable in a foreach( var record in resultSet ) block and simultaneously attempt to access record.childEntity.childProperty, you may end up with MARS due to lazy loading of both the data set and the relational entity. This will cause an exception if it is not enabled in your connection string.
Solution
I have found that enabling MARS in the connection string works unreliably. I suggest you avoid MARS unless it is well-understood and explicitly desired.
Execute the query and store results by invoking resultList = resultSet.ToList() This seems to be the most straightforward way of ensuring your entities are in-memory.
In cases where the you are accessing related entities, you may still require a data context. Either that, or you can disable entity proxies and explicitly Include related entities from your DbSet.
I recently ran into an issue with IEnumerable v. IQueryable. The algorithm being used first performed an IQueryable query to obtain a set of results. These were then passed to a foreach loop, with the items instantiated as an Entity Framework (EF) class. This EF class was then used in the from clause of a Linq to Entity query, causing the result to be IEnumerable.
I'm fairly new to EF and Linq for Entities, so it took a while to figure out what the bottleneck was. Using MiniProfiling, I found the query and then converted all of the individual operations to a single IQueryable Linq for Entities query. The IEnumerable took 15 seconds and the IQueryable took 0.5 seconds to execute. There were three tables involved and, after reading this, I believe that the IEnumerable query was actually forming a three table cross-product and filtering the results.
Try to use IQueryables as a rule-of-thumb and profile your work to make your changes measurable.
We can use both for the same way, and they are only different in the performance.
IQueryable only executes against the database in an efficient way. It means that it creates an entire select query and only gets the related records.
For example, we want to take the top 10 customers whose name start with ‘Nimal’. In this case the select query will be generated as select top 10 * from Customer where name like ‘Nimal%’.
But if we used IEnumerable, the query would be like select * from Customer where name like ‘Nimal%’ and the top ten will be filtered at the C# coding level (it gets all the customer records from the database and passes them into C#).
In addition to first 2 really good answers (by driis & by Jacob) :
IEnumerable
interface is in the System.Collections namespace.
The IEnumerable object represents a set of data in memory and can move on this data only forward. The query represented by the IEnumerable object is executed immediately and completely, so the application receives data quickly.
When the query is executed, IEnumerable loads all the data, and if we need to filter it, the filtering itself is done on the client side.
IQueryable interface is located in the System.Linq namespace.
The IQueryable object provides remote access to the database and allows you to navigate through the data either in a direct order from beginning to end, or in the reverse order. In the process of creating a query, the returned object is IQueryable, the query is optimized. As a result, less memory is consumed during its execution, less network bandwidth, but at the same time it can be processed slightly more slowly than a query that returns an IEnumerable object.
What to choose?
If you need the entire set of returned data, then it's better to use IEnumerable, which provides the maximum speed.
If you DO NOT need the entire set of returned data, but only some filtered data, then it's better to use IQueryable.
In addition to the above, it's interesting to note that you can get exceptions if you use IQueryable instead of IEnumerable:
The following works fine if products is an IEnumerable:
products.Skip(-4);
However if products is an IQueryable and it's trying to access records from a DB table, then you'll get this error:
The offset specified in a OFFSET clause may not be negative.
This is because the following query was constructed:
SELECT [p].[ProductId]
FROM [Products] AS [p]
ORDER BY (SELECT 1)
OFFSET #__p_0 ROWS
and OFFSET can't have a negative value.

Does Entity Framework Selects all rows in memory when applying a Where filter?

There is one thing makes me confused. I think EF selects all rows (all records) in table.
Let me show you a sample.
public Category GetByID(int Id)
{
return context.Categories.Find(Id);
}
there are a lot of records in table and when i check them with break point i can see all the records not only the Id numbered one. What if there are 10k records in table? I test this. I copied all record manually in database and i make 30k records.
An expression like this,
IEnumerable<Category> categories = categoryRepository
.Where(x => x.Published == true)
.ToList();
I saw 30k records with break point. But at least 10k Published False in table.
Is Entity framework firstly fetches all of the records to memory and after filters them?
TL;DR
It's likely your categoryRepository has broken EF's IQueryable<> expression tree, and is materializing the entire Category table in order to run the .Where predicate. See the examples below.
More Detail
The short answer is no, provided that Entity Framework is able to parse the IQueryable<> expression (which includes the .Where predicates you specify) it will convert the associated expression tree into native Sql using the appropriate query provider for the RDBMS you are targetting, thus allowing all the benefits of Sql, e.g. use of indexes.
As per my comment, one of the common reasons why EF would not do this is if the IQueryable mechanism has been tampered with, for instance, if your Repository pattern implementation uses the IEnumerable<T> overload of the Where predicate and not the IQueryable overload.
As a result, EF has no other option but to fetch the table and execute every row against your predicate function to determine whether the row matches your predicate or not.
As an aside, there is some debate whether there is merit in wrapping a DbContext in a Repository and / or Unit Of Work wrapper, as a DbContext is Transactional, performs caching, can be mocked during unit testing, and now supports a wide range of databases.
Examples of where materialization happens and how this affects performance
(The point at which the Sql query is actually executed is often referred to as materialization)
I've excluded the OP's repository - i.e. we're using the DbContext directly.
Best:
var miniFoos = myDbContext.MyFooSet
.Where(f => f.SomeProperty = "foo")
.Select(f => new {...})
.ToList();
This is good, because, we've applied both the predicate and a projection of just the columns we need in SQL, before we materialize the data into a List (of an anonymous type)
OK:
var foos = myDbContext.MyFooSet
.Where(f => f.SomeProperty = "foo")
.ToList() // Or .AsEnumerable(), or other materialization methods
.Select(f => new {...}); // Subset of fields
This isn't ideal, because although we've applied the .Where clause before we materialize, we're returning the full columns in of Foo table, just to discard unnecessary columns. This means unnecessary I/O, and also, Sql might have been able to use just an index to perform the query.
Bad - Never do this
var foos = myDbContext.MyFooSet
.AsEnumerable() // (or `ToList()`, same problem)
.Where(f => f.SomeProperty = "foo")
.Select(f => new {...}); // Subset of fields
This seems to be what the OP is experiencing - since the table is materialized BEFORE any .Where filtering takes place, the whole table IS retrieved to memory and filtering happens with Linq to Objects, instead.
This problem can also happen when applying custom .Where predicate builders which don't use Expressions, or which use IEnumerable<T> instead of IQueryable<T> - IEnumerable has no associated expression tree and can't be parsed to Sql.

How to Parameterize Linq Query or Lambda Query in Entity Framework?

I write some Parameterize Lambda Queries
//Method 1:
Func<SalesOrderLineEntity, bool> func01 = (o => o.SOLNumber == "123456");
var q01 = _context.SalesOrderLineEntities.Where(func01).ToList();
//Got the result, but SQLServer Read All Records to memory before "where"
//Method 2:
Expression<Func<SalesOrderLineEntity, bool>> exp02 = (o => o.SOLNumber == "123456");
var q02 = _context.SalesOrderLineEntities.Where(exp02).ToList();
//Got the result,Exec "Where" in SQLServer
//Method 3:
Expression<Func<SalesOrderLineEntity, bool>> exp03 = (o => func01(o));
var q03 = _context.SalesOrderLineEntities.Where(exp03.Compile()).ToList();
//Same to Method 1,because Compile() result is Func<SalesOrderLineEntity, bool>
//Method 4:
var q04 = _context.SalesOrderLineEntities.Where(exp03).ToList();
//Error:The LINQ expression node type 'Invoke' is not supported in LINQ to Entities
Method 1 and 3:Efficiency is very low
Method 4:Error
Method 2:Need I Build a Expression through the Lambda. I feel it is very difficult, because i will use many "if,else".it easier to create a function.
What is the correct way to do that?
Variations
Method 1: EF reads all the records from the DB because you pass a Func into the Where clause, which is not the right candidate: EF cannot extract the needed information from it to build the query, it can only use that function on an in-memory collection.
Method 2: this is the correct way to do EF queries because EF builds up the actual query based on the Expression tree. It may look like the same as Method 1 when you write .Where but this is different.
IQueryable extension methods are using Expression trees, so you can (or EF can) evaluate that information at runtime.
Method 3: this is essentially the same as Method 1, because you compile the expression. This is a key difference while you use them: an expression contains the informations to build the actual operation but that's not the operation itself. You need to compile it before (or for example you can build SQL queries based on them, that's how EF works).
Method 4: EF cannot translate your func01() call to any SQL function. It cannot translate any kind of code because it needs an equivalent SQL operation. You can try to use a general method, you will get the same result, it's not about the Func.
What happens here?
If we simplify the underlying process then the answer above might by more clear.
//Method 2:
Expression<Func<SalesOrderLineEntity, bool>> exp02 = (o => o.SOLNumber == "123456");
var q02 = _context.SalesOrderLineEntities.Where(exp02).ToList();
//Got the result,Exec "Where" in SQLServer
EF can read the following (via the expressions):
the user want to filter with Where
here is an expression for that, let's get some information
well, it needs SalesOrderLineEntity and I have a mapping for that type
the expression tells that the property SOLNumber must be equal to "123456"
ok, I have a mapping for SOLNumber so it's good
and I can translate the equal operator to an equivalent SQL operator
everything okay, so we can build the SQL query
Of course, you cannot do this with a Func for example because that object doesn't contain these informations.
Not sure if this is applicable but have you look at compiled queries: Compiled Queries (LINQ to Entities) which should result in a more efficient SQL statement

Rules for LINQ to SQL across method boundaries

To keep my code cleaner I often try to break down parts of my data access code in LINQ to SQL into private sub-methods, just like with plain-old business logic code. Let me give a very simplistic example:
public IEnumerable<Item> GetItemsFromRepository()
{
var setA = from a in this.dataContext.TableA
where /* criteria */
select a.Prop;
return DoSubquery(setA);
}
private IEnumerable<Item> DoSubQuery(IEnumerable<DateTimeOffset> set)
{
return from item in set
where /* criteria */
select item;
}
I'm sure no one's imagination would be stretched by imagining more complex examples with deeper nesting or using results of sets to filter other queries.
My basic question is this: I've seen some significant performance differences and even exceptions being thrown by just simply reorganizing LINQ to SQL code in private methods. Can anyone explain the rules for these behaviors so that I can make informed decisions about how to write efficient, clean data access code?
Some questions I've had:
1) When does passage of System.Linq.Table instace to a method cause query execution?
2) When does using a System.Linq.Table in another query cause execution?
3) Are there limits to what types of operations (Take, First, Last, order by, etc.) can be applied to System.Linq.Table passed a parameters into a method?
The most important rule in terms of LINQ-to-SQL would be: don't return IEnumerable<T> unless you must - as the semantic is unclear. There are two schools of thought beyond that:
if you return IQueryable<T>, it is composable, meaning the where from later queries is combined to make a single TSQL, but as a down-side, it is hard to fully test
otherwise, return List<T> or similar, so it is clear that everything beyond that point is LINQ-to-Objects
Currently, you are doing something in the middle: collapsing it to LINQ-to-Objects (via IEnumerable<T>), but without it being obvious - and keeping the connection open in the middle (again, only a problem because it isn't obvious)
Remove the implicit cast:
public IQueryable<Item> GetItemsFromRepository()
{
var setA = from a in this.dataContext.TableA
where /* criteria */
select a.Prop;
return DoSubquery(setA);
}
private IQueryable<Item> DoSubQuery(IQueryable<DateTimeOffset> set)
{
return from item in set
where /* criteria */
select item;
}
The implicit cast from IQueryable<Item> to IEnumerable<Item> is essentially the same as calling AsEnumerable() on your IQueryable<Item>. There are of course times when you want that, but you should leave things as IQueryable by default, so that the entire query can be performed on the database, rather than merely the GetItemsFromRepository() bit with the rest being done in memory.
The secondary questions:
1) When does passage of System.Linq.Table instace to a method cause query execution?
When something needs a final result, such as Max(), ToList(), etc. that is neither a queryable object, nor a loaded-as-it-goes enumerable.
Note though, that while AsEnumerable() does not cause query execution, it does mean that when execution does happen only that before the AsEnumerable() will be performed against the source datasource, this will then produce an on-demand in-memory datasource against which the rest will be performed.
2) When does using a System.Linq.Table in another query cause
execution?
The same as above. Table<T> implements IQueryable<T>. If you e.g. join two of them together, that won't yet cause anything to be executed.
3) Are there limits to what types of operations (Take,
First, Last, order by, etc.) can be applied to System.Linq.Table
passed a parameters into a method?
Those that are definted by IQueryable<T>.
Edit: Some clarification on the differences and similarities between IEnumerable and IQueryable.
Just about anything you can do on an IQueryable you can do on an IEnumerable and vice-versa, but how it's performed will be different.
Any given IQueryable implementation can be used in linq queries and will have all the linqy extension methods like Take(), Select(), GroupBy and so on.
Just how this is done, depends on the implementation. For example, System.Linq.Data.Table implements those methods by the query being turned into an SQL query, the results of which are turned into a objects on a as-loaded basis. So if mySource is a table then:
var filtered = from item in mySource
where item.ID < 23
select new{item.ID, item.Name};
foreach(var i in filtered)
Console.WriteLine(i.Name);
Gets turned into SQL like:
select id, name from mySourceTable where id < 23
And then an enumerator is created from that such that on each call to MoveNext() another row is read from the results, and a new anonymous object created from it.
On the other hand, if mySource where a List or a HashSet, or anything else that implements IEnumerable<T> but doesn't have its own query engine, then the linq-to-objects code will turn it into something like:
foreach(var item in mySource)
if(item.ID < 23)
yield return new {item.ID, item.Name};
Which is about as efficiently as that code could be done in memory. The results will be the same, but the way to get them, would be different:
Now, since all IQueryable<T> can be converted into the equivalent IEnumerable<T> we could, if we wanted to, take the first mySource (where execution happens in a database) and do the following instead:
var filtered = from item in mySource.AsEnumerable()
where item.ID < 23
select new{item.ID, item.Name};
Here, while there is still nothing executed against the database until we iterate through the results or call something that examines all of those results, once we do so, it's as if we split the execution into two separate steps:
var asEnum = mySource.AsEnumerable();
var filtered = from item in asEnum
where item.ID < 23
select new{item.ID, item.Name};
The implemenatation of the first line would be to execute the SQL SELECT * FROM mySourceTable, and the execution of the rest would be like the linq-to-objects example above.
It's not hard to see how, if the database contained 10 items with an id < 23, and 50,000 items with an id higher, this is now much, much less performant.
As well as offering the explicity AsEnumerable() method, all IQueryable<T> can be implicitly cast to IEnumerable<T>. This lets us do foreach on them and use them with any other existing code that handles IEnumerable<T>, but if we accidentally do it at in inappropriate time, we can make queries much slower, and this is what was happening when your DoSubQuery was defined to take an IEnumerable<DateTimeOffset> and return an IEnumerable<Item>; it implicitly called AsEnumerable() on your IQueryable<DateTimeOffset> and your IQueryable<Item> and caused what could have been performed on the database to be performed in-memory.
For this reason, 99% of the time, we want to keep dealing in IQueryable until the very last moment.
As an example of the opposite though, just to point out that AsEnumerable() and the casts to IEnumerable<T> aren't there out of madness, we should consider two things. The first is that IEnumerable<T> lets us do things that can't be done otherwise, such as joining two completely different sources that don't know about each other (e.g. two different databases, a database and an XML file, etc.)
Another is that sometimes IEnumerable<T> is actually more efficient too. Consider:
IQueryable<IGrouping<string, int>> groupingQuery = from item in mySource select item.ID group by item.Name;
var list1 = groupingQuery.Select(grp => new {Name=grp.Key, Count=grp.Count()}).ToList();//fine
foreach(var grp in groupingQuery)//disaster!
Console.WriteLine(grp.Count());
Here groupingQuery is set up as a queryable that does some grouping, but which hasn't been executed in anyway. When we create list1, then first we create a new IQueryable based on that, and the query engine does it's best to work out what the best SQL for it is, and comes up with something like:
select name, count(id) from mySourceTable group by name
Which is pretty efficiently performed. Then the rows are turned into objects, which are then put into a list.
On the other hand, with the second query, there isn't as natural a SQL conversion for a group by that doesn't perform aggregate methods on all of the non-grouped items, so the best the query engine can come up with is to first do:
select distinct name from mySourceTable,
And then for every name it receives, to do:
select id from mySourceTable where name = '{name found in last query goes here}'
And so on, should this mean 2 SQL queries, or 200,000.
In this case, we're much better working on mySource.AsEnumerable() because here it is more efficient to grab the whole table into memory first. (Even better still would be to work on mySource.Select(item => new {item.ID, item.Name}).AsEnumerable() because then we still only retrieve the columns we care about from the database, and switch to in-memory at that point).
The last bit is worth remembering because it breaks our rule that we should stay with IQueryable<T> as long as possible. It isn't something to worry about much, but it is worth keeping an eye on if you do grouping and find yourself with a very slow query.

Why does EntityFramework's LINQ parser handle an externally defined predicate differently?

I'm using Microsoft's Entity Framework as an ORM and am wondering how to solve the following problem.
I want to get a number of Product objects from the Products collection where the Product.StartDate is greater than today. (This is a simplified version of the whole problem.)
I currently use:
var query = dbContext.Products.Where(p => p.StartDate > DateTime.Now);
When this is executed, after using ToList() for example on query, it works and the SQL created is effectively:
SELECT * FROM Product WHERE StartDate > (GetDate());
However, I want to move the predicate to a function for better maintainability, so I tried this:
private Func<Product, bool> GetFilter()
{
Func<Product, bool> filter = p => p.StartDate > DateTime.Now;
return filter;
}
var query = dbContext.Products.Where(GetFilter());
This also works from a code point of view insofar as it returns the same Product set but this time the SQL created is analogous to:
SELECT * FROM Product;
The filter is moved from the SQL Server to the client making it much less efficient.
So my questions are:
Why is this happening, why does the LINQ parser treat these two formats so differently?
What can I do to take advantage of having the filter separate but having it executed on the server?
You need to use an Expression<Func<Product, bool>> in order for it to work like you intend. A plain Func<Product, bool> tells LINQ that you want it to run the Where in MSIL in your program, not in SQL. That's why the SQL is pulling in the whole table, then your .NET code is running the predicate on the entire table.
You are returning a Func, but to inject the predicate into the SQL, LINQ requires an expression tree. It should work if you change the return type of your method (and of your local variable, of course) to Expression<Func<Product, bool>>.
Since in second case filter func maybe arbitrary LINQ to EF can't parse you filter to SQL and has to resolve it on client side.

Categories

Resources