Linq query timing out, how to streamline query

Linq query timing out, how to streamline query - c#

Our front end UI has a filtering system that, in the back end, operates over millions of rows. It uses a an IQueryable that is built up over the course of the logic, then executed all at once. Each individual UI component is ANDed together (for example, Dropdown1 and Dropdown2 will only return rows that have both of what is selected in common). This is not a problem. However, Dropdown3 has has two types of data in it, and the checked items need to be ORd together, then ANDed with the rest of the query.
Due to the large amount of rows it is operating over, it keeps timing out. Since there are some additional joins that need to happen, it is somewhat tricky. Here is my code, with the table names replaced:
//The end list has driver ids in it--but the data comes from two different places. Build a list of all the driver ids.
driverIds = db.CarDriversManyToManyTable.Where(
cd =>
filter.CarIds.Contains(cd.CarId) && //get driver IDs for each car ID listed in filter object
).Select(cd => cd.DriverId).Distinct().ToList();
driverIds = driverIds.Concat(
db.DriverShopManyToManyTable.Where(ds => filter.ShopIds.Contains(ds.ShopId)) //Get driver IDs for each Shop listed in filter object
.Select(ds => ds.DriverId)
.Distinct()).Distinct().ToList();
//Now we have a list solely of driver IDs
//The query operates over the Driver table. The query is built up like this for each item in the UI. Changing from Linq is not an option.
query = query.Where(d => driverIds.Contains(d.Id));
How can I streamline this query so that I don't have to retrieve thousands and thousands of IDs into memory, then feed them back into SQL?

There are several ways to produce a single SQL query. All they require to keep the parts of the query of type IQueryable<T>, i.e. do not use ToList, ToArray, AsEnumerable etc. methods that force them to be executed and evaluated in memory.
One way is to create Union query containing the filtered Ids (which will be unique by definition) and use join operator to apply it on the main query:
var driverIdFilter1 = db.CarDriversManyToManyTable
.Where(cd => filter.CarIds.Contains(cd.CarId))
.Select(cd => cd.DriverId);
var driverIdFilter2 = db.DriverShopManyToManyTable
.Where(ds => filter.ShopIds.Contains(ds.ShopId))
.Select(ds => ds.DriverId);
var driverIdFilter = driverIdFilter1.Union(driverIdFilter2);
query = query.Join(driverIdFilter, d => d.Id, id => id, (d, id) => d);
Another way could be using two OR-ed Any based conditions, which would translate to EXISTS(...) OR EXISTS(...) SQL query filter:
query = query.Where(d =>
db.CarDriversManyToManyTable.Any(cd => d.Id == cd.DriverId && filter.CarIds.Contains(cd.CarId))
||
db.DriverShopManyToManyTable.Any(ds => d.Id == ds.DriverId && filter.ShopIds.Contains(ds.ShopId))
);
You could try and see which one performs better.

The answer to this question is complex and has many facets that, individually, may or may not help in your particular case.
First of all, consider using pagination. .Skip(PageNum * PageSize).Take(PageSize) I doubt your user needs to see millions of rows at once in the front end. Show them only 100, or whatever other smaller number seems reasonable to you.
You've mentioned that you need to use joins to get the data you need. These joins can be done while forming your IQueryable (entity framework), rather than in-memory (linq to objects). Read up on join syntax in linq.
HOWEVER - performing explicit joins in LINQ is not the best practice, especially if you are designing the database yourself. If you are doing database first generation of your entities, consider placing foreign-key constraints on your tables. This will allow database-first entity generation to pick those up and provide you with Navigation Properties which will greatly simplify your code.
If you do not have any control or influence over the database design, however, then I recommend you construct your query in SQL first to see how it performs. Optimize it there until you get the desired performance, and then translate it into an entity framework linq query that uses explicit joins as a last resort.
To speed such queries up, you will likely need to perform indexing on all of the "key" columns that you are joining on. The best way to figure out what indexes you need to improve performance, take the SQL query generated by your EF linq and bring it on over to SQL Server Management Studio. From there, update the generated SQL to provide some predefined values for your #p parameters just to make an example. Once you've done this, right click on the query and either use display estimated execution plan or include actual execution plan. If indexing can improve your query performance, there is a pretty good chance that this feature will tell you about it and even provide you with scripts to create the indexes you need.

It looks to me that using the instance versions of the LINQ extensions is creating several collections before you're done. using the from statement versions should cut that down quite a bit:
driveIds = (from var record in db.CarDriversManyToManyTable
where filter.CarIds.Contains(record.CarId)
select record.DriverId).Concat
(from var record in db.DriverShopManyToManyTable
where filter.ShopIds.Contains(record.ShopId)
select record.DriverId).Distinct()
Also using the groupby extension would give better performance than querying each driver Id.

Related

LINQ nested groups performance

i have a full outer join query pulling data from an sql compact database (i use EF6 for mapping):
var query =
from entry in left.Union(right).AsEnumerable()
select new
{
...
} into e
group e by e.Date.Year into year
select new
{
Year = year.Key,
Quartals = from x in year
group x by (x.Date.Month - 1) / 3 + 1 into quartal
select new
{
Quartal = quartal.Key,
Months = from x in quartal
group x by x.Date.Month into month
select new
{
Month = month.Key,
Contracts = from x in month
group x by x.Contract.extNo into contract
select new
{
ExtNo = month.Key,
Entries = contract,
}
}
}
};
as you can see i use nested groups to structure results.
the interesting thing is, if i remove AsEnumerable() call, the query takes 3.5x more time to execute: ~210ms vs ~60ms. And when it runs for the first time the difference is much greater: 39000(!)ms vs 1300ms.
My questions are:
What am i doing wrong, maybe those groupings should be done in a different way?
Why the first execution takes so much time? I know expression trees should be built etc, but 39 seconds?
Why linq to db is slower than linq to entities in my case? Is it generally slower and its better to load data from db if possible before processing?
thakns!

To answer your three questions:
Maybe those groupings should be done in a different way?
No. If you want nested groupings you can only do that by groupings within groupings.
You can group by multiple fields at once:
from entry in left.Union(right)
select new
{
...
} into e
group e by new
{
e.Date.Year,
Quartal = (e.Date.Month - 1) / 3 + 1,
e.Date.Month,
contract = e.Contract.extNo
} into grp
select new
{
Year = grp.Key,
Quartal = grp.Key,
Month = grp.Key,
Contracts = from x in grp
select new
{
ExtNo = month.Key,
Entries = contract,
}
}
This will remove a lot of complexity from the generated query so it's likely to be (much) faster without AsEnumerable(). But the result is quite different: a flat group (Year, Quartal, etc, in one row), not a nested grouping.
Why the first execution takes so much time?
Because the generated SQL query is probably pretty complex and the database engine's query optimizer can't find a fast execution path.
3a. Why is linq to db slower than linq to entities in my case?
Because, apparently, in this case it's much more efficient to fetch the data into memory first and do the groupings by LINQ-to-objects. This effect will be more significant if left and right represent more or less complex queries themselves. In that case, the generated SQL can get hugely bloated, because it has to process two sources of complexity in one statement, which may lead to many repeated identical sub queries. By outsourcing the grouping, the database is probably left with a relative simple query and of course the grouping in memory is never affected by the complexity of the SQL query.
3b. Is it generally slower and its better to load data from db if possible before processing?
No, not generally. I'd even say, hardly ever. In this case it is because (as I can see) you don't filter data. If however the part before AsEnumerable() would return millions of records and you would apply filtering afterwards, the query without AsEnumerable() would probably be much faster, because the filtering is done in the database.
Therefore, you should always keep an eye on generated SQL. It's unrealistic to expect that EF will always generate a super optimized SQL statement. It hardly ever will. Its primary focus is on correctness (and it does an exceptional job there), performance is secondary. It's the developer's job to make LINQ-to-Entities and LINQ-to-object work together as a slick team.

Using AsEnumerable() will convert a type that implements IEnumerable<T> to IEnumerable<T> itself.
Read this topic https://msdn.microsoft.com/en-us/library/bb335435.aspx
AsEnumerable<TSource>(IEnumerable<TSource>) can be used to choose between query implementations when a sequence implements IEnumerable<T> but also has a different set of public query methods available. For example, given a generic class Table that implements IEnumerable<T> and has its own methods such as Where, Select, and SelectMany, a call to Where would invoke the public Where method of Table. A Table type that represents a database table could have a Where method that takes the predicate argument as an expression tree and converts the tree to SQL for remote execution. If remote execution is not desired, for example because the predicate invokes a local method, the AsEnumerable<TSource> method can be used to hide the custom methods and instead make the standard query operators available.
When you invoke AsEnumerable() first, it won't convert LINQ-to-SQL but will instead load the table in memory as the Where is enumerating it. Since now it is loaded in memory, it's execution is faster.

Can I easily evaluate many IQueryables in a single database call using Entity Framework?

Suppose I have a collection (of arbitrary size) of IQueryable<MyEntity> (all for the same MyEntity type). Each individual query has successfully been dynamically built to encapsulate various pieces of business logic into a form that can be evaluated in a single database trip. Is there any way I can now have all these IQueryables executed in a single round-trip to the database?
For example (simplified; my actual queries are more complex!), if I had
ObjectContext context = ...;
var myQueries = new[] {
context.Widgets.Where(w => w.Price > 500),
context.Widgets.Where(w => w.Colour == 5),
context.Widgets.Where(w => w.Supplier.Name.StartsWith("Foo"))
};
I would like to have EF perform the translation of each query (which it can do indivudually), then in one database visit, execute
SELECT * FROM Widget WHERE Price > 500
SELECT * FROM Widget WHERE Colour = 5
SELECT W.* FROM Widget
INNER JOIN SUpplier ON Widget.SupplierId = Supplier.Id
WHERE Supplier.Name LIKE 'Foo%'
then convert each result set into an IEnumerable<Widget>, updating the ObjectContext in the usual way.
I've seen various posts about dealing with multiple result sets from a stored procedure, but this is slightly different (not least because I don't know at compile time how many results sets there are going to be). Is there an easy way, or do I have to use something along the lines of Does the Entity Framework support the ability to have a single stored procedure that returns multiple result sets??

No. EF deosn't have query batching (future queries). One queryable is one database roundtrip. As a workaround you can try to play with it and for example use:
string sql = ((ObjectQuery<Widget>)context.Widgets.Where(...)).ToTraceString();
to get SQL of the query and build your own custom command from all SQLs to be executed. After that you can use similar approach as with stored procedures to translate results.
Unless you really need to have each query executed separately you can also union them to single query:
context.Widgets.Where(...).Union(context.Widgets.Where(...));
This will result in UNION. If you need just UNION ALL you can use Concat method instead.

It might be late answer, hopefully it would help some one else with the same issue.
There is Entity Framework Extended Library on NuGet which provides the future queries feature (among others). I played a bit with it and it looks promising.
You can find more information here.

Reusing LINQ query results in another LINQ query without re-querying the database

I have a situation where my application constructs a dynamic LINQ query using PredicateBuilder based on user-specified filter criteria (aside: check out this link for the best EF PredicateBuilder implementation). The problem is that this query usually takes a long time to run and I need the results of this query to perform other queries (i.e., joining the results with other tables). If I were writing T-SQL, I'd put the results of the first query into a temporary table or a table variable and then write my other queries around that. I thought of getting a list of IDs (e.g., List<Int32> query1IDs) from the first query and then doing something like this:
var query2 = DbContext.TableName.Where(x => query1IDs.Contains(x.ID))
This will work in theory; however, the number of IDs in query1IDs can be in the hundreds or thousands (and the LINQ expression x => query1IDs.Contains(x.ID) gets translated into a T-SQL "IN" statement, which is bad for obvious reasons) and the number of rows in TableName is in the millions. Does anyone have any suggestions as to the best way to deal with this kind of situation?
Edit 1: Additional clarification as to what I'm doing.
Okay, I'm constructing my first query (query1) which just contains the IDs that I'm interested in. Basically, I'm going to use query1 to "filter" other tables. Note: I am not using a ToList() at the end of the LINQ statement---the query is not executed at this time and no results are sent to the client:
var query1 = DbContext.TableName1.Where(ComplexFilterLogic).Select(x => x.ID)
Then I take query1 and use it to filter another table (TableName2). I now put ToList() at the end of this statement because I want to execute it and bring the results to the client:
var query2 = (from a in DbContext.TableName2 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
Then I take query1 and re-use it to filter yet another table (TableName3), execute it and bring the results to the client:
var query3 = (from a in DbContext.TableName3 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
I can keep doing this for as many queries as I like:
var queryN = (from a in DbContext.TableNameN join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();
The Problem: query1 is takes a long time to execute. When I execute query2, query3...queryN, query1 is being executed (N-1) times...this is not a very efficient way of doing things (especially since query1 isn't changing). As I said before, if I were writing T-SQL, I would put the result of query1 into a temporary table and then use that table in the subsequent queries.
Edit 2:
I'm going to give the credit for answering this question to Albin Sunnanbo for his comment:
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).
I think that's really the best that one can do with Entity Framework. In the end, if the performance gets really bad, I'll probably go with John Wooley's suggestion:
This may be a situation where dropping to native ADO against a stored proc returning multiple results and using an internal temp table might be your best option for this operation. Use EF for the other 90% of your app.
Thanks to everyone who commented on this post...I appreciate everyone's input!

If the size of TableName is not too big to load the whole table you use
var tableNameById = DbContext.TableName.ToDictionary(x => x.ID);
to fetch the whole table and automatically put it in a local Dictionary with ID as key.
Another way is to just "force" the LINQ evaluation with .ToList(), in the case fetch the whole table and do the Where part locally with Linq2Objects.
var query1Lookup = new Hashset<int>(query1IDs);
var query2 = DbContext.TableName.ToList().Where(x => query1IDs.Contains(x.ID));
Edit:
Storing a list of ID:s from one query in a list and use that list as filter in another query can usually be rewritten as a join.
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).

Since you are running a subsequent query off the results, take your first query and use it as a View on your SQL Server, add the view to your context, and build your LINQ queries against the view.

Have you considered composing your query as per this article (using the decorator design pattern):
Composed LINQ Queries using the Decorator Pattern
The premise is that, instead of enumerating your first (very constly) query, you basically use the decorator pattern to produce a chain of IQueryable that is a result of query 1 and query N. This way you always execute the filtered form of the query.
Hope this might help

Entity Framework include vs where

My database structure is this: an OptiUser belongs to multiple UserGroups through the IdentityMap table, which is a matching table (many to many) with some additional properties attached to it. Each UserGroup has multiple OptiDashboards.
I have a GUID string which identifies a particular user (wlid in this code). I want to get an IEnumerable of all of the OptiDashboards for the user identified by wlid.
Which of these two Linq-to-Entities queries is the most efficient? Do they run the same way on the back-end?
Also, can I shorten option 2's Include statements to just .Include("IdentityMaps.UserGroup.OptiDashboards")?
using (OptiEntities db = new OptiEntities())
{
// option 1
IEnumerable<OptiDashboard> dashboards = db.OptiDashboards
.Where(d => d.UserGroups
.Any(u => u.IdentityMaps
.Any(i => i.OptiUser.WinLiveIDToken == wlid)));
// option 2
OptiUser user = db.OptiUsers
.Include("IdentityMaps")
.Include("IdentityMaps.UserGroup")
.Include("IdentityMaps.UserGroup.OptiDashboards")
.Where(r => r.WinLiveIDToken == wlid).FirstOrDefault();
// then I would get the dashboards through user.IdentityMaps.UserGroup.OptiDashboards
// (through foreach loops...)
}

You may be misunderstanding what the Include function actually does. Option 1 is purely a query syntax which has no effect on what is returned by the entity framework. Option 2, with the Include function instructs the entity framework to Eagerly Fetch the related rows from the database when returns the results of the query.
So option 1 will result in some joins, but the "select" part of the query will be restricted to the OptiDashboards table.
Option 2 will result in joins as well, but in this case it will be returning the results from all the included tables, which obviously is going to introduce more of a performance hit. But at the same time, the results will include all the related entities you need, avoiding the [possible] need for more round-trips to the database.

I think the Include will render as joins an you will the able to access the data from those tables in you user object (Eager Loading the properties).
The Any query will render as exists and not load the user object with info from the other tables.
For best performance if you don't need the additional info use the Any query

As has already been pointed out, the first option would almost certainly perform better, simply because it would be retrieving less information. Besides that, I wanted to point out that you could also write the query this way:
var dashboards =
from u in db.OptiUsers where u.WinLiveIDToken == wlid
from im in u.IdentityMaps
from d in im.UserGroup.OptiDashboards
select d;
I would expect the above to perform similarly to the first option, but you may (or may not) prefer the above form.

Data Loading Strategy/Syntax in EF4

Long time lurker, first time posting, and newly learning EF4 and MVC3.
I need help making sure I'm using the correct data loading strategy in this case as well as some help finalizing some details of the query. I'm currently using the eager loading approach outlined here for somewhat of a "dashboard" view that requires a small amount of data from about 10 tables (all have FK relationships).
var query = from l in db.Leagues
.Include("Sport")
.Include("LeagueContacts")
.Include("LeagueContacts.User")
.Include("LeagueContacts.User.UserContactDatas")
.Include("LeagueEvents")
.Include("LeagueEvents.Event")
.Include("Seasons")
.Include("Seasons.Divisions")
.Include("Seasons.Divisions.Teams")
.Where(l => l.URLPart.Equals(leagueName))
select (l);
model = (Models.League) query.First();
However, I need to do some additional filtering, sorting, and shaping of the data that I haven't been able to work out. Here are my chief needs/concerns from this point:
Several child objects still need additional filtering but I haven't been able to figure out the syntax or best approach yet. Example: "TOP 3 LeagueEvents.Event WHERE StartDate >= getdate() ORDER BY LeagueEvents.Event.StartDate"
I need to sort some of the fields. Examples: ORDERBY Seasons.StartDate, LeagueEvents.Event.StartDate, and LeagueContacts.User.SortOrder, etc.
I'm already very concerned about the overall size of the SQL generated by this query and the number of joins and am thinking that I may need a different data loading approach alltogether.(Explicit loading? Multiple QueryObjects? POCO?)
Any input, direction, or advice on how to resolve these remaining needs as well as ensuring the best performance is greatly appreciated.

Your concern about size of the query and size of the result set are tangible.
As #BrokenGlass mentioned EF doesn't allow you doing filtering or ordering on includes. If you want to order or filter relations you must use projection either to anonymous type or custom (non mapped) type:
var query = db.Leagues
.Where(l => l.URLPart.Equals(leagueName))
.Select(l => new
{
League = l,
Events = l.LeagueEvents.Where(...)
.OrderBy(...)
.Take(3)
.Select(e => e.Event)
...
});

Unfortunately EF doesn't allow to selectively load related entities using its navigation properties, it will always load all Foos if you specify Include("Foo").
You will have to do a join on each of the related entities using your Where() clauses as filters where they apply.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.