Entity framework the opposite of include - c#

We are working on a Project in which there are many linq queries are not optimized, because as they started on the project they used the property virtual for all of their models.
My task is to optimize the max number of the queries, in order to enhance the app performance.
The problem is if I use the Include function and delete all virtual properties from the model, lot of things stop working and the number of affected functions is huge.
So I thought if I can find some thing resemble to "exclude" to exclude the unnecessary sub queries in some cases.

(with the assumption of your result set implements ienumerable)
My first choice would be:
ListMain.Except(ItemsToExclude);
Or, I would go with (not) "Contains" as follows and have check in-between to exclude the records. This may not be the best way out there but I could work.
!ListMain.Contains(ItemsToExclude)

I don't know if I got the question right, but to avoid loading certain attributes or related objects, you can make an additional Select() including only what's needed.
Example:
A simple ToList() will bring the entire object from the table:
var resultList = await dbContext.ABTests.AsNoTracking().ToListAsync();
It will derives in the query:
SELECT [a].[Id], [a].[AssignedUsers], [a].[EndDate], [a].[Groups], [a].[Json], [a].[MaxUsers], [a].[Name], [a].[NextGroup], [a].[StartDate]
FROM [ABTests] AS [a]
(which includes all mapped fields of the ABTest object)
To avoid fetching all, you can do as follow:
var resultList = await dbContext.ABTests.AsNoTracking().Select(x => new ABTest
{
Id = x.Id,
Name = x.Name
}).ToListAsync();
(supposing that you only wan the fields Id and Name to be eagerly loaded)
The resulting SQL Query will be:
SELECT [a].[Id], [a].[Name]
FROM [ABTests] AS [a]

Related

LINQ Grouping a List of Objects into Anonymous Type

I am having difficulty trying to use LINQ to query a sql database in such a way to group all objects (b) in one table associated with an object (a) in another table into an anonymous type with both (a) and a list of (b)s. Essentially, I have a database with a table of offers, and another table with histories of actions taken related to those offers. What I'd like to be able to do is group them in such a way that I have a list of an anonymous type that contains every offer, and a list of every action taken on that offer, so the signature would be:
List<'a>
where 'a is new { Offer offer, List<OfferHistories> offerHistories}
Here is what I tried initially, which obviously will not work
var query = (from offer in context.Offers
join offerHistory in context.OffersHistories on offer.TransactionId equals offerHistory.TransactionId
group offerHistory by offerHistory.TransactionId into offerHistories
select { offer, offerHistories.ToList() }).ToList();
Normally I wouldn't come to SE with this little information but I have tried many different ways and am at a loss for how to proceed.
Please try to avoid .ToList() calls, only do if really necessary. I have an important question: Do you really need all columns of OffersHistories? Because it is very expensive grouping a full object, try only grouping the necessary columns instead. If you really need all offerHistories for one offer then I'm suggesting to write a sub select (this is also cost more performance):
var query = (from offer in context.Offers
select new { offer, offerHistories = (from offerHistory in context.OffersHistories
where offerHistory.TransactionId == offer.TransactionId
select offerHistory) });
P.s.: it's a good idea to create indexes for foreign key columns, columns that are used in where and group by statements, those are going to make the query faster,

Linq query timing out, how to streamline query

Our front end UI has a filtering system that, in the back end, operates over millions of rows. It uses a an IQueryable that is built up over the course of the logic, then executed all at once. Each individual UI component is ANDed together (for example, Dropdown1 and Dropdown2 will only return rows that have both of what is selected in common). This is not a problem. However, Dropdown3 has has two types of data in it, and the checked items need to be ORd together, then ANDed with the rest of the query.
Due to the large amount of rows it is operating over, it keeps timing out. Since there are some additional joins that need to happen, it is somewhat tricky. Here is my code, with the table names replaced:
//The end list has driver ids in it--but the data comes from two different places. Build a list of all the driver ids.
driverIds = db.CarDriversManyToManyTable.Where(
cd =>
filter.CarIds.Contains(cd.CarId) && //get driver IDs for each car ID listed in filter object
).Select(cd => cd.DriverId).Distinct().ToList();
driverIds = driverIds.Concat(
db.DriverShopManyToManyTable.Where(ds => filter.ShopIds.Contains(ds.ShopId)) //Get driver IDs for each Shop listed in filter object
.Select(ds => ds.DriverId)
.Distinct()).Distinct().ToList();
//Now we have a list solely of driver IDs
//The query operates over the Driver table. The query is built up like this for each item in the UI. Changing from Linq is not an option.
query = query.Where(d => driverIds.Contains(d.Id));
How can I streamline this query so that I don't have to retrieve thousands and thousands of IDs into memory, then feed them back into SQL?
There are several ways to produce a single SQL query. All they require to keep the parts of the query of type IQueryable<T>, i.e. do not use ToList, ToArray, AsEnumerable etc. methods that force them to be executed and evaluated in memory.
One way is to create Union query containing the filtered Ids (which will be unique by definition) and use join operator to apply it on the main query:
var driverIdFilter1 = db.CarDriversManyToManyTable
.Where(cd => filter.CarIds.Contains(cd.CarId))
.Select(cd => cd.DriverId);
var driverIdFilter2 = db.DriverShopManyToManyTable
.Where(ds => filter.ShopIds.Contains(ds.ShopId))
.Select(ds => ds.DriverId);
var driverIdFilter = driverIdFilter1.Union(driverIdFilter2);
query = query.Join(driverIdFilter, d => d.Id, id => id, (d, id) => d);
Another way could be using two OR-ed Any based conditions, which would translate to EXISTS(...) OR EXISTS(...) SQL query filter:
query = query.Where(d =>
db.CarDriversManyToManyTable.Any(cd => d.Id == cd.DriverId && filter.CarIds.Contains(cd.CarId))
||
db.DriverShopManyToManyTable.Any(ds => d.Id == ds.DriverId && filter.ShopIds.Contains(ds.ShopId))
);
You could try and see which one performs better.
The answer to this question is complex and has many facets that, individually, may or may not help in your particular case.
First of all, consider using pagination. .Skip(PageNum * PageSize).Take(PageSize) I doubt your user needs to see millions of rows at once in the front end. Show them only 100, or whatever other smaller number seems reasonable to you.
You've mentioned that you need to use joins to get the data you need. These joins can be done while forming your IQueryable (entity framework), rather than in-memory (linq to objects). Read up on join syntax in linq.
HOWEVER - performing explicit joins in LINQ is not the best practice, especially if you are designing the database yourself. If you are doing database first generation of your entities, consider placing foreign-key constraints on your tables. This will allow database-first entity generation to pick those up and provide you with Navigation Properties which will greatly simplify your code.
If you do not have any control or influence over the database design, however, then I recommend you construct your query in SQL first to see how it performs. Optimize it there until you get the desired performance, and then translate it into an entity framework linq query that uses explicit joins as a last resort.
To speed such queries up, you will likely need to perform indexing on all of the "key" columns that you are joining on. The best way to figure out what indexes you need to improve performance, take the SQL query generated by your EF linq and bring it on over to SQL Server Management Studio. From there, update the generated SQL to provide some predefined values for your #p parameters just to make an example. Once you've done this, right click on the query and either use display estimated execution plan or include actual execution plan. If indexing can improve your query performance, there is a pretty good chance that this feature will tell you about it and even provide you with scripts to create the indexes you need.
It looks to me that using the instance versions of the LINQ extensions is creating several collections before you're done. using the from statement versions should cut that down quite a bit:
driveIds = (from var record in db.CarDriversManyToManyTable
where filter.CarIds.Contains(record.CarId)
select record.DriverId).Concat
(from var record in db.DriverShopManyToManyTable
where filter.ShopIds.Contains(record.ShopId)
select record.DriverId).Distinct()
Also using the groupby extension would give better performance than querying each driver Id.

Entity Framework COUNT is doing a SELECT of all records

Profiling my code because it is taking a long time to execute, it is generating a SELECT instead of a COUNT and as there are 20,000 records it is very very slow.
This is the code:
var catViewModel= new CatViewModel();
var catContext = new CatEntities();
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
catViewModel.NumberOfCats = catAccount.Cats.Count();
It is straightforward stuff, but the code that the profiler is showing is:
exec sp_executesql N'SELECT
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy // You get the idea
FROM [dbo].[Cats] AS [Extent1]
WHERE Cats.[AccountId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=7
I've never seen this behaviour before, any ideas?
Edit: It is fixed if I simply do this instead:
catViewModel.NumberOfRecords = catContext.Cats.Where(c => c.AccountId == accountId).Count();
I'd still like to know why the former didn't work though.
So you have 2 completely separate queries going on here and I think I can explain why you get different results. Let's look at the first one
// pull a single account record
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
// count all the associated Cat records against said account
catViewModel.NumberOfCats = catAccount.Cats.Count();
Going on the assumption that Cats has a 0..* relationship with Account and assuming you are leveraging the frameworks ability to lazily load foreign tables then your first call to catAccounts.Cats is going to result in a SELECT for all the associated Cat records for that particular account. This results in the table being brought into memory therefore the call to Count() would result in an internal check of the Count property of the in-memory collection (hence no COUNT SQL generated).
The second query
catViewModel.NumberOfRecords =
catContext.Cats.Where(c => c.AccountId == accountId).Count();
Is directly against the Cats table (which would be IQueryable<T>) therefore the only operations performed against the table are Where/Count, and both of these will be evaluated on the DB-side before execution so it's obviously a lot more efficient than the first.
However, if you need both Account and Cats then I would recommend you eager load the data on the fetch, that way you take the hit upfront once
var catAccount = catContext.Account.Include(a => a.Cats).Single(...);
Most times, when somebody accesses a sub-collection of an entity, it is because there are a limited number of records, and it is acceptable to populate the collection. Thus, when you access:
catAccount.Cats
(regardless of what you do next), it is filling that collection. Your .Count() is then operating on the local in-memory collection. The problem is that you don't want that. Now you have two options:
check whether your provider offer some mechanism to make that a query rather than a collection
build the query dynamically
access the core data-model instead
I'm pretty confident that if you did:
catViewModel.NumberOfRecords =
catContext.Cats.Count(c => c.AccountId == accountId);
it will work just fine. Less convenient? Sure. But "works" is better than "convenient".

Entity Framework include vs where

My database structure is this: an OptiUser belongs to multiple UserGroups through the IdentityMap table, which is a matching table (many to many) with some additional properties attached to it. Each UserGroup has multiple OptiDashboards.
I have a GUID string which identifies a particular user (wlid in this code). I want to get an IEnumerable of all of the OptiDashboards for the user identified by wlid.
Which of these two Linq-to-Entities queries is the most efficient? Do they run the same way on the back-end?
Also, can I shorten option 2's Include statements to just .Include("IdentityMaps.UserGroup.OptiDashboards")?
using (OptiEntities db = new OptiEntities())
{
// option 1
IEnumerable<OptiDashboard> dashboards = db.OptiDashboards
.Where(d => d.UserGroups
.Any(u => u.IdentityMaps
.Any(i => i.OptiUser.WinLiveIDToken == wlid)));
// option 2
OptiUser user = db.OptiUsers
.Include("IdentityMaps")
.Include("IdentityMaps.UserGroup")
.Include("IdentityMaps.UserGroup.OptiDashboards")
.Where(r => r.WinLiveIDToken == wlid).FirstOrDefault();
// then I would get the dashboards through user.IdentityMaps.UserGroup.OptiDashboards
// (through foreach loops...)
}
You may be misunderstanding what the Include function actually does. Option 1 is purely a query syntax which has no effect on what is returned by the entity framework. Option 2, with the Include function instructs the entity framework to Eagerly Fetch the related rows from the database when returns the results of the query.
So option 1 will result in some joins, but the "select" part of the query will be restricted to the OptiDashboards table.
Option 2 will result in joins as well, but in this case it will be returning the results from all the included tables, which obviously is going to introduce more of a performance hit. But at the same time, the results will include all the related entities you need, avoiding the [possible] need for more round-trips to the database.
I think the Include will render as joins an you will the able to access the data from those tables in you user object (Eager Loading the properties).
The Any query will render as exists and not load the user object with info from the other tables.
For best performance if you don't need the additional info use the Any query
As has already been pointed out, the first option would almost certainly perform better, simply because it would be retrieving less information. Besides that, I wanted to point out that you could also write the query this way:
var dashboards =
from u in db.OptiUsers where u.WinLiveIDToken == wlid
from im in u.IdentityMaps
from d in im.UserGroup.OptiDashboards
select d;
I would expect the above to perform similarly to the first option, but you may (or may not) prefer the above form.

Data Loading Strategy/Syntax in EF4

Long time lurker, first time posting, and newly learning EF4 and MVC3.
I need help making sure I'm using the correct data loading strategy in this case as well as some help finalizing some details of the query. I'm currently using the eager loading approach outlined here for somewhat of a "dashboard" view that requires a small amount of data from about 10 tables (all have FK relationships).
var query = from l in db.Leagues
.Include("Sport")
.Include("LeagueContacts")
.Include("LeagueContacts.User")
.Include("LeagueContacts.User.UserContactDatas")
.Include("LeagueEvents")
.Include("LeagueEvents.Event")
.Include("Seasons")
.Include("Seasons.Divisions")
.Include("Seasons.Divisions.Teams")
.Where(l => l.URLPart.Equals(leagueName))
select (l);
model = (Models.League) query.First();
However, I need to do some additional filtering, sorting, and shaping of the data that I haven't been able to work out. Here are my chief needs/concerns from this point:
Several child objects still need additional filtering but I haven't been able to figure out the syntax or best approach yet. Example: "TOP 3 LeagueEvents.Event WHERE StartDate >= getdate() ORDER BY LeagueEvents.Event.StartDate"
I need to sort some of the fields. Examples: ORDERBY Seasons.StartDate, LeagueEvents.Event.StartDate, and LeagueContacts.User.SortOrder, etc.
I'm already very concerned about the overall size of the SQL generated by this query and the number of joins and am thinking that I may need a different data loading approach alltogether.(Explicit loading? Multiple QueryObjects? POCO?)
Any input, direction, or advice on how to resolve these remaining needs as well as ensuring the best performance is greatly appreciated.
Your concern about size of the query and size of the result set are tangible.
As #BrokenGlass mentioned EF doesn't allow you doing filtering or ordering on includes. If you want to order or filter relations you must use projection either to anonymous type or custom (non mapped) type:
var query = db.Leagues
.Where(l => l.URLPart.Equals(leagueName))
.Select(l => new
{
League = l,
Events = l.LeagueEvents.Where(...)
.OrderBy(...)
.Take(3)
.Select(e => e.Event)
...
});
Unfortunately EF doesn't allow to selectively load related entities using its navigation properties, it will always load all Foos if you specify Include("Foo").
You will have to do a join on each of the related entities using your Where() clauses as filters where they apply.

Categories

Resources