Optimising Linq to Entities - c#

I have a set of related entities. I'm using linq to group a collection of an entity type by a property on a related entity and then doing a sum calculation on a property of another related entity:
Vehicles.GroupBy(v => v.Mechanics.Engine.Size)
.Select(g => g.Sum(s => s.Passengers.Count));
I'm trying to do as much as possible via linq to entities because there is a large number of records in the db. However, the generated sql includes 9 select statements and an outer apply which takes more than 5 times as long to execute as writing the simplified sql code to achieve the same in one select statement.
How do I improve the generated sql?

You're in fact counting the number of passengers per engine size. So, the navigation properties permitting, you could also do:
Passengers.GroupBy(p => p.Vehicle.Mechanics.Engine.Size)
.Select(g => g.Count())
This will probably generate more joins and less subqueries. And only one aggregating statement in stead of two in the original query, of which one (Count) is repeated for each size.

Perhaps try the query like this:
Vehicles
.Select(x => new
{
EngineSize = x.Mechanics.Engine.Size,
PassengersCount = xs.Passengers.Count,
})
.ToArray()
.GroupBy(v => v.EngineSize)
.Select(g => g.Sum(s => s.PassengersCount));
This will execute in a single query, but may pull back too much data to make it faster. It's worth timing and profiling to see which is better.

You could also consider a hybrid approach whereby you bypass LINQ query generation yet use EF to project results into strong types like this:
public List<Vechicles> GetVehcileInformation(string VehicleType){
var QueryString = Resources.Queries.AllVehicles;
var parms = new List<SqlParameters>();
parms.Add(new SqlParameter("VehicleType", VehicleType );
try{
using (var db = new MyEntities()){
var stuff= db.SqlQuery<Vehicles>(QueryString, parms.ToArray());
return stuff.ToList();
}
}catch(exception iox){Log.ErrorMessage(iox);}
}
The idea is that the group by is done at DB layer which gives you more control than in LINQ. You get the speed of direct SQL Queries but get back strongly typed results! The query string itself is stored in a resources file as a string with Parameter place holders like this:
Select * from Table Where FieldName = #VehicleType...

Related

Sorting Primary Entities on Properties of Doubly-Nested Entities

I'm using Entity Framework Core with ASP.Net MVC. My business object model consists, in part, of Jobs (the primary entities), each of which contains one or more Projects, and each Project has zero or more Schedules (which link to sets of Departments, but that's not important here). Each schedule has a StartDate and and EndDate.
Here is a (simplified) class diagram (which reflects the database schema as you would expect):
I want to sort the Jobs list by the earliest StartDateTime value in the Schedule entity. I haven't been able to come up with a LINQ chain that accomplishes this. For the time being, I have cobbed the functionality I want by using ADO.Net directly in my controller to assemble the Jobs list based on the following SQL Statement:
#"SELECT
Jobs.JobId,
Jobs.JobName,
Jobs.jobNumber,
MIN(ProjectSchedules.StartDateTime) AS StartDateTime
FROM
Jobs INNER JOIN Projects ON Jobs.JobID = Projects.JobID
LEFT JOIN ProjectSchedules ON Projects.ProjectID = ProjectSchedules.ProjectID
GROUP BY Jobs.JobId, Jobs.JobName, Jobs.JobNumber
ORDER BY StartDateTime"
I would prefer to use EF Core properly, rather than to do an end-run around it. What would be the LINQ statements to generate this SQL statement (or an equivalent one)?
You need a query that "collects" all StartDateTimes in schedules of projects per job, takes their lowest value then sorts by that value:
context.Jobs
.OrderBy(j => j.Projects
.SelectMany(p => p.Schedules)
.Select(s => s.StartDate).OrderBy(d => d).First())
.Select(j => new { ... })
As you see, there's not even a Min function in there. The function could be used, but it may perform worse, because it has to evaluate all StartDate values, while the ordering could make use of an ordered index.
Either way, for comparison, this is with Min():
context.Jobs
.OrderBy(j => j.Projects
.SelectMany(p => p.Schedules)
.Select(s => s.StartDate).Min())
First we need to retrieve all the entities we can do this with the Include statement, and we use theninclude to retrieve entities one further down.
dbcontext.Jobs.Include(j => j.Projects).ThenInclude(p => p.Schedules)
Now that we have all the entities you can do all the sorting, grouping or whatever else you wish to do.
To me it sounds like you want to Orderby on schedule.startdatetime.

C# LINQ to Entities does not recognize the method TryGetValue

I know that lots of question related to this error, but I can not find out a way to convert my query to meet my query. My error: 'LINQ to Entities does not recognize the method 'Boolean TryGetValue(Int32, System.Collections.Generic.List1[System.Nullable1[System.Int32]] ByRef)' method, and this method cannot be translated into a store expression.'
My mind is melting down!
var groupedKeyAndValueOfProjectIdAndZoneIds = groupedProjectDelegationByProjectId.ToDictionary(keySelector: x => x.ProjectId, elementSelector: x => x.ZoneIds);
...
var data = projects
.Select(p => new Project
{
Id = p.Id,
ProjectName = p.Name,
Zones = p.Zones.Where(z =>
(zoneIds.Contains(z.Id) || (groupedKeyAndValueOfProjectIdAndZoneIds.TryGetValue(p.Id, out outValue) ? outValue.Contains(z.Id) : false)))
...
Given that groupedKeyAndValueOfProjectAndZones is Dictionary<int, List<int>>.
Please help me.
The problem you are having is that you are trying to mix two sources of data together. Underneath the hood LINQ to Entities wants to take the expression you are expressing in LINQ and translate it into a SQL query. In other words when you are writing a select in LINQ you are getting 1:1 mapping in SQL. When you throw a dictionary of data the way groupedKeyAndValueOfProjectAndZones is into the mix LINQ to Entities doesn't know how to represent this as its an in memory data source that has no SQL equivalent to run.
To fix this you need to either move the data contained in groupedKeyAndValueOfProjectAndZones into the database and query it from there or you need to provide the filtering you are doing post the LINQ to Entities query

Linq query timing out, how to streamline query

Our front end UI has a filtering system that, in the back end, operates over millions of rows. It uses a an IQueryable that is built up over the course of the logic, then executed all at once. Each individual UI component is ANDed together (for example, Dropdown1 and Dropdown2 will only return rows that have both of what is selected in common). This is not a problem. However, Dropdown3 has has two types of data in it, and the checked items need to be ORd together, then ANDed with the rest of the query.
Due to the large amount of rows it is operating over, it keeps timing out. Since there are some additional joins that need to happen, it is somewhat tricky. Here is my code, with the table names replaced:
//The end list has driver ids in it--but the data comes from two different places. Build a list of all the driver ids.
driverIds = db.CarDriversManyToManyTable.Where(
cd =>
filter.CarIds.Contains(cd.CarId) && //get driver IDs for each car ID listed in filter object
).Select(cd => cd.DriverId).Distinct().ToList();
driverIds = driverIds.Concat(
db.DriverShopManyToManyTable.Where(ds => filter.ShopIds.Contains(ds.ShopId)) //Get driver IDs for each Shop listed in filter object
.Select(ds => ds.DriverId)
.Distinct()).Distinct().ToList();
//Now we have a list solely of driver IDs
//The query operates over the Driver table. The query is built up like this for each item in the UI. Changing from Linq is not an option.
query = query.Where(d => driverIds.Contains(d.Id));
How can I streamline this query so that I don't have to retrieve thousands and thousands of IDs into memory, then feed them back into SQL?
There are several ways to produce a single SQL query. All they require to keep the parts of the query of type IQueryable<T>, i.e. do not use ToList, ToArray, AsEnumerable etc. methods that force them to be executed and evaluated in memory.
One way is to create Union query containing the filtered Ids (which will be unique by definition) and use join operator to apply it on the main query:
var driverIdFilter1 = db.CarDriversManyToManyTable
.Where(cd => filter.CarIds.Contains(cd.CarId))
.Select(cd => cd.DriverId);
var driverIdFilter2 = db.DriverShopManyToManyTable
.Where(ds => filter.ShopIds.Contains(ds.ShopId))
.Select(ds => ds.DriverId);
var driverIdFilter = driverIdFilter1.Union(driverIdFilter2);
query = query.Join(driverIdFilter, d => d.Id, id => id, (d, id) => d);
Another way could be using two OR-ed Any based conditions, which would translate to EXISTS(...) OR EXISTS(...) SQL query filter:
query = query.Where(d =>
db.CarDriversManyToManyTable.Any(cd => d.Id == cd.DriverId && filter.CarIds.Contains(cd.CarId))
||
db.DriverShopManyToManyTable.Any(ds => d.Id == ds.DriverId && filter.ShopIds.Contains(ds.ShopId))
);
You could try and see which one performs better.
The answer to this question is complex and has many facets that, individually, may or may not help in your particular case.
First of all, consider using pagination. .Skip(PageNum * PageSize).Take(PageSize) I doubt your user needs to see millions of rows at once in the front end. Show them only 100, or whatever other smaller number seems reasonable to you.
You've mentioned that you need to use joins to get the data you need. These joins can be done while forming your IQueryable (entity framework), rather than in-memory (linq to objects). Read up on join syntax in linq.
HOWEVER - performing explicit joins in LINQ is not the best practice, especially if you are designing the database yourself. If you are doing database first generation of your entities, consider placing foreign-key constraints on your tables. This will allow database-first entity generation to pick those up and provide you with Navigation Properties which will greatly simplify your code.
If you do not have any control or influence over the database design, however, then I recommend you construct your query in SQL first to see how it performs. Optimize it there until you get the desired performance, and then translate it into an entity framework linq query that uses explicit joins as a last resort.
To speed such queries up, you will likely need to perform indexing on all of the "key" columns that you are joining on. The best way to figure out what indexes you need to improve performance, take the SQL query generated by your EF linq and bring it on over to SQL Server Management Studio. From there, update the generated SQL to provide some predefined values for your #p parameters just to make an example. Once you've done this, right click on the query and either use display estimated execution plan or include actual execution plan. If indexing can improve your query performance, there is a pretty good chance that this feature will tell you about it and even provide you with scripts to create the indexes you need.
It looks to me that using the instance versions of the LINQ extensions is creating several collections before you're done. using the from statement versions should cut that down quite a bit:
driveIds = (from var record in db.CarDriversManyToManyTable
where filter.CarIds.Contains(record.CarId)
select record.DriverId).Concat
(from var record in db.DriverShopManyToManyTable
where filter.ShopIds.Contains(record.ShopId)
select record.DriverId).Distinct()
Also using the groupby extension would give better performance than querying each driver Id.

Using Distinct in LINQ to SQL on the primary key

I have a system that uses tags to categorize content similar to the way Stack Overflow does. I am trying to generate a list of the most recently used tags using LINQ to SQL.
(from x in cm.ContentTags
join t in cm.Tags on x.TagID equals t.TagID
orderby x.ContentItem.Date descending select t)
.Distinct(p => p.TagID) // <-- Ideally I'd be able to do this
The Tags table has a many to many relationship with the ContentItems table. ContentTags joins them, each tuple having a reference to the Tag and ContentItem.
I can't just use distinct because it compares on Tablename.* rather than Tablename.PrimaryKey, and I can't implement an IEqualityComparer since that doesn't translate to SQL, and I don't want to pull potential millions of records from the DB with .ToList(). So, what should I do?
You could write your own query provider, that supports such an overloaded distinct operator. It's not cheap, but it would probably be worthwhile, particularly if you could make it's query generation composable. That would enable a lot of customizations.
Otherwise you could create a stored proc or a view.
Use a subselect:
var result = from t in cm.Tags
where(
from x in cm.ContentTags
where x.TagID == t.TagID
).Contains(t.TagID)
select t;
This will mean only distinct records are returned form cm.Tags, only problem is you will need to find some way to order result
Use:
.GroupBy(x => x.TagID)
And then extract the data in:
Select(x => x.First().Example)

slightly complex many-to-many linq query got me stuck

So I'm on Linq-To-Entities with an asp.net mvc project.
I always get a little stumped with this sort of query.
My schema is:
ProductTag
+TagName
+<<ProductNames>>//Many-to-many relationship
ProductName
+FullName
+<<Tag>>//Many-to-many relationship
PurchaseRecord
+Amount
+<<ProductName>>//one productname can be linked to Many purchase records.
I need to get the sum of all purchases for a given tag.
This is what I've tried.
ProductTag thetag//could be some tag
decimal total = myentities.PurchaseRecords
.Where(x => thetag.ProductNames.Any
(a => a.FullName == x.ProductName.FullName))
.Sum(s => s.Amount);
I've tried changing a couple of things, tried using Contains, but I know I'm fundamentally wrong somewhere.
I keep getting :
Unable to create a constant value of type 'ProductName'. Only primitive types ('such as Int32, String, and Guid') are supported in this context.
Update
So with #Alexandre Brisebois's help below it worked simply like this:
var total= item.ProductNames.SelectMany(x => x.PurchaseRecords)
.Sum(s => s.Amount);
When you get this sort of error, you need to do all evaluations outside of the linq query and pass the values in as variables.
the problem with your query is that thetag.ProductNames.Any() is out of context.
This evaluation is not converted to SQL since it is not a string/guid or int.
You will need to query for this object within your query and evaluate from this object.
I'm not sure if that was clear.
You would need to do something like
var query1 = (from x in tags where x.tagID = id select x.ProductNames)
.SelectMany(...)
The select many is because you are selecting a collection ProductNames and need to bring it back as a flat set/collection fo that you can do a .Any() on it in the next query.
Then use this and do an query1.Any(logic)
decimal total = myentities.PurchaseRecords.
Where(x => query1.Any
(a => a.FullName == x.ProductName.FullName))
.Sum(s => s.Amount);
By doing this you will stay in linq to entity and not convert to linq to objects.
The ForEach is not an option since this will iterate over the collection.
you can use AsEnumerable method to perform certain portions of query in C# rather than on sql server. this is usually required when you have part of data in memory (Collection of objects) so using them in query is not easy. you have to perform part of query execution on .net side. for your problem plz try
decimal total = myentities.PurchaseRecords.AsEnumerable()
.Where(x => thetag.ProductNames.Any
(a => a.FullName == x.ProductName.FullName))
.Sum(s => s.Amount);
plz visit this link to find more about AsEnumerable

Categories

Resources