LinqToSQL query loadwith performance

LinqToSQL query loadwith performance - c#

I found this link which explains my problem and has an answer, but don't seem to be able to make it work.
Here's what I have for DataLoadOptions:
options.LoadWith<Support>(p => p.PostBase);
options.LoadWith<Support>(p => p.PostMaterial);
options.LoadWith<Support>(p => p.PostPosition);
options.LoadWith<Support>(p => p.PostSize);
options.LoadWith<Support>(p => p.PostType);
options.LoadWith<Support>(p => p.Signs);
options.LoadWith<Support>(p => p.SupportComments);
options.LoadWith<Support>(p => p.SupportInspections);
options.LoadWith<Support>(p => p.SupportPhotos);
options.LoadWith<Sign>(p => p.SignBacking);
options.LoadWith<Sign>(p => p.SignComments);
options.LoadWith<Sign>(p => p.SignCondition);
options.LoadWith<Sign>(p => p.SignDelineator);
options.LoadWith<Sign>(p => p.SignFace);
options.LoadWith<Sign>(p => p.SignIllumination);
options.LoadWith<Sign>(p => p.SignToSignObstructions);
options.LoadWith<Sign>(p => p.UniformTrafficControlCode);
options.LoadWith<SignToSignObstruction>(p => p.SignObstruction);
I think that will give a good explanation of my object graph. I'm trying to query for Support objects that match a certain search criteria (perhaps someone wants supports with post type of blah).
If I try just pulling back all Supports, I get about 2200 Supports and it takes 17k queries.
I attempted the grouping solution mentioned in the other question, but I wonder if either I'm doing it wrong or my situation is just too complex. I removed the search criteria and just tried returning all Supports. This results in about 21k queries and pulls back about 3000 Supports. Here is my query:
var group =
from support in roadDataContext.Supports
join sign in roadDataContext.Signs on support.SupportID equals sign.SupportID
group sign by sign.Support
into signGroup
select signGroup;
Am I just missing something simple? Thanks.

We made the same mistake with our L2S data layer. Our load options are ridiculous in some cases. It was a hard lesson learned.
This is known as the SELECT N+1 problem. 1 for the parent entity, and N for the number of associated entities being eager-loaded. You'd expect L2S to just be smart enough and get it all in one giant query, but this is unfortunately not the case. It will create one giant query, which tells it the IDs of the associations to load, then one by one retrieves those associations.
Perhaps the best work-around is to use projection so your LINQ query returns a new object, rather than an entity. For example:
var fooDtos = from foo in db.Foo
where foo.bar == "What a great example"
select new fooDTO { FooName = foo.Name, FooBar = foo.Bar };
This query returns an IEnumerable<FooDTO> instead of IQueryable<Foo>. This has two benefits. First of all you're instructing L2S specifically which columns to retrieve, so it doesn't do a SELECT *. Also, you don't need DataLoadOptions anymore because you can query any table you want in the query and select from any table to generate the DTO.

Related

Sorting Primary Entities on Properties of Doubly-Nested Entities

I'm using Entity Framework Core with ASP.Net MVC. My business object model consists, in part, of Jobs (the primary entities), each of which contains one or more Projects, and each Project has zero or more Schedules (which link to sets of Departments, but that's not important here). Each schedule has a StartDate and and EndDate.
Here is a (simplified) class diagram (which reflects the database schema as you would expect):
I want to sort the Jobs list by the earliest StartDateTime value in the Schedule entity. I haven't been able to come up with a LINQ chain that accomplishes this. For the time being, I have cobbed the functionality I want by using ADO.Net directly in my controller to assemble the Jobs list based on the following SQL Statement:
#"SELECT
Jobs.JobId,
Jobs.JobName,
Jobs.jobNumber,
MIN(ProjectSchedules.StartDateTime) AS StartDateTime
FROM
Jobs INNER JOIN Projects ON Jobs.JobID = Projects.JobID
LEFT JOIN ProjectSchedules ON Projects.ProjectID = ProjectSchedules.ProjectID
GROUP BY Jobs.JobId, Jobs.JobName, Jobs.JobNumber
ORDER BY StartDateTime"
I would prefer to use EF Core properly, rather than to do an end-run around it. What would be the LINQ statements to generate this SQL statement (or an equivalent one)?

You need a query that "collects" all StartDateTimes in schedules of projects per job, takes their lowest value then sorts by that value:
context.Jobs
.OrderBy(j => j.Projects
.SelectMany(p => p.Schedules)
.Select(s => s.StartDate).OrderBy(d => d).First())
.Select(j => new { ... })
As you see, there's not even a Min function in there. The function could be used, but it may perform worse, because it has to evaluate all StartDate values, while the ordering could make use of an ordered index.
Either way, for comparison, this is with Min():
context.Jobs
.OrderBy(j => j.Projects
.SelectMany(p => p.Schedules)
.Select(s => s.StartDate).Min())

First we need to retrieve all the entities we can do this with the Include statement, and we use theninclude to retrieve entities one further down.
dbcontext.Jobs.Include(j => j.Projects).ThenInclude(p => p.Schedules)
Now that we have all the entities you can do all the sorting, grouping or whatever else you wish to do.
To me it sounds like you want to Orderby on schedule.startdatetime.

Optimising Linq to Entities

I have a set of related entities. I'm using linq to group a collection of an entity type by a property on a related entity and then doing a sum calculation on a property of another related entity:
Vehicles.GroupBy(v => v.Mechanics.Engine.Size)
.Select(g => g.Sum(s => s.Passengers.Count));
I'm trying to do as much as possible via linq to entities because there is a large number of records in the db. However, the generated sql includes 9 select statements and an outer apply which takes more than 5 times as long to execute as writing the simplified sql code to achieve the same in one select statement.
How do I improve the generated sql?

You're in fact counting the number of passengers per engine size. So, the navigation properties permitting, you could also do:
Passengers.GroupBy(p => p.Vehicle.Mechanics.Engine.Size)
.Select(g => g.Count())
This will probably generate more joins and less subqueries. And only one aggregating statement in stead of two in the original query, of which one (Count) is repeated for each size.

Perhaps try the query like this:
Vehicles
.Select(x => new
{
EngineSize = x.Mechanics.Engine.Size,
PassengersCount = xs.Passengers.Count,
})
.ToArray()
.GroupBy(v => v.EngineSize)
.Select(g => g.Sum(s => s.PassengersCount));
This will execute in a single query, but may pull back too much data to make it faster. It's worth timing and profiling to see which is better.

You could also consider a hybrid approach whereby you bypass LINQ query generation yet use EF to project results into strong types like this:
public List<Vechicles> GetVehcileInformation(string VehicleType){
var QueryString = Resources.Queries.AllVehicles;
var parms = new List<SqlParameters>();
parms.Add(new SqlParameter("VehicleType", VehicleType );
try{
using (var db = new MyEntities()){
var stuff= db.SqlQuery<Vehicles>(QueryString, parms.ToArray());
return stuff.ToList();
}
}catch(exception iox){Log.ErrorMessage(iox);}
}
The idea is that the group by is done at DB layer which gives you more control than in LINQ. You get the speed of direct SQL Queries but get back strongly typed results! The query string itself is stored in a resources file as a string with Parameter place holders like this:
Select * from Table Where FieldName = #VehicleType...

Will this NHibernate query impact performance?

I am creating a website in ASP.NET MVC and use NHibernate as ORM. I have the following tables in my database:
Bookmarks
TagsBookmarks (junction table)
Tags
Mapping:
public BookmarkMap()
{
Table("Bookmarks");
Id(x => x.Id).Column("Id").GeneratedBy.Identity();
Map(x => x.Title);
Map(x => x.Link);
Map(x => x.DateCreated);
Map(x => x.DateModified);
References(x => x.User, "UserId");
HasManyToMany(x => x.Tags).AsSet().Cascade.None().Table("TagsBookmarks").ParentKeyColumn("BookmarkId")
.ChildKeyColumn("TagId");
}
public TagMap()
{
Table("Tags");
Id(x => x.Id).Column("Id").GeneratedBy.Identity();
Map(x => x.Title);
Map(x => x.Description);
Map(x => x.DateCreated);
Map(x => x.DateModified);
References(x => x.User, "UserId");
HasManyToMany(x => x.Bookmarks).AsSet().Cascade.None().Inverse().Table("TagsBookmarks").ParentKeyColumn("TagId")
.ChildKeyColumn("BookmarkId");
}
I need the data from both the Bookmarks and Tags table. More specific: I need 20 bookmarks with their related tags. The first thing I do is select 20 bookmark ids from the Bookmarks table. I do this because paging doesn't work well on a cartesian product that I get in the second query.
First query:
IEnumerable<int> bookmarkIds = (from b in SessionFactory.GetCurrentSession().Query<Bookmark>()
where b.User.Username == username
orderby b.DateCreated descending
select b.Id).Skip((page - 1) * pageSize).Take(pageSize).ToList<int>();
After that I select the bookmarks for these ids.
Second query:
IEnumerable<Bookmark> bookmarks = (from b in SessionFactory.GetCurrentSession().Query<Bookmark>().Fetch(t => t.Tags)
where b.User.Username == username && bookmarkIds.Contains(b.Id)
orderby b.DateCreated descending
select b);
The reason I use fetch is because I want to avoid N+1 queries. This works but results in a cartesian product. I have read in some posts that you should avoid cartesian products, but I don't really know how to do this in my case.
I have also read something about setting a batch size for the N+1 queries. Is this really faster than this single query?
An user can add max 5 tags to a bookmark. I select 20 bookmarks per page so worst case scenario for this second query is: 5 * 20 = 100 rows.
Will this impact performance when I have lots of data in the Bookmarks and Tags tables? Should I do this differently?

This is not a Cartesian product.
~ Figure A ~
Bookmarks -> Tags -> Tag
A Cartesian product is all of the possible combinations of two different sets. For example, suppose we had three tables: Customer, CustomerAddress, and CustomerEmail. Customers have many addresses, and they also have many email addresses.
~ Figure B ~
Customers -> Addresses
-> Emails
If you wrote a query like...
select *
from
Customer c
left outer join CustomerAddress a
on c.Id = a.Customer_id
left outer join CustomerEmail e
on c.Id = e.Customer_id
where c.Id = 12345
... and this customer had 5 addresses and 5 email addresses, you would wind up with 5 * 5 = 25 rows returned. You can see why this would be bad for performance. It is unnecessary data. Knowing every possible combination of Address and Email Address for a customer tells us nothing useful.
With your query, you are not returning any unnecessary rows. Every row in the result set corresponds directly to a row in one of the tables you're interested in, and vice-versa. There is no multiplication. Instead you have TagsBookmarksCount + BookmarksThatDontHaveTagsCount.
The key place to look for Cartesian products is when your query branches off into two separate unrelated collections. If you're just digging deeper and deeper into a single chain of child collections, as in Figure A, there is no Cartesian product. The number of rows your query returns will be limited by the number of rows returned by that deepest collection. As soon as you branch off to the side so that you now have two parallel, side-by-side collections in the query, as in Figure B, then you have a Cartesian product, and results will be unnecessarily multiplied.
To fix a Cartesian product, split the query into multiple queries so the number of rows are added, not multiplied. With NHibernate's Future methods, you can batch those separate queries together, so you still only have one round trip to the database. See one of my other answers for an example of how to fix a Cartesian product in NHibernate.

Query<>.Fetch() is intended to ensure that eager loading is taking place, and when it's a one-to-many relationship, as this appears to be ( i.e. if Bookmark.Tags is a collection) then the two ways you are going about this are roughly equivalent. If Tags is lazy-loaded and only accessed rarely, then leaving it non-fetched may be the best way to go (as in your first query), because you will not always be accessing the Tags much. This depends on use case.
If, on the other hand, you know that you will always be getting all the tags, it may make more sense to break this off into another query, this time on the whatever the Tags type/table is, and look them up instead of using the NHibernate relations to do the job.
If Tag has a foreign key to Bookmarks, like BookmarkId, ToLookup can be useful in this case:
var tagLookup = (from t in SessionFactory.GetCurrentSession().Query<Tag>()
// limit query appropriately for all the bookmarks you need
// this should be done once, in this optimization
select new {key=t.BookmarkId, value=t} )
.ToLookup(x=>x.key, x=>x.value);
Will give you a lookup (ILookup<int, Tag>) where you can do something like:
IGrouping<Tag> thisBookmarksTags = tagLookup[bookmarkId];
Which will give you the tags you need for that bookmark. This separates it out into another query, thereby avoiding N+1.
This is making quite a few assumptions about your data model, and the mappings, but I hope it illustrates a pretty straight-forward optimization that you can use.

How does a LINQ expression know that Where() comes before Select()?

I'm trying to create a LINQ provider. I'm using the guide LINQ: Building an IQueryable provider series, and I have added the code up to LINQ: Building an IQueryable Provider - Part IV.
I am getting a feel of how it is working and the idea behind it. Now I'm stuck on a problem, which isn't a code problem but more about the understanding.
I'm firing off this statement:
QueryProvider provider = new DbQueryProvider();
Query<Customer> customers = new Query<Customer>(provider);
int i = 3;
var newLinqCustomer = customers.Select(c => new { c.Id, c.Name}).Where(p => p.Id == 2 | p.Id == i).ToList();
Somehow the code, or expression, knows that the Where comes before the Select. But how and where?
There is no way in the code that sorts the expression, in fact the ToString() in debug mode, shows that the Select comes before the Where.
I was trying to make the code fail. Normal I did the Where first and then the Select.
So how does the expression sort this? I have not done any change to the code in the guide.

The expressions are "interpreted", "translated" or "executed" in the order you write them - so the Where does not come before the Select
If you execute:
var newLinqCustomer = customers.Select(c => new { c.Id, c.Name})
.Where(p => p.Id == 2 | p.Id == i).ToList();
Then the Where is executed on the IEnumerable or IQueryable of the anonymous type.
If you execute:
var newLinqCustomer = customers.Where(p => p.Id == 2 | p.Id == i)
.Select(c => new { c.Id, c.Name}).ToList();
Then the Where is executed on the IEnumerable or IQueryable of the customer type.
The only thing I can think of is that maybe you're seeing some generated SQL where the SELECT and WHERE have been reordered? In which case I'd guess that there's an optimisation step somewhere in the (e.g.) LINQ to SQL provider that takes SELECT Id, Name FROM (SELECT Id, Name FROM Customer WHERE Id=2 || Id=#i) and converts it to SELECT Id, Name FROM Customer WHERE Id=2 || Id=#i - but this must be a provider specific optimisation.

No, in the general case (such as LINQ to Objects) the select will be executed before the where statement. Think of it is a pipeline, your first step is a transformation, the second a filter. Not the other way round, as it would be the case if you wrote Where...Select.
Now, a LINQ Provider has the freedom to walk the expression tree and optimize it as it sees fit. Be aware that you may not change the semantics of the expression though. This means that a smart LINQ to SQL provider would try to pull as many where clauses it can into the SQL query to reduce the amount of data travelling over the network. However, keep the example from Stuart in mind: Not all query providers are clever, partly because ruling out side effects from query reordering is not as easy as it seems.

slightly complex many-to-many linq query got me stuck

So I'm on Linq-To-Entities with an asp.net mvc project.
I always get a little stumped with this sort of query.
My schema is:
ProductTag
+TagName
+<<ProductNames>>//Many-to-many relationship
ProductName
+FullName
+<<Tag>>//Many-to-many relationship
PurchaseRecord
+Amount
+<<ProductName>>//one productname can be linked to Many purchase records.
I need to get the sum of all purchases for a given tag.
This is what I've tried.
ProductTag thetag//could be some tag
decimal total = myentities.PurchaseRecords
.Where(x => thetag.ProductNames.Any
(a => a.FullName == x.ProductName.FullName))
.Sum(s => s.Amount);
I've tried changing a couple of things, tried using Contains, but I know I'm fundamentally wrong somewhere.
I keep getting :
Unable to create a constant value of type 'ProductName'. Only primitive types ('such as Int32, String, and Guid') are supported in this context.
Update
So with #Alexandre Brisebois's help below it worked simply like this:
var total= item.ProductNames.SelectMany(x => x.PurchaseRecords)
.Sum(s => s.Amount);

When you get this sort of error, you need to do all evaluations outside of the linq query and pass the values in as variables.
the problem with your query is that thetag.ProductNames.Any() is out of context.
This evaluation is not converted to SQL since it is not a string/guid or int.
You will need to query for this object within your query and evaluate from this object.
I'm not sure if that was clear.
You would need to do something like
var query1 = (from x in tags where x.tagID = id select x.ProductNames)
.SelectMany(...)
The select many is because you are selecting a collection ProductNames and need to bring it back as a flat set/collection fo that you can do a .Any() on it in the next query.
Then use this and do an query1.Any(logic)
decimal total = myentities.PurchaseRecords.
Where(x => query1.Any
(a => a.FullName == x.ProductName.FullName))
.Sum(s => s.Amount);
By doing this you will stay in linq to entity and not convert to linq to objects.
The ForEach is not an option since this will iterate over the collection.

you can use AsEnumerable method to perform certain portions of query in C# rather than on sql server. this is usually required when you have part of data in memory (Collection of objects) so using them in query is not easy. you have to perform part of query execution on .net side. for your problem plz try
decimal total = myentities.PurchaseRecords.AsEnumerable()
.Where(x => thetag.ProductNames.Any
(a => a.FullName == x.ProductName.FullName))
.Sum(s => s.Amount);
plz visit this link to find more about AsEnumerable

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.