Will this NHibernate query impact performance? - c#

I am creating a website in ASP.NET MVC and use NHibernate as ORM. I have the following tables in my database:
Bookmarks
TagsBookmarks (junction table)
Tags
Mapping:
public BookmarkMap()
{
Table("Bookmarks");
Id(x => x.Id).Column("Id").GeneratedBy.Identity();
Map(x => x.Title);
Map(x => x.Link);
Map(x => x.DateCreated);
Map(x => x.DateModified);
References(x => x.User, "UserId");
HasManyToMany(x => x.Tags).AsSet().Cascade.None().Table("TagsBookmarks").ParentKeyColumn("BookmarkId")
.ChildKeyColumn("TagId");
}
public TagMap()
{
Table("Tags");
Id(x => x.Id).Column("Id").GeneratedBy.Identity();
Map(x => x.Title);
Map(x => x.Description);
Map(x => x.DateCreated);
Map(x => x.DateModified);
References(x => x.User, "UserId");
HasManyToMany(x => x.Bookmarks).AsSet().Cascade.None().Inverse().Table("TagsBookmarks").ParentKeyColumn("TagId")
.ChildKeyColumn("BookmarkId");
}
I need the data from both the Bookmarks and Tags table. More specific: I need 20 bookmarks with their related tags. The first thing I do is select 20 bookmark ids from the Bookmarks table. I do this because paging doesn't work well on a cartesian product that I get in the second query.
First query:
IEnumerable<int> bookmarkIds = (from b in SessionFactory.GetCurrentSession().Query<Bookmark>()
where b.User.Username == username
orderby b.DateCreated descending
select b.Id).Skip((page - 1) * pageSize).Take(pageSize).ToList<int>();
After that I select the bookmarks for these ids.
Second query:
IEnumerable<Bookmark> bookmarks = (from b in SessionFactory.GetCurrentSession().Query<Bookmark>().Fetch(t => t.Tags)
where b.User.Username == username && bookmarkIds.Contains(b.Id)
orderby b.DateCreated descending
select b);
The reason I use fetch is because I want to avoid N+1 queries. This works but results in a cartesian product. I have read in some posts that you should avoid cartesian products, but I don't really know how to do this in my case.
I have also read something about setting a batch size for the N+1 queries. Is this really faster than this single query?
An user can add max 5 tags to a bookmark. I select 20 bookmarks per page so worst case scenario for this second query is: 5 * 20 = 100 rows.
Will this impact performance when I have lots of data in the Bookmarks and Tags tables? Should I do this differently?

This is not a Cartesian product.
~ Figure A ~
Bookmarks -> Tags -> Tag
A Cartesian product is all of the possible combinations of two different sets. For example, suppose we had three tables: Customer, CustomerAddress, and CustomerEmail. Customers have many addresses, and they also have many email addresses.
~ Figure B ~
Customers -> Addresses
-> Emails
If you wrote a query like...
select *
from
Customer c
left outer join CustomerAddress a
on c.Id = a.Customer_id
left outer join CustomerEmail e
on c.Id = e.Customer_id
where c.Id = 12345
... and this customer had 5 addresses and 5 email addresses, you would wind up with 5 * 5 = 25 rows returned. You can see why this would be bad for performance. It is unnecessary data. Knowing every possible combination of Address and Email Address for a customer tells us nothing useful.
With your query, you are not returning any unnecessary rows. Every row in the result set corresponds directly to a row in one of the tables you're interested in, and vice-versa. There is no multiplication. Instead you have TagsBookmarksCount + BookmarksThatDontHaveTagsCount.
The key place to look for Cartesian products is when your query branches off into two separate unrelated collections. If you're just digging deeper and deeper into a single chain of child collections, as in Figure A, there is no Cartesian product. The number of rows your query returns will be limited by the number of rows returned by that deepest collection. As soon as you branch off to the side so that you now have two parallel, side-by-side collections in the query, as in Figure B, then you have a Cartesian product, and results will be unnecessarily multiplied.
To fix a Cartesian product, split the query into multiple queries so the number of rows are added, not multiplied. With NHibernate's Future methods, you can batch those separate queries together, so you still only have one round trip to the database. See one of my other answers for an example of how to fix a Cartesian product in NHibernate.

Query<>.Fetch() is intended to ensure that eager loading is taking place, and when it's a one-to-many relationship, as this appears to be ( i.e. if Bookmark.Tags is a collection) then the two ways you are going about this are roughly equivalent. If Tags is lazy-loaded and only accessed rarely, then leaving it non-fetched may be the best way to go (as in your first query), because you will not always be accessing the Tags much. This depends on use case.
If, on the other hand, you know that you will always be getting all the tags, it may make more sense to break this off into another query, this time on the whatever the Tags type/table is, and look them up instead of using the NHibernate relations to do the job.
If Tag has a foreign key to Bookmarks, like BookmarkId, ToLookup can be useful in this case:
var tagLookup = (from t in SessionFactory.GetCurrentSession().Query<Tag>()
// limit query appropriately for all the bookmarks you need
// this should be done once, in this optimization
select new {key=t.BookmarkId, value=t} )
.ToLookup(x=>x.key, x=>x.value);
Will give you a lookup (ILookup<int, Tag>) where you can do something like:
IGrouping<Tag> thisBookmarksTags = tagLookup[bookmarkId];
Which will give you the tags you need for that bookmark. This separates it out into another query, thereby avoiding N+1.
This is making quite a few assumptions about your data model, and the mappings, but I hope it illustrates a pretty straight-forward optimization that you can use.

Related

How to query a many to many relationship with an array of ID's in Entity Framework

Say there are stores, customers, and credit accounts. So customers and stores have a many-to-many relationship through the credit account table.
If I want every customer that has an account at a given store, I could access it like follows.
var customers = dbContext.Customers
.Where(c => c.CreditAccounts.Select(a => a.StoreID).Contains(storeID));
That seems to work but when I want to search by multiple storeIds, I get an empty result set. Here are some things I've tried.
var customers = dbContext.Customers
.Where(c => c.CreditAccounts.Any(a => storeIDs.Contains(a.StoreID)));
And
var customers = dbContext.Customers
.Where(c => c.CreditAccounts.Select(a => a.StoreID).Intersect(storeIDs).Count() > 0);
These always give me empty results. How can I achieve this without writing a raw SQL query?
Update
I did not have the data that I thought I had. Once that was corrected, my queries began to work. I think the question is still valid though because it's not clear whether any of these perform well, so other users might wish to respond with the most efficient way to retrieve the results.
The following linq query will return the results that you are looking for - however, I am not certain that it will perform well at all:
var results = dbContext.Customers
.Where(c => c.CreditAccounts.Join(storeIds.ToList(), a => a.StoreId, s => s, (a,s)=>s).Any());
You may get better performance by finding the credit accounts that intersect with the StoreID, and then joining the list of credit accounts with the customers table - although I can't tell from the code you have provided whether you have direct access to a list of creditaccounts.

Optimising Linq to Entities

I have a set of related entities. I'm using linq to group a collection of an entity type by a property on a related entity and then doing a sum calculation on a property of another related entity:
Vehicles.GroupBy(v => v.Mechanics.Engine.Size)
.Select(g => g.Sum(s => s.Passengers.Count));
I'm trying to do as much as possible via linq to entities because there is a large number of records in the db. However, the generated sql includes 9 select statements and an outer apply which takes more than 5 times as long to execute as writing the simplified sql code to achieve the same in one select statement.
How do I improve the generated sql?
You're in fact counting the number of passengers per engine size. So, the navigation properties permitting, you could also do:
Passengers.GroupBy(p => p.Vehicle.Mechanics.Engine.Size)
.Select(g => g.Count())
This will probably generate more joins and less subqueries. And only one aggregating statement in stead of two in the original query, of which one (Count) is repeated for each size.
Perhaps try the query like this:
Vehicles
.Select(x => new
{
EngineSize = x.Mechanics.Engine.Size,
PassengersCount = xs.Passengers.Count,
})
.ToArray()
.GroupBy(v => v.EngineSize)
.Select(g => g.Sum(s => s.PassengersCount));
This will execute in a single query, but may pull back too much data to make it faster. It's worth timing and profiling to see which is better.
You could also consider a hybrid approach whereby you bypass LINQ query generation yet use EF to project results into strong types like this:
public List<Vechicles> GetVehcileInformation(string VehicleType){
var QueryString = Resources.Queries.AllVehicles;
var parms = new List<SqlParameters>();
parms.Add(new SqlParameter("VehicleType", VehicleType );
try{
using (var db = new MyEntities()){
var stuff= db.SqlQuery<Vehicles>(QueryString, parms.ToArray());
return stuff.ToList();
}
}catch(exception iox){Log.ErrorMessage(iox);}
}
The idea is that the group by is done at DB layer which gives you more control than in LINQ. You get the speed of direct SQL Queries but get back strongly typed results! The query string itself is stored in a resources file as a string with Parameter place holders like this:
Select * from Table Where FieldName = #VehicleType...

Nested Select Statements in LINQ to Entity Without Unnecessary Loads

I want to perform a nested select statement in linq the form of:
select *
from table_a
where w in (select w
from table_b
where x in (select x
from table_c
where z in (select z
from table_d
where z = z)))
The problem is, the only way I can figure out how to do that is by loading the results from table_b and table_c, which adds an unnecessary expense. For example, say I am attempting to load all of a customer's orderdetaildetails. The following code will load ALL of MyCustomer's orders and ALL of each order's orderdetails and, only then, all of each orderdetail's orderdetaildetails:
customer MyCustomer; //Entity customer already loaded.
var query = MyCustomer.orders.SelectMany(order => order.orderdetails).SelectMany(od => od.orderdetaildetails);
Another approach is to use the .Include function. However, this also loads each level:
var query = MyCustomer.orders.CreateSourceQuery().Include("orderdetails.orderdetaildetails");
Both of these functions load unnecessary data. The first, SelectMany(), actually makes separate roundtrips to the database for each navigation level and then for each returned entity (save entities on the last navigation level). Include() makes one trip to the database and does one giant join statement. This is a little better, but still unseemly.
Is there a way to reach the ordetaildetails level (from customer) WITHOUT loading orders and orderdetails into memory AND in one trip to the database only?
Thanks guys - Lax
This should get you the orderdetaildetails for a given customer without unnecessary loading.
customer MyCustomer; // Entity customer already loaded
var orderDetailsDetails = context.OrderDetailsDetails
.Where(odd => odd.OrderDetail.Order.Customer.CustomerPK == customer.CustomerPK);
It looks like you have lazy loading enabled which means that as soon as you access the customers orders EF goes off to the database to get them for you. It's the same when ever you access the orders orderdetails. An alternative method similar to what you used would be.
var query = context.Customers.Where(c => c.CustomerPK == customer.CustomerPK)
.SelectMany(c => c.orders)
.SelectMany(order => order.orderdetails)
.SelectMany(od => od.orderdetaildetails);

preloading a tree / hierarchy from database using NHibernate / C#

I need to populate some tree hierarchies and traverse them in order to build up a category menu. Each category can have more than 1 parent. The problem is how to do this efficiently, and try to avoid the Select N+1 problem.
Currently, it is implemented by using two tables / entities:
Category
--------
ID
Title
CategoryLink
---------
ID
CategoryID
ParentID
Ideally, I would use the normal object traversal to go through the nodes, i.e by going through Category.ChildCategories, etc. Is this possible to be done in one SQL statement? And also, can this be done in NHibernate?
Specify a batch-size on the Category.ChildCategories mapping. That will cause NHibernate to fetch children in batches of the specified size, not one at a time (which will alleviate the N+1 problem).
If you are using .hbm files, you can specify the batch-size like this:
<bag name="ChildCategories" batch-size="30">
or using fluent mapping
HasMany(x => x.ChildCategories).KeyColumn("ParentId").BatchSize(30);
See the NHibernate documentation for more info.
EDIT
Ok, I believe I understand your requirements. With the following configuration
HasManyToMany<Item>(x => x.ChildCategories)
.Table("CategoryLink")
.ParentKeyColumn("ParentId")
.ChildKeyColumn("CategoryID")
.BatchSize(100)
.Not.LazyLoad()
.Fetch.Join();
you should be able to get the entire hierarchy in one call using the following line.
var result = session.CreateCriteria(typeof(Category)).List();
For some reason, retrieving a single category like this
var categoryId = 1;
var result = session.Get<Category>(categoryId);
results in one call per level in the hierarchy. I believe this should still significantly reduce the number of calls to the database, but I was not able to get the example above to work with a single call to the database.
This would retrieve all the categories with their children:
var result = session.Query<Category>()
.FetchMany(x => x.ChildCategories)
.ToList();
The problem is determining what the root categories are. You could either use a flag, or map the inverse collection (ParentCategories) and do this:
var root = session.Query<Category>()
.FetchMany(x => x.ChildCategories)
.FetchMany(x => x.ParentCategories)
.ToList()
.Where(x => !x.ParentCategories.Any());
All sorting should be done client-side (i.e. after ToList)

LinqToSQL query loadwith performance

I found this link which explains my problem and has an answer, but don't seem to be able to make it work.
Here's what I have for DataLoadOptions:
options.LoadWith<Support>(p => p.PostBase);
options.LoadWith<Support>(p => p.PostMaterial);
options.LoadWith<Support>(p => p.PostPosition);
options.LoadWith<Support>(p => p.PostSize);
options.LoadWith<Support>(p => p.PostType);
options.LoadWith<Support>(p => p.Signs);
options.LoadWith<Support>(p => p.SupportComments);
options.LoadWith<Support>(p => p.SupportInspections);
options.LoadWith<Support>(p => p.SupportPhotos);
options.LoadWith<Sign>(p => p.SignBacking);
options.LoadWith<Sign>(p => p.SignComments);
options.LoadWith<Sign>(p => p.SignCondition);
options.LoadWith<Sign>(p => p.SignDelineator);
options.LoadWith<Sign>(p => p.SignFace);
options.LoadWith<Sign>(p => p.SignIllumination);
options.LoadWith<Sign>(p => p.SignToSignObstructions);
options.LoadWith<Sign>(p => p.UniformTrafficControlCode);
options.LoadWith<SignToSignObstruction>(p => p.SignObstruction);
I think that will give a good explanation of my object graph. I'm trying to query for Support objects that match a certain search criteria (perhaps someone wants supports with post type of blah).
If I try just pulling back all Supports, I get about 2200 Supports and it takes 17k queries.
I attempted the grouping solution mentioned in the other question, but I wonder if either I'm doing it wrong or my situation is just too complex. I removed the search criteria and just tried returning all Supports. This results in about 21k queries and pulls back about 3000 Supports. Here is my query:
var group =
from support in roadDataContext.Supports
join sign in roadDataContext.Signs on support.SupportID equals sign.SupportID
group sign by sign.Support
into signGroup
select signGroup;
Am I just missing something simple? Thanks.
We made the same mistake with our L2S data layer. Our load options are ridiculous in some cases. It was a hard lesson learned.
This is known as the SELECT N+1 problem. 1 for the parent entity, and N for the number of associated entities being eager-loaded. You'd expect L2S to just be smart enough and get it all in one giant query, but this is unfortunately not the case. It will create one giant query, which tells it the IDs of the associations to load, then one by one retrieves those associations.
Perhaps the best work-around is to use projection so your LINQ query returns a new object, rather than an entity. For example:
var fooDtos = from foo in db.Foo
where foo.bar == "What a great example"
select new fooDTO { FooName = foo.Name, FooBar = foo.Bar };
This query returns an IEnumerable<FooDTO> instead of IQueryable<Foo>. This has two benefits. First of all you're instructing L2S specifically which columns to retrieve, so it doesn't do a SELECT *. Also, you don't need DataLoadOptions anymore because you can query any table you want in the query and select from any table to generate the DTO.

Categories

Resources