nhibernate force separate queries instead of join - c#

I'm new to nhibernate and I couldn't figure this one out.
I have an entity similiar to below class;
public class MotherCollection
{
public virtual int Id { get; set; }
public virtual string Name { get; set; }
public virtual ISet<Class1> Collection1 { get; set; }
public virtual ISet<Class2> Collection2 { get; set; }
public virtual ISet<Class3> Collection3 { get; set; }
public virtual ISet<Class4> Collection4 { get; set; }
}
There are numerous one to many relationships to other entities.
I configure this relation with below mappings;
HasMany(d => d.Collection1).KeyColumn("McId");
HasMany(d => d.Collection2).KeyColumn("McId");
HasMany(d => d.Collection3).KeyColumn("McId");
HasMany(d => d.Collection4).KeyColumn("McId");
Child classes are configured similiary;
References(c1=>c1.MotherCollection).Column("McId");
and so on.
When I query this entity from db, fetching all relationships, I get a huge query similar to this one :
SELECT * FROM MotherCollection mc
JOIN c1 on mc.Id=c1.mcId
JOIN c2 on mc.Id=c2.mcId
JOIN c3 on mc.Id=c3.mcId
JOIN c4 on mc.Id=c4.mcId
this query causes alot of duplicate rows and takes alot of time to execute.
I want nhibernate to somehow seperate this query to individual SELECT queries, like below
SELECT * FROM MotherCollection Where Id = #Id
SELECT * FROM c1 Where mcId = #Id
and such. A bit similar to how it happens when the collection is lazy loaded.
I managed to achive this behaviour by setting my desired collections as lazy, and accessing their First() property just before it exits my datalayer. However, I'm guessing there must be a more elegant way of doing this in Nhibernate.
I've tried queries similar to this:
var data = session.QueryOver<DataSet>().Fetch(d=>d.Collection1).Eager.Fetch(d=>d.Collection2).Eager....
Thank you.

You should issue 4 separate queries, each one fetching one collection.
And you should use session.Query. QueryOver is an older way of doing it. To use it add using NHibernate.Linq. I usually use the following extension method to prefetch collections:
static public void Prefetch<T>(this IQueryable<T> query)
{
// ReSharper disable once ReturnValueOfPureMethodIsNotUsed
query.AsEnumerable().FirstOrDefault();
}
And then use:
var data = session.Query<DataSet>().Fetch(d=>d.Collection1).ToList();
session.Query<DataSet>().Fetch(d=>d.Collection2).Prefetch();
session.Query<DataSet>().Fetch(d=>d.Collection3).Prefetch();
session.Query<DataSet>().Fetch(d=>d.Collection4).Prefetch();
Make sure to run the 4 queries before accessing the collections. That way when you access them they will all be initialized already. If you use regular lazy loading you will be initializing one collection for one object at a time.

This is called lazy/eager loading. You have two choices to select from:
1. Lazy load with multiple queries:
This will generate multiple queries. While lazy loading, NHibernate first generates the query to get all MotherCollection data and only IDs (without data) from any dependent tables. Then it generates new query with WHERE clause for Primary Key on dependent table. So, this leads to famous N+1 issue.
With this, referenced collections will NOT be filled by default. Those will get filled up when you first access them while ISession is still valid. This is similar to calling First() as you mentioned in your question.
Look at your HasMany configuration; you have not mentioned LazyLoad but it is default. So, with your current mapping, this is what is happening.
This is recommended by NHibernate.
2. Eager load with single complex query:
If you want to avoid multiple queries and retrieve all the data in one go, try something like following:
HasMany(d => d.Collection1).KeyColumn("McId").Inverse().Not.LazyLoad().Fetch.Join();
With this, referenced collections will be filled up (if data present in database) automatically.
Please note that this is against the NHibernate recommendation. Refer this link.
Instead, we keep the default behavior, and override it for a
particular transaction, using left join fetch in HQL. This tells
NHibernate to fetch the association eagerly in the first select, using
an outer join. In the ICriteria query API, you would use
SetFetchMode(FetchMode.Join).
If you ever feel like you wish you could change the fetching strategy
used by Get() or Load(), simply use a ICriteria query, for
example:
User user = (User) session.CreateCriteria<User>()
.SetFetchMode("Permissions", FetchMode.Join)
.Add( Expression.Eq("Id", userId) )
.UniqueResult();
A completely different way to avoid problems with N+1 selects is to
use the second-level cache.
Duplicate rows and Performance
This is actually a different problem. There are multiple ways to handle this; but it will need additional inputs from you. Before that, you should choose one option from above two. Thus this deserves a new question to be asked.
Refer this answer: https://stackoverflow.com/a/30748639/5779732

Related

Entity Framework LINQ SQL Query Performance

Hello everyone I'm working on an API that returns a dish with its restaurant details from a database that has restaurants and their dishes.
I'm wondering if the following makes the query any efficient by converting the first, to second:
from res in _context.Restaurant
join resdish in _context.RestaurantDish
on res.Id equals resdish.RestaurantId
where resdish.RestaurantDishId == dishId
Second:
from resdish in _context.RestaurantDish
where resdish.RestaurantDishId == dishId
join res in _context.Restaurant
on resdish.RestaurantId equals res.Id
The reason why I'm debating this is because I feel like the second version filters to the single restaurant dish, then joining it, rather than joining all dishes then filtering.
Is this correct?
You can use a profiler on your database to capture the SQL in both cases, or inspect the SQL that EF generates and you'll likely find that the SQL in both cases is virtually identical. It boils down to how the reader (developers) interprets the intention of the logic.
As far as building efficient queries in EF goes, EF is an ORM meaning it offers to map between an object-oriented model and a relational data model. It isn't just an API to enable translating Linq to SQL. Part of the power for writing simple and efficient queries is through the use of navigation properties and projection. A Dish will be considered the property of a particular Restaurant, while a Restaurant has many Dishes on its menu. This forms a One-to-Many relationship in the database, and navigation properties can map this relationship in your object model:
public class Restaurant
{
[Key]
public int RestaurantId { get; set; }
// ... other fields
public virtual ICollection<Dish> Dishes { get; set; } = new List<Dish>();
}
public class Dish
{
[Key]
public int DishId { get; set; }
//[ForeignKey(nameof(Restaurant))]
//public int RestaurantId { get; set; }
public virtual Restaurant Restaurant { get; set; }
}
The FK propery for the Restaurant ID is optional and can be configured to use a Shadow Property. (One that EF knows about and generates, but isn't exposed in the Entity) I recommend using shadow properties for FKs mainly to avoid 2 sources of truth for relationships. (dish.RestaurantId and dish.Restaurant.RestaurantId) Changing the FK does not automatically update the relationship unless you reload the entity, and updating the relationship does not automatically update the FK until you call SaveChanges.
Now if you wanted to get a particular dish and it's associated restaurant:
var dish = _context.Dishes
.Include(d => d.Restaurant)
.Single(d => d.DishId == dishId);
This fetches both entities. Note that there is no need now to manually write Joins like you would with SQL. EF supports Join, but it should only be used in very rare cases where a schema isn't properly normalized/relational and you need to map loosely joined entities/tables. (Such as a table using an "OwnerId" that could join to a "This" or a "That" table based on a discriminator such as OwnerType.)
If you leave off the .Include(d => d.Restaurant) and have lazy loading enabled on the DbContext, then EF would attempt to automatically load the Restaurant if and when the first attempt of the code to access dish.Restaurant. This provides a safety net, but can incur some steep performance penalties in many cases, so it should be avoided or treated as a safety net, not a crutch.
Eager loading works well when dealing with single entities and their related data where you will need to do things with those relationships. For instance if I want to load a Restaurant and review, add/remove dishes, or load a Dish and possibly change the Restaurant. However, eager loading can come at a significant cost in how EF and SQL provides that related data behind the scenes.
By default when you use Include, EF will add an INNER or LEFT join between the associated tables. This creates a Cartesian Product between the involved tables. If you have 100 restaurants that have an average of 30 dishes each and select all 100 restaurants eager loading their dishes, the resulting query is 3000 rows. Now if a Dish has something like Reviews and there are an average of 5 reviews per dish and you eager load Dishes and Reviews, that would be a resultset of every column across all three tables with 15000 rows in total. You can hopefully appreciate how this can grow out of hand pretty fast. EF then goes through that Cartesian and populates the associated entities in the object graph. This can lead to questions about why "my query runs fast in SSMS but slow in EF" since EF can have a lot of work to do, especially if it has been tracking references from restaurants, dishes, and/or reviews to scan through and provide. Later versions of EF can help mitigate this a bit by using query splitting so instead of JOINs, EF can work out to fetch the related data using multiple separate SELECT statements which can execute and process a fair bit faster, but it still amounts to a lot of data going over the wire and needing memory to materialize to work with.
Most of the time though, you won't need ALL rows, nor ALL columns for each and every related entity. This is where Projection comes in such as using Select. When we pull back our list of restaurants, we might want to list the restaurants in a given city along with their top 5 dishes based on user reviews. We only need the RestaurantId & Name to display in these results, along with the Dish name and # of positive reviews. Instead of loading every column from every table, we can define a view model for Restaurants and Dishes for this summary View, and project the entities to these view models:
public class RestaurantSummaryViewModel
{
public int RestaurantId { get; set; }
public string Name { get; set; }
public ICollection<DishSummaryViewModel> Top5Dishes { get; set; } = new List<DishSummaryViewModel>();
}
public class DishSummaryViewModel
{
public string Name { get; set; }
public int PositiveReviewCount {get; set; }
}
var restaurants = _context.Restaurants
.Where(r => r.City.CityId == cityId)
.OrderBy(r => r.Name)
.Select(r => new RestaurantSummaryViewModel
{
RestaurantId = r.RestaurantId,
Name = r.Name,
Top5Dishes = r.Dishes
.OrderByDescending(d => d.Reviews.Where(rv => rv.Score > 3).Count())
.Select(d => new DishSummaryViewModel
{
Name = d.Name,
PositiveReviewCount = d.Reviews.Where(rv => rv.Score > 3).Count()
}).Take(5)
.ToList();
}).ToList();
Notice that the above Linq example doesn't use Join or even Include. Provided you follow a basic set of rules to ensure that EF can work out what you want to project down to SQL you can accomplish a fair bit producing far more efficient queries. The above statement would generate SQL to run across the related tables but would only return the fields needed to populate the desired view models. This allows you to tune indexes based on what data is most commonly needed, and also reduces the amount of data going across the wire, plus memory usage on both the DB and app servers. Libraries like Automapper and it's ProjectTo method can simplify the above statements even more, configuring how to select into the desired view model once, then replacing that whole Select( ... ) with just a ProjectTo<RestaurantSummaryViewModel>(config) where "config" is a reference to the Automapper configuration where it can resolve how to turn Restaurants and their associated entities into the desired view model(s).
In any case it should give you some avenues to explore with EF and learning what it can bring to the table to produce (hopefully:) easy to understand, and efficient query expressions.

Entity Framework 6: virtual collections lazy loaded even explicitly loaded on a query

I have a problem with EF6 when trying to optimize the queries. Consider this class with one collection:
public class Client
{
... a lot of properties
public virtual List<Country> Countries { get; set; }
}
As you might know, with Lazy Loading I have this n+1 problem, when EF tries to get all the Countries, for each client.
I tried to use Linq projections; for example:
return _dbContext.Clients
.Select(client => new
{
client,
client.Countries
}).ToList().Select(data =>
{
data.client.Countries = data.Countries; // Here is the problem
return data.client;
}).ToList();
Here I'm using two selects: the first for the Linq projection, so EF can create the SQL, and the second to map the result to a Client class. The reason for that is because I'm using a repository interface, which returns List<Client>.
Despite the query is generated with the Countries in it, EF still is using Lazy Loading when I try to render the whole information (the same n+1 problem). The only way to avoid this, is to remove the virtual accessor:
public class Client
{
... a lot of properties
public List<Country> Countries { get; set; }
}
The issue I have with this solution is that we still want to have this property as virtual. This optimization is only necessary for a particular part of the application, whilst on the other sections we want to have this Lazy Loading feature.
I don't know how to "inform" EF about this property, that has been already lazy-loaded via this Linq projection. Is that possible? If not, do we have any other options? The n+1 problems makes the application to take several seconds to load like 1000 rows.
Edit
Thanks for the responses. I know I can use the Include() extension to get the collections, but my problem is with some additional optimizations I need to add (I'm sorry for not posting the complete example, I thought with the Collection issue would be enough):
public class Client
{
... a lot of properties
public virtual List<Country> Countries { get; set; }
public virtual List<Action> Actions { get; set; }
public virtual List<Investment> Investments { get; set; }
public User LastUpdatedBy {
get {
if(Actions != null) {
return Actions.Last();
}
}
}
}
If I need to render the clients, the information about the last update and the number of investments (Count()), with the Include() I practically need to bring all the information from the database. However, if I use the projection like
return _dbContext.Clients
.Select(client => new
{
client,
client.Countries,
NumberOfInvestments = client.Investments.Count() // this is translated to an SQL query
LastUpdatedBy = client.Audits.OrderByDescending(m => m.Id).FirstOrDefault(),
}).ToList().Select(data =>
{
// here I map back the data
return data.client;
}).ToList();
I can reduce the query, getting only the required information (in the case of LastUpdatedBy I need to change the property to a getter/setter one, which is not a big issue, as its only used for this particular part of the application).
If I use the Select() with this approach (projection and then mapping), the Include() section is not considered by EF.
If i understand correctly you can try this
_dbContext.LazyLoading = false;
var clientWithCountres = _dbContext.Clients
.Include(c=>c.Countries)
.ToList();
This will fetch Client and only including it Countries. If you disable lazy-loading the no other collection will load from the query. Unless you are specifying a include or projection.
FYI : Projection and Include() doesn't work together see this answer
If you are projection it will bypass the include.
https://stackoverflow.com/a/7168225/1876572
don't know what you want to do, you are using lambda expression not linq, and your second select it's unnecessary.
data.client is client, data.Countries is client.Countries, so data.client.Countries = data.Countries alway true.
if you don't want lazy load Countries, use _dbContext.Clients.Include("Countries").Where() or select ().
In order to force eager loading of virtual properties you are supposed to use Include extension method.
Here is a link to MSDN https://msdn.microsoft.com/en-us/library/jj574232(v=vs.113).aspx.
So something like this should work:
return _dbContext.Clients.Include(c=>c.Countries).ToList();
Im not 100% sure but I think your issue is that you are still maintaining a queryable for your inner collection through to the end of the query.
This queryable is lazy (because in the model it was lazy), and you havent done anything to explain that this should not be the case, you have simply projected that same lazy queryable into the result set.
I cant tell you off the top of my head what the right answer here is but I would try things around the following:
1 use a projection on the inner queriable also eg
return _dbContext.Clients
.Select(client => new
{
client,
Countries = client.Countries.Select(c=>c)// or a new Country
})
2 Put the include at the end of the query (Im pretty sure include applies to the result not the input. It definitally doesnt work if you put it before a projection) eg:
_dbContext.Clients
.Select(client => new
{
client,
client.Countries
}.Include(c=>c.Countries)`
3 Try specifying the enumeration inside the projection eg:
_dbContext.Clients
.Select(client => new
{
client,
Countries = client.Countries.AsEnumerable() //perhaps tolist if it works
}`
I do want to caviat this by saying that I havent tried any of the above but I think this will set you on the right path.
A note on lazy loading
IMO there are very few good use cases for lazy loading. It almost always causes too many queries to be generated, unless your user is following a lazy path directly on the model. Use it only with extreme caution and IMO not at all in request response (eg web) apps.

Entity Framework Code First - select based on type of child TPH class

I am currently working with Entity Framework 4 on a project that is using Table Per Hierarchy to represent one set of classes using a single table. The way this works is that this table represents states and different states are associated with different other classes.
So you might imagine it to look like this, ignoring the common fields all states share:
InactiveState
has a -> StateStore
ActiveState
has a -> ActivityLocation
CompletedState
has a -> CompletionDate
Each state has a collection of all the InventoryItems that belong to it.
Now each item in my inventory has many states, where the last one in the history is the current state. To save on lists I have a shortcut field that points to the current state of my Inventory:
public class InventoryItem : Entity
{
// whole bunch of other properties
public virtual ICollection<InventoryState> StateHistory { get; set; }
public virtual InventoryState LastState { get; set; }
}
The first problem I am having is when I want to find, for example, all the InventoryItems which are in the Active state.
It turns out Linq-To-Entities doesn't support GetType() so I can't use a statement like InventoryRepository.Get().Where( x => x.LastState.GetType() == someType ). I can use the is operator, but that requires a fixed type so rather than being able to have a method like this:
public ICollection<InventoryItem> GetInventoryItemsByState( Type state )
{
return inventoryRepository.Get().Where( x => x.LastState is state );
}
I have to run some kind of if statement based on the type before I make the Linq query, which feels wrong. The InventoryItem list is likely to get large, so I need to do this at the EF level, I can't pull the whole list into memory and use GetType(), for example.
I have two questions in this situation, connected closely enough that I think they can be combined as they probably reflect a lack of understanding on my part:
Is it possible to find a list of items that share a child table type using Linq To Entities?
Given that I am not using Lazy Loading, is it possible to Include related items for child table types using TPH so that, for example, if I have an InactiveState as the child of my InventoryItem I can preload the StateStore for that InactiveState?
Is it possible to find a list of items that share a child table type
using Linq To Entities?
I don't think it's possible in another way than using an if/switch that checks for the type and builds a filter expression using is T or OfType<T>. You could encapsulate this logic into an extension method for example to have a single place to maintain and a reusable method:
public static class Extensions
{
public static IQueryable<InventoryItem> WhereIsLastState(
this IQueryable<InventoryItem> query, Type state)
{
if (state == typeof(InactiveState))
return query.Where(i => i.LastState is InactiveState);
if (state == typeof(ActiveState))
return query.Where(i => i.LastState is ActiveState);
if (state == typeof(CompletedState))
return query.Where(i => i.LastState is CompletedState);
throw new InvalidOperationException("Unsupported type...");
}
}
To be used like this:
public ICollection<InventoryItem> GetInventoryItemsByState(Type state)
{
return inventoryRepository.Get().WhereIsLastState(state).ToList();
}
I don't know if it would be possible to build the i => i.LastState is XXX expression manually using the .NET Expression API and based on the Type passed into the method. (Would interest me too, to be honest, but I have almost no clue about expression manipulation to answer that myself.)
Given that I am not using Lazy Loading, is it possible to Include
related items for child table types using TPH so that, for example, if
I have an InactiveState as the child of my InventoryItem I can preload
the StateStore for that InactiveState?
I am not sure if I understand that correctly but generally eager loading with Include does not support any filtering or additional operations on specific included children.
One way to circumvent this limitation and still get the result in a single database roundtrip is using a projection which would look like this:
var result = context.InventoryItems
.Select(i => new
{
InventoryItem = i,
LastState = i.LastState,
StateStore = (i.LastState is InactiveState)
? (i.LastState as InactiveState).StateStore
: null
})
.AsEnumerable()
.Select(x => x.InventoryItem)
.ToList();
If the query is a tracked query (which it is in the example above) and the relationships are not many-to-many (they are not in your example) the context will fixup the relationships when the entities are loaded into the context, that is InventoryItem.LastState and InventoryItem.LastState.StateStore (if LastState is of type InactiveState) will be set to the loaded entities (as if they had been loaded with eager loading).

Improving efficiency with Entity Framework

I have been using the Entity Framework with the POCO First approach. I have pretty much followed the pattern described by Steve Sanderson in his book 'Pro ASP.NET MVC 3 Framework', using a DI container and DbContext class to connect to SQL Server.
The underlying tables in SQL server contain very large datasets used by different applications. Because of this I have had to create views for the entities I need in my application:
class RemoteServerContext : DbContext
{
public DbSet<Customer> Customers { get; set; }
public DbSet<Order> Orders { get; set; }
public DbSet<Contact> Contacts { get; set; }
...
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<Customer>().ToTable("vw_Customers");
modelBuilder.Entity<Order>().ToTable("vw_Orders");
...
}
}
and this seems to work fine for most of my needs.
The problem I have is that some of these views have a great deal of data in them so that when I call something like:
var customers = _repository.Customers().Where(c => c.Location == location).Where(...);
it appears to be bringing back the entire data set, which can take some time before the LINQ query reduces the set to those which I need. This seems very inefficient when the criteria is only applicable to a few records and I am getting the entire data set back from SQL server.
I have tried to work around this by using stored procedures, such as
public IEnumerable<Customer> CustomersThatMatchACriteria(string criteria1, string criteria2, ...) //or an object passed in!
{
return Database.SqlQuery<Customer>("Exec pp_GetCustomersForCriteria #crit1 = {0}, #crit2 = {1}...", criteria1, criteria2,...);
}
whilst this is much quicker, the problem here is that it doesn't return a DbSet and so I lose all of the connectivity between my objects, e.g. I can't reference any associated objects such as orders or contacts even if I include their IDs because the return type is a collection of 'Customers' rather than a DbSet of them.
Does anyone have a better way of getting SQL server to do the querying so that I am not passing loads of unused data around?
var customers = _repository.Customers().Where(c => c.Location == location).Where(...
If Customers() returns IQueryable, this statement alone won't actually be 'bringing back' anything at all - calling Where on an IQueryable gives you another IQueryable, and it's not until you do something that causes query execution (such as ToList, or FirstOrDefault) that anything will actually be executed and results returned.
If however this Customers method returns a collection of instantiated objects, then yes, since you are asking for all the objects you're getting them all.
I've never used either code-first or indeed even then repository pattern, so I don't know what to advise, other than staying in the realm of IQueryable for as long as possible, and only executing the query once you've applied all relevant filters.
What I would have done to return just a set of data would have been the following:
var customers = (from x in Repository.Customers where <boolean statement> &&/|| <boolean statement select new {variableName = x.Name , ...).Take(<integer amount for amount of records you need>);
so for instance:
var customers = (from x in _repository.Customers where x.ID == id select new {variableName = x.Name} ).take(1000);
then Iterate through the results to get the data: (remember, the linq statement returns an IQueryable)...
foreach (var data in customers)
{
string doSomething = data.variableName; //to get data from your query.
}
hope this helps, not exactly the same methods, but I find this handy in my code
Probably it's because your Cusomters() method in your repository is doing a GetAll() kind of thing and fetching the entire list first. This prohibits LINQ and your SQL Server from creating smart queries.
I don't know if there's a good workaround for your repository, but if you would do something like:
using(var db = new RemoteServerContext())
{
var custs = db.Customers.Where(...);
}
I think that will be a lot quicker. If your project is small enough, you can do without a repository. Sure, you'll lose an abstraction layer, but with small projects this may not be a big problem.
On the other hand, you could load all Customers in your repository once and use the resulting collection directly (instead of the method-call that fills the list). Beware of adding, removing and modifying Customers though.
You need the LINQ query to return less data like sql paging like top function in sql or do manual querying using stored procedures. In either cases, you need to rewrite your querying mechanism. This is one of the reasons why I didn't use EF, because you don't have a lot of control over the code it seems.

EF Query to Get Union of All Child Collections

Assuming I have an Entity Framework 4.2 class like this:
class Company
{
public int ID { get; set; }
public ICollection<Employee> Employees { get; set; }
}
And I have collection like this:
public ICollection<Company> Companies { get; set; }
What's an efficient way to build a list of all employees in the entire Companies collection? I do not need to worry about duplicates.
Note: I'm trying to do it without an Entity Framework context--just using the Companies collection. However, if this entails a huge performance hit, I could get a context if necessary.
I'm not sure what you're trying to achieve, but to query the data you'll want to use the DbContext.
You can configure you context to automatically load the related entities by setting LazyLoadingEnabled to true, or explicitly, i.e. when querying the data, include the path to the employees, using the Include extension provided with EF > 4.0.
As soon as all companies are there you could linq it using the SelectMany method. I haven't checked, but AFAIK, LINQ to SQL will execute the query as one bunch on demand.
If you insist not to expose your DbContext or if you use a different underlying context, I would suggest you to wrap it up in a repository that does the SelectMany behind the since, but your friend are Include and SelectMany.

Categories

Resources