Entity Framework LINQ SQL Query Performance

Entity Framework LINQ SQL Query Performance - c#

Hello everyone I'm working on an API that returns a dish with its restaurant details from a database that has restaurants and their dishes.
I'm wondering if the following makes the query any efficient by converting the first, to second:
from res in _context.Restaurant
join resdish in _context.RestaurantDish
on res.Id equals resdish.RestaurantId
where resdish.RestaurantDishId == dishId
Second:
from resdish in _context.RestaurantDish
where resdish.RestaurantDishId == dishId
join res in _context.Restaurant
on resdish.RestaurantId equals res.Id
The reason why I'm debating this is because I feel like the second version filters to the single restaurant dish, then joining it, rather than joining all dishes then filtering.
Is this correct?

You can use a profiler on your database to capture the SQL in both cases, or inspect the SQL that EF generates and you'll likely find that the SQL in both cases is virtually identical. It boils down to how the reader (developers) interprets the intention of the logic.
As far as building efficient queries in EF goes, EF is an ORM meaning it offers to map between an object-oriented model and a relational data model. It isn't just an API to enable translating Linq to SQL. Part of the power for writing simple and efficient queries is through the use of navigation properties and projection. A Dish will be considered the property of a particular Restaurant, while a Restaurant has many Dishes on its menu. This forms a One-to-Many relationship in the database, and navigation properties can map this relationship in your object model:
public class Restaurant
{
[Key]
public int RestaurantId { get; set; }
// ... other fields
public virtual ICollection<Dish> Dishes { get; set; } = new List<Dish>();
}
public class Dish
{
[Key]
public int DishId { get; set; }
//[ForeignKey(nameof(Restaurant))]
//public int RestaurantId { get; set; }
public virtual Restaurant Restaurant { get; set; }
}
The FK propery for the Restaurant ID is optional and can be configured to use a Shadow Property. (One that EF knows about and generates, but isn't exposed in the Entity) I recommend using shadow properties for FKs mainly to avoid 2 sources of truth for relationships. (dish.RestaurantId and dish.Restaurant.RestaurantId) Changing the FK does not automatically update the relationship unless you reload the entity, and updating the relationship does not automatically update the FK until you call SaveChanges.
Now if you wanted to get a particular dish and it's associated restaurant:
var dish = _context.Dishes
.Include(d => d.Restaurant)
.Single(d => d.DishId == dishId);
This fetches both entities. Note that there is no need now to manually write Joins like you would with SQL. EF supports Join, but it should only be used in very rare cases where a schema isn't properly normalized/relational and you need to map loosely joined entities/tables. (Such as a table using an "OwnerId" that could join to a "This" or a "That" table based on a discriminator such as OwnerType.)
If you leave off the .Include(d => d.Restaurant) and have lazy loading enabled on the DbContext, then EF would attempt to automatically load the Restaurant if and when the first attempt of the code to access dish.Restaurant. This provides a safety net, but can incur some steep performance penalties in many cases, so it should be avoided or treated as a safety net, not a crutch.
Eager loading works well when dealing with single entities and their related data where you will need to do things with those relationships. For instance if I want to load a Restaurant and review, add/remove dishes, or load a Dish and possibly change the Restaurant. However, eager loading can come at a significant cost in how EF and SQL provides that related data behind the scenes.
By default when you use Include, EF will add an INNER or LEFT join between the associated tables. This creates a Cartesian Product between the involved tables. If you have 100 restaurants that have an average of 30 dishes each and select all 100 restaurants eager loading their dishes, the resulting query is 3000 rows. Now if a Dish has something like Reviews and there are an average of 5 reviews per dish and you eager load Dishes and Reviews, that would be a resultset of every column across all three tables with 15000 rows in total. You can hopefully appreciate how this can grow out of hand pretty fast. EF then goes through that Cartesian and populates the associated entities in the object graph. This can lead to questions about why "my query runs fast in SSMS but slow in EF" since EF can have a lot of work to do, especially if it has been tracking references from restaurants, dishes, and/or reviews to scan through and provide. Later versions of EF can help mitigate this a bit by using query splitting so instead of JOINs, EF can work out to fetch the related data using multiple separate SELECT statements which can execute and process a fair bit faster, but it still amounts to a lot of data going over the wire and needing memory to materialize to work with.
Most of the time though, you won't need ALL rows, nor ALL columns for each and every related entity. This is where Projection comes in such as using Select. When we pull back our list of restaurants, we might want to list the restaurants in a given city along with their top 5 dishes based on user reviews. We only need the RestaurantId & Name to display in these results, along with the Dish name and # of positive reviews. Instead of loading every column from every table, we can define a view model for Restaurants and Dishes for this summary View, and project the entities to these view models:
public class RestaurantSummaryViewModel
{
public int RestaurantId { get; set; }
public string Name { get; set; }
public ICollection<DishSummaryViewModel> Top5Dishes { get; set; } = new List<DishSummaryViewModel>();
}
public class DishSummaryViewModel
{
public string Name { get; set; }
public int PositiveReviewCount {get; set; }
}
var restaurants = _context.Restaurants
.Where(r => r.City.CityId == cityId)
.OrderBy(r => r.Name)
.Select(r => new RestaurantSummaryViewModel
{
RestaurantId = r.RestaurantId,
Name = r.Name,
Top5Dishes = r.Dishes
.OrderByDescending(d => d.Reviews.Where(rv => rv.Score > 3).Count())
.Select(d => new DishSummaryViewModel
{
Name = d.Name,
PositiveReviewCount = d.Reviews.Where(rv => rv.Score > 3).Count()
}).Take(5)
.ToList();
}).ToList();
Notice that the above Linq example doesn't use Join or even Include. Provided you follow a basic set of rules to ensure that EF can work out what you want to project down to SQL you can accomplish a fair bit producing far more efficient queries. The above statement would generate SQL to run across the related tables but would only return the fields needed to populate the desired view models. This allows you to tune indexes based on what data is most commonly needed, and also reduces the amount of data going across the wire, plus memory usage on both the DB and app servers. Libraries like Automapper and it's ProjectTo method can simplify the above statements even more, configuring how to select into the desired view model once, then replacing that whole Select( ... ) with just a ProjectTo<RestaurantSummaryViewModel>(config) where "config" is a reference to the Automapper configuration where it can resolve how to turn Restaurants and their associated entities into the desired view model(s).
In any case it should give you some avenues to explore with EF and learning what it can bring to the table to produce (hopefully:) easy to understand, and efficient query expressions.

Related

Entity design when principal entity state is an aggregate of its dependant entities

Let's define Term as a principal entity and Course as a one to many dependant relationship.
public class Term
{
public int Id { get; set; }
public List<Course> Courses { get; set; } = new List<Course>();
}
public class Course
{
public int Id { get; set; }
public DateTime EndDate { get; set; }
}
A common query criteria for our Term entity is to check whether all of the courses are finished or not and by that we deem the Term as finished or not too, and this implicit state can show itself in a lot of places in our bussiness logic, simple queries to populate view models, etc.
Using ORMs like EntityFramework Core this query can popup in a lot of places.
Terms.Where(t=>t.Courses.All(c=>c.EndTime > DateTime.Now))
Terms.Count(t=>t.Courses.All(c=>c.EndTime > DateTime.Now))
Other examples of this that come to mind are a product and its current inventory count, posts that only contain unconfirmed comments, etc.
What can we consider as best practice if we are to capture these implicit states and make them directly accessible in our principal entity without the need to rehydrate our dependant entity from the database too?
Some solutions that come to mind:
Using a computed column to do a subquery and map it to a property on the principal entity e.g. Term.IsFinished
Defining a normal property on our entity and use a scheduling solution to update its value on predetermined timestamps which is not acceptable in a lot of cases due to inconsistency in different intervals, or use domain events and react upon them to update the property on the principal entity

Create a view, with the two tables joined and aggregated per principal entity.
Use the view directly in Entity Framework instead of the base table.
For bonus points:
In SQL Server you can create a clustered index on the view, and it will be automatically maintained for you. Oracle has a similar concept.
In other RDBMSs, you would need to create a separate table, and maintain it yourself with triggers.

nhibernate force separate queries instead of join

I'm new to nhibernate and I couldn't figure this one out.
I have an entity similiar to below class;
public class MotherCollection
{
public virtual int Id { get; set; }
public virtual string Name { get; set; }
public virtual ISet<Class1> Collection1 { get; set; }
public virtual ISet<Class2> Collection2 { get; set; }
public virtual ISet<Class3> Collection3 { get; set; }
public virtual ISet<Class4> Collection4 { get; set; }
}
There are numerous one to many relationships to other entities.
I configure this relation with below mappings;
HasMany(d => d.Collection1).KeyColumn("McId");
HasMany(d => d.Collection2).KeyColumn("McId");
HasMany(d => d.Collection3).KeyColumn("McId");
HasMany(d => d.Collection4).KeyColumn("McId");
Child classes are configured similiary;
References(c1=>c1.MotherCollection).Column("McId");
and so on.
When I query this entity from db, fetching all relationships, I get a huge query similar to this one :
SELECT * FROM MotherCollection mc
JOIN c1 on mc.Id=c1.mcId
JOIN c2 on mc.Id=c2.mcId
JOIN c3 on mc.Id=c3.mcId
JOIN c4 on mc.Id=c4.mcId
this query causes alot of duplicate rows and takes alot of time to execute.
I want nhibernate to somehow seperate this query to individual SELECT queries, like below
SELECT * FROM MotherCollection Where Id = #Id
SELECT * FROM c1 Where mcId = #Id
and such. A bit similar to how it happens when the collection is lazy loaded.
I managed to achive this behaviour by setting my desired collections as lazy, and accessing their First() property just before it exits my datalayer. However, I'm guessing there must be a more elegant way of doing this in Nhibernate.
I've tried queries similar to this:
var data = session.QueryOver<DataSet>().Fetch(d=>d.Collection1).Eager.Fetch(d=>d.Collection2).Eager....
Thank you.

You should issue 4 separate queries, each one fetching one collection.
And you should use session.Query. QueryOver is an older way of doing it. To use it add using NHibernate.Linq. I usually use the following extension method to prefetch collections:
static public void Prefetch<T>(this IQueryable<T> query)
{
// ReSharper disable once ReturnValueOfPureMethodIsNotUsed
query.AsEnumerable().FirstOrDefault();
}
And then use:
var data = session.Query<DataSet>().Fetch(d=>d.Collection1).ToList();
session.Query<DataSet>().Fetch(d=>d.Collection2).Prefetch();
session.Query<DataSet>().Fetch(d=>d.Collection3).Prefetch();
session.Query<DataSet>().Fetch(d=>d.Collection4).Prefetch();
Make sure to run the 4 queries before accessing the collections. That way when you access them they will all be initialized already. If you use regular lazy loading you will be initializing one collection for one object at a time.

This is called lazy/eager loading. You have two choices to select from:
1. Lazy load with multiple queries:
This will generate multiple queries. While lazy loading, NHibernate first generates the query to get all MotherCollection data and only IDs (without data) from any dependent tables. Then it generates new query with WHERE clause for Primary Key on dependent table. So, this leads to famous N+1 issue.
With this, referenced collections will NOT be filled by default. Those will get filled up when you first access them while ISession is still valid. This is similar to calling First() as you mentioned in your question.
Look at your HasMany configuration; you have not mentioned LazyLoad but it is default. So, with your current mapping, this is what is happening.
This is recommended by NHibernate.
2. Eager load with single complex query:
If you want to avoid multiple queries and retrieve all the data in one go, try something like following:
HasMany(d => d.Collection1).KeyColumn("McId").Inverse().Not.LazyLoad().Fetch.Join();
With this, referenced collections will be filled up (if data present in database) automatically.
Please note that this is against the NHibernate recommendation. Refer this link.
Instead, we keep the default behavior, and override it for a
particular transaction, using left join fetch in HQL. This tells
NHibernate to fetch the association eagerly in the first select, using
an outer join. In the ICriteria query API, you would use
SetFetchMode(FetchMode.Join).
If you ever feel like you wish you could change the fetching strategy
used by Get() or Load(), simply use a ICriteria query, for
example:
User user = (User) session.CreateCriteria<User>()
.SetFetchMode("Permissions", FetchMode.Join)
.Add( Expression.Eq("Id", userId) )
.UniqueResult();
A completely different way to avoid problems with N+1 selects is to
use the second-level cache.
Duplicate rows and Performance
This is actually a different problem. There are multiple ways to handle this; but it will need additional inputs from you. Before that, you should choose one option from above two. Thus this deserves a new question to be asked.
Refer this answer: https://stackoverflow.com/a/30748639/5779732

EF denormalize result of each group join

I have a 1-to-many relationship between a user and his/her schools. I often want to get the primary school for the user (the one with the highest "Type"). This results in having to join the primary school for every query I want to run. A user's schools barely ever change. Are there best practices on how to do this to avoid the constant join? Should I denormalize the models and if so, how? Are there other approaches that are better?
Thanks.
public class User
{
public int Id { get; set; }
public virtual IList<UserSchool> UserSchools { get; set; }
...
}
public class UserSchool
{
public int UserId { get; set; }
public string Name { get; set; }
public int Type { get; set; }
...
}
...
var schools = (from r in _dbcontext.UserSchools
group r by r.UserId into grp
select grp.OrderByDescending(x => x.Type).FirstOrDefault());
var results = (from u in _dbcontext.Users
join us in schools on u.Id equals us.UserId
select new UserContract
{
Id = u.Id,
School = us.Name
});

In past projects, when I opted to denormalize data, I have denormalized it into separate tables which are updated in the background by the database itself, and tried to keep as much of the process contained in the database software, which handles these things much better. Note that any sort of "run every x seconds" solution will cause a lag in how up-to-date your data is. For something like this, it doesn't sound like the data changes that often, so being a few seconds (or minutes, or days, by the sound of it) out of date is not a big concern. If you're considering denormalization, then retrieval speed must be much more important.
I have never had a "hard and fast" criteria for when to denormalize, but in general the data must be:
Accessed often. Like multiple times per page load often. Absolutely
critical to the application often. Retrieval time must be paramount.
Time insensitive. If the data you need is changing all the time, and it is critical that the data you retrieve is up-to-the-minute, denormalization will have too much overhead to buy you much benefit.
Either an extremely large data set or the result of a relatively complex query. Simple joins can usually be handled by proper indexing, and maybe an indexed view.
Already optimized as much as possible. We've already tried things like indexed views, reorganizing indexes, rewriting underlying queries, and things are still too slow.
Denormalizing can be very helpful, but it introduces its own headaches, so you want to be very sure that you are ready to deal with those before you commit to it as a solution to your problem.

Best strategies when working with micro ORM?

I started using PetaPOCO and Dapper and they both have their own limitations. But on the contrary, they are so lightning fast than Entity Framework that I tend to let go the limitations of it.
My question is: Is there any ORM which lets us define one-to-many, many-to-one and many-to-many relationships concretely? Both Dapper.Net and PetaPOCO they kind of implement hack-ish way to fake these relationship and moreover they don't even scale very well when you may have 5-6 joins. If there isn't a single micro ORM that can let us deal with it, then my 2nd question is should I let go the fact that these micro ORMs aren't that good in defining relationships and create a new POCO entity for every single type of query that I would be executing that includes these types of multi joins? Can this scale well?
I hope I am clear with my question. If not, let me know.

I generally follow these steps.
I create my viewmodel in such a way that represents the exact data and format I want to display in a view.
I query straight from the database via PetaPoco on to my view models.
In my branch I have a
T SingleInto<T>(T instance, string sql, params object[] args);
method which takes an existing object and can map columns directly on to it matched by name. This works brilliantly for this scenario.
My branch can be found here if needed.
https://github.com/schotime/petapoco/

they don't even scale very well when you may have 5-6 joins
Yes, they don't, but that is a good thing, because when the system you will be building starts to get complex, you are free to do the joins you want, without performance penalties or headaches.
Yes, I miss when I don't needed to write all this JOINS with Linq2SQL, but then I created a simple tool to write the common joins so I get the basic SQL for any entity and then I can build from there.
Example:
[TableName("Product")]
[PrimaryKey("ProductID")]
[ExplicitColumns]
public class Product {
[PetaPoco.Column("ProductID")]
public int ProductID { get; set; }
[PetaPoco.Column("Name")]
[Display(Name = "Name")]
[Required]
[StringLength(50)]
public String Name { get; set; }
...
...
[PetaPoco.Column("ProductTypeID")]
[Display(Name = "ProductType")]
public int ProductTypeID { get; set; }
[ResultColumn]
public string ProductType { get; set; }
...
...
public static Product SingleOrDefault(int id) {
var sql = BaseQuery();
sql.Append("WHERE Product.ProductID = #0", id);
return DbHelper.CurrentDb().SingleOrDefault<Product>(sql);
}
public static PetaPoco.Sql BaseQuery(int TopN = 0) {
var sql = PetaPoco.Sql.Builder;
sql.AppendSelectTop(TopN);
sql.Append("Product.*, ProductType.Name as ProductType");
sql.Append("FROM Product");
sql.Append(" INNER JOIN ProductType ON Product.ProductoTypeID = ProductType.ProductTypeID");
return sql;
}

Would QueryFirst help here? You get the speed of micro orms, with the added comfort of every-error-a-compile-time-error, plus intellisense both for your queries and their output. You define your joins in SQL as god intended. If typing out join conditions is really bugging you, DBForge might be the answer, and because you're working in SQL, these tools are compatible, and you're not locked in.

Is it possible to eagerly load a foreign natural key with NHibernate?

For example, say I have two tables, Product and Order. Product has ID, Name,Description, Cost, and other detailed columns. Order has ID and ProductID columns (assume an order can only contain one product).
When displaying a list of orders in the system, I would like to also display the associated product name without all of the other data (i.e., eagerly load an order and its associated product name, and lazily load all of the other product properties):
SELECT o.ID, o.ProductID, p.Name FROM Order o JOIN Product p ON o.ProductID=p.ID
If I do this with NHibernate, I have two choices: eager loading or lazy loading.
With eager loading, I get something like:
SELECT o.ID, o.ProductID, p.ID, p.Name, p.Description, p.Cost, p.... FROM Order JOIN Product p ON o.ProductID=p.ID
With lazy loading, I get something like:
SELECT o.ID, o.Product ID from Order
....
SELECT p.Name, p.Description, p.Cost, p.... FROM Product p WHERE p.ID=?
Update
Here is a more concrete example of what I am trying to achieve. I am working with an existing DAL and trying to integrate NHibernate. One of the functions of the current DAL is that it allows retrieval of some basic foreign key information as part of the parent record. Let's say there is a User table and a Region table. Each user has a foreign key to the region in which they reside. When displaying user information in a GUI, the region name should be displayed with the user, but other details about the region are not necessary.
In the current DAL, the User domain object has a member of type ForeignKeyReference<Region>.
public class ForeignKeyReference<T>
{
public virtual int ForeignKeyID { get; set; }
public virtual string ForeignNaturalKey { get; set; }
public virtual T Reference { get; set; }
}
When the User is retrieved from the database, the primary and natural keys for the Region are also retrieved, and the Reference is set to a proxy object. I would like to simplify this with NHibernate, but still maintain this functionality. For example, I would like to remove the ForeignKeyReference<Region> member and just have a Region member which is an NHibernate proxy. On this proxy, I would like to be able to retrieve the ID and the Name without having to hit the database again.

You can mark a specific column as lazy.
Ayende wrote about this new feature last year.
I would suggest you to read the Hibernate documentation about this feature, though:
Hibernate3 supports the lazy fetching of individual properties. This
optimization technique is also known as fetch groups. Please note that
this is mostly a marketing feature; optimizing row reads is much more
important than optimization of column reads. However, only loading
some properties of a class could be useful in extreme cases. For
example, when legacy tables have hundreds of columns and the data
model cannot be improved.
The other alternative is to creare your own queries:
Session.CreateQuery
Have a look at this answer.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.