I have a SQL Server database table that has a couple million records in it. I have an MVC site with a page to display data from this table, and I'm running into extensive performance issues.
Running a simple query like this takes about 25-30 seconds to return about two thousand rows:
_dbContext.Contracts
.Where(c => c.VendorID == vendorId)
.ToList();
When I run a query against the database, it only takes a couple seconds.
Turns out, EF is loading all the related entities for my Contract, so it's slowing down my query a ton.
In the debugger, the objects returned are of a strange type, not sure if that's an issue:
System.Data.Entity.DynamicProxies.Contract_3EF6BECBB56F2ADDDA6E0050AC82D03A4E993CEDF4FCA49244D3EE4005572C46
And the same with the related entities on my Contract:
System.Data.Entity.DynamicProxies.Vendor_4FB727808BD6E0BF3B25085B40F3F0B9B10EE4BD17D2A4C600214634F494DB66
The site is a bit old, it's MVC 3 with EF 4. I know on the current version of EF, I have to explicitly use Include() to get related entities, but here it seems to be included automatically.
I have an EDMX file, with a .tt file and entity classes under that, but I don't see anywhere that I can prevent my Courses from getting related objects.
Is there any way for me to do that?
If your MVC controller is returning Entities to the view, the trap you're hitting is that the serializer is iterating over the entities returned and lazy-loading all related data. This is considerably worse than triggering an eager load because in the case of loading collections, this will fetch related entities/sets one parent at a time.
Say I fetch 100 Contracts and contracts contain a Vendor reference.
Eager loading I would use:
context.Contracts.Where(x => /* condition */).Include(x => x.Vendor).ToList();
which would compose 1 query loading all applicable contracts and their vendor details. However, if you let the serializer lazy load Vendors you get effectively the following:
context.Contracts.Where(x => /* condition */).ToList(); // gets applicable contracts...
// This happens behind the scenes for every single related entity touched while serializing...
context.Vendors.Where(x => x.VendorId == 1);
context.Vendors.Where(x => x.VendorId == 1);
// ... continue for each and every contract returned in the above list...
If Contract also has an Employee reference...
context.Employees.Where(x => x.EmployeeId == 16);
context.Employees.Where(x => x.EmployeeId == 12);
context.Employees.Where(x => x.EmployeeId == 11);
... and this continues for every related entity/collection in each contract and each related entity. It adds up, fast. You can see just how crazy it gets by hooking up a profiler to your server and kicking off a read. You expect 1 SQL, but then get hit with hundreds to thousands of calls.
The best way to avoid this is to simply not return entities from controllers, instead compose a view model with just the detail you want to display and use .Select() or Automapper's .ProjectTo<ViewModel>() to populate it from an EF query. This avoids falling into the trap of having a serializer touching lazy load properties, and also minimizes the payload sent to the client.
So if I wanted to display a list of contracts for a vendor and I only needed to display the Contract ID, the contract #, and a dollar figure:
[Serializable]
public class ContractSummaryViewModel
{
public int ContractId { get; set; }
public string ContractNumber { get; set; }
public decimal Amount { get; set; }
}
var contracts = _dbContext.Contracts
.Where(c => c.VendorID == vendorId)
.Select( c => new ContractSummaryViewModel
{
ContractId = c.ContractId,
ContractNumber = c.ContractNumber,
Amount = c.Amount
})
.ToList();
You can include details from related entities into the view model or compose related view models for key details, all without having to worry about using .Include() or tripping lazy loading. This composes a single SQL statement to load just the data you need, and passes just that back to the UI. By streamlining the payload the performance can increase quite dramatically.
Related
Hello everyone I'm working on an API that returns a dish with its restaurant details from a database that has restaurants and their dishes.
I'm wondering if the following makes the query any efficient by converting the first, to second:
from res in _context.Restaurant
join resdish in _context.RestaurantDish
on res.Id equals resdish.RestaurantId
where resdish.RestaurantDishId == dishId
Second:
from resdish in _context.RestaurantDish
where resdish.RestaurantDishId == dishId
join res in _context.Restaurant
on resdish.RestaurantId equals res.Id
The reason why I'm debating this is because I feel like the second version filters to the single restaurant dish, then joining it, rather than joining all dishes then filtering.
Is this correct?
You can use a profiler on your database to capture the SQL in both cases, or inspect the SQL that EF generates and you'll likely find that the SQL in both cases is virtually identical. It boils down to how the reader (developers) interprets the intention of the logic.
As far as building efficient queries in EF goes, EF is an ORM meaning it offers to map between an object-oriented model and a relational data model. It isn't just an API to enable translating Linq to SQL. Part of the power for writing simple and efficient queries is through the use of navigation properties and projection. A Dish will be considered the property of a particular Restaurant, while a Restaurant has many Dishes on its menu. This forms a One-to-Many relationship in the database, and navigation properties can map this relationship in your object model:
public class Restaurant
{
[Key]
public int RestaurantId { get; set; }
// ... other fields
public virtual ICollection<Dish> Dishes { get; set; } = new List<Dish>();
}
public class Dish
{
[Key]
public int DishId { get; set; }
//[ForeignKey(nameof(Restaurant))]
//public int RestaurantId { get; set; }
public virtual Restaurant Restaurant { get; set; }
}
The FK propery for the Restaurant ID is optional and can be configured to use a Shadow Property. (One that EF knows about and generates, but isn't exposed in the Entity) I recommend using shadow properties for FKs mainly to avoid 2 sources of truth for relationships. (dish.RestaurantId and dish.Restaurant.RestaurantId) Changing the FK does not automatically update the relationship unless you reload the entity, and updating the relationship does not automatically update the FK until you call SaveChanges.
Now if you wanted to get a particular dish and it's associated restaurant:
var dish = _context.Dishes
.Include(d => d.Restaurant)
.Single(d => d.DishId == dishId);
This fetches both entities. Note that there is no need now to manually write Joins like you would with SQL. EF supports Join, but it should only be used in very rare cases where a schema isn't properly normalized/relational and you need to map loosely joined entities/tables. (Such as a table using an "OwnerId" that could join to a "This" or a "That" table based on a discriminator such as OwnerType.)
If you leave off the .Include(d => d.Restaurant) and have lazy loading enabled on the DbContext, then EF would attempt to automatically load the Restaurant if and when the first attempt of the code to access dish.Restaurant. This provides a safety net, but can incur some steep performance penalties in many cases, so it should be avoided or treated as a safety net, not a crutch.
Eager loading works well when dealing with single entities and their related data where you will need to do things with those relationships. For instance if I want to load a Restaurant and review, add/remove dishes, or load a Dish and possibly change the Restaurant. However, eager loading can come at a significant cost in how EF and SQL provides that related data behind the scenes.
By default when you use Include, EF will add an INNER or LEFT join between the associated tables. This creates a Cartesian Product between the involved tables. If you have 100 restaurants that have an average of 30 dishes each and select all 100 restaurants eager loading their dishes, the resulting query is 3000 rows. Now if a Dish has something like Reviews and there are an average of 5 reviews per dish and you eager load Dishes and Reviews, that would be a resultset of every column across all three tables with 15000 rows in total. You can hopefully appreciate how this can grow out of hand pretty fast. EF then goes through that Cartesian and populates the associated entities in the object graph. This can lead to questions about why "my query runs fast in SSMS but slow in EF" since EF can have a lot of work to do, especially if it has been tracking references from restaurants, dishes, and/or reviews to scan through and provide. Later versions of EF can help mitigate this a bit by using query splitting so instead of JOINs, EF can work out to fetch the related data using multiple separate SELECT statements which can execute and process a fair bit faster, but it still amounts to a lot of data going over the wire and needing memory to materialize to work with.
Most of the time though, you won't need ALL rows, nor ALL columns for each and every related entity. This is where Projection comes in such as using Select. When we pull back our list of restaurants, we might want to list the restaurants in a given city along with their top 5 dishes based on user reviews. We only need the RestaurantId & Name to display in these results, along with the Dish name and # of positive reviews. Instead of loading every column from every table, we can define a view model for Restaurants and Dishes for this summary View, and project the entities to these view models:
public class RestaurantSummaryViewModel
{
public int RestaurantId { get; set; }
public string Name { get; set; }
public ICollection<DishSummaryViewModel> Top5Dishes { get; set; } = new List<DishSummaryViewModel>();
}
public class DishSummaryViewModel
{
public string Name { get; set; }
public int PositiveReviewCount {get; set; }
}
var restaurants = _context.Restaurants
.Where(r => r.City.CityId == cityId)
.OrderBy(r => r.Name)
.Select(r => new RestaurantSummaryViewModel
{
RestaurantId = r.RestaurantId,
Name = r.Name,
Top5Dishes = r.Dishes
.OrderByDescending(d => d.Reviews.Where(rv => rv.Score > 3).Count())
.Select(d => new DishSummaryViewModel
{
Name = d.Name,
PositiveReviewCount = d.Reviews.Where(rv => rv.Score > 3).Count()
}).Take(5)
.ToList();
}).ToList();
Notice that the above Linq example doesn't use Join or even Include. Provided you follow a basic set of rules to ensure that EF can work out what you want to project down to SQL you can accomplish a fair bit producing far more efficient queries. The above statement would generate SQL to run across the related tables but would only return the fields needed to populate the desired view models. This allows you to tune indexes based on what data is most commonly needed, and also reduces the amount of data going across the wire, plus memory usage on both the DB and app servers. Libraries like Automapper and it's ProjectTo method can simplify the above statements even more, configuring how to select into the desired view model once, then replacing that whole Select( ... ) with just a ProjectTo<RestaurantSummaryViewModel>(config) where "config" is a reference to the Automapper configuration where it can resolve how to turn Restaurants and their associated entities into the desired view model(s).
In any case it should give you some avenues to explore with EF and learning what it can bring to the table to produce (hopefully:) easy to understand, and efficient query expressions.
I have the method below to load dependent data from navigation property. However, it generates an error. I can remove the error by adding ToList() or ToArray(), but I'd rather not do that for performance reasons. I also cannot set the MARS property in my web.config file because it causes a problem for other classes of the connection.
How can I solve this without using extension methods or editing my web.config?
public override void Load(IEnumerable<Ques> data)
{
if (data.Any())
{
foreach (var pstuu in data)
{
if (pstuu?.Id_user != null)
{
db.Entry(pstuu).Reference(q => q.Users).Load();
}
}
}
}
I take it from this question you've got a situation something like:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
LoadDependent(query);
Chances are based on this method it's probably a call stack of various methods that build search expressions and such, but ultimately what gets passed into LoadDependent() is an IQueryable<TEntity>.
Instead if you call:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
var data = query.ToList();
LoadDependent(data);
Or.. in your LoadDependent changing doing something like:
base.LoadDependent(data);
data = data.ToList();
or better,
foreach (Ques qst in data.ToList())
Then your LoadDependent() call works, but in the first example you get an error that a DataReader is open. This is because your foreach call as-is would be iterating over the IQueryable meaning EF's data reader would be left open so further calls to db, which I'd assume is a module level variable for the DbContext that is injected, cannot be made.
Replacing this:
db.Entry(qst).Reference(q => q.AspNetUsers).Load();
with this:
db.Entry(qst).Reference(q => q.AspNetUsers).LoadAsync();
... does not actually work. This just delegates the load call asynchronously, and without awaiting it, it too would fail, just not raise the exception on the continuation thread.
As mentioned in the comments to your question this is a very poor design choice to handle loading references. You are far, far better off enabling lazy loading and taking the Select n+1 hit if/when a reference is actually needed if you aren't going to implement the initial fetch properly with either eager loading or projection.
Code like this forces a Select n+1 pattern throughout your code.
A good example of loading a "Ques" with it's associated User eager loaded:
var ques = db.Ques
.Include(x => x.AspNetUsers)
.Where(x => x.SomeCondition == someCondition)
.ToList();
Whether "SomeCondition" results in 1 Ques returned or 1000 Ques returned, the data will execute with one query to the DB.
Select n+1 scenarios are bad because in the case where 1000 Ques are returned with a call to fetch dependencies you get:
var ques = db.Ques
.Where(x => x.SomeCondition == someCondition)
.ToList(); // 1 query.
foreach(var q in ques)
db.Entry(q).Reference(x => x.AspNetUsers).Load(); // 1 query x 1000
1001 queries run. This compounds with each reference you want to load.
Which then looks problematic where later code might want to offer pagination such as to take only 25 items where the total record count could run in the 10's of thousands or more. This is where lazy loading would be the lesser of two Select n+1 evils, as with lazy loading you know that AspNetUsers would only be selected if any returned Ques actually referenced it, and only for those Ques that actually reference it. So if the pagination only "touched" 25 rows, Lazy Loading would result in 26 queries. Lazy loading is a trap however as later code changes could inadvertently lead to performance issues appearing in seemingly unrelated areas as new referenences or code changes result in far more references being "touched" and kicking off a query.
If you are going to pursue a LoadDependent() type method then you need to ensure that it is called as late as possible, once you have a known set size to load because you will need to materialize the collection to load related entities with the same DbContext instance. (I.e. after pagination) Trying to work around it using detached instances (AsNoTracking()) or by using a completely new DbContext instance may give you some headway but will invariably lead to more problems later, as you will have a mix of tracked an untracked entities, or worse, entities tracked by different DbContexts depending on how these loaded entities are consumed.
An alternative teams pursue is rather than a LoadReference() type method would be an IncludeReference() type method. The goal here being to build .Include statements into the IQueryable. This can be done two ways, either by magic strings (property names) or by passing in expressions for the references to include. Again this can turn into a bit of a rabbit hole when handling more deeply nested references. (I.e. building .Include().ThenInclude() chains.) This avoids the Select n+1 issue by eager loading the required related data.
I have solved the problem by deletion the method Load and I have used Include() in my first query of data to show the reference data in navigation property
So far, I have a lot of data to load on my list. When I was using normal (sync) way to load data, it was about 20 seconds to load all the data. I made this asynchronous method and now I need about 7 seconds to load.
I wonder if there is a way to speed it up, for example, loading the first 20 cards as soon as the screen is opened, and then everything else? This is my code so far ..
public async Task<List<CardObject>> GetCardsAsync()
{
using (var context = new MyCARDEntities())
{
return await context.Card
.Include(f => f.Person)
.Include(k => k.CardType)
.Where(arg => arg.LastAction != "D" && arg.PERSON_ID != null)
.Select(k => new CardObject()
{
Id = k.Id,
UID = k.UID,
Person = new PersonBasicObject()
{
Id = k.PersonBasicObject.Id,
OIB = k.PersonBasicObject.OIB,
Name = k.PersonBasicObject.Name,
LastName = k.PersonBasicObject.LastName
}
})
.ToListAsync();
}
}
and this is in viewModel
private async void LoadCards()
{
var cards = await repKartica.GetCardsAsync();
CardLst = new ObservableCollection<CardObject>(cards);
}
private ObservableCollection<CardObject> _CardLst;
public ObservableCollection<CardObject> CardLst
{
get => _CardLst;
set
{
_CardLst= value;
RaisePropertyChanged(() => CardLst);
}
}
Making a method async doesn't make a method faster, it just allows the method to surrender the executing thread to allow other code to run. This can make the code more responsive, but won't make retrieving that particular data any faster.
Firstly, loading a large amount of data is not a good idea if it can be avoided. Do clients need to see all of this data at once, or is it / can it be paginated into pages of 20 cards at a time? If it is, or can be paged, then consider employing a paginated collection which can query specific pages of data (utilizing .Skip() & .Take()) to pull back just 20 or so records at a time as the visible page changes.
The next thing would be to look at the query being executed. Run a profiler such as ExpressProfiler against your database and capture the SQL being run by EF. Execute a copy of those queries in Enterprise Manager to get an execution plan and see if there are index recommendations.
Other tips: When utilizing .Select() you do not need to use .Include(). Select will generate a query to pull from related entities automatically.
Are PersonBasicObject and CardObject your Entity definitions? If so, how many fields are in these entities that would remain un-filled by this Select? Ideally you should use dedicated view models rather than passing entities. By filling entities using Select() to selectively populate data, you are composing an entity that is not an entity in the sense that it is not a complete representation of an entity. This can be inefficient due to variables that still take up memory but are not populated, and it is misleading and lead to bugs because you will have code that expects entities, and could get called/reused but then be dealing with "real" complete entities vs. incomplete entities that were loaded like this. Entities' sole purpose should be to represent data state, not transport for view state.
I have a problem with EF6 when trying to optimize the queries. Consider this class with one collection:
public class Client
{
... a lot of properties
public virtual List<Country> Countries { get; set; }
}
As you might know, with Lazy Loading I have this n+1 problem, when EF tries to get all the Countries, for each client.
I tried to use Linq projections; for example:
return _dbContext.Clients
.Select(client => new
{
client,
client.Countries
}).ToList().Select(data =>
{
data.client.Countries = data.Countries; // Here is the problem
return data.client;
}).ToList();
Here I'm using two selects: the first for the Linq projection, so EF can create the SQL, and the second to map the result to a Client class. The reason for that is because I'm using a repository interface, which returns List<Client>.
Despite the query is generated with the Countries in it, EF still is using Lazy Loading when I try to render the whole information (the same n+1 problem). The only way to avoid this, is to remove the virtual accessor:
public class Client
{
... a lot of properties
public List<Country> Countries { get; set; }
}
The issue I have with this solution is that we still want to have this property as virtual. This optimization is only necessary for a particular part of the application, whilst on the other sections we want to have this Lazy Loading feature.
I don't know how to "inform" EF about this property, that has been already lazy-loaded via this Linq projection. Is that possible? If not, do we have any other options? The n+1 problems makes the application to take several seconds to load like 1000 rows.
Edit
Thanks for the responses. I know I can use the Include() extension to get the collections, but my problem is with some additional optimizations I need to add (I'm sorry for not posting the complete example, I thought with the Collection issue would be enough):
public class Client
{
... a lot of properties
public virtual List<Country> Countries { get; set; }
public virtual List<Action> Actions { get; set; }
public virtual List<Investment> Investments { get; set; }
public User LastUpdatedBy {
get {
if(Actions != null) {
return Actions.Last();
}
}
}
}
If I need to render the clients, the information about the last update and the number of investments (Count()), with the Include() I practically need to bring all the information from the database. However, if I use the projection like
return _dbContext.Clients
.Select(client => new
{
client,
client.Countries,
NumberOfInvestments = client.Investments.Count() // this is translated to an SQL query
LastUpdatedBy = client.Audits.OrderByDescending(m => m.Id).FirstOrDefault(),
}).ToList().Select(data =>
{
// here I map back the data
return data.client;
}).ToList();
I can reduce the query, getting only the required information (in the case of LastUpdatedBy I need to change the property to a getter/setter one, which is not a big issue, as its only used for this particular part of the application).
If I use the Select() with this approach (projection and then mapping), the Include() section is not considered by EF.
If i understand correctly you can try this
_dbContext.LazyLoading = false;
var clientWithCountres = _dbContext.Clients
.Include(c=>c.Countries)
.ToList();
This will fetch Client and only including it Countries. If you disable lazy-loading the no other collection will load from the query. Unless you are specifying a include or projection.
FYI : Projection and Include() doesn't work together see this answer
If you are projection it will bypass the include.
https://stackoverflow.com/a/7168225/1876572
don't know what you want to do, you are using lambda expression not linq, and your second select it's unnecessary.
data.client is client, data.Countries is client.Countries, so data.client.Countries = data.Countries alway true.
if you don't want lazy load Countries, use _dbContext.Clients.Include("Countries").Where() or select ().
In order to force eager loading of virtual properties you are supposed to use Include extension method.
Here is a link to MSDN https://msdn.microsoft.com/en-us/library/jj574232(v=vs.113).aspx.
So something like this should work:
return _dbContext.Clients.Include(c=>c.Countries).ToList();
Im not 100% sure but I think your issue is that you are still maintaining a queryable for your inner collection through to the end of the query.
This queryable is lazy (because in the model it was lazy), and you havent done anything to explain that this should not be the case, you have simply projected that same lazy queryable into the result set.
I cant tell you off the top of my head what the right answer here is but I would try things around the following:
1 use a projection on the inner queriable also eg
return _dbContext.Clients
.Select(client => new
{
client,
Countries = client.Countries.Select(c=>c)// or a new Country
})
2 Put the include at the end of the query (Im pretty sure include applies to the result not the input. It definitally doesnt work if you put it before a projection) eg:
_dbContext.Clients
.Select(client => new
{
client,
client.Countries
}.Include(c=>c.Countries)`
3 Try specifying the enumeration inside the projection eg:
_dbContext.Clients
.Select(client => new
{
client,
Countries = client.Countries.AsEnumerable() //perhaps tolist if it works
}`
I do want to caviat this by saying that I havent tried any of the above but I think this will set you on the right path.
A note on lazy loading
IMO there are very few good use cases for lazy loading. It almost always causes too many queries to be generated, unless your user is following a lazy path directly on the model. Use it only with extreme caution and IMO not at all in request response (eg web) apps.
I have been using the Entity Framework with the POCO First approach. I have pretty much followed the pattern described by Steve Sanderson in his book 'Pro ASP.NET MVC 3 Framework', using a DI container and DbContext class to connect to SQL Server.
The underlying tables in SQL server contain very large datasets used by different applications. Because of this I have had to create views for the entities I need in my application:
class RemoteServerContext : DbContext
{
public DbSet<Customer> Customers { get; set; }
public DbSet<Order> Orders { get; set; }
public DbSet<Contact> Contacts { get; set; }
...
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<Customer>().ToTable("vw_Customers");
modelBuilder.Entity<Order>().ToTable("vw_Orders");
...
}
}
and this seems to work fine for most of my needs.
The problem I have is that some of these views have a great deal of data in them so that when I call something like:
var customers = _repository.Customers().Where(c => c.Location == location).Where(...);
it appears to be bringing back the entire data set, which can take some time before the LINQ query reduces the set to those which I need. This seems very inefficient when the criteria is only applicable to a few records and I am getting the entire data set back from SQL server.
I have tried to work around this by using stored procedures, such as
public IEnumerable<Customer> CustomersThatMatchACriteria(string criteria1, string criteria2, ...) //or an object passed in!
{
return Database.SqlQuery<Customer>("Exec pp_GetCustomersForCriteria #crit1 = {0}, #crit2 = {1}...", criteria1, criteria2,...);
}
whilst this is much quicker, the problem here is that it doesn't return a DbSet and so I lose all of the connectivity between my objects, e.g. I can't reference any associated objects such as orders or contacts even if I include their IDs because the return type is a collection of 'Customers' rather than a DbSet of them.
Does anyone have a better way of getting SQL server to do the querying so that I am not passing loads of unused data around?
var customers = _repository.Customers().Where(c => c.Location == location).Where(...
If Customers() returns IQueryable, this statement alone won't actually be 'bringing back' anything at all - calling Where on an IQueryable gives you another IQueryable, and it's not until you do something that causes query execution (such as ToList, or FirstOrDefault) that anything will actually be executed and results returned.
If however this Customers method returns a collection of instantiated objects, then yes, since you are asking for all the objects you're getting them all.
I've never used either code-first or indeed even then repository pattern, so I don't know what to advise, other than staying in the realm of IQueryable for as long as possible, and only executing the query once you've applied all relevant filters.
What I would have done to return just a set of data would have been the following:
var customers = (from x in Repository.Customers where <boolean statement> &&/|| <boolean statement select new {variableName = x.Name , ...).Take(<integer amount for amount of records you need>);
so for instance:
var customers = (from x in _repository.Customers where x.ID == id select new {variableName = x.Name} ).take(1000);
then Iterate through the results to get the data: (remember, the linq statement returns an IQueryable)...
foreach (var data in customers)
{
string doSomething = data.variableName; //to get data from your query.
}
hope this helps, not exactly the same methods, but I find this handy in my code
Probably it's because your Cusomters() method in your repository is doing a GetAll() kind of thing and fetching the entire list first. This prohibits LINQ and your SQL Server from creating smart queries.
I don't know if there's a good workaround for your repository, but if you would do something like:
using(var db = new RemoteServerContext())
{
var custs = db.Customers.Where(...);
}
I think that will be a lot quicker. If your project is small enough, you can do without a repository. Sure, you'll lose an abstraction layer, but with small projects this may not be a big problem.
On the other hand, you could load all Customers in your repository once and use the resulting collection directly (instead of the method-call that fills the list). Beware of adding, removing and modifying Customers though.
You need the LINQ query to return less data like sql paging like top function in sql or do manual querying using stored procedures. In either cases, you need to rewrite your querying mechanism. This is one of the reasons why I didn't use EF, because you don't have a lot of control over the code it seems.