How to speed up loading data using async method in EF?

How to speed up loading data using async method in EF? - c#

So far, I have a lot of data to load on my list. When I was using normal (sync) way to load data, it was about 20 seconds to load all the data. I made this asynchronous method and now I need about 7 seconds to load.
I wonder if there is a way to speed it up, for example, loading the first 20 cards as soon as the screen is opened, and then everything else? This is my code so far ..
public async Task<List<CardObject>> GetCardsAsync()
{
using (var context = new MyCARDEntities())
{
return await context.Card
.Include(f => f.Person)
.Include(k => k.CardType)
.Where(arg => arg.LastAction != "D" && arg.PERSON_ID != null)
.Select(k => new CardObject()
{
Id = k.Id,
UID = k.UID,
Person = new PersonBasicObject()
{
Id = k.PersonBasicObject.Id,
OIB = k.PersonBasicObject.OIB,
Name = k.PersonBasicObject.Name,
LastName = k.PersonBasicObject.LastName
}
})
.ToListAsync();
}
}
and this is in viewModel
private async void LoadCards()
{
var cards = await repKartica.GetCardsAsync();
CardLst = new ObservableCollection<CardObject>(cards);
}
private ObservableCollection<CardObject> _CardLst;
public ObservableCollection<CardObject> CardLst
{
get => _CardLst;
set
{
_CardLst= value;
RaisePropertyChanged(() => CardLst);
}
}

Making a method async doesn't make a method faster, it just allows the method to surrender the executing thread to allow other code to run. This can make the code more responsive, but won't make retrieving that particular data any faster.
Firstly, loading a large amount of data is not a good idea if it can be avoided. Do clients need to see all of this data at once, or is it / can it be paginated into pages of 20 cards at a time? If it is, or can be paged, then consider employing a paginated collection which can query specific pages of data (utilizing .Skip() & .Take()) to pull back just 20 or so records at a time as the visible page changes.
The next thing would be to look at the query being executed. Run a profiler such as ExpressProfiler against your database and capture the SQL being run by EF. Execute a copy of those queries in Enterprise Manager to get an execution plan and see if there are index recommendations.
Other tips: When utilizing .Select() you do not need to use .Include(). Select will generate a query to pull from related entities automatically.
Are PersonBasicObject and CardObject your Entity definitions? If so, how many fields are in these entities that would remain un-filled by this Select? Ideally you should use dedicated view models rather than passing entities. By filling entities using Select() to selectively populate data, you are composing an entity that is not an entity in the sense that it is not a complete representation of an entity. This can be inefficient due to variables that still take up memory but are not populated, and it is misleading and lead to bugs because you will have code that expects entities, and could get called/reused but then be dealing with "real" complete entities vs. incomplete entities that were loaded like this. Entities' sole purpose should be to represent data state, not transport for view state.

Related

There is already an open DataReader associated with this Command without ToList()

I have the method below to load dependent data from navigation property. However, it generates an error. I can remove the error by adding ToList() or ToArray(), but I'd rather not do that for performance reasons. I also cannot set the MARS property in my web.config file because it causes a problem for other classes of the connection.
How can I solve this without using extension methods or editing my web.config?
public override void Load(IEnumerable<Ques> data)
{
if (data.Any())
{
foreach (var pstuu in data)
{
if (pstuu?.Id_user != null)
{
db.Entry(pstuu).Reference(q => q.Users).Load();
}
}
}
}

I take it from this question you've got a situation something like:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
LoadDependent(query);
Chances are based on this method it's probably a call stack of various methods that build search expressions and such, but ultimately what gets passed into LoadDependent() is an IQueryable<TEntity>.
Instead if you call:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
var data = query.ToList();
LoadDependent(data);
Or.. in your LoadDependent changing doing something like:
base.LoadDependent(data);
data = data.ToList();
or better,
foreach (Ques qst in data.ToList())
Then your LoadDependent() call works, but in the first example you get an error that a DataReader is open. This is because your foreach call as-is would be iterating over the IQueryable meaning EF's data reader would be left open so further calls to db, which I'd assume is a module level variable for the DbContext that is injected, cannot be made.
Replacing this:
db.Entry(qst).Reference(q => q.AspNetUsers).Load();
with this:
db.Entry(qst).Reference(q => q.AspNetUsers).LoadAsync();
... does not actually work. This just delegates the load call asynchronously, and without awaiting it, it too would fail, just not raise the exception on the continuation thread.
As mentioned in the comments to your question this is a very poor design choice to handle loading references. You are far, far better off enabling lazy loading and taking the Select n+1 hit if/when a reference is actually needed if you aren't going to implement the initial fetch properly with either eager loading or projection.
Code like this forces a Select n+1 pattern throughout your code.
A good example of loading a "Ques" with it's associated User eager loaded:
var ques = db.Ques
.Include(x => x.AspNetUsers)
.Where(x => x.SomeCondition == someCondition)
.ToList();
Whether "SomeCondition" results in 1 Ques returned or 1000 Ques returned, the data will execute with one query to the DB.
Select n+1 scenarios are bad because in the case where 1000 Ques are returned with a call to fetch dependencies you get:
var ques = db.Ques
.Where(x => x.SomeCondition == someCondition)
.ToList(); // 1 query.
foreach(var q in ques)
db.Entry(q).Reference(x => x.AspNetUsers).Load(); // 1 query x 1000
1001 queries run. This compounds with each reference you want to load.
Which then looks problematic where later code might want to offer pagination such as to take only 25 items where the total record count could run in the 10's of thousands or more. This is where lazy loading would be the lesser of two Select n+1 evils, as with lazy loading you know that AspNetUsers would only be selected if any returned Ques actually referenced it, and only for those Ques that actually reference it. So if the pagination only "touched" 25 rows, Lazy Loading would result in 26 queries. Lazy loading is a trap however as later code changes could inadvertently lead to performance issues appearing in seemingly unrelated areas as new referenences or code changes result in far more references being "touched" and kicking off a query.
If you are going to pursue a LoadDependent() type method then you need to ensure that it is called as late as possible, once you have a known set size to load because you will need to materialize the collection to load related entities with the same DbContext instance. (I.e. after pagination) Trying to work around it using detached instances (AsNoTracking()) or by using a completely new DbContext instance may give you some headway but will invariably lead to more problems later, as you will have a mix of tracked an untracked entities, or worse, entities tracked by different DbContexts depending on how these loaded entities are consumed.
An alternative teams pursue is rather than a LoadReference() type method would be an IncludeReference() type method. The goal here being to build .Include statements into the IQueryable. This can be done two ways, either by magic strings (property names) or by passing in expressions for the references to include. Again this can turn into a bit of a rabbit hole when handling more deeply nested references. (I.e. building .Include().ThenInclude() chains.) This avoids the Select n+1 issue by eager loading the required related data.

I have solved the problem by deletion the method Load and I have used Include() in my first query of data to show the reference data in navigation property

Preventing EF from including related entities

I have a SQL Server database table that has a couple million records in it. I have an MVC site with a page to display data from this table, and I'm running into extensive performance issues.
Running a simple query like this takes about 25-30 seconds to return about two thousand rows:
_dbContext.Contracts
.Where(c => c.VendorID == vendorId)
.ToList();
When I run a query against the database, it only takes a couple seconds.
Turns out, EF is loading all the related entities for my Contract, so it's slowing down my query a ton.
In the debugger, the objects returned are of a strange type, not sure if that's an issue:
System.Data.Entity.DynamicProxies.Contract_3EF6BECBB56F2ADDDA6E0050AC82D03A4E993CEDF4FCA49244D3EE4005572C46
And the same with the related entities on my Contract:
System.Data.Entity.DynamicProxies.Vendor_4FB727808BD6E0BF3B25085B40F3F0B9B10EE4BD17D2A4C600214634F494DB66
The site is a bit old, it's MVC 3 with EF 4. I know on the current version of EF, I have to explicitly use Include() to get related entities, but here it seems to be included automatically.
I have an EDMX file, with a .tt file and entity classes under that, but I don't see anywhere that I can prevent my Courses from getting related objects.
Is there any way for me to do that?

If your MVC controller is returning Entities to the view, the trap you're hitting is that the serializer is iterating over the entities returned and lazy-loading all related data. This is considerably worse than triggering an eager load because in the case of loading collections, this will fetch related entities/sets one parent at a time.
Say I fetch 100 Contracts and contracts contain a Vendor reference.
Eager loading I would use:
context.Contracts.Where(x => /* condition */).Include(x => x.Vendor).ToList();
which would compose 1 query loading all applicable contracts and their vendor details. However, if you let the serializer lazy load Vendors you get effectively the following:
context.Contracts.Where(x => /* condition */).ToList(); // gets applicable contracts...
// This happens behind the scenes for every single related entity touched while serializing...
context.Vendors.Where(x => x.VendorId == 1);
context.Vendors.Where(x => x.VendorId == 1);
// ... continue for each and every contract returned in the above list...
If Contract also has an Employee reference...
context.Employees.Where(x => x.EmployeeId == 16);
context.Employees.Where(x => x.EmployeeId == 12);
context.Employees.Where(x => x.EmployeeId == 11);
... and this continues for every related entity/collection in each contract and each related entity. It adds up, fast. You can see just how crazy it gets by hooking up a profiler to your server and kicking off a read. You expect 1 SQL, but then get hit with hundreds to thousands of calls.
The best way to avoid this is to simply not return entities from controllers, instead compose a view model with just the detail you want to display and use .Select() or Automapper's .ProjectTo<ViewModel>() to populate it from an EF query. This avoids falling into the trap of having a serializer touching lazy load properties, and also minimizes the payload sent to the client.
So if I wanted to display a list of contracts for a vendor and I only needed to display the Contract ID, the contract #, and a dollar figure:
[Serializable]
public class ContractSummaryViewModel
{
public int ContractId { get; set; }
public string ContractNumber { get; set; }
public decimal Amount { get; set; }
}
var contracts = _dbContext.Contracts
.Where(c => c.VendorID == vendorId)
.Select( c => new ContractSummaryViewModel
{
ContractId = c.ContractId,
ContractNumber = c.ContractNumber,
Amount = c.Amount
})
.ToList();
You can include details from related entities into the view model or compose related view models for key details, all without having to worry about using .Include() or tripping lazy loading. This composes a single SQL statement to load just the data you need, and passes just that back to the UI. By streamlining the payload the performance can increase quite dramatically.

Entity Framework - Reference not loading

I have a model-first, entity framework design like this (version 4.4)
When I load it using code like this:
PriceSnapshotSummary snapshot = db.PriceSnapshotSummaries.FirstOrDefault(pss => pss.Id == snapshotId);
the snapshot has loaded everything (that is SnapshotPart, Quote, QuoteType), except the DataInfo. Now looking into the SQL this appears to be because Quote has no FK to DataInfo because of the 0..1 relationship.
However, I would have expected that the navigation property 'DataInfo' on Quote would still go off to the database to fetch it.
My current work around is this:
foreach (var quote in snapshot.ComponentQuotes)
{
var dataInfo = db.DataInfoes.FirstOrDefault(di => di.Quote.Id == quote.InstrumentQuote.Id);
quote.InstrumentQuote.DataInfo = dataInfo;
}
Is there a better way to achieve this? I thought EF would automatically load the reference?

This problem has to do with how the basic linq building blocks interact with Entity Framework.
take the following (pseudo)code:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000);
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This looks OK, but will throw a runtime error, because of an interface called IQueryable.
IQueryable implements IEnumerable and adds info for an expression and a provider. This basically allows it to build and execute sql statements against a database and not have to load whole tables when fetching data and iterating over them like you would over an IEnumerable.
Because linq defers execution of the expression until it's used, it compiles the IQueryable expression into SQL and executes the database query only right before it's needed. This speeds up things a lot, and allows for expression chaining without going to the database every time a Where() or Select() is executed. The side effect is if the object is used outside the scope of db, then the sql statement is executed after db has been disposed of.
To force linq to execute, you can use ToList, like this:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000).ToList();
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This will force linq to execute the expression against db and get all addresses with number greater than a thousand. this is all good if you need to access a field within the addresses table, but since we want to get the name of a city (a 1..1 relationship similar to yours), we'll hit another bump before it can run: lazy loading.
Entity framework lazy loads entities by default, so nothing is fetched from the database until needed. Again, this speeds things up considerably, since without it every call to the database could potentially bring the whole database into memory; but has the problem of depending on the context being available.
You could set EF to eager load (in your model, go to properties and set 'Lazy Loading Enabled' to False), but that would bring in a lot of info you probably don't use.
The best fix for this problem is to execute everything inside db's scope:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000);
addresses.Select(addr => Console.WriteLine(addr.City.Name));
}
I know this is a really simple example but in the real world you can use a DI container like ninject to handle your dependencies and have your db available to you throughout execution of the app.
This leaves us with Include. Include will make IQueryable include all specified relation paths when building the sql statement:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Include("City").Where(addr => addr.Number > 1000).ToList;
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This will work, and it's a nice compromise between having to load the whole database and having to refactor an entire project to support DI.
Another thing you can do, is map multiple tables to a single entity. In your case, since the relationship is 1-0..1, you shouldn't have a problem doing it.

Am I getting list in a good way in EF

I am still new in entity framework. So forgive me if question is dummy :)
I have a domain class that get's a list of some data from database:
public IEnumerable<Item> GetItems()
{
return context.Items.ToList();
}
This code return all items from database.
On the site I use paging so I need only 10 items per page.
So I did something like this:
var model = itemsRepository.GetItems().
Where(x => x.CategoryId == categoryId).
OrderByDescending(x => x.CreatedOnDate).
Skip(0).
Take(pageSize);
Now as I see what I did here is, I take all items from db and filter them.
Will I get some benefit if I put new method in domain and put the following code in it:
return context.Items.Where(x => x.CategoryId == categoryId).
OrderByDescending(x => x.CreatedOnDate).
Skip(0).
Take(pageSize);

Yes. You will get the benefit that your LINQ query in the latter case will get translated to SQL and executed in the database. Therefore, your first example will load the entire table into memory - while the second example will do a much more efficient query in the database.
Essentially, the .ToList() breaks deferred execution - but it might also make sense for you to return IQueryable<T> rather than IEnumerable<T>, and then working on that in upper layers - depending on your requirements. Also, try reading this question and answer.

Yes, you should. I'm assuming you're using SQL as the backend for your context, but the query that ends up getting constructed with your new method will only pull those 10 records out and return them as an IEnumerable (deferred execution) rather than pulling everything from the database and then just filtering out the first 10 results.
I think you're better off with the second (new) method using deferred execution.
Are you seeing an improvement in performance via SQL Profiler, too?

There are some problems in your code:
Do not use variable in class for context, every time you need it, create it and dispose it (with using):
using(var context = new ...)
{
// do DB stuffs
}
Do not call ToList(), to fetch all items, use normal paging then call ToList (something like your second sample but with using, ...).

The problem with the second approach is that the domain now is coupled to the context. This defeats one of the main purposes of the repository pattern. I suggest you have the second method inside the repository where you passed it the page number you want retrieved and it returns them to you. In your repository have something like
public IEnumerable<Item> GetItemsForPage(int pageNumber)
{
return context.Items.Where(x => x.CategoryId == categoryId).
OrderByDescending(x => x.CreatedOnDate).
Skip(pageNumber * pageSize). //Note not always 0
Take(pageSize);
}
In your domain you would call repository.GetItemsForPage(). This gives you the benefit of delayed execution while maintaining the decoupling of domain and context.

Improving efficiency with Entity Framework

I have been using the Entity Framework with the POCO First approach. I have pretty much followed the pattern described by Steve Sanderson in his book 'Pro ASP.NET MVC 3 Framework', using a DI container and DbContext class to connect to SQL Server.
The underlying tables in SQL server contain very large datasets used by different applications. Because of this I have had to create views for the entities I need in my application:
class RemoteServerContext : DbContext
{
public DbSet<Customer> Customers { get; set; }
public DbSet<Order> Orders { get; set; }
public DbSet<Contact> Contacts { get; set; }
...
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<Customer>().ToTable("vw_Customers");
modelBuilder.Entity<Order>().ToTable("vw_Orders");
...
}
}
and this seems to work fine for most of my needs.
The problem I have is that some of these views have a great deal of data in them so that when I call something like:
var customers = _repository.Customers().Where(c => c.Location == location).Where(...);
it appears to be bringing back the entire data set, which can take some time before the LINQ query reduces the set to those which I need. This seems very inefficient when the criteria is only applicable to a few records and I am getting the entire data set back from SQL server.
I have tried to work around this by using stored procedures, such as
public IEnumerable<Customer> CustomersThatMatchACriteria(string criteria1, string criteria2, ...) //or an object passed in!
{
return Database.SqlQuery<Customer>("Exec pp_GetCustomersForCriteria #crit1 = {0}, #crit2 = {1}...", criteria1, criteria2,...);
}
whilst this is much quicker, the problem here is that it doesn't return a DbSet and so I lose all of the connectivity between my objects, e.g. I can't reference any associated objects such as orders or contacts even if I include their IDs because the return type is a collection of 'Customers' rather than a DbSet of them.
Does anyone have a better way of getting SQL server to do the querying so that I am not passing loads of unused data around?

var customers = _repository.Customers().Where(c => c.Location == location).Where(...
If Customers() returns IQueryable, this statement alone won't actually be 'bringing back' anything at all - calling Where on an IQueryable gives you another IQueryable, and it's not until you do something that causes query execution (such as ToList, or FirstOrDefault) that anything will actually be executed and results returned.
If however this Customers method returns a collection of instantiated objects, then yes, since you are asking for all the objects you're getting them all.
I've never used either code-first or indeed even then repository pattern, so I don't know what to advise, other than staying in the realm of IQueryable for as long as possible, and only executing the query once you've applied all relevant filters.

What I would have done to return just a set of data would have been the following:
var customers = (from x in Repository.Customers where <boolean statement> &&/|| <boolean statement select new {variableName = x.Name , ...).Take(<integer amount for amount of records you need>);
so for instance:
var customers = (from x in _repository.Customers where x.ID == id select new {variableName = x.Name} ).take(1000);
then Iterate through the results to get the data: (remember, the linq statement returns an IQueryable)...
foreach (var data in customers)
{
string doSomething = data.variableName; //to get data from your query.
}
hope this helps, not exactly the same methods, but I find this handy in my code

Probably it's because your Cusomters() method in your repository is doing a GetAll() kind of thing and fetching the entire list first. This prohibits LINQ and your SQL Server from creating smart queries.
I don't know if there's a good workaround for your repository, but if you would do something like:
using(var db = new RemoteServerContext())
{
var custs = db.Customers.Where(...);
}
I think that will be a lot quicker. If your project is small enough, you can do without a repository. Sure, you'll lose an abstraction layer, but with small projects this may not be a big problem.
On the other hand, you could load all Customers in your repository once and use the resulting collection directly (instead of the method-call that fills the list). Beware of adding, removing and modifying Customers though.

You need the LINQ query to return less data like sql paging like top function in sql or do manual querying using stored procedures. In either cases, you need to rewrite your querying mechanism. This is one of the reasons why I didn't use EF, because you don't have a lot of control over the code it seems.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.