I would like to know when to use tolist. In the following example, both of the following do not result in error. So, which way to use?
var employees = db.Employees.Include(e => e.Department);
return View(employees);
var employees = db.Employees.Include(e => e.Department);
return View(employees.ToList());
Seems like the code is ASP.Net MVC code.. given return View(employees);
I also assume that the data is being pulled from the DB, using some LinqToSQL or EntityFramework like technology.
Given those two assumptions, I'd recommend that the latter be used. i.e. with .ToList()
Reason being, if the query is lazily evaluated, if you pass employees without .ToList(), you are essentially passing the query to the View, and it will execute when query is enumerated while rendering the View. View rendering should be fast, and not be blocked by a call to database.
.ToList() would avoid that, and force execution of the query in controller, and View will have data available in memory, for fast rendering...
Hope it answers your question.
EDIT: One caveat.. there some scenarios, for example when building APIs, like with OData APIs with WebAPI, that you actually want to return the query than the materialized list. The reason there is that by design, you want Framework to build on top of that query, before the filtered data is returned to the caller. In other words, framework does some more leg work for you, before view (serialized data - typically not HTML) is actually rendered.
After the first line is executed, employees collection will not be loaded into the memory (Lazy Loading). It is loaded when the collection is first accessed. When you call ToList() collection will be forced to be loaded into memory.
Usage is based on the trade-off between memory limitation and speed. Accessing from the memory is faster than lazy loading.
Related
I have the method below to load dependent data from navigation property. However, it generates an error. I can remove the error by adding ToList() or ToArray(), but I'd rather not do that for performance reasons. I also cannot set the MARS property in my web.config file because it causes a problem for other classes of the connection.
How can I solve this without using extension methods or editing my web.config?
public override void Load(IEnumerable<Ques> data)
{
if (data.Any())
{
foreach (var pstuu in data)
{
if (pstuu?.Id_user != null)
{
db.Entry(pstuu).Reference(q => q.Users).Load();
}
}
}
}
I take it from this question you've got a situation something like:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
LoadDependent(query);
Chances are based on this method it's probably a call stack of various methods that build search expressions and such, but ultimately what gets passed into LoadDependent() is an IQueryable<TEntity>.
Instead if you call:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
var data = query.ToList();
LoadDependent(data);
Or.. in your LoadDependent changing doing something like:
base.LoadDependent(data);
data = data.ToList();
or better,
foreach (Ques qst in data.ToList())
Then your LoadDependent() call works, but in the first example you get an error that a DataReader is open. This is because your foreach call as-is would be iterating over the IQueryable meaning EF's data reader would be left open so further calls to db, which I'd assume is a module level variable for the DbContext that is injected, cannot be made.
Replacing this:
db.Entry(qst).Reference(q => q.AspNetUsers).Load();
with this:
db.Entry(qst).Reference(q => q.AspNetUsers).LoadAsync();
... does not actually work. This just delegates the load call asynchronously, and without awaiting it, it too would fail, just not raise the exception on the continuation thread.
As mentioned in the comments to your question this is a very poor design choice to handle loading references. You are far, far better off enabling lazy loading and taking the Select n+1 hit if/when a reference is actually needed if you aren't going to implement the initial fetch properly with either eager loading or projection.
Code like this forces a Select n+1 pattern throughout your code.
A good example of loading a "Ques" with it's associated User eager loaded:
var ques = db.Ques
.Include(x => x.AspNetUsers)
.Where(x => x.SomeCondition == someCondition)
.ToList();
Whether "SomeCondition" results in 1 Ques returned or 1000 Ques returned, the data will execute with one query to the DB.
Select n+1 scenarios are bad because in the case where 1000 Ques are returned with a call to fetch dependencies you get:
var ques = db.Ques
.Where(x => x.SomeCondition == someCondition)
.ToList(); // 1 query.
foreach(var q in ques)
db.Entry(q).Reference(x => x.AspNetUsers).Load(); // 1 query x 1000
1001 queries run. This compounds with each reference you want to load.
Which then looks problematic where later code might want to offer pagination such as to take only 25 items where the total record count could run in the 10's of thousands or more. This is where lazy loading would be the lesser of two Select n+1 evils, as with lazy loading you know that AspNetUsers would only be selected if any returned Ques actually referenced it, and only for those Ques that actually reference it. So if the pagination only "touched" 25 rows, Lazy Loading would result in 26 queries. Lazy loading is a trap however as later code changes could inadvertently lead to performance issues appearing in seemingly unrelated areas as new referenences or code changes result in far more references being "touched" and kicking off a query.
If you are going to pursue a LoadDependent() type method then you need to ensure that it is called as late as possible, once you have a known set size to load because you will need to materialize the collection to load related entities with the same DbContext instance. (I.e. after pagination) Trying to work around it using detached instances (AsNoTracking()) or by using a completely new DbContext instance may give you some headway but will invariably lead to more problems later, as you will have a mix of tracked an untracked entities, or worse, entities tracked by different DbContexts depending on how these loaded entities are consumed.
An alternative teams pursue is rather than a LoadReference() type method would be an IncludeReference() type method. The goal here being to build .Include statements into the IQueryable. This can be done two ways, either by magic strings (property names) or by passing in expressions for the references to include. Again this can turn into a bit of a rabbit hole when handling more deeply nested references. (I.e. building .Include().ThenInclude() chains.) This avoids the Select n+1 issue by eager loading the required related data.
I have solved the problem by deletion the method Load and I have used Include() in my first query of data to show the reference data in navigation property
I am making a call to a function in .net from angular js, the time it takes to get the response to angular from .net is more than 5 seconds. How can I make the mapping of the result of a sql query decrease in time, I have the following code.
List<CarDTO> result = new List<CarDTO>();
var cars = await _carsUnitOfWork.CarRepository.GetCarDefault(carFilter,true,_options.Value.priorityLabel);
result = cars.Select(car => _mapper.Map<Car, CarDTO>(car)).ToList();
The code you have provided isn't expanded enough to identify a cause, but there are a number of clues:
Check / post the code for CareRepository.GetCarDefault(). The call implies that this is returning an IEnumerable given it is Awaited. It isn't clear what all of the parameters are and how they affect the query. As your database grows, this appears to return all cars, rather than supporting pagination. (What happens when you have 10,000 Car records, or 1,000,000?)
Next would be the use of Automapper's Map method. Combined with IEnumerable this means that your repository is going through the hassle of loading all Care entities into memory, then Automapper is allocating a duplicate set of DTOs into memory copying across data from the entities.
Lazy loading is a distinct risk with an approach like this. If the CarDTO pulls any fields from entities referenced by a Car, this will trip off additional queries for each individual car.
For best performance, I highly recommend adopting an IQueryable return type on Repository methods and leveraging Automapper's ProjectTo method rather than Map. This is equivalent to using Select in Linq, as ProjectTo will bubble down into the SQL generation to build efficient queries and return the collection of DTOs. This removes the risk of lazy loading calls as well as the double memory allocation for entities then DTOs.
Implementing this with your Unit of Work pattern is a bit of an unknown without seeing the code. However it would look something more like:
var result = await _carsUnitOfWork.CarRepository
.GetCarDefault(carFilter,true,_options.Value.priorityLabel)
.ProjectTo<CarDto>(mapperConfig)
.ToListAsync(); // Add Skip() and Take() to support pagination.
The repository method would be changed from being something like:
public async IEnumerable<Car> GetCarDefault( ... )
to
public IQueryable<Car> GetCarDefault( ... )
Rather than the method returning something like .ToListAsync(), you return the built Linq expression.
I.e. change from something like:
var result = _context.Cars.Include().Where(x => ...).ToListAsync();
return result;
to
var query = _context.Cars.Where(x => ....);
return query;
The key differences is that where the existing method likely returns ToListAsync() we remove that and return the unmaterialized IQueryable that Linq is building. Also, if the current implementation is Eager Loading any relations /w .Include() we exclude those. The caller performing projection doesn't need that. If the caller does need Car entity graphs, (such as when updating data) the caller can append .Include() statements.
It is also worth running an SQL Profiler to look at what queries are being run against the database server. This can give you the queries to inspect and test, as well as highlight any unexpected queries being triggered. (I.e. caused by lazy loading calls)
That should give you some ideas on where to start.
I'm looking to get a better understanding on when we should look to use IEnumerable over IQueryablewith LINQ to Entities.
With really basic calls to the database, IQueryable is way quicker, but when do i need to think about using an IEnumerable in its place?
Where is an IEnumerable optimal over an IQueryable??
Basically, IQueryables are executed by a query provider (for example a database) and some operations cannot be or should not be done by the database. For example, if you want to call a C# function (here as an example, capitalize a name correctly) using a value you got from the database you may try something like;
db.Users.Select(x => Capitalize(x.Name)) // Tries to make the db call Capitalize.
.ToList();
Since the Select is executed on an IQueryable, and the underlying database has no idea about your Capitalize function, the query will fail. What you can do instead is to get the correct data from the database and convert the IQueryable to an IEnumerable (which is basically just a way to iterate through collections in-memory) to do the rest of the operation in local memory, as in;
db.Users.Select(x => x.Name) // Gets only the name from the database
.AsEnumerable() // Do the rest of the operations in memory
.Select(x => Capitalize(x)) // Capitalize in memory
.ToList();
The most important thing when it comes to performance of IQueryable vs. IEnumerable from the side of EF, is that you should always try to filter the data using an IQueryable to get as little data as possible to convert to an IEnumerable. What the AsEnumerable call basically does is to tell the database "give me the data as it is filtered now", and if you didn't filter it, you'll get everything fetched to memory, even data you may not need.
IEnumerable represents a sequence of elements which you enumerate one by one until you find the answer you need, so for example if I wanted all entities that had some property greater than 10, I'd need to go through each one in turn and return only those that matched. Pulling every row of a database table into memory in order to do this would not maybe be a great idea.
IQueryable on the other hand represents a set of elements on which operations like filtering can be deferred to the underlying data source, so in the filtering case, if I were to implement IQueryable on top of a custom data source (or use LINQ to Entities!) then I could give the hard work of filtering / grouping etc to the data source (e.g. a database).
The major downside of IQueryable is that implementing it is pretty hard - queries are constructed as Expression trees which as the implementer you then have to parse in order to resolve the query. If you're not planning to write a provider though then this isn't going to hurt you.
Another aspect of IQueryable that it's worth being aware of (although this is really just a generic caveat about passing processing off to another system that may make different assumptions about the world) is that you may find things like string comparison work in the manner they are supported in the source system, not in the manner they are implemented by the consumer, e.g. if your source database is case-insensitive but your default comparison in .NET is case-sensitive.
Right now I'm working on a pretty complex database. Our object model is designed to be mapped to the database. We're using EF 5 with POCO classes, manually generated.
Everything is working, but there's some complaining about the performances. I've never had performance problems with EF so I'm wondering if this time I just did something terribly wrong, or the problem could reside somewhere else.
The main query may be composed of dynamic parameters. I have several if and switch blocks that are conceptually like this:
if (parameter != null) { query = query.Where(c => c.Field == parameter); }
Also, for some complex And/Or combinations I'm using LinqKit extensions from Albahari.
The query is against a big table of "Orders", containing years and years of data. The average use is a 2 months range filter though.
Now when the main query is composed, it gets paginated with a Skip/Take combination, where the Take is set to 10 elements.
After all this, the IQueryable is sent through layers, reaches the MVC layer where Automapper is employed.
Here, when Automapper starts iterating (and thus the query is really executed) it calls a bunch of navigation properties, which have their own navigation properties and so on. Everything is set to Lazy Loading according to EF recommendations to avoid eager loading if you have more than 3 or 4 distinct entities to include. My scenario is something like this:
Orders (maximum 10)
Many navigation properties under Order
Some of these have other navigation under them (localization entities)
Order details (many order details per order)
Many navigation properties under each Order detail
Some of these have other navigation under them (localization entities)
This easily leads to a total of 300+ queries for a single rendered "page". Each of those queries is very fast, running in a few milliseconds, but still there are 2 main concerns:
The lazy loaded properties are called in sequence and not parallelized, thus taking more time
As a consequence of previous point, there's some dead time between each query, as the database has to receive the sql, run it, return it and so on for each query.
Just to see how it went, I tried to make the same query with eager loading, and as I predicted it was a total disaster, with a translated sql of more than 7K lines (yes, seven thousands) and way more slow overall.
Now I'm reluctant to think that EF and Linq are not the right choice for this scenario. Some are saying that if they were to write a stored procedure which fetches all the needed data, it would run tens of times faster. I don't believe that to be true, and we would lose the automatic materialization of all related entities.
I thought of some things I could do to improve, like:
Table splitting to reduce the selected columns
Turn off object tracking, as this scenario is read only (have untracked entities)
With all of this said, the main complaint is that the result page (done in MVC 4) renders too slowly, and after a bit of diagnostics it seems all "Server Time" and not "Network Time", taking about from 8 to 12 seconds of server time.
From my experience, this should not be happening. I'm wondering if I'm approaching this query need in a wrong way, or if I have to turn my attention to something else (maybe a bad configured IIS server, or anything else I'm really clueless). Needles to say, the database has its indexes ok, checked very carefully by our dba.
So if anyone has any tip, advice, best practice I'm missing about this, or just can tell me that I'm dead wrong in using EF with Lazy Loading for this scenario... you're all welcome.
For a very complex query that brings up tons of hierarchical data, stored procs won't generally help you performance-wise over LINQ/EF if you take the right approach. As you've noted, the two "out of the box" options with EF (lazy and eager loading) don't work well in this scenario. However, there are still several good ways to optimize this:
(1) Rather than reading a bunch of entities into memory and then mapping via automapper, do the "automapping" directly in the query where possible. For example:
var mapped = myOrdersQuery.Select(o => new OrderInfo { Order = o, DetailCount = o.Details.Count, ... })
// by deferring the load until here, we can bring only the information we actually need
// into memory with a single query
.ToList();
This approach works really well if you only need a subset of the fields in your complex hierarchy. Also, EF's ability to select hierarchical data makes this much easier than using stored procs if you need to return something more complex than flat tabular data.
(2) Run multiple LINQ queries by hand and assemble the results in memory. For example:
// read with AsNoTracking() since we'll be manually setting associations
var myOrders = myOrdersQuery.AsNoTracking().ToList();
var orderIds = myOrders.Select(o => o.Id);
var myDetails = context.Details.Where(d => orderIds.Contains(d.OrderId)).ToLookup(d => d.OrderId);
// reassemble in memory
myOrders.ForEach(o => o.Details = myDetails[o.Id].ToList());
This works really well when you need all the data and still want to take advantage of as much EF materialization as possible. Note that, in most cases a stored proc approach can do no better than this (it's working with raw SQL, so it has to run multiple tabular queries) but can't reuse logic you've already written in LINQ.
(3) Use Include() to manually control which associations are eager-loaded. This can be combined with #2 to take advantage of EF loading for some associations while giving you the flexibility to manually load others.
Try to think of an efficient yet simple sql query to get the data for your views.
Is it even possible?
If not, try to decompose (denormalize) your tables so that less joins is required to get data. Also, are there efficient indexes on table colums to speed up data retrieval?
If yes, forget EF, write a stored procedure and use it to get the data.
Turning tracking off for selected queries is a-must for a read-only scenario. Take a look at my numbers:
http://netpl.blogspot.com/2013/05/yet-another-orm-micro-benchmark-part-23_15.html
As you can see, the difference between tracking and notracking scenario is significant.
I would experiment with eager loading but not everywhere (so you don't end up with 7k lines long query) but in selected subqueries.
One point to consider, EF definitely helps make development time much quicker. However, you must remember that when you're returning lots of data from the DB, that EF is using dynamic SQL. This means that EF must 1. Create the SQL, 2.SQL Server then needs to create an execution plan. this happens before the query is run.
When using stored procedures, SQL Server can cache the execution plan (which can be edited for performance), which does make it faster than using EF. BUT... you can always create your stored proc and then execute it from EF. Any complex procedures or queries I would convert to stored procs and then call from EF. Then you can see your performance gain(s) and reevaluate from there.
In some cases, you can use Compiled Queries MSDN to improve query performance drastically. The idea is that if you have a common query that is run many times that might generate the same SQL call with different parameters, you compile the query tie first time it's run then pass it as a delegate, eliminating the overhead of Entity Framework re-generating the SQL for each subsequent call.
I am still new in entity framework. So forgive me if question is dummy :)
I have a domain class that get's a list of some data from database:
public IEnumerable<Item> GetItems()
{
return context.Items.ToList();
}
This code return all items from database.
On the site I use paging so I need only 10 items per page.
So I did something like this:
var model = itemsRepository.GetItems().
Where(x => x.CategoryId == categoryId).
OrderByDescending(x => x.CreatedOnDate).
Skip(0).
Take(pageSize);
Now as I see what I did here is, I take all items from db and filter them.
Will I get some benefit if I put new method in domain and put the following code in it:
return context.Items.Where(x => x.CategoryId == categoryId).
OrderByDescending(x => x.CreatedOnDate).
Skip(0).
Take(pageSize);
Yes. You will get the benefit that your LINQ query in the latter case will get translated to SQL and executed in the database. Therefore, your first example will load the entire table into memory - while the second example will do a much more efficient query in the database.
Essentially, the .ToList() breaks deferred execution - but it might also make sense for you to return IQueryable<T> rather than IEnumerable<T>, and then working on that in upper layers - depending on your requirements. Also, try reading this question and answer.
Yes, you should. I'm assuming you're using SQL as the backend for your context, but the query that ends up getting constructed with your new method will only pull those 10 records out and return them as an IEnumerable (deferred execution) rather than pulling everything from the database and then just filtering out the first 10 results.
I think you're better off with the second (new) method using deferred execution.
Are you seeing an improvement in performance via SQL Profiler, too?
There are some problems in your code:
Do not use variable in class for context, every time you need it, create it and dispose it (with using):
using(var context = new ...)
{
// do DB stuffs
}
Do not call ToList(), to fetch all items, use normal paging then call ToList (something like your second sample but with using, ...).
The problem with the second approach is that the domain now is coupled to the context. This defeats one of the main purposes of the repository pattern. I suggest you have the second method inside the repository where you passed it the page number you want retrieved and it returns them to you. In your repository have something like
public IEnumerable<Item> GetItemsForPage(int pageNumber)
{
return context.Items.Where(x => x.CategoryId == categoryId).
OrderByDescending(x => x.CreatedOnDate).
Skip(pageNumber * pageSize). //Note not always 0
Take(pageSize);
}
In your domain you would call repository.GetItemsForPage(). This gives you the benefit of delayed execution while maintaining the decoupling of domain and context.