Entity Framework 5 performance concerns

Entity Framework 5 performance concerns - c#

Right now I'm working on a pretty complex database. Our object model is designed to be mapped to the database. We're using EF 5 with POCO classes, manually generated.
Everything is working, but there's some complaining about the performances. I've never had performance problems with EF so I'm wondering if this time I just did something terribly wrong, or the problem could reside somewhere else.
The main query may be composed of dynamic parameters. I have several if and switch blocks that are conceptually like this:
if (parameter != null) { query = query.Where(c => c.Field == parameter); }
Also, for some complex And/Or combinations I'm using LinqKit extensions from Albahari.
The query is against a big table of "Orders", containing years and years of data. The average use is a 2 months range filter though.
Now when the main query is composed, it gets paginated with a Skip/Take combination, where the Take is set to 10 elements.
After all this, the IQueryable is sent through layers, reaches the MVC layer where Automapper is employed.
Here, when Automapper starts iterating (and thus the query is really executed) it calls a bunch of navigation properties, which have their own navigation properties and so on. Everything is set to Lazy Loading according to EF recommendations to avoid eager loading if you have more than 3 or 4 distinct entities to include. My scenario is something like this:
Orders (maximum 10)
Many navigation properties under Order
Some of these have other navigation under them (localization entities)
Order details (many order details per order)
Many navigation properties under each Order detail
Some of these have other navigation under them (localization entities)
This easily leads to a total of 300+ queries for a single rendered "page". Each of those queries is very fast, running in a few milliseconds, but still there are 2 main concerns:
The lazy loaded properties are called in sequence and not parallelized, thus taking more time
As a consequence of previous point, there's some dead time between each query, as the database has to receive the sql, run it, return it and so on for each query.
Just to see how it went, I tried to make the same query with eager loading, and as I predicted it was a total disaster, with a translated sql of more than 7K lines (yes, seven thousands) and way more slow overall.
Now I'm reluctant to think that EF and Linq are not the right choice for this scenario. Some are saying that if they were to write a stored procedure which fetches all the needed data, it would run tens of times faster. I don't believe that to be true, and we would lose the automatic materialization of all related entities.
I thought of some things I could do to improve, like:
Table splitting to reduce the selected columns
Turn off object tracking, as this scenario is read only (have untracked entities)
With all of this said, the main complaint is that the result page (done in MVC 4) renders too slowly, and after a bit of diagnostics it seems all "Server Time" and not "Network Time", taking about from 8 to 12 seconds of server time.
From my experience, this should not be happening. I'm wondering if I'm approaching this query need in a wrong way, or if I have to turn my attention to something else (maybe a bad configured IIS server, or anything else I'm really clueless). Needles to say, the database has its indexes ok, checked very carefully by our dba.
So if anyone has any tip, advice, best practice I'm missing about this, or just can tell me that I'm dead wrong in using EF with Lazy Loading for this scenario... you're all welcome.

For a very complex query that brings up tons of hierarchical data, stored procs won't generally help you performance-wise over LINQ/EF if you take the right approach. As you've noted, the two "out of the box" options with EF (lazy and eager loading) don't work well in this scenario. However, there are still several good ways to optimize this:
(1) Rather than reading a bunch of entities into memory and then mapping via automapper, do the "automapping" directly in the query where possible. For example:
var mapped = myOrdersQuery.Select(o => new OrderInfo { Order = o, DetailCount = o.Details.Count, ... })
// by deferring the load until here, we can bring only the information we actually need
// into memory with a single query
.ToList();
This approach works really well if you only need a subset of the fields in your complex hierarchy. Also, EF's ability to select hierarchical data makes this much easier than using stored procs if you need to return something more complex than flat tabular data.
(2) Run multiple LINQ queries by hand and assemble the results in memory. For example:
// read with AsNoTracking() since we'll be manually setting associations
var myOrders = myOrdersQuery.AsNoTracking().ToList();
var orderIds = myOrders.Select(o => o.Id);
var myDetails = context.Details.Where(d => orderIds.Contains(d.OrderId)).ToLookup(d => d.OrderId);
// reassemble in memory
myOrders.ForEach(o => o.Details = myDetails[o.Id].ToList());
This works really well when you need all the data and still want to take advantage of as much EF materialization as possible. Note that, in most cases a stored proc approach can do no better than this (it's working with raw SQL, so it has to run multiple tabular queries) but can't reuse logic you've already written in LINQ.
(3) Use Include() to manually control which associations are eager-loaded. This can be combined with #2 to take advantage of EF loading for some associations while giving you the flexibility to manually load others.

Try to think of an efficient yet simple sql query to get the data for your views.
Is it even possible?
If not, try to decompose (denormalize) your tables so that less joins is required to get data. Also, are there efficient indexes on table colums to speed up data retrieval?
If yes, forget EF, write a stored procedure and use it to get the data.
Turning tracking off for selected queries is a-must for a read-only scenario. Take a look at my numbers:
http://netpl.blogspot.com/2013/05/yet-another-orm-micro-benchmark-part-23_15.html
As you can see, the difference between tracking and notracking scenario is significant.
I would experiment with eager loading but not everywhere (so you don't end up with 7k lines long query) but in selected subqueries.

One point to consider, EF definitely helps make development time much quicker. However, you must remember that when you're returning lots of data from the DB, that EF is using dynamic SQL. This means that EF must 1. Create the SQL, 2.SQL Server then needs to create an execution plan. this happens before the query is run.
When using stored procedures, SQL Server can cache the execution plan (which can be edited for performance), which does make it faster than using EF. BUT... you can always create your stored proc and then execute it from EF. Any complex procedures or queries I would convert to stored procs and then call from EF. Then you can see your performance gain(s) and reevaluate from there.

In some cases, you can use Compiled Queries MSDN to improve query performance drastically. The idea is that if you have a common query that is run many times that might generate the same SQL call with different parameters, you compile the query tie first time it's run then pass it as a delegate, eliminating the overhead of Entity Framework re-generating the SQL for each subsequent call.

Related

Load Entities AsNoTracking() with navigation properties, without specifying includes

I would like to know if the following scenario is possible with Entity Framework:
I want to load several tables with the option AsNoTracking since they are all like static tables that cannot be changed by user.
Those tables also happen to be navigation property of others. Up till now I relied on the AutoMapping feature of the Entity Framework, and don't use the .Include() or LazyLoading functionality.
So instead of:
var result = from x in context.TestTable
.Include("ChildTestTable")
select x;
I am using it like this:
context.ChildTestTable.Load();
context.TestTable.Load();
var result = context.TestTable.Local;
This is working smoothly because the application is so designed that the tables within the Database are very small, there won't be a table that exceeds 600 rows (and that's already pretty high value in my app).
Now my way of loading data, isn't working with .AsNoTracking().
Is there any way to make it working?
So I can write:
context.ChildTestTable.AsNoTracking().List();
var result = context.TestTable.AsNoTracking().List();
Instead of:
var result = from x in context.TestTable.AsNoTracking()
.Include("ChildTestTable")
select x;
So basically, I want to have 1 or more tables loaded with AutoMapping feature on but without loading them into the Object State Manager, is that a possibility?

The simple answer is no. For normal tracking queries, the state manager is used for both identity resolution (finding a previously loaded instance of a given entity and using it instead of creating a new instance) and fixup (connecting navigation properties together). When you use a no-tracking query it means that the entities are not tracked in the state manager. This means that fixup between entities from different queries cannot happen because EF has no way of finding those entities.
If you were to use Include with your no-tracking query then EF would attempt to do some fixup between entities within the query, and this will work a lot of the time. However, some queries can result in referencing the same entity multiple times and in some of those cases EF has no way of knowing that it is the same entity being referenced and hence you may get duplicates.
I guess the thing you don't really say is why you want to use no-tracking. If your tables don't have a lot of data then you're unlikely to see significant perf improvements, although many factors can influence this. (As a digression, using the ObservableCollection returned by .Local could also impact perf and should not be necessary if the data never changes.) Generally speaking you should only use no-tracking if you have an explicit need to do so since otherwise it ends up adding complexity without benefit.

optimize linq query with related entities

i am new to linq, i started writing this query:
var dProjects = Projects
.Select(p => new Models.Project {
ProjectID = p.ProjectID,
Status = p.Status,
ExpiresOn = p.ExpiresOn,
LatestComments = p.ProjectComments
.OrderByDescending(pc => pc.CreatedOn)
.Select(pc => pc.Comments)
.FirstOrDefault(),
ProjectFileIDs = p.ProjectFiles
.Select(pf => pf.BinaryFileID)
.AsQueryable()
})
.AsQueryable<Models.Project>();
I already know this query will perform really slow because related entities like ProjectComments and ProjectFiles will create nested selects, though it works and gives me right results that i need.
How can i optimize this query and get the same results? One of my guesses would be using inner join but ProjectComments and ProjectFiles already has a relationship in database through keys, so not sure what we can achieve by setting the relationship again.
Basically, need to know which is the best approach to take here from performance perspective. One thing to note is i am sorting ProjectComments and only taking the most recent one. Should i be using combination of join and group by into? Help will be much appreciated. Thanks.
UPDATED:
Sorry, if i wasn't clear enough on what i am trying to do. Basically, in front end, i have a grid, which shows list of projects with latest project comments and list of all the files associated to project, so users can click on those links and actually open those documents. So the query that i have above is working and it does show the following in the grid:
Project ID (From Project table)
Status (From Project table)
ExpiresOn (From Project table)
LatestComments (latest entry from ProjectComments table which has project ID as foreign key)
ProjectFileIDs (list of file ids from ProjectFiles table which has Project ID as foreign key - i am using those File IDs and creating links so users can open those files).
So everything is working, i have it all setup, but the query is little slow. Right now we have very little data (only test data), but once this is launched, i am expecting lot of users/data and thus i want to optimize this query to the best, before it goes live. So, the goal here is to basically optimize. I am pretty sure this is not the best approach, because this will create nested selects.

In Entity Framework, you can drastically improve the performance of the queries by returning the objects back as an object graph instead of a projection. Entity Framework is extremely efficient at optimizing all but the most complex SQL queries, and can take advantage of deferred "Eager" loading vs. "Lazy" Loading (not loading related items from the db until they are actually accessed). This MSDN reference is a good place to start.
As far as your specific query is concerned, you could use this technique something like the following:
var dbProjects = yourContext.Projects
.Include(p => p.ProjectComments
.OrderByDescending(pc => pc.CreatedOn)
.Select(pc => pc.Comments)
.FirstOrDefault()
)
.Include(p => p.ProjectFileIDs)
.AsQueryable<Models.Project>();
note the .Include() being used to imply Eager Loading.
From the MDSN Reference on Loading Related Objects,
Performance Considerations
When you choose a pattern for loading related entities, consider the behavior of each approach with regard to the number and timing of connections made to the data source versus the amount of data returned by and the complexity of using a single query. Eager loading returns all related entities together with the queried entities in a single query. This means that, while there is only one connection made to the data source, a larger amount of data is returned in the initial query. Also, query paths result in a more complex query because of the additional joins that are required in the query that is executed against the data source.
Explicit and lazy loading enables you to postpone the request for related object data until that data is actually needed. This yields a less complex initial query that returns less total data. However, each successive loading of a related object makes a connection to the data source and executes a query. In the case of lazy loading, this connection occurs whenever a navigation property is accessed and the related entity is not already loaded.

Do you get any boost in performance if you add Include statements before the Select?
Example:
var dProjects = Projects
.Include(p => p.ProjectComments)
.Include(p => p.ProjectFiles)
Include allows all matching ProjectComments and ProjectFiles to be eagerly loaded. See Loading Related Entities for more details.

How does the Entity Framework walks through a collection that is too big?

I am new to Entity Framework.
And I have one concern:
I need to walktrhough a quite big amount of data that is gathered via a LINQ to Entities that combines couple of properties from different entities in an anonymous type.
If I need to read the returned items of this query one by one until the end, am I under the risk of OutOfMemory exception as the collection is BIG or EF uses SqlDataReader implicitly?
( Or should I use EntityDateReader to ensure that I am reading the Db in a sequential order (But then I have to generate my query as a string I guess) )

As I see it there are two things you can do, firstly turn off tracking by using .AsNoTracking this will in most cases cut your memory set in half which may be enough.
If your set is still too big use skip and take to pull down the resultset in chunks. You should also use this in conjunction with AsNoTracking to ensure no memory is consumed with tracking
EDIT:
For example you could use something like the following the following to loop through all items in chunks of 1000. The below code should only hold 1000 items at a time in memory.
int numberOfItems = ctx.MySet.Count();
for(int i = 0; i < numberOfItems + 1000; i+=1000)
{
foreach(var item in ctx.MySet.AsNoTracking().Skip(i).Take(1000).AsEnumerable())
{
//do stuff with your entity
}
}

If the amount of the data is as big as you said i would recommend that you don't use EF for such a case. EF is great but you sometimes need to fallback and use standard SQL to get better performance.
Take a look at Dapper.NET https://github.com/SamSaffron/dapper-dot-net
If you really want to use EF in every case i would recommend that you use a Bounded Contexts (multiple DBContexts).
Splitting your model in multiple smaller contexts will improve the performance as you will use less resources when EF creates an in-memory model of the context.
The larger the context is the more resources are expended to generate
and maintain that in-memory model.
Also you can Detach and Attach when sharing instances of multiple context so you still don't load all the model.

Nhibernate Polymorphic Query - Eager Load Associations Without Polymorphic Fetch

I will start by saying I have already looked thoroughly in stack overflow, nhusers and the documentation for a possible solution to my issue.
I need to be able to query only the base class table in parts of my multi/future query when eagerly loading associations (although from the research I have done I don't think this is possible)
I have started to map an existing schema using fluent nhibernate as a proof of concept. I have mapped an inheritance hierarchy using table per sub class (The mappings all work perfectly fine so I won't paste them all in here). The hierarchy has around 15 sub classes and the base class has some additional associations. E.g.
Base
Dictionary<string, Attribute> Attributes
List<EntityChange> Changes
I need to eagerly load both of the collections as in the given scenario they are required for post processing and lazily loading them causes performance issues. I am eagerly loading them by a multi / future query:
var baseQuery = session.CreateCriteria<Base>("b")
.CreateCriteria("Nested", JoinType.LeftOuterJoin)
.CreateCriteria("Nested2", JoinType.LeftOuterJoin)
.CreateCriteria("Nested2.AdditionalNested", JoinType.LeftOuterJoin);
var logsQuery = session.CreateCriteria<Base>("b").CreateAlias("Changes", "c", JoinType.LeftOuterJoin,
Expression.And(Expression.Ge("c.EntryDate", changesStartDate), Expression.Le("c.EntryDate", changesEndDate)))
.AddOrder(Order.Desc("c.EntryDate"));
var attributesQuery = session.CreateCriteria<Base>("t").SetFetchMode("Attributes", FetchMode.Join);
logsQuery.Future<Base>();
attributesQuery.Future<Base>();
var results = baseQuery.Future<Base>().ToList();
The queries execute and return the correct results. But just to eagerly load the associations in this manner means that the attribute and changes queries have to perform a polymorphic fetch (the addition of about 15 left outer joins per query that aren't required). I know this is required for polymorphic querying but the base query will return the hierarchy that I desire. The other parts of the multi query that are issuing a polymorphic query are redundant.
I haven't yet mapped the whole of the hierarchy so there will be additional unecessary joins being performed and there are also other associations that could be loaded up front. These two combined without the addition of an increase in volume will lead to performance issues. The performance currently of this query is about 6 seconds (which admittedly is better than the 20 it's currently taking) but by messing around a bit with the query and taking out the extra joins I can get it down to about 2 seconds (this is a common query so getting it as low as possible is beneficial not just pleasing to me. It will also be run from multiple distributed machine so I would rather not get into a discussion about caching / 2nd level caching).
I have tried
using the class modifier in the query 'class = base'. I initially done this blindly but believe this is for discriminator values. Even if it is for the case statement in the SQL this will not prevent the extra joins.
Doing everything in a single query. This is slower than splitting it up as above and gives the cartesian product
Using 'Polymorphism.Explicit();' in the fluent mappings. This has no effect as I am using ClassMap with SubclassMaps so it is ignored. I tried changing all the maps to ClassMaps and using Join but this didn't give the desired behaviour.
Tried to trick nhibernate into joining the base class table onto itself for loading associations (basically load a more concrete type to prevent the polymorphic query) - create a derived class 'BaseOnlyLoading' which uses the same table and primary key as the base class. This was obviously a hack but I was just trying to see what's possible. NHibernate doesn't allow the class and sub class to use the same table.
Define the BaseOnlyLoadingMap to be a classmap with the same assocations as the BaseMap with a join back onto the Base. This was hopeful as assocation collections are resolved in the context based on full type name.
Use an interceptor which modifies the SQL that before it's execute. I wouldn't use this in production and just tried it out of interest. I passed an interceptor into a local session. This caused issues and I didn't proceed.
The HQL 'Type' query operator as explained here although I am not sure this has been implemented in the .NET version and might behave similarly to 1.
There is comment on highest rated answer (How to perform a non-polymorphic HQL query in Hibernate?) which suggest overriding the IsExplicitPolymorphism on the persister. I had a quick look and from what I remember the persister was either global per entity or created in the SessionImpl from a static factory which would prevent doing this. Even if this was possible I am not sure what sort of side effects this would have.
I tried using some SQL to load everything but even if I use a stored proc I am not sure how nhibernate will piece the graph back together. Maybe I could specify all the entities and aliases?
Specifying explicit per query would be nice. Any suggestions?
Thanks in advance.

Which ORM Supports Multiple row Updates and Deletes

I have a query that deletes all rows that have been marked for deletion. There is a column in a table that is named IsDeleted. It is a boolean data type if it is true the row is suppose to be deleted along with all related rows in different tables.
If an article row is marked for deletion then the article comments, votes are also supose to be deleted. Which ORM can efficiently handle this?
Edit
I need this for C# .NET

DataObjects.Net offers an intermediate solution:
Currently it can't perform server-side deletion of entities selected by query. This will be implemented some day, but for now there is another solution.
One the other hand, is supports so-called generalized batching: queries it sends are sent in batches by up to 25 items at once, when this is possible. "Possible" means "query result won't be necessary right now". This is almost always correct for creations, updates and deletes. Since such queries always lead to a single (or few, if there is inheritance) seek operations, they're pretty cheap. If they're sent in bulks, SQL Server can cache plans for the whole bulks, not for just individual queries there.
So this is very fast, although not yet ideal:
For now DO4 doesn't use IN (...) to optimize such deletions.
So far it doesn't support asynchronous batch execution. When this is done (I hope this will be done in a month or so), its speed on CUD (a subset from CRUD) operations will be nearly the same as of SqlBulkCopy (~= 1.5 ... 2 times faster than now).
So in case with DO bulk deletion looks as follows:
var customersToRemove =
from customer in Query<Customer>.All
where customer.IsDeleted
select customer;
foreach (customer in customersToRemove)
customer.Remove(); // This will be automatically batched
I can name a benefit of this approach: any of such objects will be able to react on deletion; Session event subscribers will be notified about each deletion as well. So any common logic related to deletions will work as expected. This is impossible, if such operation is executed on server.
Code for soft delete must look like:
var customersToRemove =
from customer in Query<Customer>.All
where ...
select customer;
foreach (customer in customersToRemove)
customer.IsRemoved = true; // This will be automatically batched
Obviously, such an approach is slower that bulk server-side update. By our estimates, what we have now is about 5 times slower than true server-side deletion in worst case ([bigint Id, bigint Value] table, clustered primary index, no other indexes); on real-life cases (more columns, more indexes, more data) it must bring a comparable performance right now (i.e. be 2-3 times slower). Asynchronous batch execution will improve this further.
Btw, we shared tests for bulk CUD operations with entities for various ORM frameworks at ORMBattle.NET. Note that tests there don't use bulk server-side updates (in fact, such test would be a test for database performance rather than ORM); instead they test if ORM is capable of optimizing this. Anyway, the info provided there + test code might be helpful, if you're evaluating multiple ORM tools.

Typically, if you are already using an IsDeleted flag paradigm, the items are normally ignored by the application object model, and this is efficient and reliable because no referential integrity is needed to be checked (no cascade), and no data is permanently destroyed.
If you want IsDeleted rows purged on a regular basis, it is far more efficient to schedule these as batch jobs in the RDBMS using native SQL, as long as you remove things in the right order so that referential integrity is not compromised. If you do not enforce referential integrity at the DB-level, then the order doesn't matter.
Even with strong referential integrity and constraints in all my database designs over the years, I have never used cascading RI - it has never been desirable in my designs.

NHibernate supports HQL (the object oriented Hibernate Query Language) updates and deletes.
There are some examples in this Blog Post by Fabio Maulo and this Blog Post by Ayende Rahien.
It would probably look like this:
using (var session = OpenSession())
using (var tx = s.BeginTransaction())
{
session
.CreateQuery("delete from Whatever where IsDelete = true")
.ExecuteUpdate();
tx.Commit();
}
Note: this is not SQL. This is HQL containing class names and property names and it translates to (almost) any database.

Which ORM's support "criteria" based delete... Both of the ones i've worked with (Propel, Doctrine). I would think that nearly all do unless they are early in development as its pretty basic thing. But what language are you working?
As far as your deletion cascade, this is best implemented at the database level with foreign keys. Most RDBMS's support this. If youre using something that doesn't some ORM's implement this as well if support isnt available. But my advice would be to just use an RDBMS that does support it. It will be less headaches in the long run.

I am using LLBLgen, which can do cascading deletes. You might want to try it, it's very good.
Example. Delete all users in username[] from all roles in rolenames[]
string[] usernames;
string[] rolenames;
UserRoleCollection userRoles = new UserRoleCollection ();
PredicateExpression filter = new PredicateExpression();
filter.Add(new FieldCompareRangePredicate(UserFields.logincode, usernames));
filter.AddWithAnd(new FieldCompareRangePredicate(RoleFields.Name, rolenames));
userRoles.DeleteMulti(filter)

Most ORMs will allow you to either give SQL hints, or execute SQL within their framework.
For example, you can use the ExecuteQuery method in DLINQ to do what you want. Here is a brief tutorial on using custom sql with DLINQ.
http://weblogs.asp.net/scottgu/archive/2007/08/27/linq-to-sql-part-8-executing-custom-sql-expressions.aspx
I would expect that the Entity Framework would also allow it, but I have never used it but you can look into it.
Basically, find an ORM that has the features you need, and then you could ask how to do this query in your selected ORM. I think picking an ORM for this one feature is risky as there are many other factors that should go into the selection.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.