Extremely slow and inefficient query execution from Entity Framework

Extremely slow and inefficient query execution from Entity Framework - c#

I've got Entity Framework 4.1 with .NET 4.5 running on ASP.NET in Windows 2008R2. I'm using EF code-first to connect to SQL Server 2008R2, and executing a fairly complex LINQ query, but resulting in just a Count().
I've reproduced the problem on two different web servers but only one database (production of course). It recently started happening with no application, database structure, or server changes on the web or database side.
My problem is that executing the query under certain circumstances takes a ridiculous amount of time (close to 4 minutes). I can take the actual query, pulled from SQL Profiler, and execute in SSMS in about 1 second. This is consistent and reproducible for me, but if I change the value of one of the parameters (a "Date after 2015-01-22" parameter) to something earlier, like 2015-01-01, or later like 2015-02-01, it works fine in EF. But I put it back to 2015-01-22 and it's slow again. I can repeat this over and over again.
I can then run a similar but unrelated query in EF, then come back to the original, and it runs fine this time - same exact query as before. But if I open a new browser, the cycle starts over again. That part also makes no sense - we're not doing anything to retain the data context in a user session, so I have no clue whatsoever why that comes into play.
But this all tells me that the data itself is fine.
In Profiler, when the query runs properly, it takes about a second or two, and shows about 2,000,000 in reads and about 2,000 in CPU. When it runs slowly, it takes 3.5 minutes, and the values are 300,000,000 and 200,000 - so reads are about 150 times higher and CPU is 100 times higher. Again, for the identical SQL statement.
Any suggestions on what EF might be doing differently that wouldn't show up in the query text? Is there some kind of hidden connection property which might cause a different execution plan in certain circumstances?
EDIT
The query that EF builds is one of the ones where it builds a giant string with the parameter included in the text, not as a SQL parameter:
exec sp_executesql
N'SELECT [GroupBy1].[A1] AS [C1]
FROM (
SELECT COUNT(1) AS [A1]
...
AND ([Extent1].[Added_Time] >= convert(datetime2, ''2015-01-22 00:00:00.0000000'', 121))
...
) AS [GroupBy1]'
EDIT
I'm not adding this as an answer since it doesn't actually address the underlying issue, but this did end up getting resolved by rebuilding indexes and recomputing statistics. That hadn't been done in longer than usual, and it seems to have cleared up whatever caused the issue.
I'll keep reading up on some of the links here in case this happens again, but since it's all working now and unreproduceable, I don't know if I'll ever know for sure exactly what it was doing.
Thanks for all the ideas.

I recently had a very similar scenario, a query would run very fast executing it directly in the database, but had terrible performance using EF (version 5, in my case). It was not a network issue, the difference was from 4ms to 10 minutes.
The problem ended up being a mapping problem. I had a column mapped to NVARCHAR, while it was VARCHAR in the database. Seems inoffensive, but that resulted in an implicit conversion in the database, which totally ruined the performance.
I'm not entirely sure on why this happens, but from the tests I made, this resulted in the database doing an Index Scan instead of an Index Seek, and apparently they are very different performance-wise.
I blogged about this here (disclaimer: it is in Portuguese), but later I found that Jimmy Bogard described this exact problem in a post from 2012, I suggest you check it out.
Since you do have a convert in your query, I would say start from there. Double check all your column mappings and check for differences between your table's column and your entity's property. Avoid having implicit conversions in your query.
If you can, check your execution plan to find any inconsistencies, be aware of the yellow warning triangle that may indicate problems like this one about doing implicit conversion:
I hope this helps you somehow, it was a really difficult problem for us to find out, but made sense in the end.

Just to put this out there since it has not been addressed as a possibility:
Given that you are using Entity Framework (EF), if you are using Lazy Loading of entities, then EF requires Multiple Active Result Sets (MARS) to be enabled via the connection string. While it might seem entirely unrelated, MARS does sometimes produce this exact behavior of something running quickly in SSMS but horribly slow (seconds become several minutes) via EF.
One way to test this is to turn off Lazy Loading and either remove MultipleActiveResultSets=True; (the default is "false") or at least change it to be MultipleActiveResultSets=False;.
As far as I know, there is unfortunately no work-around or fix (currently) for this behavior.
Here is an instance of this issue: Same query with the same query plan takes ~10x longer when executed from ADO.NET vs. SMSS

There is an excellent article about Entity Framework performance consideration here.
I would like to draw your attention to the section on Cold vs. Warm Query Execution:
The very first time any query is made against a given model, the
Entity Framework does a lot of work behind the scenes to load and
validate the model. We frequently refer to this first query as a
"cold" query. Further queries against an already loaded model are
known as "warm" queries, and are much faster.
During LINQ query execution, the step "Metadata loading" has a high impact on performance for Cold query execution. However, once loaded metadata will be cached and future queries will run much faster. The metadata are cached outside of the DbContext and will be re-usable as long as the application pool lives.
In order to improve performance, consider the following actions:
use pre-generated views
use query plan caching
use no tracking queries (only if accessing for read-only)
create a native image of Entity Framework (only relevant if using EF 6 or later)
All those points are well documented in the link provided above. In addition, you can find additional information about creating a native image of Entity Framework here.

I don't have an specific answer as to WHY this is happening, but it certainly looks to be related with how the query is handled more than the query itself. If you say that you don't have any issues running the same generated query from SSMS, then it isn't the problem.
A workaround you can try: A stored procedure. EF can handle them very well, and it is the ideal way to deal with potentially complicated or expensive queries.

Realizing you are using Entity Framework 4.1, I would suggest you upgrade to Entity Framework 6.
There has been a lot of performance improvement and EF 6 is much faster than EF 4.1.
The MSDN article about Entity Framework performance consideration mentioned in my other response has also a comparison between EF 4.1 and EF 6.
There might be a bit of refactoring needed as a result, but the improvement in performance should be worth it (and that would reduce the technical debt at the same time).

Related

Entity Framework upgrade to 6.2.0 from 6.1.x breaks certain queries unless I enable MARS

I recently upgraded EF 6.1.3 to 6.2.0 on one of our large projects, and it has broken a significant amount of our LINQ queries. Enabling MultipleActiveResultSets causes everything to work as normal again, but I'm struggling to understand the change. We have been using EF for years and gone through multiple major version changes without any issue. If I simply revert back to 6.1.3, everything works again as expected - in fact everything works even if I explicitly disable MARS in 6.1.3.
Let me give a few simplified examples. The first problem is with nested queries:
foreach(var row in dbSet.Where(<condition>))
foreach(var innerRow in otherDbSet.Where(_ => _.Property == row.Property))
This works fine in 6.1.3, but in 6.2.0 throws a "There is already an open DataReader..." exception. I understand the nature of the exception, and I can solve this by calling ToList() on the outer query to push the results into memory first - what I don't understand is why I didn't have to do this in 6.1.3 (even with MARS disabled). It isn't always desirable to simply load the whole outer set into memory.
This also seems to impact lazy-loaded properties. For example, we build ComboBoxes from simple queries like this:
return db.Collection
.Where(<condition>)
.AsEnumerable()
.Select(_ => new ListItem(_.Id, _.LazyNavigationProperty.Description))
.ToList();
This works fine in 6.1.3, but again in 6.2.0 throws the "There is already an open DataReader..." exception. The fix is I now have to eager-load the navigation property.
Ultimately I don't have an explicit question, I'm just trying to understand why a minor version update seemingly caused major breaking changes in how queries are handled.
Moving forward, this impacts far too many queries for us to refactor. When I was researching the problem, I saw vague warnings about enabling MARS, but nobody really gave anything concrete. Is there a compelling reason not to enable it?

you get this error because you're iterating through a result set while trying to open another result set (while the first one did not finish yet)-> sort of lazy loading (the first 'for each' iteration in your case) -> there are a lot of ways to solve this as you've already seen for yourself: using toList (drop to memory first), because it's no longer using the datareader to open the set.
it looks like it MIGHT be related to a bug fix in 6.2 (release notes: https://entityframework.net/ef-version-history) - looks like related to: "Bug: Retrying queries or SQL commands fails with "The SqlParameter is already contained by another SqlParameterCollection.")
Regarding enabling MARS:
you can find special warning here:
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/enabling-multiple-active-result-sets

Entity Framework is supposed to deliver a tiny abstraction on your database model.
Such work requires performing multiple queries under the hood. The engine might also require more queries necessary when compared to the same workload encoded by hand.
This is a physiological evolution in order to be able to handle all possible user requests. Simply upgrading to a different Entity Framework version, can introduce differences on the database workload emitted under the hood.
MARS is required as EF changed the way object retrieval is performed (particularly, within loops combined with lazy loading). Unfortunately, most of the times, you are required to use MARS when using Entity Framework.
Nowadays, using async/await usually requires MARS too.
You can find additional information about how related entities are loaded on MSDN Loading Related Objects and Enabling Multiple Active Result Sets. This interesting blog goes a little more deeper.

Using EF to do a simple query is 1s faster than using Nhibernate.Why?

I thought NHibernate is faster than EF. But this code shows me, EF is around 1s, but Nhibernate is around 2~4s. Anything wrong with the query?
gist link:https://gist.github.com/d271f4ca0276cca7d481
It is a single table, no links, no relation with others, but just 300k rows data.
MySQL.EF5,NHibernate 3.3.

I can see nothing wrong with the query used in the test. The thing with the test is that it measures a bulk processing operation, which outside NHibernate's target use case.
Also, NHibernate does not have performance as the top goal, and it should not be evaluated solely on that parameter. That is to say, if performance is your single most important factor, you might be better of with something simpler.

from my experience with NHibernate the problem is, that the ISessionFactory needs the most time to be created. There all mappings are made, cache is initialized and so on.
Also querying with NHibernate works in another way than EF. EF compiles the "Linq"-Expression Tree and builds the SQL Statement based on the used Driver.
In NHibernate you write the query yourself in an own syntax. Compared to EF this is slower.
That's the experience I made. Maybe other folks can give a deeper dive.

GetExecutionPlan and EnsureConnection causing really slow performance for EF

I am using Entity Framework to look up (and save) an entity in my SQL Server 2008 R2 database. My problem is with a simple ObjectContext.FirstOrDefault call (though it is abstracted via an IRepository pattern).
I am noticing really really poor performance. So attached a profiler and found that the first query I run is where most of the slow downs are.
So the first thing I thought was that I have a bad index. But running a lookup in SSMS is nearly instantaneous. (That is not the problem.) Also tried switching the first query I call and the performance hit stayed mostly with the first query.
There are two methods that EF is calling that take a high percentage of my run time. They are GetExecutionPlan and EnsureConnection.
Are these just overhead that I have to deal with if I want to use EF? Or is there a way to optimize these calls?
One thing I thought of is re-using my Entity Framework ObjectContext. I think if I did that then some of the slow downs would be overcome by caching. However, I have read bad things about reusing the ObjectContext (which is why I was making a new one with each of my service calls).

IQueryable Count method takes longer to execute

With a WCF built on top of a DB containing around 200 tables and Entity Framework, it takes lot of time (around 2 mins) to perform login the first time after building the WCF.
Stepping into the code revealed the IQueryable.Count method to be the culprit.
This happens only the first time after building the WCF code. Consecutive execution of the Count method is fast as expected.
What could be the reason? Is entities doing some kind of a background caching of sort after rebuilding the code?
Please share your thoughts!
UPDATED:
#Craig: Thanks for the Pre-Generation of views link
Also, this link has lot of performance improvement suggestions for EF
Also, check out Lazy Loading for EF library.

This is a known problem that will be resolved with .NET 4.0.
When you first run a web based application, the code must be cached. From then on, it runs at full speed. The article shows current methods of avoiding this initial slowdown by pre-running the code before your first user hits the service.

The Lame Duck's answer is helpful (up-voted), but it does not tell the full story. The first time you execute an Entity Framework query, several things happen. One is view generation, where SQL is compiled for common queries, like loading entity sets and loading individual entities. But view generation can also be done at compile time, which saves the first, unlucky, person to run a query the performance overhead of this step. This step is repeated whenever a new ObjectContext is initialized, so the small overhead of doing view generation at compile time pays off in a big way at runtime. The second is compilation of an IQueryable into a canonical command tree, which can be optimized with CompiledQuery. You may be facing one or both of these issues, so before you write this off as a .NET 3.5 SP 1 problem, it is worth checking them out.

How is the performance of entity framework 4 vs entity framework 3.5?

I have one query on my page that takes at least a half second to execute using EF 3.5. When I used a stored procedure the speed was noticably faster. It is a very complex query. Will there be any performance improvements in the upcoming EF 4.0? And does EF 4.0 really beat out 3.5 performance wise?

The short answer is it's too early to tell. The .Net guys are focusing almost entirely on performance until the release on April 12th has to be finalized and localized. Also, what is meant by faster? Faster can be viewed in many ways, for example:
Entity Framework 4.0 has new features, the object tracking improvements alone may mean huge wins since you're not doing that manual work yourself...in any case, at least the development's faster.
If it didn't work at all before, lighter weight objects with POCO support may mean a lot less memory being shifted when dealing with lots of objects as well. No matter how small the cost of extra properties being populated when fetching from the DB, there is a cost both in instantiating and tracking them (load time and memory consumption).
In your specific case, a half second is a long time for anything but a very complex or high volume query...have you looked to see how much time is spent in the database and how much time is spent once .Net has the data? If you're spending most of your time outside of SQL then yes, the base improvements in reflections in Net 4.0 should provide you some speed improvement...however if you're spending all your time in SQL, it won't help much at all. The bulk of your performance problem may be indexing of the generated SQL and not Entity Framework hydration performance.
I would follow Kane's comment, look at the SQL it's generating for your query, is it possible for you to post this and the stored procedure that is quick so we can maybe find where the problem lies?

From the ADO.NET blog:
Customizing Queries – Adding support for existing LINQ operators,
recognizing a larger set of patterns
with LINQ, writing model defined
functions along with the ability to
use these in LINQ, and a number of
other ways to create and customize
queries.
SQL Generation Readability Improvements – Improving the
readability, along with TSQL
performance optimizations, of the
generated queries to make it much
easier to understand what is happening
So these two points imply you could see improvements in the way it's generating your query from LINQ.
However it's unlikely that an ORM will ever be able to out-perform a query you've written from scratch as it has to cater for so many different scenarios, and usually the most common one is defaulted to. EF 3.5 seemed to produce some very efficient join SQL when I used it, probably the best I've seen from an ORM so there is hope you can ditch the SP in 4.0.
If you've got a stored procedure I'm guessing it's a big query - sending this SQL text each time to the server will cause a lot of network traffic which is one other thing you may or may not have considered. Obviously on the same server or inside the same internal network this a 'cutting your hair to lose weight' style optimisation.

When it comes to really complex queries, I've not seen any evidence that any of L2S, NH, or EF can generate a better query plan than I can in a sproc. I love ORM's (especially NH), but there are still times when ORM execution time can get curbstomped by a well written sproc.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.