I noticed that if I specify a constant in an EF query that the value gets inlined, but if I specify that same value as a variable then EF creates a subquery for it and passes it in as a parameter. Are there any performance differences between the two approaches?
I have some massive Linq queries and I'm wondering if using constants might help with performance in terms of both the query execution (and plan caching) and in the translation from Linq to SQL.
I suggest you check out this article from TechNet:
https://technet.microsoft.com/en-us/library/ms175580(v=sql.105).aspx
It states that if you use literals, then the query optimizer should recognize it, but sometimes it might not.
The only difference between the execution plans for these queries is
the value stored for the comparison against the ProductSubcategoryID
column. While the goal is for SQL Server to always recognize that the
statements generate essentially the same plan and reuse the plans, SQL
Server sometimes does not detect this in complex SQL statements.
Note the words "complex" and "sometimes" — quite a concrete explanation, isn't it? :)
The article also goes on to explain that if you use parameters, it "helps" the engine to reuse plans (again, concrete), some things about simple parameterization and forced parameterization.
So the documentation says that it is not certain, but usually this should not make a difference. As or my own experience: I've found that the engine is able to recognize the constants in queries generated by EF quite well. I had the same question a while back, and did some checking on Azure SQL. I wouldn't say I examined the most complex generated SQL queries of the world, but they were not just simple select-where-let-join combos.
But again, that was for my queries in a specific version of the engine. To be sure, I would suggest you also check your queries in question, and then you can be sure.
So there's a few posts around complaining about the performance of the MySQL plugin for Entity Framework 6. Most of these, however, seem to come down to it generating bad SQL. I'm suffering with this issue but it seems to be due to a performance lag due to the plugin itself.
Here's my query in LINQ:
List<Address> matches = _rep.GetAddresses(s => s.AddressKey == cleanAddress).ToList();
And in the repository (_rep) I have this:
public IQueryable<Address> GetAddresses(Expression<Func<Address, bool>> query)
{
//var foo = Addresses.AsNoTracking().Where(query);
//var bar = foo.ToString();
return Addresses.AsNoTracking().Where(query);
}
So I'm already using AsNoTracking to try and improve performance. The commented out lines are there so I can see the SQL that's being generated, which turns out to be:
SELECT
`Extent1`.`AddressId`,
`Extent1`.`AddressKey`,
`Extent1`.`NameKey`,
`Extent1`.`Title`,
`Extent1`.`Forename`,
`Extent1`.`Surname`,
FROM `Addresses` AS `Extent1`
WHERE (`Extent1`.`AddressKey` = #p__linq__0)
OR ((`Extent1`.`AddressKey` IS NULL) AND (#p__linq__0 IS NULL))
Simple enough. Worth noting that AddressKey is a varchar(255) column with an index.
Now, here's the thing. If I stick that query into MySQL workbench and run it (with a varied value of #p__linq__0) it doesn't even register the run time. It lists the duration as 0.000 seconds.
However putting a Stopwatch around my query and logging the time taken to execute the Linq works out around 0.004 seconds. Not a big difference you might think, but this is part of a speed-critical application that runs this code millions of times over. It soon adds up.
I have the same issue with a later block of upsert code. Run natively in workbench and it's under a milisecond. Again, via EF, it takes 3-4 miliseconds.
Am I right in thinking this is down to weak design in the EF MySQL plugin? If so, can I presume that I'll run into the same problem if I try and change this to run via a stored procedure or submitting SQL directly with ExecuteSqlCommand()?
Is there anything else I can try to clear up this performance lag?
There is a big difference between running a generated query and a LINQ expression.
Let look what do Entity Framework
Convert the LINQ Expression to a DbExpression
This part can sometimes take more time than running the query itself. In some case, with multiple includes, I have seen performance as bad as a few hundreds of milliseconds.
Generate the Query or Take it from cache
It can take some time to generate the SQL Query the first time, but subsequence call will take the query generated from the cache.
The cache uses the DbExpression generated previously to create the cache key.
Execute the Query
Server Latency + Time to run the query
Object Materialization
Time to create entities. Normally very fast since you used AsNoTracking so needed to track them.
I may be wrong, but my guess is most of your time is taken when the LINQ Expression is converted to the DbExpression.
You can easily verify it by verifying the time taken to generate your Query without executing it via ToTraceString method. Be careful, when using directly to TraceString, it doesn't count the time taken by the MySQL Interceptor but you can a rough idea of the time at least.
Here is an example of ToTraceString extension method (I have not tested it): Obtain ToTraceString
I've got Entity Framework 4.1 with .NET 4.5 running on ASP.NET in Windows 2008R2. I'm using EF code-first to connect to SQL Server 2008R2, and executing a fairly complex LINQ query, but resulting in just a Count().
I've reproduced the problem on two different web servers but only one database (production of course). It recently started happening with no application, database structure, or server changes on the web or database side.
My problem is that executing the query under certain circumstances takes a ridiculous amount of time (close to 4 minutes). I can take the actual query, pulled from SQL Profiler, and execute in SSMS in about 1 second. This is consistent and reproducible for me, but if I change the value of one of the parameters (a "Date after 2015-01-22" parameter) to something earlier, like 2015-01-01, or later like 2015-02-01, it works fine in EF. But I put it back to 2015-01-22 and it's slow again. I can repeat this over and over again.
I can then run a similar but unrelated query in EF, then come back to the original, and it runs fine this time - same exact query as before. But if I open a new browser, the cycle starts over again. That part also makes no sense - we're not doing anything to retain the data context in a user session, so I have no clue whatsoever why that comes into play.
But this all tells me that the data itself is fine.
In Profiler, when the query runs properly, it takes about a second or two, and shows about 2,000,000 in reads and about 2,000 in CPU. When it runs slowly, it takes 3.5 minutes, and the values are 300,000,000 and 200,000 - so reads are about 150 times higher and CPU is 100 times higher. Again, for the identical SQL statement.
Any suggestions on what EF might be doing differently that wouldn't show up in the query text? Is there some kind of hidden connection property which might cause a different execution plan in certain circumstances?
EDIT
The query that EF builds is one of the ones where it builds a giant string with the parameter included in the text, not as a SQL parameter:
exec sp_executesql
N'SELECT [GroupBy1].[A1] AS [C1]
FROM (
SELECT COUNT(1) AS [A1]
...
AND ([Extent1].[Added_Time] >= convert(datetime2, ''2015-01-22 00:00:00.0000000'', 121))
...
) AS [GroupBy1]'
EDIT
I'm not adding this as an answer since it doesn't actually address the underlying issue, but this did end up getting resolved by rebuilding indexes and recomputing statistics. That hadn't been done in longer than usual, and it seems to have cleared up whatever caused the issue.
I'll keep reading up on some of the links here in case this happens again, but since it's all working now and unreproduceable, I don't know if I'll ever know for sure exactly what it was doing.
Thanks for all the ideas.
I recently had a very similar scenario, a query would run very fast executing it directly in the database, but had terrible performance using EF (version 5, in my case). It was not a network issue, the difference was from 4ms to 10 minutes.
The problem ended up being a mapping problem. I had a column mapped to NVARCHAR, while it was VARCHAR in the database. Seems inoffensive, but that resulted in an implicit conversion in the database, which totally ruined the performance.
I'm not entirely sure on why this happens, but from the tests I made, this resulted in the database doing an Index Scan instead of an Index Seek, and apparently they are very different performance-wise.
I blogged about this here (disclaimer: it is in Portuguese), but later I found that Jimmy Bogard described this exact problem in a post from 2012, I suggest you check it out.
Since you do have a convert in your query, I would say start from there. Double check all your column mappings and check for differences between your table's column and your entity's property. Avoid having implicit conversions in your query.
If you can, check your execution plan to find any inconsistencies, be aware of the yellow warning triangle that may indicate problems like this one about doing implicit conversion:
I hope this helps you somehow, it was a really difficult problem for us to find out, but made sense in the end.
Just to put this out there since it has not been addressed as a possibility:
Given that you are using Entity Framework (EF), if you are using Lazy Loading of entities, then EF requires Multiple Active Result Sets (MARS) to be enabled via the connection string. While it might seem entirely unrelated, MARS does sometimes produce this exact behavior of something running quickly in SSMS but horribly slow (seconds become several minutes) via EF.
One way to test this is to turn off Lazy Loading and either remove MultipleActiveResultSets=True; (the default is "false") or at least change it to be MultipleActiveResultSets=False;.
As far as I know, there is unfortunately no work-around or fix (currently) for this behavior.
Here is an instance of this issue: Same query with the same query plan takes ~10x longer when executed from ADO.NET vs. SMSS
There is an excellent article about Entity Framework performance consideration here.
I would like to draw your attention to the section on Cold vs. Warm Query Execution:
The very first time any query is made against a given model, the
Entity Framework does a lot of work behind the scenes to load and
validate the model. We frequently refer to this first query as a
"cold" query. Further queries against an already loaded model are
known as "warm" queries, and are much faster.
During LINQ query execution, the step "Metadata loading" has a high impact on performance for Cold query execution. However, once loaded metadata will be cached and future queries will run much faster. The metadata are cached outside of the DbContext and will be re-usable as long as the application pool lives.
In order to improve performance, consider the following actions:
use pre-generated views
use query plan caching
use no tracking queries (only if accessing for read-only)
create a native image of Entity Framework (only relevant if using EF 6 or later)
All those points are well documented in the link provided above. In addition, you can find additional information about creating a native image of Entity Framework here.
I don't have an specific answer as to WHY this is happening, but it certainly looks to be related with how the query is handled more than the query itself. If you say that you don't have any issues running the same generated query from SSMS, then it isn't the problem.
A workaround you can try: A stored procedure. EF can handle them very well, and it is the ideal way to deal with potentially complicated or expensive queries.
Realizing you are using Entity Framework 4.1, I would suggest you upgrade to Entity Framework 6.
There has been a lot of performance improvement and EF 6 is much faster than EF 4.1.
The MSDN article about Entity Framework performance consideration mentioned in my other response has also a comparison between EF 4.1 and EF 6.
There might be a bit of refactoring needed as a result, but the improvement in performance should be worth it (and that would reduce the technical debt at the same time).
What are the common things that we can keep in mind while writing the LINQ to SQL query for optimizing or speeding up the LINQ to SQL?.
For example, ordinarily, LINQ to SQL must translate LINQ queries to SQL every time a query executes; this involves recursing the expression tree that makes up the query in several stages. What we do is like precompiling the query using the CompiledQuery class.
There is one helpful thing about LINQ that every developer should know.
It is about performance of Join vs Where.
The full discussion can be seen here why is join so much faster than where
usually the native LINQ2SQL compiler lets you forget about the whole optimizing queries trouble, however there are some caveats regarding mostly with compiled queries abuse.
Here are some resources you should check about it. :)
http://visualstudiomagazine.com/articles/2010/06/24/five-tips-linq-to-sql.aspx
http://weblogs.asp.net/dixin/archive/2011/01/31/understanding-linq-to-sql-11-performance.aspx
I've inherited a C# / ASP.NET MVC / Entity Framework project with some slowness. There's not a lot of data in the DB but calls to .Include() were causing slowdowns.
However, I found something very strange. I have a 2k row table with just numbers (5 columns). I have indexes on the columns I'm searching.
When doing:
_entities.MyTable.Where(x=> x.Id1 == 4 && x.Id2 == 5).First()
it takes 1800ms on my development machine.
However, when I do :
_entities.MyTable.Where("it.Id1 = 4 and it.Id2 = 5").First()
it takes like 10ms.
What's the deal? I don't understand why the LINQ expression would be so slow.
Open Sql Profiler, look through the queries from EF. Try to analyze it, build plans. Seems that EF implenets queries in a strange way, without getting indexes.
Could it be that EF has to generate the SQL to do the where clause in the first example and in the second the SQL is much easier to generate as it can just plug in your already provided SQL?
Ive found that EF is very slow at generating queries, it seems unlikely in this case as its a rather simple query in both cases.
Have you tried compiling the first query and running it multiple times to check that the time to execute only includes actually running the SQL and not just generating it?