I have a linq query that takes 11 minutes to execute against MSSQL server 2008. I used MSSQL Profiler to find the query taking so long to execute, and I ran it alone towards my database.
I also removed all parameters and added the values directly and ran the query. It took less then 1 second to execute!
I have googled and found that using parameters can really impact the performance because the plan is compiled before the value of the where clause is known.
Since Linq to SQL always run parametrized SQL, what can I do to improve performance in this case?
I haven't found anything I can improve on columns regarding indexes. The first table in the Inner Join statement has 192 014 rows, and the SQL without parameters takes less than a second to execute. Screenshots of execution plans attached.
Edits are below the screenshots.
This is the Linq query:
var criteria = CreateBaseCriteria();
var wordsGroup = from word in QueryExecutor.GetSearchWords()
join searchEntry in QueryExecutor.GetReportData(criteria) on (word.SearchID + 100000000) equals searchEntry.EventId
group searchEntry by word.SearchWord into wg
select new SearchAggregate
{
Value = wg.Key,
FirstTime = wg.Min(l => l.EventTime),
LastTime = wg.Max(l => l.EventTime),
AverageHits = wg.Average(l => l.NumberOfHits.HasValue ? l.NumberOfHits.Value : 0),
Count = wg.Count()
};
return wordsGroup.OrderByDescending(w => w.Count).Take(maxRows);
Edit: The screenshots did go a little small in here. There are only 5 parameters in the parametrized SQL.
Edit 2: It is the Inner Join statement with parameter #p0 which causes the execution plan to change. When I only removed #p0 variable with the value itself, it runs in less then a second. If this value is constant in all cases (I have to investigate that) can I do anything so that this value doesn't get used like a parameter?
It is advising you to create an index in green colour above you query plan. Try this first.
I found a way to go around this statement, which is causing the execution time to just grow bigtime:
on (word.SearchID + 100000000) equals searchEntry.EventId
What I did was to add a computed column [SearchIdUnique] AS ([SearchID]+(100000000)). Then I can change my Linq query to this:
on word.SearchIdUnique equals searchEntry.EventId
The query execution is down to less than a second, and issue solved.
Related
I need to get the amount of records with a certain filter.
Theoretically this instruction:
_dbContext.People.Count (w => w.Type == 1);
It should generate SQL like:
Select count (*)
from People
Where Type = 1
However, the generated SQL is:
Select Id, Name, Type, DateCreated, DateLastUpdate, Address
from People
Where Type = 1
The query being generated takes much longer to run in a database with many records.
I need to generate the first query.
If I just do this:
_dbContext.People.Count ();
Entity Framework generates the following query:
Select count (*)
from People
.. which runs very fast.
How to generate this second query passing search criteria to the count?
There is not much to answer here. If your ORM tool does not produce the expected SQL query from a simple LINQ query, there is no way you can let it do that by rewriting the query (and you shouldn't be doing that at the first place).
EF Core has a concept of mixed client/database evaluation in LINQ queries which allows them to release EF Core versions with incomplete/very inefficient query processing like in your case.
Excerpt from Features not in EF Core (note the word not) and Roadmap:
Improved translation to enable more queries to successfully execute, with more logic being evaluated in the database (rather than in-memory).
Shortly, they are planning to improve the query processing, but we don't know when will that happen and what level of degree (remember the mixed mode allows them to consider query "working").
So what are the options?
First, stay away from EF Core until it becomes really useful. Go back to EF6, it's has no such issues.
If you can't use EF6, then stay updated with the latest EF Core version.
For instance, in both v1.0.1 and v1.1.0 you query generates the intended SQL (tested), so you can simply upgrade and the concrete issue will be gone.
But note that along with improvements the new releases introduce bugs/regressions (as you can see here EFCore returning too many columns for a simple LEFT OUTER join for instance), so do that on your own risk (and consider the first option again, i.e. Which One Is Right for You :)
Try to use this lambda expression for execute query faster.
_dbContext.People.select(x=> x.id).Count();
Try this
(from x in _dbContext.People where x.Type == 1 select x).Count();
or you could do the async version of it like:
await (from x in _dbContext.People where x.Type == 1 select x).CountAsync();
and if those don't work out for you, then you could at least make the query more efficient by doing:
(from x in _dbContext.People where x.Type == 1 select x.Id).Count();
or
await (from x in _dbContext.People where x.Type == 1 select x.Id).CountAsync();
If you want to optimize performance and the current EF provider is not not (yet) capable of producing the desired query, you can always rely on raw SQL.
Obviously, this is a trade-off as you are using EF to avoid writing SQL directly, but using raw SQL can be useful if the query you want to perform can't be expressed using LINQ, or if using a LINQ query is resulting in inefficient SQL being sent to the database.
A sample raw SQL query would look like this:
var results = _context.People.FromSql("SELECT Id, Name, Type, " +
"FROM People " +
"WHERE Type = #p0",
1);
As far as I know, raw SQL queries passed to the FromSql extension method currently require that you return a model type, i.e. returning a scalar result may not yet be supported.
You can however always go back to plain ADO.NET queries:
using (var connection = _context.Database.GetDbConnection())
{
connection.Open();
using (var command = connection.CreateCommand())
{
command.CommandText = "SELECT COUNT(*) FROM People WHERE Type = 1";
var result = command.ExecuteScalar().ToString();
}
}
It seems that there has been some problem with one of the early releases of Entity Framework Core. Unfortunately you have not specified exact version so I am not able to dig into EF source code to tell what exactly has gone wrong.
To test this scenario, I have installed the latest EF Core package and managed to get correct result.
Here is my test program:
And here is SQL what gets generated captured by SQL Server Profiler:
As you can see it matches all the expectations.
Here is the excerpt from packages.config file:
...
<package id="Microsoft.EntityFrameworkCore" version="1.1.0" targetFramework="net452" />
...
So, in your situation the only solution is to update to the latest package which is 1.1.0 at the time of writing this.
Does this get what you want:
_dbContext.People.Where(w => w.Type == 1).Count();
I am using EFCore 1.1 here.
This can occur if EFCore cannot translate the entire Where clause to SQL. This can be something as simple as DateTime.Now that might not even think about.
The following statement results in a SQL query that will surprisingly run a SELECT * and then C# .Count() once it has loaded the entire table!
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > DateTime.Now.AddDays(-7)).Count();
But this query will run an SQL SELECT COUNT(*) as you would expect / hope for:
DateTime earliestDate = DateTime.Now.AddDays(-7);
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template
&& x.SendConfirmedDate > earliestDate).Count();
Crazy but true. Fortunately this also works:
DateTime now = DateTime.Now;
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > now.AddDays(-7)).Count();
sorry for the bump, but...
probably the reason the query with the where clause is slow is because you didnt provide your database a fast way to execute it.
in case of the select count(*) from People query we dont need to know the actual data for each field and we can just use a small index that doesnt have all these fields in them so we havent got to spend our slow I/O on. The database software would be clever enough to see that the primary key index requires the least I/O to do the count on. The pk id's require less space than the full row so you get more back to count per I/O block so you can complete faster.
Now in the case of the query with the Type it needs to read the Type to determine it's value. You should create an index on Type if you want your query to be fast or else it will have to do a very slow full table scan, reading all rows. It helps when your values are more discriminating. A column Gender (usually) only has two values and isnt very discriminating, a primary key column where every value is unique is highly dscriminating. Higher discriminating values will result in a shorter index range scan and a faster result to the count.
What I used to count rows using a search query was
_dbContext.People.Where(w => w.Type == 1).Count();
This can also be achieved by
List<People> people = new List<People>();
people = _dbContext.People.Where(w => w.Type == 1);
int count = people.Count();
This way you will get the people list too if you need it further.
I have a simple count query using LINQ and EF:
var count = (from I in db.mytable
where xyz
select I).Count();
the code above shows the query being locked in the database.
while the execute sql executes right away:
var count = db.SqlQuery<int>("select count(*) from mytable where xyz").FirstOrDefault();
the code above returns immediately.
I few have suggested to remove the .ToList() which I did and not difference. One thing is that this only happens on the PROD server. The QA server executes pretty fast as expected. But the prod server shows that it gets suspended. I suspect this could be a data storage limitation or server related. But wanted to make sure I am not doing something stupid in the code.
UPDATE:
One thing I noticed is the first time it execute is takes longer the first time. When I set next statement to run it again, it executes immediately. Is there a compile of the query the first time?
Because you are calling ToList in the first query and that causes fetching all records from DB and do the counting in memory. Instead of ToList you can just call Count() to get the same behaviour:
var count = (from I in db.mytable
where xyz
select I).Count();
You must not call .ToList() method, because you start retrieve all objects from database.
Just call .Count()
var count = (from I in db.mytable
where xyz
select I).Count();
Count can take a predicate. I'm not sure if it will speed up your code any but you can write the count as such.
var count = db.mytable.Count(x => predicate);
Where predicate is whatever you are testing for in your where clause.
Simple fiddling in LINQPad shows that this will generate similar, if not exactly the same, SQL as above. This is about the simplest way, in terseness of code, that I know how to do it.
If you need much higher speeds than what EF provides, yet stay in the confines of EF without using inline SQL you could make a stored procedure and call it from EF.
This seems like an inconsistency, but I'm probably just missing something obvious. The base query is:
var events = db.tbl_special_events.Where(x => x.TimeStamp >= startDate);
The apparent inconsistency comes when I run the following code block:
int c1 = 0;
foreach (var e in events)
{
if (e.TimeStamp.DayOfWeek.ToString() == "Tuesday") c1++;
}
int c2 = events.Where(e => e.TimeStamp.DayOfWeek.ToString() == "Tuesday").Count();
After that runs, c1 is 1832, but c2 is 0. What am I missing?
I recreated this test and found that it may be directly related to the DateTime function.
The query that is generated:
exec sp_executesql N'SELECT [t0].[User_ID], [t0].[Email], [t0].[Password], [t0].[BrandID], [t0].[CustomerID], [t0].[Created], [t0].[EndDate], [t0].[AccountStatusID], [t0].[Verified], [t0].[ResetPasswordFailCount], [t0].[ResetPasswordLocked]
FROM [dbo].[User] AS [t0]
WHERE ((CONVERT(NVarChar,CONVERT(Int,(DATEPART(dw, [t0].[Created]) + (##DATEFIRST) + 6) % 7))) = #p0) AND ([t0].[BrandID] = #p1)',N'#p0 nvarchar(4000),#p1 int',#p0=N'Tuesday',#p1=3
Notice where #p0=N'Tuesday'
Keeping in mind that IQueryable and IEnumerable differ in that where IEnumerable represents an actual .net object, IQueryable converts your expression into an actual SQL statement used to query the database. So any values that you provide in that expression are actually sent to the database.
It is returning 0 results because there is no match. Reason being, the date conversion in SQL returns a 2 instead of 'Tuesday'. You can test this if you replace Tuesday with 2 in your LINQ WHERE clause, it'll actually work. This will work after enumerating it since the results will have been successfully mapped into a usable .net object where the DateTime.DayOfWeek conversion to "Tuesday" will work properly.
Your count is acting on an IQueryable<Event> while e is an enumerated instance. Therefore, some operations might not work as they cannot be translated into SQL [Edit: or will be translated in nonsensical SQL].
To ensure your Where clause works add an AsEnumerable() before it. This converts the IQueryable<Event> into an IEnumerable<Event> and tells the linq provider that it should stop generating SQL at this point.
So, this statement should provide the correct result:
int c2 = events.AsEnumerable()
.Where(e => e.TimeStamp.DayOfWeek.ToString() == "Tuesday")
.Count();
The actual code that cannot be converted causes the problem in SQL is e.TimeStamp.DayOfWeek.ToString().
Alternatively you can use the System.Data.Objects.SqlClient.SqlFunctions (doc here) class to hint the linq provider what it should be doing in SQL.
Edit
As #Servy pointed out this is a Linq to SQL question. However, the problem is quite common so I leave the answer and do not delete it.
On looking at the OP again, there could be another variable to the whole game ... lazy loading.
In the foreach loop the TimeStamp is lazy loaded. In the count query the provider tries to construct an SQL query, during this construction it might not be able to work with ToString (which is known to be problematic) and evaluate e.TimeStamp.DayOfWeek.ToString() to something different than "Tuesday".
The AsEnumerable() forces the provider to stop generating SQL, so that e.TimeStamp is lazy loaded again.
The only way kowing exactly what is going on, is to use tracing tool on the DB (e.g. SQL Server Profiler) to actually see the query (or queries) executed on the server.
Edit 2
Building on #Sinaesthetic's answer the reason why 0 is returned is that the Query tries to compare "4" to "Tuesday" which returns false and therefore the correct result is false.
Can be tested by executing
select ((CONVERT(NVarChar,CONVERT(Int,(DATEPART(dw, getdate()) + (##DATEFIRST) + 6) % 7))))
against the DB.
However, this also shows that it is up to the provider to decide what SQL it generates. Also it is up to the provider to decide what statements it supports and which statements it doesn't.
It also shows that using AsEnumerable to stop the SQL generation process can have an infulence on the semantic evaluation of a query.
I got a very simple LINQ query:
List<table> list = ( from t in ctx.table
where
t.test == someString
&& t.date >= dateStartInt
&& t.date <= dateEndInt
select t ).ToList<table>();
The table which gets queried has got about 30 million rows, but the columns test and date are indexed.
When it should return around 5000 rows it takes several minutes to complete.
I also checked the SQL command which LINQ generates.
If I run that command on the SQL Server it takes 2 seconds to complete.
What's the problem with LINQ here?
It's just a very simple query without any joins.
That's the query SQL Profiler shows:
exec sp_executesql N'SELECT [t0].[test]
FROM [dbo].[table] AS [t0]
WHERE ([t0].[test] IN (#p0)) AND ([t0].[date] >= #p1)
AND ([t0].[date] <= #p2)',
N'#p0 nvarchar(12),#p1 int,#p2 int',#p0=N'123test',#p1=110801,#p2=110804
EDIT:
It's really weird. While testing I noticed that it's much faster now. The LINQ query now takes 3 seconds for around 20000 rows, which is quite ok.
What's even more confusing:
It's the same behaviour on our production server. An hour ago it was really slow, now it's fast again. As I was testing on the development server, I didn't change anything on the production server. The only thing I can think of being a problem is that both servers are virtualized and share the SAN with lots of other servers.
How can I find out if that's the problem?
Before blaming LINQ first find out where the actual delay is taking place.
Sending the query
Execution of the query
Receiving the results
Transforming the results into local types
Binding/showing the result in the UI
And any other events tied to this process
Then start blaming LINQ ;)
If I had to guess, I would say "parameter sniffing" is likely, i.e. it has built and cached a query plan based on one set of parameters, which is very suboptimal for your current parameter values. You can tackle this with OPTION (OPTIMIZE FOR UNKNOWN) in regular TSQL, but there is no LINQ-to-SQL / EF way of expolsing this.
My plan would be:
use profiling to prove that the time is being lost in the query (as opposed to materialization etc)
once confirmed, consider using direct TSQL methods to invoke
For example, with LINQ-to-SQL, ctx.ExecuteQuery<YourType>(tsql, arg0, ...) can be used to throw raw TSQL at the server (with parameters as {0} etc, like string.Format). Personally, I'd lean towards "dapper" instead - very similar usage, but a faster materializer (but it doesn't support EntityRef<> etc for lazy-loading values - which is usually a bad thing anyway as it leads to N+1).
i.e. (with dapper)
List<table> list = ctx.Query<table>(#"
select * from table
where test == #someString
and date >= #dateStartInt
and date <= #dateEndInt
OPTION (OPTIMIZE FOR UNKNOWN)",
new {someString, dateStartInt, dateEndInt}).ToList();
or (LINQ-to-SQL):
List<table> list = ctx.ExecuteQuery<table>(#"
select * from table
where test == {0}
and date >= {1}
and date <= {2}
OPTION (OPTIMIZE FOR UNKNOWN)",
someString, dateStartInt, dateEndInt).ToList();
I want to find the fastest way to get the min and max of a column in a table with a single Linq to SQL roundtrip. So I know this would work in two roundtrips:
int min = MyTable.Min(row => row.FavoriteNumber);
int max = MyTable.Max(row => row.FavoriteNumber);
I know I can use group but I don't have a group by clause, I want to aggregate over the whole table! And I can't use the .Min without grouping first. I did try this:
from row in MyTable
group row by true into r
select new {
min = r.Min(z => z.FavoriteNumber),
max = r.Max(z => z.FavoriteNumber)
}
But that crazy group clause seems silly, and the SQL it makes is more complex than it needs to be.
So, is there any way to just get the correct SQL out?
EDIT: These guys failed too: Linq to SQL: how to aggregate without a group by? ... lame oversight by LINQ designers if there's really no answer.
EDIT 2: I looked at my own solution (with the nonsensical constant group by clause) in the SQL Server Management Studio execution plan analysis, and it looks to me like it is identical to the plan generated by:
SELECT MIN(FavoriteNumber), MAX(FavoriteNumber)
FROM MyTable
so unless someone can come up with a simpler-or-equally-as-good answer, I think I have to mark it as answered-by-myself. Thoughts?
As stated in the question, this method seems to actually generate optimal SQL code, so while it looks a bit squirrely in LINQ, it should be optimal performance-wise.
from row in MyTable
group row by true into r
select new {
min = r.Min(z => z.FavoriteNumber),
max = r.Max(z => z.FavoriteNumber)
}
I could find only this one which produces somewhat clean sql still not really effective comparing to select min(val), max(val) from table:
var r =
(from min in items.OrderBy(i => i.Value)
from max in items.OrderByDescending(i => i.Value)
select new {min, max}).First();
the sql is
SELECT TOP (1)
[t0].[Value],
[t1].[Value] AS [Value2]
FROM
[TestTable] AS [t0],
[TestTable] AS [t1]
ORDER BY
[t0].[Value],
[t1].[Value] DESC
still there is another option to use single connection for both min and max queries (see Multiple Active Result Sets (MARS))
or stored procedure..
I'm not sure how to translate it into C# yet (I'm working on it)
This is the Haskell version
minAndMax :: Ord a => [a] -> (a,a)
minAndMax [x] = (x,x)
minAndMax (x:xs) = (min a x, max b x)
where (a,b) = minAndMax xs
The C# version should involve Aggregate some how (I think).
You could select the whole table, and do your min and max operations in memory:
var cache = // select *
var min = cache.Min(...);
var max = cache.Max(...);
Depending on how large your dataset is, this might be the way to go about not hitting your database more than once.
A LINQ to SQL query is a single expression. Thus, if you can't express your query in a single expression (or don't like it once you do) then you have to look at other options.
Stored procedures, since they can have statements, enable you to accomplish this in a single round-trip. You will either have two output parameters or select a result set with two rows. Either way, you will need custom code to read the stored procedure's result.
(I don't personally see the need to avoid two round-trips here. It seems like a premature optimization, especially since you will probably have to jump through hoops to get it working. Not to mention the time you will spend justifying this decision and explaining the solution to other developers.)
Put another way: you've already answered your own question. "I can't use the .Min without grouping first", followed by "that crazy group clause seems silly, and the SQL it makes is more complex than it needs to be", are clues that the simple and easily-understood two-round-trip solution is the best expression of your intent (unless you write custom SQL).