Understanding how LINQ compiles to ADO.NET SQL statements - c#

I have what seems like a pretty straightforward LINQ statement, but the SQL it's generating doesn't make sense to me and it's taking much longer to execute than I'm expecting. I'm just trying to understand what LINQ is doing so I can figure out why it's running so slowly. A comparable SQL statement takes less than a second but the LINQ is taking around 20 seconds.
Here's the code:
// This line takes 20 seconds to return.
var alertEvents = GetFilteredAlertEvents(alert.AlertEvents, db).ToList<AlertEvent>();
private static IEnumerable<AlertEvent> GetFilteredAlertEvents(ICollection<AlertEvent> alertEvents, SlalomEREntities db)
{
Guid marketAlertReceiverGroupId = new Guid(ConfigurationManager.AppSettings["MarketAlertReceiverGroupId"]);
var subQuery = from ae in alertEvents
join tr in db.TargetResources on ae.ResourceId equals tr.ResourceID
join atr in db.AlertTargetResources on tr.ResourceID equals atr.TargetResourceID
where atr.AlertTargetID == marketAlertReceiverGroupId
select ae.AlertEventId;
return from alertEvent in alertEvents
where !subQuery.Contains(alertEvent.AlertEventId)
select alertEvent;
}
In SSMS, the sub-select returns 3126 rows without the where, and only 127 rows with it. The primary select in SSMS returns 365 rows without the sub-select. The full select with sub-select returns 238 rows. Not a lot of data.
Using Visual Studio's Diagnostic Tools, I'm seeing 14 SQL statements generated from the LINQ. Every one of them is a simple SQL select and I don't see any joins nor do I see the where comparison. Here's a sample SQL statement that I'm seeing in the Diagnostic Tools Events window:
SELECT
[Extent1].[AlertTargetID] AS [AlertTargetID],
[Extent1].[TargetResourceID] AS [TargetResourceID],
[Extent1].[CreatedAt] AS [CreatedAt],
[Extent1].[CreatedBy] AS [CreatedBy]
FROM [dbo].[AlertTargetResource] AS [Extent1]
There are 13 more similar SQL statements.
Here's the SQL I'm trying to replicate.
select *
from AlertEvent ae1
where ae1.AlertEventId not in
(select ae.AlertEventId
from AlertEvent ae
join TargetResource tr on ae.ResourceId = tr.ResourceID
join AlertTargetResource atr on atr.TargetResourceID = tr.ResourceID
where atr.AlertTargetID = '89bd4ea5-5d56-4b8a-81ba-5a9e5991ba64')
Here are my questions:
Why is LINQ genrerating 14 simple selects rather than a single select with joins and the where?
How can I speed up what I thought was a simple bit of code?

In the first part of your method, subQuery is a query that has not yet been run against the database. In the second part (around your return statement), you are invoking that query a number of times.
It's not always obvious how Entity Framework will handle a case like this, but here it appears to invoke the query for each item in alertEvents.
What you really want is the list of IDs returned by the query, and then use that list for the second part of the method. To convert a query to the data returned by that query, you can use ToList(). This extension method will execute the query and return the data results.
In the code below, subQuery is now a collection of IDs.
private static IEnumerable<AlertEvent> GetFilteredAlertEvents(ICollection<AlertEvent> alertEvents, SlalomEREntities db)
{
Guid marketAlertReceiverGroupId = new Guid(ConfigurationManager.AppSettings["MarketAlertReceiverGroupId"]);
var subQuery = (from ae in alertEvents
join tr in db.TargetResources on ae.ResourceId equals tr.ResourceID
join atr in db.AlertTargetResources on tr.ResourceID equals atr.TargetResourceID
where atr.AlertTargetID == marketAlertReceiverGroupId
select ae.AlertEventId).ToList();
return from alertEvent in alertEvents
where !subQuery.Contains(alertEvent.AlertEventId)
select alertEvent;
}

Related

.netcore EF linq - this is a BUG? Very strange behavior

I have two table in sql. Document and User. Document have relation to User and I want to get users that I sent document recently.
I need to sort by the date document was sent and get unique (distinct) user with relation to this document
This is my linq queries
var recentClients = documentCaseRepository.Entities
.Where(docCase => docCase.AssignedByAgentId == WC.UserContext.UserId)
.OrderByDescending(userWithDate => userWithDate.LastUpdateDate)
.Take(1000) // I need this because if I comment this line then EF generate completely different sql query.
.Select(doc => new { doc.AssignedToClient.Id, doc.AssignedToClient.FirstName, doc.AssignedToClient.LastName })
.Distinct()
.Take(configuration.MaxRecentClientsResults)
.ToList();
and generated sql query is:
SELECT DISTINCT TOP(5) [t].*
FROM (
SELECT TOP(1000) [docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON ([docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id])
WHERE [docCase].[AssignedByAgentId] = 3
ORDER BY [docCase].[LastUpdateDate] DESC
)
AS [t]
Every thing is correct for now. But if I delete this line
.Take(1000) // I need this because...
EF generated completely different query such as:
SELECT DISTINCT TOP(5)
[docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON ([docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id])
WHERE [docCase].[AssignedByAgentId] = 3
My question is: why EF not generated orderby clause and subquery with distinct?
This is a BUG EF or I'm doing something wrong? And what I must do to generate in linq this sql query ()
SELECT DISTINCT TOP 5 [t].*
FROM ( SELECT [docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON [docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id]
WHERE [docCase].[AssignedByAgentId] = 1
ORDER BY [docCase].[LastUpdateDate] DESC
) AS [t]
OrderBy information not always retained across other operators such as Distinct. Entity Framework does not document (to my knowledge) how exactly OrderBy is propagated.
This kind of makes sense because some operators have undefined output order. The fact that ordering is retained in many situations is a convenience for the developer.
Move the OrderBy to the end of the query (or at least past the Distinct).
The reason for the difference in queries is that Distinct messes up result order. So when you first execute OrderBy and then Distinct, you can just es well not execute OrderBy, because this order is lost anyway. So EF can just optimize it away.
Calling Take in between causes the result set to be semantically different: You first order the items, take the first 1000 items of that order and then call Distinct on them.
What you can change in your query depends mainly on the result you want to achieve. Maybe you want to first make the result set distinct then order by date and finally take the amount of items. Other options are also thinkable based on your requirements.

Alternative methods/functions in LINQ while converting from SQL to Linq

Hello i am totally new to Linq. I need to convert the following query to Linq and having a pretty hard time. Almost 3 hrs spent still unable to figure out. Most of the functions/methods in Sql like Distinct, Not in etc are missing in Linq. Even if they are available i am unable to figure out how to use them. Are there any alternative Methods/Functions in Linq with different names that i should be using or they don't even exist in Linq and i need to use a different approach ? I would be really helpful if someone could help me in converting the following query to Linq.
SQL Query
Select count(distinct(UserID)) from dbo.DeansStudents
inner join dbo.UserBarSession on DeansStudents.UserBarSessionID = UserBarSession.ID
inner join dbo.Users on users.ID = UserBarSession.UserID
where UserBarSessionID not in (
Select b.ID from dbo.DeansStudents,dbo.Users
left join dbo.Answers on answers.Student=users.ID
left join dbo.UserBarSession b on Answers.Student = b.UserID
where AnswerDate between b.StartDate and b.EndDate
and AnswerDate between 7/10/2011 and 3/12/2018
and UserBarSessionID = b.ID and DeanID= 12296 group by Answers.Student,users.FirstName,users.LastName,b.ID) and DeanID =12296
Query converted so far to LINQ
From my past couple of days into LINQ i managed to converted the first part of the Sql Query to LINQ. I am unable to continue with the second part. From "Select b.id........ "
var query = from deansStudent in dbo.DeansStudents
join userBarSession in dbo.UserBarSession
on deansStudent.UserBarSessionId equals userBarSession.Id
join users in dbo.Users
on userBarSession.UserId equals users.Id
//Need continuation from here
Most of the functions/methods in Sql like Distinct, Not in etc are missing in Linq
Distinct() is part of Linq, and NOT IN can be accomplished with the Except() Linq extension method.
With this answer, you should be able to finish your query.

OleDB JOIN Syntax Not Correct

I recently asked a question on StackOverflow (MySQL Returns All Rows When field = 0) regarding a query statement not working in MySQL. I now have a very similar problem, this time using OleDB where I am trying to use a join to include fields that have 0 as an entry, but not select every field in the table as a result.
The new look MySQL query posted in the above question as the accepted answer works without a hitch. However the OleDB counterpart I have written to do almost the same does not. It's a bit messy as the tables are not named very well (I didn't create this database) and I'm getting a simple syntax error;
myQuery.CommandText = "SELECT s.scm_num, s.scm_name, c.cr3_text, q.qsp_value, s.scm_bprefix, s.scm_nxbnum FROM qspreset q INNER JOIN sdccomp s LEFT OUTER JOIN cntref3 c ON s.scm_cntref3=c.cr3_id AND q.qsp_relat=s.scm_invtype AND q.qsp_what=13";
I'm querying another table here as well as the two involved in the LEFT OUTER JOIN and I believe that is where I am making the mistake.
Join conditions need to be with the join
myQuery.CommandText =
"SELECT s.scm_num, s.scm_name, c.cr3_text, q.qsp_value, s.scm_bprefix, s.scm_nxbnum
FROM qspreset q
INNER JOIN sdccomp s
on q.qsp_relat = s.scm_invtype AND q.qsp_what = 13
LEFT OUTER JOIN cntref3 c
ON s.scm_cntref3 = c.cr3_id";
q.qsp_what = 13 can be moved to a where
I happen to like this style
In the case of MSSQL T-SQL and some queries with a lot of joins I have gotten more efficient query plan by moving a where condition up into a join. The filter happened early rather that last.
If you don't believe you can put a hard value in a join see SQLfiddle

IQueryable Count() method issue with Inner Join created by .Include() related entities

IQueryable<EntityOne> query = entities.EntityOne
.Include(t => t.EntityRelated1)
.Include(t => t.EntityRelated2)
.AsQueryable();
The query generated in "query" variable :
SELECT
[Extent1].[Id] AS [IdEntityOne],
...
[Extent2].[Id] AS [IdEntityRelatedOne],
...
[Extent3].[Id] AS [IdEntityRelatedTwo],
...
FROM [dbo].[EntityOne] AS [Extent1]
INNER JOIN [dbo].[EntityRelatedOne] AS [Extent2]
ON [Extent1].[IdRelatedOne] = [Extent2].[Id]
INNER JOIN [dbo].[EntityRelatedTwo] AS [Extent3]
ON [Extent1].[IdRelatedTwo] = [Extent3].[Id]
After that, on C# code those are the result of counting:
var wrongCount = query.Count(); // Returns 295
var correctCount = query.AsEnumerable().Count(); // Returns 10
The 295 count is the full EntityOne set numbers of registers. (wrong)
The 10 Count is the desired count after Inner Join.
It sounds like the IQueryable.Count() is counting before executing the InnerJoin on database. I don't want to generate the IEnumerable since I hope the count to be executed on Sql Server together with the inner join.
UPDATE 1
Trying to manually execute the inner join:
IQueryable<EntityOne> query2 = entities.EntityOne.Join(entities.EntityTwo,
eone=> eone.IdRelatedOne, en => en.Id,
(x, y) => x);
The SQL code generated in "query2" is :
SELECT
[Extent1].[Id] AS [Id],
...
FROM [dbo].[EntityOne] AS [Extent1]
As you can see, the related entity is not included on Inner Join forced by linq Join statement.
Update 2
I dont know if it matters but, the IdEntityRelated1 on EntityOne is a required property, plus its not a foreign key on database, just a Integer field that stores the Id of the related entity. (Im working with POCO classes with Database First)
I have another working sources where fields but they're nullable integers instead of required. Maybe should I not try to do an Include to force Inner Join between required relationships?
You have a required association, but the expected objects are not present in the database.
But let's first see what EF does.
In the first count...
var wrongCount = query.Count();
...the Includes are ignored. There's no reason to execute them because EF has been told that the referred objects (EntityRelated1 and EntityRelated2 are mandatory, so inner joins are expected to find the related records. If they do, EF figures it may as well just count entities.EntityOne and skip the rest. The Includes are only going to make the query more expensive and they don't affect the result.
You can check that by monitoring the SQL that's executed for the count. That's not the SQL generated when you look at query only! It's probably something that simply boils down to
SELECT COUNT(*) FROM [dbo].[EntityOne]
So the first count returns a correct count of all EntityOne records in the database.
For the second count you force execution of the entire query that's stored in the query variable, the SQL statement that you show. Then you count its results in memory – and it returns 10. This means that the query with the inner joins does actually return 10 records. That, in turn, can only mean one thing: there are 285 EntityOne.IdRelatedOne values that don't point to an existing EntityRelatedOne record. But you mapped the association as required, so EF generates an inner join. An outer join would also return 295.
Include is not a LINQ method proper, is an EntityFramework extension designed to do eager loading and not much else. Includes are lost if the query shape changes:
When you call the Include method, the query path is only valid on the returned instance of the IQueryable of T. Other instances of IQueryable of T and the context itself are not affected.
Specifically this means that, for instance aggregates on top of the Included IQueryable<T> are going to loose the Include (which is exactly what you see).
See Tip 22 - How to make Include really Include, IQueryable.Include() gets ignored, Include in following query does not include really and many more.

Why does the Entity Framework generate nested SQL queries?

Why does the Entity Framework generate nested SQL queries?
I have this code
var db = new Context();
var result = db.Network.Where(x => x.ServerID == serverId)
.OrderBy(x=> x.StartTime)
.Take(limit);
Which generates this! (Note the double select statement)
SELECT
`Project1`.`Id`,
`Project1`.`ServerID`,
`Project1`.`EventId`,
`Project1`.`StartTime`
FROM (SELECT
`Extent1`.`Id`,
`Extent1`.`ServerID`,
`Extent1`.`EventId`,
`Extent1`.`StartTime`
FROM `Networkes` AS `Extent1`
WHERE `Extent1`.`ServerID` = #p__linq__0) AS `Project1`
ORDER BY
`Project1`.`StartTime` DESC LIMIT 5
What should I change so that it results in one select statement? I'm using MySQL and Entity Framework with Code First.
Update
I have the same result regardless of the type of the parameter passed to the OrderBy() method.
Update 2: Timed
Total Time (hh:mm:ss.ms) 05:34:13.000
Average Time (hh:mm:ss.ms) 25:42.000
Max Time (hh:mm:ss.ms) 51:54.000
Count 13
First Seen Nov 6, 12 19:48:19
Last Seen Nov 6, 12 20:40:22
Raw query:
SELECT `Project?`.`Id`, `Project?`.`ServerID`, `Project?`.`EventId`, `Project?`.`StartTime` FROM (SELECT `Extent?`.`Id`, `Extent?`.`ServerID`, `Extent?`.`EventId`, `Extent?`.`StartTime`, FROM `Network` AS `Extent?` WHERE `Extent?`.`ServerID` = ?) AS `Project?` ORDER BY `Project?`.`Starttime` DESC LIMIT ?
I used a program to take snapshots from the current process in MySQL.
Other queries were executed at the same time, but when I change it to just one SELECT statement, it NEVER goes over one second. Maybe I have something else that's going on; I'm asking 'cause I'm not so into DBs...
Update 3: The explain statement
The Entity Framework generated
'1', 'PRIMARY', '<derived2>', 'ALL', NULL, NULL, NULL, NULL, '46', 'Using filesort'
'2', 'DERIVED', 'Extent?', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', '', '45', 'Using where'
One liner
'1', 'SIMPLE', 'network', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', 'const', '45', 'Using where; Using filesort'
This is from my QA environment, so the timing I pasted above is not related to the rowcount explain statements. I think that there are about 500,000 records that match one server ID.
Solution
I switched from MySQL to SQL Server. I don't want to end up completely rewriting the application layer.
It's the easiest way to build the query logically from the expression tree. Usually the performance will not be an issue. If you are having performance issues you can try something like this to get the entities back:
var results = db.ExecuteStoreQuery<Network>(
"SELECT Id, ServerID, EventId, StartTime FROM Network WHERE ServerID = #ID",
serverId);
results = results.OrderBy(x=> x.StartTime).Take(limit);
My initial impression was that doing it this way would actually be more efficient, although in testing against a MSSQL server, I got <1 second responses regardless.
With a single select statement, it sorts all the records (Order By), and then filters them to the set you want to see (Where), and then takes the top 5 (Limit 5 or, for me, Top 5). On a large table, the sort takes a significant portion of the time. With a nested statement, it first filters the records down to a subset, and only then does the expensive sort operation on it.
Edit: I did test this, but I realized I had an error in my test which invalidated it. Test results removed.
Why does Entity Framework produce a nested query? The simple answer is because Entity Framework breaks your query expression down into an expression tree and then uses that expression tree to build your query. A tree naturally generates nested query expressions (i.e. a child node generates a query and a parent node generates a query on that query).
Why doesn't Entity Framework simplify the query down and write it as you would? The simple answer is because there is a limited amount of work that can go into the query generation engine, and while it's better now than it was in earlier versions it's not perfect and probably never will be.
All that said there should be no significant speed difference between the query you would write by hand and the query EF generated in this case. The database is clever enough to generate an execution plan that applies the WHERE clause first in either case.
If you want to get the EF to generate the query without the subselect, use a constant within the query, not a variable.
I have previously created my own .Where and all other LINQ methods that first traverse the expression tree and convert all variables, method calls etc. into Expression.Constant. It was done just because of this issue in Entity Framework...
I just stumbled upon this post because I suffer from the same problem. I already spend days tracking this down and it it is just a poor query generation in mysql.
I already filed a bug at mysql.com http://bugs.mysql.com/bug.php?id=75272
To summarize the problem:
This simple query
context.products
.Include(x => x.category)
.Take(10)
.ToList();
gets translated into
SELECT
`Limit1`.`C1`,
`Limit1`.`id`,
`Limit1`.`name`,
`Limit1`.`category_id`,
`Limit1`.`id1`,
`Limit1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id` LIMIT 10) AS `Limit1`
and performs pretty well. Anyway, the outer query is pretty much useless. Now If I add an OrderBy
context.products
.Include(x => x.category)
.OrderBy(x => x.id)
.Take(10)
.ToList();
the query changes to
SELECT
`Project1`.`C1`,
`Project1`.`id`,
`Project1`.`name`,
`Project1`.`category_id`,
`Project1`.`id1`,
`Project1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1`
ORDER BY
`Project1`.`id` ASC LIMIT 10
Which is bad because the order by is in the outer query. Theat means MySQL has to pull every record in order to perform an orderby which results in using filesort
I verified that SQL Server (Comapact at least) does not generate nested queries for the same code
SELECT TOP (10)
[Extent1].[id] AS [id],
[Extent1].[name] AS [name],
[Extent1].[category_id] AS [category_id],
[Extent2].[id] AS [id1],
[Extent2].[name] AS [name1],
FROM [products] AS [Extent1]
LEFT OUTER JOIN [categories] AS [Extent2] ON [Extent1].[category_id] = [Extent2].[id]
ORDER BY [Extent1].[id] ASC
Actually the queries generated by Entity Framework are few ugly, less than LINQ 2 SQL but still ugly.
However, very probably you database engine will make the desired execution plan, and the query will run smoothly.

Categories

Resources