Hello i am totally new to Linq. I need to convert the following query to Linq and having a pretty hard time. Almost 3 hrs spent still unable to figure out. Most of the functions/methods in Sql like Distinct, Not in etc are missing in Linq. Even if they are available i am unable to figure out how to use them. Are there any alternative Methods/Functions in Linq with different names that i should be using or they don't even exist in Linq and i need to use a different approach ? I would be really helpful if someone could help me in converting the following query to Linq.
SQL Query
Select count(distinct(UserID)) from dbo.DeansStudents
inner join dbo.UserBarSession on DeansStudents.UserBarSessionID = UserBarSession.ID
inner join dbo.Users on users.ID = UserBarSession.UserID
where UserBarSessionID not in (
Select b.ID from dbo.DeansStudents,dbo.Users
left join dbo.Answers on answers.Student=users.ID
left join dbo.UserBarSession b on Answers.Student = b.UserID
where AnswerDate between b.StartDate and b.EndDate
and AnswerDate between 7/10/2011 and 3/12/2018
and UserBarSessionID = b.ID and DeanID= 12296 group by Answers.Student,users.FirstName,users.LastName,b.ID) and DeanID =12296
Query converted so far to LINQ
From my past couple of days into LINQ i managed to converted the first part of the Sql Query to LINQ. I am unable to continue with the second part. From "Select b.id........ "
var query = from deansStudent in dbo.DeansStudents
join userBarSession in dbo.UserBarSession
on deansStudent.UserBarSessionId equals userBarSession.Id
join users in dbo.Users
on userBarSession.UserId equals users.Id
//Need continuation from here
Most of the functions/methods in Sql like Distinct, Not in etc are missing in Linq
Distinct() is part of Linq, and NOT IN can be accomplished with the Except() Linq extension method.
With this answer, you should be able to finish your query.
Ok so I have been trying to figure out how to embed multiple select's
in a LINQ query and have been failing being that I don't use LINQ all that often.
I have come up with a fairly straight forward simple SQL statement I am trying to convert and I would appreciate it if anyone can help convert this to LINQ and explain the logic there.
SELECT * FROM TABLE_A
WHERE TABLE_B_REF IN
(SELECT TABLE_B_REF FROM TABLE_B
WHERE TABLE_B.TABLE_C_REF IN
(SELECT TABLE_C_REF FROM TABLE_C WHERE C_NUMBER = '10001'))
I have googled around but what I am getting is too complex to follow for multiple embedded
queries.
A SQL join seems much simpler here. Assuming you've set up navigation properties correctly, that would look a bit like this:
from a in db.TableA
where a.TableB.TableC.C_Number == "10001"
select a;
Or in fluent syntax:
db.TableA.Where(a => a.TableB.TableC.C_Number == "10001")
The corresponding SQL would be something like this:
SELECT a.*
FROM TABLE_A a
JOIN TABLE_B b ON a.TABLE_B_REF = b.TABLE_B_REF
JOIN TABLE_C c ON b.TABLE_C_REF = c.TABLE_C_REF
WHERE c.C_NUMBER = '10001'
I want to know which one is better for performance:
//Logical
var query = from i in db.Item
from c in db.Category
where i.FK_IdCategory == c.IdCategory
Select new{i.name, c.name};
or
//Join
var query2 = from i in db.Item
join c in db.Category
on c.ID equals i.FK_IdCategory
Select new{i.name, c.name};
Performance of the two queries really depends on which LINQ provider and which RDBMS you're using. Assuming SQL Server, the first would generate the following query:
select i.name, c.name
from Item i, Category c
where i.FK_idCategory = c.IdCategory
Whereas the second would generate:
select i.name, c.name
from Item i
inner join Category c
on i.FK_idCategory = c.IdCategory
Which operate exactly the same in SQL Server as is explained in: Explicit vs implicit SQL joins
This depends on the ORM you're using and how intelligent it is at optimizing your queries for your backend.
Entity Framework can generate some pretty awful SQL if you don't do your linq perfectly, so I'd assume query2 is better.
The only way for you to know for sure would be to inspect the SQL being generated by the two queries.
Eyeballing it, it looks like query1 would result in both tables being pulled in their entirety and then being filtered against each other in your application, while query2 will for sure generate an INNER JOIN in the query, which will let SQL Server do what it does best - set logic.
Is that FK_IdCategory field a member of an actual foreign key index on that table? If not, make it so (and include the name column as an included column in the index) and your query will be very highly performant.
With linq2Sql or EntityFramework, you would probably do something like this:
var query = from i in db.Item
select new {i.name, i.Category.Name}
This will generate a proper SQL inner join.
I do assume that there is a foreign key relation between Item and Category defined.
I've seen this issue come up again and again and again. As I read through the answers, the common-ish answer is summarized well by this statement:
any manual join or projection will change the shape of the query and Include will not be used
-Ladislav Mrnka
Ok, so I decided to make a "hello world" sample EF code-first project with DbContext that would test that statement. I created the following query:
var result =
from c in context.Customers.Include(i => i.Addresses)
from a in c.Accounts
where a.ID > 4
select c;
The Include() statement should work, because I'm clearly meeting the requirements: (1) I am not modifying the projection by using an anonymous type and (2) I am not manually handling the joins.
Nevertheless, it doesn't work. The SQL query generated by this query is this:
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name]
FROM [dbo].[Customers] AS [Extent1]
INNER JOIN [dbo].[Accounts] AS [Extent2] ON [Extent1].[ID] = [Extent2].[Customer_ID]
WHERE [Extent2].[ID] > 4
If I remove the join and filtering on the Accounts, then the include statement is correctly generated. Why is this happening?
I'm also bothered that EF official documentation doesn't seem to explain the rules of when Include() is or is not honored. Did I simply overlook something?
In a way you are "manually handling a join" by specifying the second from statement with c.Accounts.
Try the below query, whose context is solely based on a Customer entity:
from c in context.Customers.Include( i => i.Addresses )
where c.Accounts.Any( a => a.ID > 4 )
select c
Why does the Entity Framework generate nested SQL queries?
I have this code
var db = new Context();
var result = db.Network.Where(x => x.ServerID == serverId)
.OrderBy(x=> x.StartTime)
.Take(limit);
Which generates this! (Note the double select statement)
SELECT
`Project1`.`Id`,
`Project1`.`ServerID`,
`Project1`.`EventId`,
`Project1`.`StartTime`
FROM (SELECT
`Extent1`.`Id`,
`Extent1`.`ServerID`,
`Extent1`.`EventId`,
`Extent1`.`StartTime`
FROM `Networkes` AS `Extent1`
WHERE `Extent1`.`ServerID` = #p__linq__0) AS `Project1`
ORDER BY
`Project1`.`StartTime` DESC LIMIT 5
What should I change so that it results in one select statement? I'm using MySQL and Entity Framework with Code First.
Update
I have the same result regardless of the type of the parameter passed to the OrderBy() method.
Update 2: Timed
Total Time (hh:mm:ss.ms) 05:34:13.000
Average Time (hh:mm:ss.ms) 25:42.000
Max Time (hh:mm:ss.ms) 51:54.000
Count 13
First Seen Nov 6, 12 19:48:19
Last Seen Nov 6, 12 20:40:22
Raw query:
SELECT `Project?`.`Id`, `Project?`.`ServerID`, `Project?`.`EventId`, `Project?`.`StartTime` FROM (SELECT `Extent?`.`Id`, `Extent?`.`ServerID`, `Extent?`.`EventId`, `Extent?`.`StartTime`, FROM `Network` AS `Extent?` WHERE `Extent?`.`ServerID` = ?) AS `Project?` ORDER BY `Project?`.`Starttime` DESC LIMIT ?
I used a program to take snapshots from the current process in MySQL.
Other queries were executed at the same time, but when I change it to just one SELECT statement, it NEVER goes over one second. Maybe I have something else that's going on; I'm asking 'cause I'm not so into DBs...
Update 3: The explain statement
The Entity Framework generated
'1', 'PRIMARY', '<derived2>', 'ALL', NULL, NULL, NULL, NULL, '46', 'Using filesort'
'2', 'DERIVED', 'Extent?', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', '', '45', 'Using where'
One liner
'1', 'SIMPLE', 'network', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', 'const', '45', 'Using where; Using filesort'
This is from my QA environment, so the timing I pasted above is not related to the rowcount explain statements. I think that there are about 500,000 records that match one server ID.
Solution
I switched from MySQL to SQL Server. I don't want to end up completely rewriting the application layer.
It's the easiest way to build the query logically from the expression tree. Usually the performance will not be an issue. If you are having performance issues you can try something like this to get the entities back:
var results = db.ExecuteStoreQuery<Network>(
"SELECT Id, ServerID, EventId, StartTime FROM Network WHERE ServerID = #ID",
serverId);
results = results.OrderBy(x=> x.StartTime).Take(limit);
My initial impression was that doing it this way would actually be more efficient, although in testing against a MSSQL server, I got <1 second responses regardless.
With a single select statement, it sorts all the records (Order By), and then filters them to the set you want to see (Where), and then takes the top 5 (Limit 5 or, for me, Top 5). On a large table, the sort takes a significant portion of the time. With a nested statement, it first filters the records down to a subset, and only then does the expensive sort operation on it.
Edit: I did test this, but I realized I had an error in my test which invalidated it. Test results removed.
Why does Entity Framework produce a nested query? The simple answer is because Entity Framework breaks your query expression down into an expression tree and then uses that expression tree to build your query. A tree naturally generates nested query expressions (i.e. a child node generates a query and a parent node generates a query on that query).
Why doesn't Entity Framework simplify the query down and write it as you would? The simple answer is because there is a limited amount of work that can go into the query generation engine, and while it's better now than it was in earlier versions it's not perfect and probably never will be.
All that said there should be no significant speed difference between the query you would write by hand and the query EF generated in this case. The database is clever enough to generate an execution plan that applies the WHERE clause first in either case.
If you want to get the EF to generate the query without the subselect, use a constant within the query, not a variable.
I have previously created my own .Where and all other LINQ methods that first traverse the expression tree and convert all variables, method calls etc. into Expression.Constant. It was done just because of this issue in Entity Framework...
I just stumbled upon this post because I suffer from the same problem. I already spend days tracking this down and it it is just a poor query generation in mysql.
I already filed a bug at mysql.com http://bugs.mysql.com/bug.php?id=75272
To summarize the problem:
This simple query
context.products
.Include(x => x.category)
.Take(10)
.ToList();
gets translated into
SELECT
`Limit1`.`C1`,
`Limit1`.`id`,
`Limit1`.`name`,
`Limit1`.`category_id`,
`Limit1`.`id1`,
`Limit1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id` LIMIT 10) AS `Limit1`
and performs pretty well. Anyway, the outer query is pretty much useless. Now If I add an OrderBy
context.products
.Include(x => x.category)
.OrderBy(x => x.id)
.Take(10)
.ToList();
the query changes to
SELECT
`Project1`.`C1`,
`Project1`.`id`,
`Project1`.`name`,
`Project1`.`category_id`,
`Project1`.`id1`,
`Project1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1`
ORDER BY
`Project1`.`id` ASC LIMIT 10
Which is bad because the order by is in the outer query. Theat means MySQL has to pull every record in order to perform an orderby which results in using filesort
I verified that SQL Server (Comapact at least) does not generate nested queries for the same code
SELECT TOP (10)
[Extent1].[id] AS [id],
[Extent1].[name] AS [name],
[Extent1].[category_id] AS [category_id],
[Extent2].[id] AS [id1],
[Extent2].[name] AS [name1],
FROM [products] AS [Extent1]
LEFT OUTER JOIN [categories] AS [Extent2] ON [Extent1].[category_id] = [Extent2].[id]
ORDER BY [Extent1].[id] ASC
Actually the queries generated by Entity Framework are few ugly, less than LINQ 2 SQL but still ugly.
However, very probably you database engine will make the desired execution plan, and the query will run smoothly.