Include method in LINQ is used for Left Join? - c#

I'm using Entity Framework 6, DotConnect for Oracle and i have these 2 queries:
First one, using a simple join (LINQ and Output SQL):
LINQ:
var joinQuery = Db.Products
.Join(Db.Product_Categories.AsEnumerable(), p => p.ProductID,
pc => pc.CategoryID, (pc, p) => new { pc, p })
.ToList();
Output SQL:
SELECT * FROM Products
Second, using Include:
LINQ:
var includeQuery = Db.Products.Include("Product_Categories").ToList();
Output SQL:
SELECT * FROM Products
LEFT OUTER JOIN Product_Categories
ON Products.CategoryID = Product_Categories.CategoryID
I am in doubt if i can always use "Include" method for left joins. This method is not clear for my.

In the first example the join should not have .AsEnumerable() on the end of it. By doing that you are causing EF to go and get all the records from Product_Categories and then doing the join in memory which can be very inefficient as it doesn't use any kind of index.
The second option you have isn't pure LINQ. Include is an EF-specific extension method that is not available in other providers.
So if you want common LINQ you could use with other DB providers go with option 1. If you want simpler syntax and okay with being EF specific option 2 might be better.

Related

.netcore EF linq - this is a BUG? Very strange behavior

I have two table in sql. Document and User. Document have relation to User and I want to get users that I sent document recently.
I need to sort by the date document was sent and get unique (distinct) user with relation to this document
This is my linq queries
var recentClients = documentCaseRepository.Entities
.Where(docCase => docCase.AssignedByAgentId == WC.UserContext.UserId)
.OrderByDescending(userWithDate => userWithDate.LastUpdateDate)
.Take(1000) // I need this because if I comment this line then EF generate completely different sql query.
.Select(doc => new { doc.AssignedToClient.Id, doc.AssignedToClient.FirstName, doc.AssignedToClient.LastName })
.Distinct()
.Take(configuration.MaxRecentClientsResults)
.ToList();
and generated sql query is:
SELECT DISTINCT TOP(5) [t].*
FROM (
SELECT TOP(1000) [docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON ([docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id])
WHERE [docCase].[AssignedByAgentId] = 3
ORDER BY [docCase].[LastUpdateDate] DESC
)
AS [t]
Every thing is correct for now. But if I delete this line
.Take(1000) // I need this because...
EF generated completely different query such as:
SELECT DISTINCT TOP(5)
[docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON ([docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id])
WHERE [docCase].[AssignedByAgentId] = 3
My question is: why EF not generated orderby clause and subquery with distinct?
This is a BUG EF or I'm doing something wrong? And what I must do to generate in linq this sql query ()
SELECT DISTINCT TOP 5 [t].*
FROM ( SELECT [docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON [docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id]
WHERE [docCase].[AssignedByAgentId] = 1
ORDER BY [docCase].[LastUpdateDate] DESC
) AS [t]
OrderBy information not always retained across other operators such as Distinct. Entity Framework does not document (to my knowledge) how exactly OrderBy is propagated.
This kind of makes sense because some operators have undefined output order. The fact that ordering is retained in many situations is a convenience for the developer.
Move the OrderBy to the end of the query (or at least past the Distinct).
The reason for the difference in queries is that Distinct messes up result order. So when you first execute OrderBy and then Distinct, you can just es well not execute OrderBy, because this order is lost anyway. So EF can just optimize it away.
Calling Take in between causes the result set to be semantically different: You first order the items, take the first 1000 items of that order and then call Distinct on them.
What you can change in your query depends mainly on the result you want to achieve. Maybe you want to first make the result set distinct then order by date and finally take the amount of items. Other options are also thinkable based on your requirements.

Alternative methods/functions in LINQ while converting from SQL to Linq

Hello i am totally new to Linq. I need to convert the following query to Linq and having a pretty hard time. Almost 3 hrs spent still unable to figure out. Most of the functions/methods in Sql like Distinct, Not in etc are missing in Linq. Even if they are available i am unable to figure out how to use them. Are there any alternative Methods/Functions in Linq with different names that i should be using or they don't even exist in Linq and i need to use a different approach ? I would be really helpful if someone could help me in converting the following query to Linq.
SQL Query
Select count(distinct(UserID)) from dbo.DeansStudents
inner join dbo.UserBarSession on DeansStudents.UserBarSessionID = UserBarSession.ID
inner join dbo.Users on users.ID = UserBarSession.UserID
where UserBarSessionID not in (
Select b.ID from dbo.DeansStudents,dbo.Users
left join dbo.Answers on answers.Student=users.ID
left join dbo.UserBarSession b on Answers.Student = b.UserID
where AnswerDate between b.StartDate and b.EndDate
and AnswerDate between 7/10/2011 and 3/12/2018
and UserBarSessionID = b.ID and DeanID= 12296 group by Answers.Student,users.FirstName,users.LastName,b.ID) and DeanID =12296
Query converted so far to LINQ
From my past couple of days into LINQ i managed to converted the first part of the Sql Query to LINQ. I am unable to continue with the second part. From "Select b.id........ "
var query = from deansStudent in dbo.DeansStudents
join userBarSession in dbo.UserBarSession
on deansStudent.UserBarSessionId equals userBarSession.Id
join users in dbo.Users
on userBarSession.UserId equals users.Id
//Need continuation from here
Most of the functions/methods in Sql like Distinct, Not in etc are missing in Linq
Distinct() is part of Linq, and NOT IN can be accomplished with the Except() Linq extension method.
With this answer, you should be able to finish your query.

A simple join consumes too much memory - LINQ

I have this join :
var andlist = (from cust in custFinal
join serv in db.Service on cust.ID equals serv.CustID
select new JoinObj
{
Name = cust.name,
ServiceID = serv.ID,
});
custFinal is a list of Customers that contains only one object. db.Service is a DbSet and there are only four rows whose custID equals customer object's ID in Service table. When I use ToList() or Count(), used memory quickly exceeds 1GB and I'm getting outOfMemory exception. Can you tell me what is wrong with this code? Thanks in advance.
The reason is you don't really perform join on server. custFinal as you said is just in-memory list, not a database table or query. So it is IEnumerable, not IQueryable. When you perform a join - it calls IEnumerable.Join, not IQueryable.Join method. The latter would build a query but the former will just pull all arguments into memory and perform join in-memory. So in result - whole Service table in pulled into memory and joined there (easy to check if you log EF context queries - you will see that it just performs select all from Service query).
If you change the order of arguments in a join so that IQueryable.Join would be executed - that won't help either, because you cannot join database table with in-memory list with Entity Framework anyway. So you have to find another way, for example:
var ids = custFinal.Select(c => c.ID).ToArray();
var matchingServices = db.Service.Where(serv => ids.Contains(serv.CustID)).Select(c => new {c.ServiceID, c.CustID}).ToArray();
// now filter `custFinal` based on `matchingServices`, in memory.
That will perform CustID IN (...) query instead of a join. If you insist on having a join - you will have to do that with raw sql, without entity framework (you will also need to create custom table type in sql server, if you use SQL server).

Entity Framework 6 - MySQL Query Generates Unnecessary SQL

This is my first time using EF 6 as well as MySQL. I came across an annoyance while updating my LINQ statement from explicitly using joins to using navigation properties to fetch related data.
Here is the statement I'm executing to get a user and all the user's locations.
AspNetUsers.Include("UserLocations")
.Select(u => new {
FullName = u.FullName,
Locations = u.UserLocations.Select(l => l.Title)
})
This statement, using LinqPad4, generates the following SQL:
Why does it join using a select statement instead of doing a join on the table itself, and why does it add all the location columns to the join when the only column needed is the Title?
Wouldn't the following SQL query be better:
SELECT
u.FullName,
l.Title
FROM AspNetUsers u
JOIN UserLocations ul ON u.Id = ul.UserId
JOIN Locations l ON ul.LocationId = l.LocationId;
This is my first time using EF, I have read that in the past the SQL generated has not been so great. I was wandering if this is just one of those cases or if there is something I could do to minimize the SQL generated.
Thank you in advance!

LINQ Logical join VS inner join

I want to know which one is better for performance:
//Logical
var query = from i in db.Item
from c in db.Category
where i.FK_IdCategory == c.IdCategory
Select new{i.name, c.name};
or
//Join
var query2 = from i in db.Item
join c in db.Category
on c.ID equals i.FK_IdCategory
Select new{i.name, c.name};
Performance of the two queries really depends on which LINQ provider and which RDBMS you're using. Assuming SQL Server, the first would generate the following query:
select i.name, c.name
from Item i, Category c
where i.FK_idCategory = c.IdCategory
Whereas the second would generate:
select i.name, c.name
from Item i
inner join Category c
on i.FK_idCategory = c.IdCategory
Which operate exactly the same in SQL Server as is explained in: Explicit vs implicit SQL joins
This depends on the ORM you're using and how intelligent it is at optimizing your queries for your backend.
Entity Framework can generate some pretty awful SQL if you don't do your linq perfectly, so I'd assume query2 is better.
The only way for you to know for sure would be to inspect the SQL being generated by the two queries.
Eyeballing it, it looks like query1 would result in both tables being pulled in their entirety and then being filtered against each other in your application, while query2 will for sure generate an INNER JOIN in the query, which will let SQL Server do what it does best - set logic.
Is that FK_IdCategory field a member of an actual foreign key index on that table? If not, make it so (and include the name column as an included column in the index) and your query will be very highly performant.
With linq2Sql or EntityFramework, you would probably do something like this:
var query = from i in db.Item
select new {i.name, i.Category.Name}
This will generate a proper SQL inner join.
I do assume that there is a foreign key relation between Item and Category defined.

Categories

Resources