Hello I have the a linq query that I have created for a left outer join. I am wondering why linq creates the Sql it does, and how to make it a better query.
here's the c# query:
var query=
(
from subject in subjects
join statement in statements.DefaultIfEmpty() on subject.Id equals statement.SubjectId
select subject
);
query.Take(100).Dump();
and the Sql that it sends:
SELECT TOP (100)
--some fields here
FROM [Subject] AS [t0]
INNER JOIN ((
SELECT NULL AS [EMPTY]
) AS [t1]
LEFT OUTER JOIN [SubjectStatement] AS [t2] ON 1=1 ) ON [t0].[id] = [t2].[SubjectId]
What I would like to see sent is
SELECT TOP(100)
--some fields here
FROM Subject
LEFT OUTER JOIN SubjectStatemnt ON Subject.Id = SubjectStatement.Id
Is there a way to control the Sql that is being passed to Sql Server?
You are using the syntax of an inner join and while that might work out some times, you would normally create a left join using the following syntax:
var query =
(
from subject in subjects
join statement in statements on subject.Id equals statement.SubjectId into ljStatement
from statement in ljStatement.DefaultIfEmpty()
select subject
);
query.Take(100).Dump();
This would result in:
SELECT TOP (100) [t0].[Id]
FROM [Subject] AS [t0]
LEFT OUTER JOIN [SubjectStatement] AS [t1] ON [t0].[Id] = [t1].[SubjectId]
into (C# Reference)
The into contextual keyword can be used to create a temporary
identifier to store the results of a group, join or select clause into
a new identifier.
join clause (C# Reference)
A join clause with an into expression is called a group join.
...
A group join produces a hierarchical result sequence, which associates elements in the left source sequence with one or more matching elements in the right side source sequence. A group join has no equivalent in relational terms; it is essentially a sequence of object arrays.
If no elements from the right source sequence are found to match an element in the left source, the join clause will produce an empty array for that item. Therefore, the group join is still basically an inner-equijoin except that the result sequence is organized into groups.
Related
I converted a linq query to sql using LinqPad 4. But i am so much confused to the converted sql query. I have a job table that is related to AppliedJob. AppliedJob is related to JobOffer. JobOffer is related to Contract. Contract table has a field CompletedDate that is set to Null initially when a job contract starts. If a job completed ten the field is updated with the current date. I want to get those job list which have CompletedDate !=Null (if found in Contract table). That means a contract related to a job is not completed yet or not found in Contract table. Not found means any contract is not started with the job.
My Linq:
from j in Jobs
join jobContract in
(
from appliedJob in AppliedJobs.DefaultIfEmpty()
from offer in appliedJob.JobOffers.DefaultIfEmpty()
from contract in Contracts.DefaultIfEmpty()
select new { appliedJob, offer, contract }
).DefaultIfEmpty()
on j.JobID equals jobContract.appliedJob.JobID into jobContracts
where jobContracts.Any(jobContract => jobContract.contract.CompletedDate != null)
select j.JobTitle
My Sql query that Linqpad made:
SELECT [t0].[JobTitle]
FROM [Job] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT NULL AS [EMPTY]
) AS [t1]
LEFT OUTER JOIN ((
SELECT NULL AS [EMPTY]
) AS [t2]
LEFT OUTER JOIN ([AppliedJob] AS [t3]
LEFT OUTER JOIN [JobOffer] AS [t4] ON [t4].[AppliedJobID] = [t3].[AppliedJobID]
LEFT OUTER JOIN [Contract] AS [t5] ON 1=1 ) ON 1=1 ) ON 1=1
WHERE ([t5].[CompletedDate] IS NOT NULL) AND ([t0].[JobID] = [t3].[JobID])
)
My question is that why it makes so many SELECT NULL AS [EMPTY] and LEFT OUTER JOIN in the query?
Can i make a simple and understandable query from this? OR is it ok?
DefaultIfEmpty() translates to left outer join. See LEFT OUTER JOIN in LINQ
There are so many "NULL as [Empty]" because NULL != NULL in SQL. See Why does NULL = NULL evaluate to false in SQL server
It's been a while since I've touched C# and LINQ, but this is my take.
The reason for the multiple left outer joins and nulls is because you have several (deferred?) calls to DefaultIfEmpty().
No pun intended, but what is the default return value of Enumerable.DefaultIfEmpty()? It is null. And they are all evaluated and gathered before you get to the point of evaluating the join criteria in the LINQ code snippet.
And that code snippet represents the non-null right side of equation. And the whole thing can return an empty set.
So a compatible SQL statement must create a left outer join between an empty set recursively all the way down to the actual SQL join criteria.
It's almost algebraic. Try to understand what both the LINQ and SQL statements are down. Work them both out, backwards from the end all the way to the beginning of each, and you'll see the equivalence.
The reason for all the SELECT NULL AS [EMPTY]s is that these subqueries are not being utilized to return data, only to verify that there is data there. In other words, the LINQ code is optimizing the query to not actually bring in any column data, since it's completely unnecessary for the purposes of these subqueries.
Considering Following linq statement
var users = from a in dbContext.Users
select a;
var list = (from a in users
let count = users.Count()
where a.IsActive == true
select new { a.UserId, count }).ToList();
If we check profiler for this linq statement , it shows cross join to get count for every record.
SELECT
[Extent1].[UserId] AS [UserId],
[GroupBy1].[A1] AS [C1]
FROM [dbo].[Users] AS [Extent1]
CROSS JOIN (SELECT
COUNT(1) AS [A1]
FROM [dbo].[Users] AS [Extent2] ) AS [GroupBy1]
WHERE 1 = [Extent1].[IsActive]
I think cross join overhead for sql statement and may cause a performance issue when records are in huge amounts.
As a solution I can move that data.Count() outside of linq statment and then put in in select , but it cause two db operation.
var count = (from a in dbContext.Users
select a).Count();
var list = (from a in dbContext.Users
where a.IsActive == true
select new { a.UserId, count }).ToList();
By looking into profiler ,It will generate below two operation.
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[Users] AS [Extent1]
) AS [GroupBy1]
exec sp_executesql N'SELECT
[Extent1].[UserId] AS [UserId],
#p__linq__0 AS [C1]
FROM [dbo].[Users] AS [Extent1]
WHERE 1 = [Extent1].[IsActive]',N'#p__linq__0 int',#p__linq__0=26
Can anybody have better solution than this. Or can anybody suggest best way among putting let inside linq or getting it previously?
I think cross join overhead for sql statement and may cause a performance issue when records are in huge amounts.
Not necessarily. Notice that this is joining to a sub-query, which is a single row/column of data (count). You can write this query in different ways, but in the end, it needs to join in order to return {UserId,count}. You can't return that data without a join. And the join it's doing right now is pretty efficient. So, I would recommend to not try'n optimize a problem you don't have (i.e. premature optimization).
UPDATE: adding an actual execution plan (see how to) for the following query. You can see that it's joining to a scalar value (e.g. only running the Count select query once).
Query:
SELECT
[Extent1].[UserId] AS [UserId],
[GroupBy1].[A1] AS [C1]
FROM [dbo].[Users] AS [Extent1]
CROSS JOIN (SELECT
COUNT(1) AS [A1]
FROM [dbo].[Users] AS [Extent2] ) AS [GroupBy1]
WHERE 1 = [Extent1].[IsActive]
Execution plan:
There shouldn't be any performance issues with the generated sql. The cross join results in one record and the optimizer will only have to calculate it once regardless of the amount of active users in your table.
If you are not convinced compare the execution plan to your alternative. I can only think of using a sub select, but it doesn't look better to me.
Sub Select
SELECT
[UserId],
(SELECT count(*) FROM [dbo].[Users]) as [Cnt]
FROM [dbo].[Users]
WHERE 1 = [IsActive]
I have an entity framework 5 project hooked up to a SQLite database.
I did the model first approach and I was able to query 30,000 records from Table_A in roughly 3 seconds.
Now all I did was a another Table_B which has 0 to 1 references to a parent record from Table_A. It takes over 3 minutes to run the SAME query on Table_A. Table_B has ZERO records in it.
It's also worth noting that the EDMX added Navigation Properties to Table_A and Table_B. However it only added the foreign key column to Table_B. What would cause Entity Framework to slow down that much? When I revert my changes back to the old model, it runs fast.
Update
For reference the query is a standard linq to sql query.
var matches = Table_A.Where(it => it.UserName == "Waldo" || it.TimeStamp < oneMonthAgo);
I just ran the ToTraceString() to find the generated SQL query that this guy suggested in his answer here:
Turns out Entity Framework tried to be "smart" anticipating that I would use data from the child record. This is actually pretty cool! Just slows down my query a bit, so I might find a faster workaround.
Please note that this query is identical in LINQ syntax. This is just the underlying SQL that is generated as soon as I added another Table into the EDMX diagram.
Here is the FAST query: (abbreviated for clarity)
SELECT *
FROM [Table_A] AS [Extent1]
INNER JOIN (SELECT
[Extent2].[OID] AS [K1],
[Extent2].[C_Column1] AS [K2],
Max([Extent2].[Id]) AS [A1]
FROM [Table_A] AS [Extent2]
GROUP BY [Extent2].[OID], [Extent2].[C_Column1] ) AS [GroupBy1] ON [Extent1].[Id] =
[GroupBy1].[A1]
INNER JOIN [OtherExistingTable] AS [Extent3] ON [Extent1].[C_Column1] = [Extent3].[Id]
After adding Table_B this was the new query that was generated which made things much much slower.
SELECT *
FROM [Table_A] AS [Extent1]
LEFT OUTER JOIN [Table_B] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Table_B_ForeignKey_To_Table_A]
INNER JOIN (SELECT
[Join2].[K1] AS [K1],
[Join2].[K2] AS [K2],
Max([Join2].[A1]) AS [A1]
FROM ( SELECT
[Extent3].[OID] AS [K1],
[Extent3].[C_Column1] AS [K2],
[Extent3].[Id] AS [A1]
FROM [Table_A] AS [Extent3]
LEFT OUTER JOIN [Table_B] AS [Extent4] ON [Extent3].[Id] = [Extent4].[Table_B_ForeignKey_To_Table_A]
) AS [Join2]
GROUP BY [K1], [K2] ) AS [GroupBy1] ON [Extent1].[Id] = [GroupBy1].[A1]
INNER JOIN [FeatureServices] AS [Extent5] ON [Extent1].[C_Column1] = [Extent5].[Id]
1) First issue I'm having is if you do an include and then an order by the SQL generated generates an inner join and an outer join
var query = from l in Lead.Include("Contact")
orderby l.Contact.FirstName
select l;
Which generates the following inner join and outer join on the same table
INNER JOIN [dbo].[Contact] AS [Extent2]
ON [Extent1].[ContactId] = [Extent2].[ContactId]
LEFT OUTER JOIN [dbo].[Contact] AS [Extent3]
ON [Extent1].[ContactId] = [Extent3]. [ContactId]
ORDER BY [Extent2].[FirstName] ASC
Which makes for a slightly inefficient query
2) if I do multiple includes it always does the second one as an outer join so like
Lead.Include("OneToOne").Include("OtherOneToOne") <- in this scenario
OtherOneToOne is an outer
join and OneToOne is an inner
join
Lead.Include("OtherOneToOne").Include("OneToOne") <- in this scenario
OneToOne is an outer join
and OtherOneToOne is an
inner join
is that just how it works?
I found another post where someone was talking about this and they said that it was fixed in the June CTP release
http://blogs.msdn.com/b/adonet/archive/2011/06/30/announcing-the-microsoft-entity-framework-june-2011-ctp.aspx
But I installed and setup that to be used and it still doesn't work..
alright it won't let me answer my own question
so
EDIT:
Alright I setup an isolated test and found that http://blogs.msdn.com/b/adonet/archive/2011/06/30/announcing-the-microsoft-entity-framework-june-2011-ctp.aspx seems to have resolved these
But since I'm using RIA I'm out of luck since the june ctp doesn't support RIA :-/
A solution is to do the include yourself:
var query = from l in Lead
select new { l, l.Contact } into row
orderby row.Contact.FirstName
select row;
Lets say I have a variable 'userid', I want to select from aspnet_Membership AND aspnet_AccountProfile tables. They both have the column userid, I just want to be able to make a statement like SELECT * FROM aspnet_AccountProfile, aspnet_Membership WHERE UserId=#UserId and it gets the records with the matching user id for BOTH tables. how do I do this? Thank you!
That is called a JOIN:
There are several basic types of join based on what data exactly you want. These are related to set theory/relational algebra. I'll list the most common ones:
INNER JOIN
Use this when you want to return every possible combination of rows where both tables have a matching UserId. Some rows in either table may not get returned in an inner join.
SELECT * FROM aspnet_AccountProfile INNER JOIN aspnet_Membership
ON aspnet_AccountProfile.UserId = aspnet_Membership.UserId
Another way of writing an INNER JOIN (Which I wouldn't encourage if you want to understand joins) is:
SELECT * FROM aspnet_AccountProfile, aspnet_Membership
WHERE aspnet_AccountProfile.UserId = aspnet_membership.UserId
Of course, to select the specific UserId you want, you add a condition on either table eg:
AND aspnet_AccountProfile.UserId = #UserId
OR
AND aspnet_Membership.UserId = #UserId
Either of those two will work fine for an inner join.
LEFT OUTER JOIN
Use this when you want to return all rows from the first table in your query, and every combination where the UserId in the second table matches the first. Some rows in the second table (Membership, in this case) may not get returned at all.
SELECT * FROM aspnet_AccountProfile LEFT JOIN aspnet_Membership
ON aspnet_AccountProfile.UserId = aspnet_Membership.UserId
You have to use the left column to narrow down your criteria in this case, or it will automatically get converted to an INNER JOIN.
WHERE aspnet_AccountProfile.UserId = #UserId
RIGHT OUTER JOIN
This is fairly uncommon, because it can usually be written as a LEFT outer join. It's like a left outer join, but all rows from the second table in the relation are returned instead of the first.
SELECT * FROM aspnet_AccountProfile RIGHT JOIN aspnet_Membership
ON aspnet_AccountProfile.UserId = aspnet_Membership.UserId
FULL OUTER JOIN
Use this if you need to relate all the rows with a matching UserId in AccountProfile to the corresponding rows in Membership, but also need to know which rows in either table don't have a match in the other one.
SELECT * FROM aspnet_AccountProfile FULL OUTER JOIN aspnet_Membership
ON aspnet_AccountProfile.UserId = aspnet_Membership.UserId
Getting results for only a single user is a little trickier in a FULL OUTER JOIN. You have to specify that a NULL or the correct value is okay in either table.
Hi,
you can do it by using
"SELECT * FROM aspnet_AccountProfile
ap, aspnet_Membership m WHERE
ap.UserId=m.UserId ANB
ap.UserId=#UserId"
you can do this by inner join.
Here is the example,
Select aspnet_Membership.*, aspnet_AccountProfile.* from aspnet_AccountProfile
inner join aspnet_Membership on aspnet_Membership.userid = aspnet_AccountProfile.userid
where aspnet_Membership.UserId=#UserId
This will get only the record whem userid is common in both the table.
If you want to get the record which are in 1 table and may or may not be in other then you must user the left join
that is
Select aspnet_Membership.*, aspnet_AccountProfile.* from aspnet_AccountProfile
left join aspnet_Membership on aspnet_Membership.userid = aspnet_AccountProfile.userid
where aspnet_Membership.UserId=#UserId
You can use a Join.
Something like:
Select * from aspnet_AccountProfile INNER JOIN aspnet_Membership
ON aspnet_AccountProfile.UserId = aspnet_Membership.UserId
Where aspnet_AccountProfile.UserId = #UserId
Thanks,
Vamyip
You can use a simple inner join for this.