The setup is simple, a Master table and a linked Child table (one master, many children). Lets say we want to extract all masters and their top chronological child value (updated, accessed, etc). A query would look like this (for example):
var masters = from m in Master
let mc = m.Childs.Max(c => c.CreatedOn)
select new { m, mc };
A potential problem occurs if master has no children, the subquery will yield NULL and conversion from NULL to DateTime will fail with
InvalidOperationException: The null
value cannot be assigned to a member
with type System.DateTime which is a
non-nullable value type.
Solution to exception is to cast mc to DateTime?, but I need masters that have some children and just bypass few which have no children yet.
Solution #1 Add where m.Childs.Count() > 0.
This one kicked me hard and unexpected, the generated SQL was just plain awful (as was its execution plan) and ran almost twice as slow:
SELECT [t2].[Name] AS [MasterName], [t2].[value] AS [cm]
FROM (
SELECT [t0].[id], [t0].[Name], (
SELECT MAX([t1].[CreatedOn])
FROM [Child] AS [t1]
WHERE [t1].[masterId] = [t0].[id]
) AS [value]
FROM [Master] AS [t0]
) AS [t2]
WHERE ((
SELECT COUNT(*)
FROM [Child] AS [t3]
WHERE [t3].[masterId] = [t2].[id]
)) > #p0
Solution #2 with where mc != null is even worst, it gives a shorter script but it executes far longer than the above one (takes roughly the same time as the two above together)
SELECT [t2].[Name] AS [MasterName], [t2].[value] AS [cm]
FROM (
SELECT [t0].[id], [t0].[Name], (
SELECT MAX([t1].[CreatedOn])
FROM [Child] AS [t1]
WHERE [t1].[masterId] = [t0].[id]
) AS [value]
FROM [Master] AS [t0]
) AS [t2]
WHERE ([t2].[value]) IS NOT NULL
All in all a lot of wasted SQL time to eliminate a few rows from tens or thousands or more. This led me to Solution #3, get everything and eliminate empty ones client side, but to do that I had to kiss IQueryable goodbye:
var masters = from m in Master
let mc = (DateTime?)m.Childs.Max(c => c.CreatedOn)
select new { m, mc };
var mastersNotNull = masters.AsEnumerable().Where(m => m.mc != null);
and this works, however I am trying to figure out if there are any downsides to this? Will this behave anyway fundamentally different then with full monty IQueryable? I imagine this also means I cannot (or should not) use masters as a factor in a different IQueryable? Any input/observation/alternative is welcomed.
Just based on this requirement:
a Master table and a linked Child
table (one master, many children).
Lets say we want to extract all
masters and their top chronological
child value
SELECT [m].[Name] AS [MasterName]
, Max([c].[value]) as [cm]
FROM [Master] AS [m]
left outer join [Child] as [c] on m.id = c.id
group by [m].[name]
Related
I converted a linq query to sql using LinqPad 4. But i am so much confused to the converted sql query. I have a job table that is related to AppliedJob. AppliedJob is related to JobOffer. JobOffer is related to Contract. Contract table has a field CompletedDate that is set to Null initially when a job contract starts. If a job completed ten the field is updated with the current date. I want to get those job list which have CompletedDate !=Null (if found in Contract table). That means a contract related to a job is not completed yet or not found in Contract table. Not found means any contract is not started with the job.
My Linq:
from j in Jobs
join jobContract in
(
from appliedJob in AppliedJobs.DefaultIfEmpty()
from offer in appliedJob.JobOffers.DefaultIfEmpty()
from contract in Contracts.DefaultIfEmpty()
select new { appliedJob, offer, contract }
).DefaultIfEmpty()
on j.JobID equals jobContract.appliedJob.JobID into jobContracts
where jobContracts.Any(jobContract => jobContract.contract.CompletedDate != null)
select j.JobTitle
My Sql query that Linqpad made:
SELECT [t0].[JobTitle]
FROM [Job] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT NULL AS [EMPTY]
) AS [t1]
LEFT OUTER JOIN ((
SELECT NULL AS [EMPTY]
) AS [t2]
LEFT OUTER JOIN ([AppliedJob] AS [t3]
LEFT OUTER JOIN [JobOffer] AS [t4] ON [t4].[AppliedJobID] = [t3].[AppliedJobID]
LEFT OUTER JOIN [Contract] AS [t5] ON 1=1 ) ON 1=1 ) ON 1=1
WHERE ([t5].[CompletedDate] IS NOT NULL) AND ([t0].[JobID] = [t3].[JobID])
)
My question is that why it makes so many SELECT NULL AS [EMPTY] and LEFT OUTER JOIN in the query?
Can i make a simple and understandable query from this? OR is it ok?
DefaultIfEmpty() translates to left outer join. See LEFT OUTER JOIN in LINQ
There are so many "NULL as [Empty]" because NULL != NULL in SQL. See Why does NULL = NULL evaluate to false in SQL server
It's been a while since I've touched C# and LINQ, but this is my take.
The reason for the multiple left outer joins and nulls is because you have several (deferred?) calls to DefaultIfEmpty().
No pun intended, but what is the default return value of Enumerable.DefaultIfEmpty()? It is null. And they are all evaluated and gathered before you get to the point of evaluating the join criteria in the LINQ code snippet.
And that code snippet represents the non-null right side of equation. And the whole thing can return an empty set.
So a compatible SQL statement must create a left outer join between an empty set recursively all the way down to the actual SQL join criteria.
It's almost algebraic. Try to understand what both the LINQ and SQL statements are down. Work them both out, backwards from the end all the way to the beginning of each, and you'll see the equivalence.
The reason for all the SELECT NULL AS [EMPTY]s is that these subqueries are not being utilized to return data, only to verify that there is data there. In other words, the LINQ code is optimizing the query to not actually bring in any column data, since it's completely unnecessary for the purposes of these subqueries.
Currently I am trying to load a large dataset to a gridview which of course is timing out because the amount of rows being returned is so large. Instead of trying to page in the gridview via memory, how might I page on the server to prevent the timeout that occurs from loading the dataset all at once? I have tried using the method described here, but the query still seems to hang. It may also be worth note that I have run my query against an execution plan and it has not suggested anything to consider. Please see my current implementation below:
with result_set as (select distinct row_number() over(order by a.date desc) as [row_number], a.date, vw.Name, a.accountNum,
a.action, z.loc, b.name, a.col
from tbl1 as a
inner join tbl2 as b
on b.id= ch.id
inner join tbl3 as z
on a.zip= z.zip
inner join tbl4 as vw
on a.accountNum= vw.accountNum collate database_default
where a.col is not null
) select * from result_set where [row_number] between 1 and 20
Am I the victim of poor indexing on my part? Or is something else i missed? Please share your thoughts.
There is a list list in memory of 50,000 Product IDs. I would like to get all these Products from the DB. Using dbContext.Products.Where(p => list.contains(p.ID)) generates a giant IN in the SQL - WHERE ID IN (2134,1324543,5675,32451,45735...), and it takes forever. This is partly because it takes time for SQL Server to parse such a large string, and also the execution plan is bad. (I know this from trying to use a temporary table instead).
So I used SQLBulkCopy to insert the IDs to a temporary table, and then ran
dbContext.Set<Product>().SqlQuery("SELECT * FROM Products WHERE ID IN (SELECT ID FROM #tmp))"
This gave good performance. However, now I need the products, with their suppliers (multiple for every product). Using a custom SQL command there is no way to get back a complex object that I know of. So how can I get the products with their suppliers, using the temporary table?
(If I can somehow refer to the temporary table in LINQ, then it would be OK - I could just do dbContext.Products.Where(p => dbContext.TempTable.Any(t => t.ID==p.ID)). If I could refer to it in a UDF that would also be good - but you can't. I cannot use a real table, since concurrent users would leave it in an inconsistent state.)
Thanks
I was curious to explore the sql generated using Join syntax rather than Contains. Here is the code for my test:
IQueryable<Product> queryable = Uow.ProductRepository.All;
List<int> inMemKeys = new int[] { 2134, 1324543, 5675, 32451, 45735 }.ToList();
string sql1 = queryable.Where(p => inMemKeys.Contains(p.ID)).ToString();
string sql2 = queryable.Join(inMemKeys, t => t.ID, pk => pk, (t, pk) => t).ToString();
This is the sql generated using Contains (sql1)
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
WHERE ([extent1].[id] IN (2134, 1324543, 5675, 32451, 45735))
This is the sql generated using Join:
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
INNER JOIN (SELECT
[unionall3].[c1] AS [c1]
FROM (SELECT
[unionall2].[c1] AS [c1]
FROM (SELECT
[unionall1].[c1] AS [c1]
FROM (SELECT
2134 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable1] UNION ALL SELECT
1324543 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable2]) AS [unionall1] UNION ALL SELECT
5675 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable3]) AS [unionall2] UNION ALL SELECT
32451 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable4]) AS [unionall3] UNION ALL SELECT
45735 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable5]) AS [unionall4]
ON [extent1].[id] = [unionall4].[c1]
So the sql creates a big select statement using union all to create the equivalent of your temporary table, then it joins to that table. The sql is more verbose, but it may well be efficient - I'm afraid I'm not qualified to say.
While it doesn't answer the question as set out in the heading, it does show a way to avoid the giant IN . OK.... now it's a giant UNION ALL.... anyways...I hope that this contribution is useful to some
I suggest you extend the filter table (TempTable in the code above) to store something like a UserId or SessionId as well as ProductID's:
this will give you all the performance you're after
it will work for concurrent users
If this filter table is changing a lot then consider updating it in a separate transaction (i.e. a different instance of dbContext) to avoid holding a write lock on this table for longer than necessary.
I have this query that I want translated pretty much 1:1 from Entity Framework to SQL:
SELECT GroupId, ItemId, count(*) as total
FROM [TESTDB].[dbo].[TestTable]
WHERE GroupId = '64025'
GROUP BY GroupId, ItemId
ORDER BY GroupId, total DESC
This SQL query should sort based on the number occurrence of the same ItemId (for that group).
I have this now:
from x in dataContext.TestTable.AsNoTracking()
where x.GroupId = 64025
group x by new {x.GroupId, x.ItemId}
into g
orderby g.Key.GroupId, g.Count() descending
select new {g.Key.GroupId, g.Key.ItemId, Count = g.Count()};
But this generates the following SQL code:
SELECT
[GroupBy1].[K1] AS [GroupId],
[GroupBy1].[K2] AS [ItemId],
[GroupBy1].[A2] AS [C1]
FROM ( SELECT
[Extent1].[GroupId] AS [K1],
[Extent1].[ItemId] AS [K2],
COUNT(1) AS [A1],
COUNT(1) AS [A2]
FROM [dbo].[TestTable] AS [Extent1]
WHERE 64025 = [Extent1].[GroupId]
GROUP BY [Extent1].[GroupId], [Extent1].[ItemId]
) AS [GroupBy1]
ORDER BY [GroupBy1].[K1] ASC, [GroupBy1].[A1] DESC
This also works but is a factor 2 slower than the SQL I created.
I've been fiddling around with the linq code for a while but I haven't managed to create something similar to my query.
Execution plan (only the last two items, the first two are identical):
FIRST: |--Stream Aggregate(GROUP BY:([Extent1].[ItemId]) DEFINE:([Expr1006]=Count(*), [Extent1].[GroupId]=ANY([TESTDB].[dbo].[TestTable].[GroupId] as [Extent1].[GroupId])))
|--Index Seek(OBJECT:([TESTDB].[dbo].[TestTable].[IX_Group]), SEEK:([TESTDB].[dbo].[TestTable].[GroupId]=(64034)) ORDERED FORWARD)
SECOND: |--Stream Aggregate(GROUP BY:([TESTDB].[dbo].[TestTable].[ItemId]) DEFINE:([Expr1007]=Count(*), [TESTDB].[dbo].[TestTable].[GroupId]=ANY([TESTDB].[dbo].[TestTable].[GroupId])))
|--Index Seek(OBJECT:([TESTDB].[dbo].[TestTable].[IX_Group] AS [Extent1]), SEEK:([Extent1].[GroupId]=(64034)) ORDERED FORWARD)
The query that Entity Framework generates and your hand crafted query are semantically the same and will give the same plan.
The derived table definition is inlined during query optimisation so the only difference might be some extremely minor additional overhead during parsing and compilation.
The snippets of SHOWPLAN_TEXT you have posted are the same plan. The only difference is aliases. It looks as though your table definition is something like.
CREATE TABLE [dbo].[TestTable]
(
[GroupId] INT,
[ItemId] INT
)
CREATE NONCLUSTERED INDEX IX_Group ON [dbo].[TestTable] ([GroupId], [ItemId])
And you are getting a plan like this
To all intents and purposes the plans are the same. Your performance testing methodology is probably flawed. Maybe your first query brought pages into cache that then benefited the second query for example.
I did the following query:
var list = from book in books
where book.price > 50
select book;
list = list.Take(50);
I would expect the above to generate something like:
SELECT top 50 id, title, price, author
FROM Books
WHERE price > 50
but it generates:
SELECT
[Limit1].[C1] as [C1]
[Limit1].[id] as [Id],
[Limit1].[title] as [title],
[Limit1].[price] as [price],
[Limit1].[author]
FROM (SELECT TOP (50)
[Extent1].[id] as as [Id],
[Extent1].[title] as [title],
[Extent1].[price] as [price],
[Extent1].[author] as [author]
FROM Books as [Extent1]
WHERE [Extent1].[price] > 50
) AS [Limit1]
Why does the above linq query generate a subquery and where does the C1 come from?
Disclaimer: I've never used LINQ before...
My guess would be paging support? I guess you have some sort of Take(50, 50) method that gets 50 records, starting at record 50. Take a look at the SQL that query generates and you will probably find that it uses a similar sub query structure to allow it to return any 50 rows in a query in approximately the amount of time that it returns the first 50 rows.
In any case, the nested sub query doesn't add any performance overhead as it's automagically optimised away during compilation of the execution plan.
You could still make it cleaner like this:
var c = (from co in db.countries
where co.regionID == 5
select co).Take(50);
This will result in:
Table(country).Where(co => (co.regionID = Convert(5))).Take(50)
Equivalent to:
SELECT TOP (50) [t0].[countryID], [t0].[regionID], [t0].[countryName], [t0].[code]
FROM [dbo].[countries] AS [t0]
WHERE [t0].[regionID] = 5
EDIT: Comments, Its Not necessarily because with separate Take(), you can still use it like this:
var c = (from co in db.countries
where co.regionID == 5
select co);
var l = c.Take(50).ToList();
And the Result would be the same as before.
SELECT TOP (50) [t0].[countryID], [t0].[regionID], [t0].[countryName], [t0].[code]
FROM [dbo].[countries] AS [t0]
WHERE [t0].[regionID] = #p0
The fact that you wrote IQueryable = IQueryable.Take(50) is the tricky part here.
The subquery is generated for projection purposes, it makes more sense when you select from multiple tables into a single anonymous object, then the outer query is used to gather the results.
Try what happens with something like this:
from book in books
where price > 50
select new
{
Title = book.title,
Chapters = from chapter in book.Chapters
select chapter.Title
}
Isn't it a case of the first query returning the total number of rows while the second extracts the subset of rows based on the call to the .Take() method?
I agree with #Justin Swartsel. There was no error involved, so this is largely an academic matter.
Linq-to-SQL endeavors to generate SQL that runs efficiently (which it did in your case).
But it does not make any effort to generate conventional SQL that a human would likely create.
The Linq-to-SQL implementers likely used the builder pattern to generate the SQL.
If so, it would be easier to append a substring (or a subquery in this case) than it would be to backtrack and insert a 'TOP x' fragment into the SELECT clause.