dSQL Server paging large dataset with multiple tbl joins - c#

Currently I am trying to load a large dataset to a gridview which of course is timing out because the amount of rows being returned is so large. Instead of trying to page in the gridview via memory, how might I page on the server to prevent the timeout that occurs from loading the dataset all at once? I have tried using the method described here, but the query still seems to hang. It may also be worth note that I have run my query against an execution plan and it has not suggested anything to consider. Please see my current implementation below:
with result_set as (select distinct row_number() over(order by a.date desc) as [row_number], a.date, vw.Name, a.accountNum,
a.action, z.loc, b.name, a.col
from tbl1 as a
inner join tbl2 as b
on b.id= ch.id
inner join tbl3 as z
on a.zip= z.zip
inner join tbl4 as vw
on a.accountNum= vw.accountNum collate database_default
where a.col is not null
) select * from result_set where [row_number] between 1 and 20
Am I the victim of poor indexing on my part? Or is something else i missed? Please share your thoughts.

Related

postgresql query to get sum and columns from two tables using joins

These are the tables that I have created: Tables
I want to get accounts.account,accounts.type,DATE(transactions.date),transactions.transactionid,transactions.amount,transactions.note from two tables between '10-11-2021' and '31-12-2021'.(whatever type may be)
I want to get Sum(account) from transactions table where type="income" and between '10-11-2021' and '31-12-2021'.
I want to get Sum(account) from transactions table where type="expense" and between '10-11-2021' and '31-12-2021'.
But I need all three queries in a single statement(that's what I am struggling)
My query:
SELECT accounts.account,accounts.type,DATE(transactions.date),transactions.transactionid,transactions.amount,transactions.note
FROM transactions
FULL JOIN accounts ON transactions.accountid=accounts.accountid
WHERE transactions.date BETWEEN '{0}' AND '{1}' ORDER BY transactions.date
UNION
select sum(amount)
FROM transactions
FULL JOIN accounts ON transactions.accountid=accounts.accountid
WHERE accounts.type='income'
I need to add other two queries also to fit above
can anyone help me?
Given the poor information about the source tables, the best I could is the following:
SELECT
a.account,
a.type,
DATE(t.date),
sum(case when t.type = 'income' then t.amount else 0 end) sum_of_income,
sum(case when t.type = 'expense' then t.amount else 0 end) sum_of_expense
FROM transactions t
left JOIN accounts a
ON t.accountid = a.accountid
group by
a.account,
a.type,
DATE(t.date)
tip: you shouldn't write your sql script in single-line code, almost never.
Also could be more informative to have table contents, or at least screenshots.

Refer to temporary table in Entity Framework query

There is a list list in memory of 50,000 Product IDs. I would like to get all these Products from the DB. Using dbContext.Products.Where(p => list.contains(p.ID)) generates a giant IN in the SQL - WHERE ID IN (2134,1324543,5675,32451,45735...), and it takes forever. This is partly because it takes time for SQL Server to parse such a large string, and also the execution plan is bad. (I know this from trying to use a temporary table instead).
So I used SQLBulkCopy to insert the IDs to a temporary table, and then ran
dbContext.Set<Product>().SqlQuery("SELECT * FROM Products WHERE ID IN (SELECT ID FROM #tmp))"
This gave good performance. However, now I need the products, with their suppliers (multiple for every product). Using a custom SQL command there is no way to get back a complex object that I know of. So how can I get the products with their suppliers, using the temporary table?
(If I can somehow refer to the temporary table in LINQ, then it would be OK - I could just do dbContext.Products.Where(p => dbContext.TempTable.Any(t => t.ID==p.ID)). If I could refer to it in a UDF that would also be good - but you can't. I cannot use a real table, since concurrent users would leave it in an inconsistent state.)
Thanks
I was curious to explore the sql generated using Join syntax rather than Contains. Here is the code for my test:
IQueryable<Product> queryable = Uow.ProductRepository.All;
List<int> inMemKeys = new int[] { 2134, 1324543, 5675, 32451, 45735 }.ToList();
string sql1 = queryable.Where(p => inMemKeys.Contains(p.ID)).ToString();
string sql2 = queryable.Join(inMemKeys, t => t.ID, pk => pk, (t, pk) => t).ToString();
This is the sql generated using Contains (sql1)
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
WHERE ([extent1].[id] IN (2134, 1324543, 5675, 32451, 45735))
This is the sql generated using Join:
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
INNER JOIN (SELECT
[unionall3].[c1] AS [c1]
FROM (SELECT
[unionall2].[c1] AS [c1]
FROM (SELECT
[unionall1].[c1] AS [c1]
FROM (SELECT
2134 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable1] UNION ALL SELECT
1324543 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable2]) AS [unionall1] UNION ALL SELECT
5675 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable3]) AS [unionall2] UNION ALL SELECT
32451 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable4]) AS [unionall3] UNION ALL SELECT
45735 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable5]) AS [unionall4]
ON [extent1].[id] = [unionall4].[c1]
So the sql creates a big select statement using union all to create the equivalent of your temporary table, then it joins to that table. The sql is more verbose, but it may well be efficient - I'm afraid I'm not qualified to say.
While it doesn't answer the question as set out in the heading, it does show a way to avoid the giant IN . OK.... now it's a giant UNION ALL.... anyways...I hope that this contribution is useful to some
I suggest you extend the filter table (TempTable in the code above) to store something like a UserId or SessionId as well as ProductID's:
this will give you all the performance you're after
it will work for concurrent users
If this filter table is changing a lot then consider updating it in a separate transaction (i.e. a different instance of dbContext) to avoid holding a write lock on this table for longer than necessary.

Choosing better query out of 2 queries, both returning same results

I am having two different sql queries one written by me and one automatically generated by C# when used with linq, both are giving same results.
I am not sure which one to choose, Iam looking for
Whats the best way to choose one query out of many, when all returns same result (most optimized query).
Out of my queries (below written), which one should i choose.
Hand Written
select * from People P
inner join SubscriptionItemXes S
on
P.Id=S.Person_Id
inner join FoodTagXFoods T1
on T1.FoodTagX_Id = S.Tag2
inner join FoodTagXFoods T2
on T2.FoodTagX_Id = S.Tag1
inner join Foods F
on
F.Id= T1.Food_Id and F.Id= T2.Food_Id
where p.id='1'
Automatically Generated by LINQ
SELECT
[Distinct1].[Id] AS [Id],
[Distinct1].[Item] AS [Item]
FROM ( SELECT DISTINCT
[Extent2].[Id] AS [Id],
[Extent2].[Item] AS [Item]
FROM [dbo].[People] AS [Extent1]
CROSS JOIN [dbo].[Foods] AS [Extent2]
INNER JOIN [dbo].[FoodTagXFoods] AS [Extent3]
ON [Extent2].[Id] = [Extent3].[Food_Id]
INNER JOIN [dbo].[SubscriptionItemXes] AS [Extent4]
ON [Extent1].[Id] = [Extent4].[Person_Id]
WHERE (N'rusi' = [Extent1].[Name]) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[FoodTagXFoods] AS [Extent5]
WHERE ([Extent2].[Id] = [Extent5].[Food_Id])
AND ([Extent5].[FoodTagX_Id] = [Extent4].[Tag1])
)) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[FoodTagXFoods] AS [Extent6]
WHERE ([Extent2].[Id] = [Extent6].[Food_Id])
AND ([Extent6].[FoodTagX_Id] = [Extent4].[Tag2])
))
) AS [Distinct1]
Execution Plan Results
Hand Written: Query Cost (relative to batch):33%
Linq Generated: Query Cost (relative to batch):67%
I have found that two different queries, one hand-written and one generated by Linq might look wildly different but, actually, when you analyse the query plan in SSMS, you find that actually they are almost identical.
You need to actually run these queries in SSMS with Display Actual Execution Plan switched on, and analyse the different plans. It's the only way to correctly analyse the two and find out which is better.
In general, Linq is actually very good at generating efficient queries; even if the actual SQL itself is pretty ugly (in some cases, it's the kind of SQL that a human would write if they had the time!). If course, that said, it can also generate some pigs!
Additionally, asking SO to help with performance of a query over so many tables is fraught with problems for us, since it will be governed so much by your indexes :)
But they aren't quite returning the same thing... The first query grabs everyting (SELECT *) while LINQ is going to extract what you really want (id and item). Trivial you may say but streaming back lots of data that's never used is a good waste of bandwidth and will make your application appear sluggish. Additionally, the LINQ query seems to be doing a lot more which may or may not be the correct solution especially as data is populated into FoodTagXFoods
As for which performs better, I couldn't tell you without something like the actual query plans and/or results of statistics io from both queries. My money is on hand-written but maybe because I like my hands.
Examining SQL Server Query Execution Plan is the best way to choose the better query.
Hope below tutorials help.
SQL Tuning Tutorial - Understanding a Database Execution Plan (1)
Examining Query Execution Plans
SQL Server Query Execution Plan Analysis
SQL SERVER – Actual Execution Plan vs. Estimated Execution Plan

Limiting the NULL subqueries from result set

The setup is simple, a Master table and a linked Child table (one master, many children). Lets say we want to extract all masters and their top chronological child value (updated, accessed, etc). A query would look like this (for example):
var masters = from m in Master
let mc = m.Childs.Max(c => c.CreatedOn)
select new { m, mc };
A potential problem occurs if master has no children, the subquery will yield NULL and conversion from NULL to DateTime will fail with
InvalidOperationException: The null
value cannot be assigned to a member
with type System.DateTime which is a
non-nullable value type.
Solution to exception is to cast mc to DateTime?, but I need masters that have some children and just bypass few which have no children yet.
Solution #1 Add where m.Childs.Count() > 0.
This one kicked me hard and unexpected, the generated SQL was just plain awful (as was its execution plan) and ran almost twice as slow:
SELECT [t2].[Name] AS [MasterName], [t2].[value] AS [cm]
FROM (
SELECT [t0].[id], [t0].[Name], (
SELECT MAX([t1].[CreatedOn])
FROM [Child] AS [t1]
WHERE [t1].[masterId] = [t0].[id]
) AS [value]
FROM [Master] AS [t0]
) AS [t2]
WHERE ((
SELECT COUNT(*)
FROM [Child] AS [t3]
WHERE [t3].[masterId] = [t2].[id]
)) > #p0
Solution #2 with where mc != null is even worst, it gives a shorter script but it executes far longer than the above one (takes roughly the same time as the two above together)
SELECT [t2].[Name] AS [MasterName], [t2].[value] AS [cm]
FROM (
SELECT [t0].[id], [t0].[Name], (
SELECT MAX([t1].[CreatedOn])
FROM [Child] AS [t1]
WHERE [t1].[masterId] = [t0].[id]
) AS [value]
FROM [Master] AS [t0]
) AS [t2]
WHERE ([t2].[value]) IS NOT NULL
All in all a lot of wasted SQL time to eliminate a few rows from tens or thousands or more. This led me to Solution #3, get everything and eliminate empty ones client side, but to do that I had to kiss IQueryable goodbye:
var masters = from m in Master
let mc = (DateTime?)m.Childs.Max(c => c.CreatedOn)
select new { m, mc };
var mastersNotNull = masters.AsEnumerable().Where(m => m.mc != null);
and this works, however I am trying to figure out if there are any downsides to this? Will this behave anyway fundamentally different then with full monty IQueryable? I imagine this also means I cannot (or should not) use masters as a factor in a different IQueryable? Any input/observation/alternative is welcomed.
Just based on this requirement:
a Master table and a linked Child
table (one master, many children).
Lets say we want to extract all
masters and their top chronological
child value
SELECT [m].[Name] AS [MasterName]
, Max([c].[value]) as [cm]
FROM [Master] AS [m]
left outer join [Child] as [c] on m.id = c.id
group by [m].[name]

Why did the following linq to sql query generate a subquery?

I did the following query:
var list = from book in books
where book.price > 50
select book;
list = list.Take(50);
I would expect the above to generate something like:
SELECT top 50 id, title, price, author
FROM Books
WHERE price > 50
but it generates:
SELECT
[Limit1].[C1] as [C1]
[Limit1].[id] as [Id],
[Limit1].[title] as [title],
[Limit1].[price] as [price],
[Limit1].[author]
FROM (SELECT TOP (50)
[Extent1].[id] as as [Id],
[Extent1].[title] as [title],
[Extent1].[price] as [price],
[Extent1].[author] as [author]
FROM Books as [Extent1]
WHERE [Extent1].[price] > 50
) AS [Limit1]
Why does the above linq query generate a subquery and where does the C1 come from?
Disclaimer: I've never used LINQ before...
My guess would be paging support? I guess you have some sort of Take(50, 50) method that gets 50 records, starting at record 50. Take a look at the SQL that query generates and you will probably find that it uses a similar sub query structure to allow it to return any 50 rows in a query in approximately the amount of time that it returns the first 50 rows.
In any case, the nested sub query doesn't add any performance overhead as it's automagically optimised away during compilation of the execution plan.
You could still make it cleaner like this:
var c = (from co in db.countries
where co.regionID == 5
select co).Take(50);
This will result in:
Table(country).Where(co => (co.regionID = Convert(5))).Take(50)
Equivalent to:
SELECT TOP (50) [t0].[countryID], [t0].[regionID], [t0].[countryName], [t0].[code]
FROM [dbo].[countries] AS [t0]
WHERE [t0].[regionID] = 5
EDIT: Comments, Its Not necessarily because with separate Take(), you can still use it like this:
var c = (from co in db.countries
where co.regionID == 5
select co);
var l = c.Take(50).ToList();
And the Result would be the same as before.
SELECT TOP (50) [t0].[countryID], [t0].[regionID], [t0].[countryName], [t0].[code]
FROM [dbo].[countries] AS [t0]
WHERE [t0].[regionID] = #p0
The fact that you wrote IQueryable = IQueryable.Take(50) is the tricky part here.
The subquery is generated for projection purposes, it makes more sense when you select from multiple tables into a single anonymous object, then the outer query is used to gather the results.
Try what happens with something like this:
from book in books
where price > 50
select new
{
Title = book.title,
Chapters = from chapter in book.Chapters
select chapter.Title
}
Isn't it a case of the first query returning the total number of rows while the second extracts the subset of rows based on the call to the .Take() method?
I agree with #Justin Swartsel. There was no error involved, so this is largely an academic matter.
Linq-to-SQL endeavors to generate SQL that runs efficiently (which it did in your case).
But it does not make any effort to generate conventional SQL that a human would likely create.
The Linq-to-SQL implementers likely used the builder pattern to generate the SQL.
If so, it would be easier to append a substring (or a subquery in this case) than it would be to backtrack and insert a 'TOP x' fragment into the SELECT clause.

Categories

Resources