A more refined version of this LINQ to SQL query - c#

My conundrum is with trying to convert the following T-SQL query into a near equivalent (performance wise) LINQ to SQL query:
SELECT
j1.JOB,
max(CASE WHEN ISNULL(logs.statcategory, ' ') = 'PREP' THEN 'X' ELSE ' ' END) AS prep,
max(CASE WHEN ISNULL(logs.statcategory, ' ') = 'PRINT' THEN 'X' ELSE ' ' END) AS press,
max(CASE WHEN ISNULL(logs.statcategory, ' ') = 'BIND' THEN 'X' ELSE ' ' END) AS bind,
max(CASE WHEN ISNULL(logs.statcategory, ' ') = 'SHIP' THEN 'X' ELSE ' ' END) AS ship
from
job j1
left outer join
(
select
j.job,
l.statcategory,
cnt=count(*)
from
job j
join
jobloc jl
join location l
on
l.code = jl.location and
l.site = jl.site
on j.job = jl.job
WHERE
j.stat = 'O'
group by
j.job,l.statcategory
) logs
on
j1.job = logs.job
WHERE
j1.stat = 'O'
group by
j1.job
This query currently runs just under 0.2 seconds on MS SQL Server. The following LINQ query is what I've come up with that returns the exact same records, but runs nearly 30x slower:
from a0 in Jobs
join a1 in
(
from a0 in Jobs
join a1 in JobLocs on a0.Content equals a1.Job
join a2 in Locations on new {Code = a1.Location, a1.Site} equals new {a2.Code, a2.Site}
where a0.Stat == 'O'
select new {a0.Content, a2.StatCategory}
) on a0.Content equals a1.Content into a1
from a2 in a1.DefaultIfEmpty()
where a0.Stat == 'O'
group a2 by a0.Content into a0
orderby a0.Key
select new
{
Job = a0.Key,
Prep = (bool?)a0.Max(a1 => a1.StatCategory == "PREP" ? true : false),
Print = (bool?)a0.Max(a1 => a1.StatCategory == "PRINT" ? true : false),
BIND = (bool?)a0.Max(a1 => a1.StatCategory == "BIND" ? true : false),
SHIP = (bool?)a0.Max(a1 => a1.StatCategory == "SHIP" ? true : false),
}
Here is the generated SQL from the LINQ query (using LINQPad):
-- Region Parameters
DECLARE #p0 Int = 79
DECLARE #p1 Int = 79
DECLARE #p2 VarChar(1000) = 'PREP'
DECLARE #p3 VarChar(1000) = 'PRINT'
DECLARE #p4 VarChar(1000) = 'BIND'
DECLARE #p5 VarChar(1000) = 'SHIP'
-- EndRegion
SELECT [t4].[Job], [t4].[value] AS [Prep], [t4].[value2] AS [Print], [t4].[value3] AS [BIND], [t4].[value4] AS [SHIP]
FROM (
SELECT MAX(
(CASE
WHEN [t3].[StatCategory] = #p2 THEN 1
WHEN NOT ([t3].[StatCategory] = #p2) THEN 0
ELSE NULL
END)) AS [value], MAX(
(CASE
WHEN [t3].[StatCategory] = #p3 THEN 1
WHEN NOT ([t3].[StatCategory] = #p3) THEN 0
ELSE NULL
END)) AS [value2], MAX(
(CASE
WHEN [t3].[StatCategory] = #p4 THEN 1
WHEN NOT ([t3].[StatCategory] = #p4) THEN 0
ELSE NULL
END)) AS [value3], MAX(
(CASE
WHEN [t3].[StatCategory] = #p5 THEN 1
WHEN NOT ([t3].[StatCategory] = #p5) THEN 0
ELSE NULL
END)) AS [value4], [t0].[Job]
FROM [Job] AS [t0]
LEFT OUTER JOIN ([Job] AS [t1]
INNER JOIN [JobLoc] AS [t2] ON [t1].[Job] = [t2].[Job]
INNER JOIN [Location] AS [t3] ON ([t2].[Location] = [t3].[Code]) AND ([t2].[Site] = [t3].[Site])) ON ([t0].[Job] = [t1].[Job]) AND (UNICODE([t1].[Stat]) = #p0)
WHERE UNICODE([t0].[Stat]) = #p1
GROUP BY [t0].[Job]
) AS [t4]
ORDER BY [t4].[Job]
One thing that stands out is that the generated SQL from the LINQ query runs the aggregate for each column returned in a subquery, whereas in the original it is part of the outer SELECT. I can imagine part of the performance decrease is there.
I'm (tentatively) willing to accept that there is no better way to write this, and just use the DataContext.ExecuteQuery() method in the LINQ API (and just run and shape the first SQL statement directly). However, I'm trying to not include embedded SQL as much as possible in a project that I'm currently working on, so if it can be made to be near the performance of the original query, that'd be ideal. I've been hacking away at this for some time (partly as an academic exercise, and also to actually use this or similar queries like it), and this is the best I've come up with (I did not write the original query BTW--it was part of an older project that is being migrated to a newer one).
Thanks for any assistance.

As per our discussion in the comments,
The issue is the UNICODE conversion that the linq-to-entities adds from some unknown reason.
the DB cannot use the index because of the (unnecessary) conversion.
You can use .Equals instead of == and it will not use UNICODE or change the type to varchar(1) in the db.

Related

how to compare a string delimited string to a column value in sql without considering sequence

How to compare a string delimited string to a column value in sql without considering sequence?
Suppose I have a value in sql column [fruits] - mango, apple, cherry... I have list in asp.net C# cherry, mango, apple... I want to write sql query such that it can match sql table without order.
I suggest that you look at the fabulous answers in this SO question
How to split a comma-separated value to columns
That said, your solution should be pass each column which contains words to this function and then store it in a table along with a column ID.
So "mango,apple,cherry" becomes a table with values
ColdID Value
_______________
1 mango
1 apple
1 cherry
Now order the tables by ColID ASC, Value ASC and compare both the tables.
This should do it.
DECLARE #str NVARCHAR(MAX)
, #Delim NVARCHAR(255)
SELECT #str = 'cherry,mango,peach,apple'
SELECT #Delim = ','
CREATE TABLE #Fruits ( Fruit VARCHAR(255) )
INSERT INTO #Fruits
( Fruit )
VALUES ( 'cherry' ),
( 'Mango' ),
( 'Apple' ) ,
( 'Banana' )
;WITH lv0 AS (SELECT 0 g UNION ALL SELECT 0)
,lv1 AS (SELECT 0 g FROM lv0 a CROSS JOIN lv0 b) -- 4
,lv2 AS (SELECT 0 g FROM lv1 a CROSS JOIN lv1 b) -- 16
,lv3 AS (SELECT 0 g FROM lv2 a CROSS JOIN lv2 b) -- 256
,lv4 AS (SELECT 0 g FROM lv3 a CROSS JOIN lv3 b) -- 65,536
,lv5 AS (SELECT 0 g FROM lv4 a CROSS JOIN lv4 b) -- 4,294,967,296
,Tally_CTE (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM lv5)
SELECT SUBSTRING(#str, N, CHARINDEX(#Delim, #str + #Delim, N) - N) AS Item
INTO #StrTable
FROM Tally_CTE
WHERE N BETWEEN 1 AND DATALENGTH(#str) + DATALENGTH(#Delim)
AND SUBSTRING(#Delim + #str, N, LEN(#Delim)) = #Delim;
--#############################################################################
-- in both
--#############################################################################
SELECT *
FROM #Fruits F
JOIN #StrTable ST ON F.Fruit = ST.Item
--#############################################################################
-- in table but not string
--#############################################################################
SELECT *
FROM #Fruits F
LEFT JOIN #StrTable ST ON ST.Item = F.Fruit
WHERE ST.Item IS NULL
--#############################################################################
-- in string but not table
--#############################################################################
SELECT *
FROM #StrTable ST
LEFT JOIN #Fruits F ON ST.Item = F.Fruit
WHERE F.Fruit IS NULL
GO
DROP TABLE #Fruits
DROP TABLE #StrTable
You can use string_split function to do this. I tested this on SQL Server 2017 ctp 2.0 but it should work on 2016 too.
drop table if exists dbo.Fruits;
create table dbo.Fruits (
Fruits varchar(100)
);
insert into dbo.Fruits (Fruits)
values ('cherry,mango,apple'), ('peanut,cherry,mango'),
('apple,cherry,mango')
declare #str varchar(100) = 'apple,mango,cherry';
select
tt.Fruits
, COUNT(tt.value) as Value01
, COUNT(app.value) as Value02
from (
select
*
from dbo.Fruits f
outer apply string_split (f.Fruits, ',') t
) tt
left join string_split (#str, ',') app on tt.value = app.value
group by tt.Fruits

Converting SQL Statement with Multiple Left Outer Joins and Various Clauses to LINQ

I've been trying to create associations and navigation properties to simplify my LINQ query a little; however, because of the complicated, proprietary database I'm working with it's been really difficult.
For most of this process, I've been using the database first approach on EF5. I couldn't create associations and navigation properties using database first, so I tried to create foreign keys on SQL Server, but was unsuccessful in that.
For my next step, I tried to create the tables, associations, and navigation properties using the code first approach; however, again, because of the necessities of the database and server I'm working with, I was unsuccessful in simplifying my query.
My last resort is just to do what I was originally trying to do in this question. That is to get this LINQ query:
from item in db2.OrderFormDump
join icp in db2.IcPricP on item.NODASHITEMNO equals icp.ITEMNO into icpGroup
from iG in icpGroup.DefaultIfEmpty()
join itemInfo in db2.WebItemInfo on item.ITEMNO equals itemInfo.ITEMNO into itemInfoGroup
from iIG in itemInfoGroup.DefaultIfEmpty()
join weboeordh in db2.WebOEOrdH on "brian" equals weboeordh.USER into weboeordhGroup
from wOEODHG in weboeordhGroup.DefaultIfEmpty()
join weboeordd in db2.WebOEOrdD on new { itemno = item.NODASHITEMNO, orduniq = wOEODHG.ORDUNIQ } equals new { itemno = weboeordd.ITEMNO, orduniq = weboeordd.ORDUNIQ } into weboeorddGroup
from wOEODG in weboeorddGroup.DefaultIfEmpty()
join weboeordsubmit in db2.WebOEOrdSubmit on wOEODG.ORDUNIQ equals weboeordsubmit.ORDUNIQ into weboeordsubmitGroup
from wOEOSG in weboeordsubmitGroup.DefaultIfEmpty()
join webloginaccess in db2.WebLoginAccess on "brian" equals webloginaccess.USER into webloginaccessGroup
from wLAG in webloginaccessGroup.DefaultIfEmpty()
join arcus in db2.Arcus on wLAG.CUSTID equals arcus.IDCUST into arcusGroup
from aG in arcusGroup
where (item.ALLOWINBC == "Yes" && item.ALLOWINAB == "Yes")
&& (item.BASEDESCRIPTION.Contains("dude") || item.DESCRIPTION.Contains("dude") || item.CATEGORY.Contains("dude") || item.FOODACCSPEC.Contains("dude") || item.ITEMBRAND.Contains("dude") || item.ITEMGROUP.Contains("dude") || item.ITEMNO.Contains("dude") || item.ITEMSUBTYPE.Contains("dude") || item.ITEMTYPE.Contains("dude") || iIG.INFO.Contains("dude") || item.UPC.Contains("dude") || item.UPC.Substring(2, 10).Contains("dude"))
&& (iG.CURRENCY == "CDN" && iG.DPRICETYPE == 1)
&& wOEODG.ORDUNIQ != wOEODHG.ORDUNIQ
&& iG.PRICELIST == aG.PRICLIST
orderby item.BASEDESCRIPTION
select new { item.ITEMNO, item.BASEDESCRIPTION, iIG.INFO, item.UPC, iG.UNITPRICE, item.CASEQTY, wOEODG.QTY } into x
group x by new { x.ITEMNO, x.BASEDESCRIPTION, x.INFO, x.UPC, x.UNITPRICE, x.CASEQTY, x.QTY } into items
select items;
To get the same results as this SQL query:
DECLARE #search varchar(50) = 'dude'
SELECT orderformdump.itemno,basedescription,info,upc,CAST(UNITPRICE AS DECIMAL(18,2)),caseqty, sum(qty) AS userquantity
FROM PPPLTD.[dbo].[ORDERFORMDUMP]
LEFT JOIN PPPLTD.dbo.ICPRICP ON replace(PPPLTD.[dbo].[ORDERFORMDUMP].[ITEMNO],'-','') = ICPRICP.ITEMNO
LEFT JOIN PPPLTD.dbo.WEBITEMINFO ON ORDERFORMDUMP.ITEMNO = WEBITEMINFO.ITEMNO
LEFT JOIN pppltd.dbo.weboeordh ON [user] = 'brian'
LEFT JOIN pppltd.dbo.weboeordd ON pppltd.dbo.WEBOEORDD.ITEMNO = REPLACE(pppltd.dbo.ORDERFORMDUMP.ITEMNO,'-','') and weboeordd.ORDUNIQ = weboeordh.orduniq
Left JOIN pppltd.dbo.weboeordsubmit ON weboeordsubmit.orduniq = weboeordd.ORDUNIQ and weboeordd.ORDUNIQ != weboeordsubmit.orduniq
LEFT JOIN PPPLTD.dbo.WEBLOGINACCESS ON WEBLOGINACCESS.[USER] = 'brian'
LEFT JOIN PPPLTD.dbo.ARCUS ON ARCUS.IDCUST = WEBLOGINACCESS.CUSTID
where (allowinbc = 'Yes' or allowinab = 'Yes')
AND [PRICELIST] = ARCUS.PRICLIST
and [CURRENCY] = 'CDN' and DPRICETYPE = 1
and (itemgroup like '%' + #search + '%' or itemtype like '%' + #search + '%' or itembrand like '%' + #search + '%'
or subcat like '%' + #search + '%' or orderformdump.description like '%' + #search + '%' or basedescription like '%'+ #search + '%'
or orderformdump.ITEMNO like '%'+#search+'%' or UPC like '%'+#search+'%' or (select top 1 1 from pppltd.dbo.ICITEMO where OPTFIELD like 'UPC%' and VALUE like '%'+#search+'%'
and ITEMNO = pppltd.dbo.ORDERFORMDUMP.itemno) is not null)
group by ORDERFORMDUMP.ITEMNO,BASEDESCRIPTION,info,UPC,CAST(UNITPRICE AS DECIMAL(18,2)),caseqty
order by basedescription
When I execute the LINQ on LINQPad, it produces this SQL:
DECLARE #p0 NVarChar(1000) = '-'
DECLARE #p1 NVarChar(1000) = ''
DECLARE #p2 VarChar(1000) = 'brian'
DECLARE #p3 NVarChar(1000) = '-'
DECLARE #p4 NVarChar(1000) = ''
DECLARE #p5 VarChar(1000) = 'brian'
DECLARE #p6 VarChar(1000) = 'Yes'
DECLARE #p7 VarChar(1000) = 'Yes'
DECLARE #p8 VarChar(1000) = '%dude%'
DECLARE #p9 VarChar(1000) = '%dude%'
DECLARE #p10 VarChar(1000) = '%dude%'
DECLARE #p11 VarChar(1000) = '%dude%'
DECLARE #p12 VarChar(1000) = '%dude%'
DECLARE #p13 VarChar(1000) = '%dude%'
DECLARE #p14 VarChar(1000) = '%dude%'
DECLARE #p15 VarChar(1000) = '%dude%'
DECLARE #p16 VarChar(1000) = '%dude%'
DECLARE #p17 VarChar(1000) = '%dude%'
DECLARE #p18 VarChar(1000) = '%dude%'
DECLARE #p19 Int = 2
DECLARE #p20 Int = 10
DECLARE #p21 VarChar(1000) = '%dude%'
DECLARE #p22 VarChar(1000) = 'CDN'
DECLARE #p23 Int = 1
-- EndRegion
SELECT [t10].[ITEMNO], [t10].[BASEDESCRIPTION], [t10].[value] AS [INFO], [t10].[UPC], [t10].[value2] AS [UNITPRICE], [t10].[CASEQTY], [t10].[value3] AS [QTY]
FROM (
SELECT [t9].[ITEMNO], [t9].[BASEDESCRIPTION], [t9].[value], [t9].[UPC], [t9].[value2], [t9].[CASEQTY], [t9].[value3]
FROM (
SELECT [t0].[ITEMNO], [t0].[BASEDESCRIPTION], [t2].[INFO] AS [value], [t0].[UPC], [t1].[UNITPRICE] AS [value2], [t0].[CASEQTY], [t5].[QTY] AS [value3], [t0].[ALLOWINBC], [t0].[ALLOWINAB], [t0].[DESCRIPTION], [t0].[CATEGORY], [t0].[FOODACCSPEC], [t0].[ITEMBRAND], [t0].[ITEMGROUP], [t0].[ITEMSUBTYPE], [t0].[ITEMTYPE], [t1].[CURRENCY], [t1].[DPRICETYPE], [t5].[ORDUNIQ], [t4].[ORDUNIQ] AS [ORDUNIQ2], [t1].[PRICELIST], [t8].[PRICLIST]
FROM [ORDERFORMDUMP] AS [t0]
LEFT OUTER JOIN [ICPRICP] AS [t1] ON REPLACE([t0].[ITEMNO], #p0, #p1) = [t1].[ITEMNO]
LEFT OUTER JOIN [WEBITEMINFO] AS [t2] ON [t0].[ITEMNO] = [t2].[ITEMNO]
LEFT OUTER JOIN (
SELECT [t3].[ORDUNIQ]
FROM [WEBOEORDH] AS [t3]
WHERE #p2 = [t3].[USER]
) AS [t4] ON 1=1
LEFT OUTER JOIN [WEBOEORDD] AS [t5] ON (REPLACE([t0].[ITEMNO], #p3, #p4) = [t5].[ITEMNO]) AND ([t4].[ORDUNIQ] = [t5].[ORDUNIQ])
LEFT OUTER JOIN (
SELECT [t6].[CUSTID]
FROM [WEBLOGINACCESS] AS [t6]
WHERE #p5 = [t6].[USER]
) AS [t7] ON 1=1
LEFT OUTER JOIN [ARCUS] AS [t8] ON [t7].[CUSTID] = [t8].[IDCUST]
) AS [t9]
WHERE ([t9].[ALLOWINBC] = #p6) AND ([t9].[ALLOWINAB] = #p7) AND (([t9].[BASEDESCRIPTION] LIKE #p8) OR ([t9].[DESCRIPTION] LIKE #p9) OR ([t9].[CATEGORY] LIKE #p10) OR ([t9].[FOODACCSPEC] LIKE #p11) OR ([t9].[ITEMBRAND] LIKE #p12) OR ([t9].[ITEMGROUP] LIKE #p13) OR ([t9].[ITEMNO] LIKE #p14) OR ([t9].[ITEMSUBTYPE] LIKE #p15) OR ([t9].[ITEMTYPE] LIKE #p16) OR ([t9].[value] LIKE #p17) OR ([t9].[UPC] LIKE #p18) OR (SUBSTRING([t9].[UPC], #p19 + 1, #p20) LIKE #p21)) AND ([t9].[CURRENCY] = #p22) AND ([t9].[DPRICETYPE] = #p23) AND ([t9].[ORDUNIQ] <> [t9].[ORDUNIQ2]) AND ([t9].[PRICELIST] = [t9].[PRICLIST])
GROUP BY [t9].[ITEMNO], [t9].[BASEDESCRIPTION], [t9].[value], [t9].[UPC], [t9].[value2], [t9].[CASEQTY], [t9].[value3]
) AS [t10]
ORDER BY [t10].[BASEDESCRIPTION]
UPDATE
As per HBomb's answer, I decided to create a stored procedure with parameters instead of doing multiple joins:
CREATE PROCEDURE PRODUCT_PROCEDURE
#USERID VARCHAR(MAX)
AS
BEGIN
SELECT distinct datawarehouse.dbo.orderformdump.itemno, basedescription,info,upc,CAST((SELECT [UNITPRICE] FROM PPPLTD.dbo.[ICPRICP] WHERE [ITEMNO] = replace([DataWarehouse].[dbo].[ORDERFORMDUMP].[ITEMNO],'-','') AND [PRICELIST] = (select top 1 priclist from PPPLTD.dbo.ARCUS where IDCUST = (select top 1 CUSTID from PPPLTD.dbo.WEBLOGINACCESS where [USER] = #USERID)) and [CURRENCY] = 'CDN' and DPRICETYPE = 1) AS DECIMAL(18,2))as price,caseqty, qty AS userquantity FROM [DataWarehouse].[dbo].[ORDERFORMDUMP] LEFT JOIN pppltd.dbo.weboeordd ON pppltd.dbo.WEBOEORDD.ITEMNO = REPLACE(datawarehouse.dbo.ORDERFORMDUMP.ITEMNO,'-','') and orduniq not in (select orduniq from pppltd.dbo.weboeordsubmit) and WEBOEORDD.ORDUNIQ in (select orduniq from pppltd.dbo.weboeordh where [user] = #USERID) LEFT JOIN DATAWAREHOUSE.dbo.webiteminfo on webiteminfo.itemno = orderformdump.itemno where (allowinbc = 'Yes' or allowinab = 'Yes') order by BASEDESCRIPTION
END
Then I used Entity Framework's Database First approach to add my stored procedure and it has created a new DbContext with a method that sets the 'USERID' parameter in my stored procedure:
public partial class DataWarehouseEntities : DbContext
{
public DataWarehouseEntities()
: base("name=DataWarehouseEntities")
{
}
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
throw new UnintentionalCodeFirstException();
}
public virtual ObjectResult<PRODUCT_PROCEDURE_Result> PRODUCT_PROCEDURE(string USERID)
{
var USERIDParameter = USERID != null ?
new ObjectParameter("USERID", USERID) :
new ObjectParameter("USERID", typeof(string));
return ((IObjectContextAdapter)this).ObjectContext.ExecuteFunction<PRODUCT_PROCEDURE_Result>("PRODUCT_PROCEDURE", USERIDParameter);
}
}
I've also tried:
var USERIDParameter = USERID != null ? new SqlParameter("USERID", USERID) : new SqlParameter("USERID", typeof(string));
return ((IObjectContextAdapter)this).ObjectContext.ExecuteStoreQuery<PRODUCT_PROCEDURE‌​_Result>("PRODUCT_PROCEDURE #USERID", USERIDParameter);
and finally, I tried running a much more simplified LINQ query on the results of my stored procedure:
var products = db2.PRODUCT_PROCEDURE(username).Where
(item => item.basedescription.Contains(searchword)
|| item.info.Contains(searchword)
|| item.itemno.Contains(searchword)
|| item.itemno.Contains(searchword.Replace("-", ""))
|| item.upc.Contains(searchword));
However, now I'm getting a NullReferenceException because the query isn't returning any results.
UPDATE #2
Executing the stored procedure is not causing the NullReferenceException. the problem is the LINQ query.
I found out that when I have var products = db2.PRODUCT_PROCEDURE(username).ToList() alone, it returns results, but as soon as I try to add a where clause on it, it returns null.
SOLUTION
With HBomb's help, I solved this issue. First of all, instead creating multiple joins in LINQ, or create associations, and navigation properties, it's much easier to create a view or a stored procedure in your database then write a simple LINQ query using the results of that (example of how to do that is above).
I found out that I was getting my NullReferenceException because some values in the database for my info property were null. All I had to do to fix that issue was modify the stored procedure to change to info column to isnull(info,'') as info.
Lastly, just for better search results, I changed my query:
var searchWords = searchword.ToLower().Split(' ');
var products = db2.PRODUCT_PROCEDURE(username).ToList()
.Where
(item => item.basedescription.ToLower().Contains(searchWords[0])
|| item.info.ToLower().Contains(searchWords[0])
|| item.itemno.Contains(searchword)
|| item.itemno.Contains(searchword.Replace("-", ""))
|| item.upc.Contains(searchword)
|| (item.price.ToString() == searchword
&& item.price.ToString() != null));
if (searchWords.Length > 1)
{
for (int x = 0; x < searchWords.Length-1; x++)
{
products = products.Where(i => i.basedescription.ToLower().Contains(searchWords[x]) || i.info.ToLower().Contains(searchWords[x]));
}
}
Thank you.
If you wrap the Sql in a stored Procedure, you could just use EF to call the procedure. Here's one way:
var searchParameter = new SqlParameter("#search", mySearchValue);
this.Database.SqlQuery<YourEntityTypeForReturnVal>("MyProcedureName, #search", searchParameter);
You can that create a parameterized stored procedure as a wrapper for your Sql:
CREATE PROCEDURE MyProcedureName
#search varchar(50)
AS
BEGIN
-- Copy and paste your existing Sql Query here, minus the variable declaration
END
Sometimes the best (and hardest) thing we can do as developers is to take a step back and ask "should I be doing it this way" instead of asking "how do I do it this way."
I'm arriving after the battle, but could this be a job for QueryFirst? You can paste your SQL into a query template as is, but it stays in your application (easier source-control, versioning, maintenance). And the generated classes make it directly usable from your application code.

Entity Framework SUM CASE not optimized

I'm trying to write a simple SQL query in LinQ, and no matter how hard I try, I always get a complex query.
Here is the SQL I am trying to achieve (this is not what I'm getting):
SELECT
ClearingAccounts.ID,
SUM(CASE WHEN Payments.StatusID = 1 THEN Payments.TotalAmount ELSE 0 END) AS Sum1,
SUM(CASE WHEN DirectDebits.StatusID = 2 THEN DirectDebits.TotalAmount ELSE 0 END) AS Sum2,
SUM(CASE WHEN Payments.StatusID = 2 THEN Payments.TotalAmount ELSE 0 END) AS Sum3,
SUM(CASE WHEN DirectDebits.StatusID = 1 THEN DirectDebits.TotalAmount ELSE 0 END) AS Sum4
FROM ClearingAccounts
LEFT JOIN Payments ON Payments.ClearingAccountID = ClearingAccounts.ID
LEFT JOIN DirectDebits ON DirectDebits.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
Here is the code:
from clearingAccount in clearingAccounts
let payments = clearingAccount.Payments
let directDebits = clearingAccount.DirectDebits
select new
{
ID = clearingAccount.ID,
Sum1 = payments.Sum(p => p.StatusID == 1 ? p.TotalAmount : 0),
Sum2 = directDebits.Sum(p => p.StatusID == 2 ? p.TotalAmount : 0),
Sum3 = payments.Sum(p => p.StatusID == 2 ? p.TotalAmount : 0),
Sum4 = directDebits.Sum(p => p.StatusID == 1 ? p.TotalAmount : 0),
}
The generated query gets the data from the respective table for each sum, so four times. I'm not sure if it's even possible to optimize this?
EDIT Here the is generated query:
SELECT
[Project5].[ID] AS [ID],
[Project5].[C1] AS [C1],
[Project5].[C2] AS [C2],
[Project5].[C3] AS [C3],
[Project5].[C4] AS [C4]
FROM ( SELECT
[Project4].[ID] AS [ID],
[Project4].[C1] AS [C1],
[Project4].[C2] AS [C2],
[Project4].[C3] AS [C3],
(SELECT
SUM([Filter5].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (1 = [Extent5].[StatusID]) THEN [Extent5].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[DirectDebits] AS [Extent5]
WHERE [Project4].[ID] = [Extent5].[ClearingAccountID]
) AS [Filter5]) AS [C4]
FROM ( SELECT
[Project3].[ID] AS [ID],
[Project3].[C1] AS [C1],
[Project3].[C2] AS [C2],
(SELECT
SUM([Filter4].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (2 = [Extent4].[StatusID]) THEN [Extent4].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[Payments] AS [Extent4]
WHERE [Project3].[ID] = [Extent4].[ClearingAccountID]
) AS [Filter4]) AS [C3]
FROM ( SELECT
[Project2].[ID] AS [ID],
[Project2].[C1] AS [C1],
(SELECT
SUM([Filter3].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (2 = [Extent3].[StatusID]) THEN [Extent3].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[DirectDebits] AS [Extent3]
WHERE [Project2].[ID] = [Extent3].[ClearingAccountID]
) AS [Filter3]) AS [C2]
FROM ( SELECT
[Project1].[ID] AS [ID],
(SELECT
SUM([Filter2].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (1 = [Extent2].[StatusID]) THEN [Extent2].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[Payments] AS [Extent2]
WHERE [Project1].[ID] = [Extent2].[ClearingAccountID]
) AS [Filter2]) AS [C1]
FROM ( SELECT
[Extent1].[ID] AS [ID]
FROM [dbo].[ClearingAccounts] AS [Extent1]
WHERE ([Extent1].[CustomerID] = 3) AND ([Extent1].[Deleted] <> 1)
) AS [Project1]
) AS [Project2]
) AS [Project3]
) AS [Project4]
) AS [Project5]
Edit
Note that as per #usr's comment, that your original Sql Query is broken. By LEFT OUTER joining on two independent tables, and then grouping on the common join key, as soon as one of the DirectDebits or Payments tables returns more than one row, you will erroneously duplicate the TotalAmount value in the 'other' SUMmed colums (and vice versa). e.g. If a given ClearingAccount has 3 DirectDebits and 4 Payments, you will get a total of 12 rows (whereas you should be summing 3 and 4 rows independently for the two tables). A better Sql Query would be:
WITH ctePayments AS
(
SELECT
ClearingAccounts.ID,
-- Note the ELSE 0 projection isn't required as nulls are eliminated from aggregates
SUM(CASE WHEN Payments.StatusID = 1 THEN Payments.TotalAmount END) AS Sum1,
SUM(CASE WHEN Payments.StatusID = 2 THEN Payments.TotalAmount END) AS Sum3
FROM ClearingAccounts
INNER JOIN Payments ON Payments.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
),
cteDirectDebits AS
(
SELECT
ClearingAccounts.ID,
SUM(CASE WHEN DirectDebits.StatusID = 2 THEN DirectDebits.TotalAmount END) AS Sum2,
SUM(CASE WHEN DirectDebits.StatusID = 1 THEN DirectDebits.TotalAmount END) AS Sum4
FROM ClearingAccounts
INNER JOIN DirectDebits ON DirectDebits.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
)
SELECT ca.ID, COALESCE(p.Sum1, 0) AS Sum1, COALESCE(d.Sum2, 0) AS Sum2,
COALESCE(p.Sum3, 0) AS Sum3, COALESCE(d.Sum4, 0) AS Sum4
FROM
ClearingAccounts ca
LEFT OUTER JOIN ctePayments p
ON ca.ID = p.ID
LEFT OUTER JOIN cteDirectDebits d
ON ca.ID = d.ID;
-- GROUP BY not required, since we have already guaranteed at most one row
-- per joined table in the CTE's, assuming ClearingAccounts.ID is unique;
You'll want to fix and test this with test cases before you even contemplate conversion to LINQ.
Old Answer(s)
The Sql construct:
SELECT SUM(CASE WHEN ... THEN 1 ELSE 0 END) AS Something
when applied in a SELECT list, is a common hack 'alternative' to pivot data from the 'greater' select into columns which meet the projection criteria (and hence the zero if not matched) . It isn't really a sum at all, its a 'matched' count.
With regards to optimizing the Sql generated, another alternative would be to materialize the data after joining and grouping (and of course, if there is a predicate WHERE clause, apply that in Sql too via IQueryable), and then do the conditional summation in memory:
var result2 = Db.ClearingAccounts
.Include(c => c.Payments)
.Include(c => c.DirectDebits)
.GroupBy(c => c.Id)
.ToList() // or any other means to force materialization here.
.ToDictionary(
grp => grp.Key,
grp => new
{
PaymentsByStatus = grp.SelectMany(x => x.Payments)
.GroupBy(p => p.StatusId),
DirectDebitByStatus = grp.SelectMany(x => x.Payments)
.GroupBy(p => p.StatusId),
})
.Select(ca => new
{
ID = ca.Key,
Sum1 = ca.Value.PaymentsByStatus.Where(pbs => pbs.Key == 1)
.Select(pbs => pbs.Select(x => x.TotalAmount).Sum()),
Sum2 = ca.Value.DirectDebitByStatus.Where(pbs => pbs.Key == 2)
.Select(ddbs => ddbs.Select(x => x.TotalAmount).Sum()),
Sum3 = ca.Value.PaymentsByStatus.Where(pbs => pbs.Key == 2)
.Select(pbs => pbs.Select(x => x.TotalAmount).Sum()),
Sum4 = ca.Value.DirectDebitByStatus.Where(pbs => pbs.Key == 1)
.Select(ddbs => ddbs.Select(x => x.TotalAmount).Sum())
});
However, personally, I would leave this pivot projection directly in Sql, and then use something like SqlQuery to then deserialize the result back from Sql
directly into the final Entity type.
1)
Add AsNoTracking in EF to avoid tracking changes.
Check that you have indexes on the columns you are using for the JOINs. Especially the column that you are using to group by. Profile the query and optimize it. EF has also overhead over a stored procedure.
or
2) If you cannot find a way to make it as fast as you need, create a stored procedure and call it from EF. Even the same query will be faster.

What is the best way to convert this to SQL

I know for the Database Guru's here this should be a doddle. I have a Field in my database in the format of ' A/B/C/D/E/F '
The format is irrelevant I generally need the last two parts so for the above it would be
'EF'
But if I had another string
AB/CD/EF/GH == EFGH
And I am looking to getting the last two parts to return like this 'EFGH'
Does anyone know an SQL Function I can do that will split this
I am using Microsoft SQL Server 2012 - I Hope this helps,
Here is C# Code.
var myText = "A/B/C/D/E/F";
var identificationArray = myText.Split('/');
if(identificationArray.Length >= 2)
{
var friendlyId = identificationArray[identificationArray.Length - 2] + identificationArray[identificationArray.Length - 1];
return friendlyId;
}
return "";
Here is one answer that searches a string in reverse order for the second forward slash and returns that substring with forward slashes removed:
declare #s varchar(20)
set #s = 'A/B/C/D/E/F'
-- result: 'EF'
select reverse(replace(left(reverse(#s), charindex('/', reverse(#s), charindex('/', reverse(#s)) + 1)), '/', ''))
set #s = 'AB/CD/EF/GH'
-- result: 'EFGH'
select reverse(replace(left(reverse(#s), charindex('/', reverse(#s), charindex('/', reverse(#s)) + 1)), '/', ''))
Testing this with a couple of other inputs:
set #s = '/AB/CD' -- result: 'ABCD'
set #s = 'AB/CD' -- result: an empty string '' -- you may not want this result
set #s = 'AB' -- result: an empty string ''
Here is a ridiculously complicated way to do the same thing with a series of common table expressions (CTEs). Credit goes to Itzik Ben-Gan for the CTE technique to generate a tally table using cross-joins:
declare #s varchar(50)
set #s = 'A/B/C/D/E/F/G'
--set #s = 'AB/CD/EF/GH'
--set #s = 'AB/CD'
--set #s = 'ABCD/EFGH/IJKL'
--set #s = 'A/B'
-- set #s = 'A'
declare #result varchar(50)
set #result = ''
;with
-- cross-join a meaningless set of data together to create a lot of rows
Nbrs_2 (n) AS (SELECT 1 UNION SELECT 0 ),
Nbrs_4 (n) AS (SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2),
Nbrs_16 (n) AS (SELECT 1 FROM Nbrs_4 n1 CROSS JOIN Nbrs_4 n2),
Nbrs_256 (n) AS (SELECT 1 FROM Nbrs_16 n1 CROSS JOIN Nbrs_16 n2),
Nbrs_65536(n) AS (SELECT 1 FROM Nbrs_256 n1 CROSS JOIN Nbrs_256 n2),
Nbrs (n) AS (SELECT 1 FROM Nbrs_65536 n1 CROSS JOIN Nbrs_65536 n2),
-- build a table of numbers from the data above; this is insanely fast
nums(n) as
(
select row_number() over(order by n) from Nbrs
),
-- split the string into separate rows per letter
letters(n, c) as
(
select n, substring(#s, n, 1)
from nums
where n < len(#s) + 1
),
-- count the slashes from the rows in descending order
-- the important slash is the second one from the end
slashes(n, num) as
(
select n, ROW_NUMBER() over (order by n desc)
from letters
where c = '/'
)
select #result = #result + c
from letters
where n > (select n from slashes where num = 2) -- get everything after the second slash
and c <> '/' -- and drop out the other slash
select #result
You need to reverse the string and find the 2nd occurrence of the / character. Once you have that it is pretty straight forward, just a lot of function calls to get the desired format
declare #test varchar(max);
set #test = 'b/b/a/v/d';
select
case
when charindex('/', reverse(#test), charindex('/', reverse(#test))+1) = 0 then ''
else replace(reverse(substring(reverse(#test), 0, charindex('/', reverse(#test), charindex('/', reverse(#test))+1))), '/', '')
end
I understand that you want to do this in SQL. But did you think about using SQL CLR User Defined Functions? It will execute faster than SQL. you anyways have the logic implemented in C# which definitely simpler than the logic in SQL.
Late to the party, but here is my attempt:
declare #text varchar(max), #reversedtext varchar(max)
select #text = 'AB/CD/EF/GH'
select #reversedtext = reverse(#text)
declare #pos1 int
declare #pos2 int
declare #pos3 int
select #pos1 = charindex('/', #reversedtext)
select #pos2 = charindex('/', replace(#reversedtext, left(#reversedtext, #pos1), ''))
select #pos3 = #pos1 + #pos2
select REPLACE(RIGHT(#text, #pos3), '/', '')

What's the difference between these two LINQtoSQL statements?

These two statements look the same logically to me, but they're resulting in different SQL being generated:
#1
var people = _DB.People.Where(p => p.Status == MyPersonEnum.STUDENT.ToString());
var ids = people.Select(p => p.Id);
var cars = _DB.Cars.Where(c => ids.Contains(c.PersonId));
#2
string s = MyPersonEnum.STUDENT.ToString();
var people = _DB.People.Where(p => p.Status == s);
var ids = people.Select(p => p.Id);
var cars = _DB.Cars.Where(c => ids.Contains(c.PersonId));
Example #1 doesn't work, but example #2 does.
The generated SQL for the var people query is identical for both, but the SQL in the final query differs like this:
#1
SELECT [t0].[PersonId], [t0].[etc].....
FROM [Cars] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [People] AS [t1]
WHERE ([t1].[Id] = [t0].[PersonId]) AND ([t1].[Status] = (CONVERT(NVarChar,#p0)))
)
#2
SELECT [t0].[PersonId], [t0].[etc].....
FROM [Cars] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [People] AS [t1]
WHERE ([t1].[Id] = [t0].[PersonId]) AND ([t1].[Status] = #p0)
)
Why is there this difference?
Edit:
Up until now all I've done to get the SQL generated is to inspect the queryable in the debugger. However, after setting up a logger as Jon suggested, it seems that the real sql executed is different.
#1
SELECT [t1].[Id], [t1].etc ... [t0].Id, [t1].etc ...
FROM [Cars] AS [t0], [People] AS [t1]
WHERE ([t1].[Id] = [t0].[PersonId]) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM [People] AS [t2]
WHERE ([t2].[Id] = [t0].[PersonId]) AND ([t2].[Status] = (CONVERT(NVarChar,#p0)))
)) AND ([t1].[Status] = #p1)
-- #p0: Input Int (Size = 0; Prec = 0; Scale = 0) [2]
-- #p1: Input NVarChar (Size = 7; Prec = 0; Scale = 0) [STUDENT]
#2
SELECT [t1].[Id], [t1].etc ... [t0].Id, [t1].etc ...
FROM [Cars] AS [t0], [People] AS [t1]
WHERE ([t1].[Id] = [t0].[PersonId]) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM [People] AS [t2]
WHERE ([t2].[Id] = [t0].[PersonId]) AND ([t2].[Status] = #p0)
)) AND ([t1].[Status] = #p1)
-- #p0: Input NVarChar (Size = 7; Prec = 0; Scale = 0) [STUDENT]
-- #p1: Input NVarChar (Size = 7; Prec = 0; Scale = 0) [STUDENT]
First, think of dual nature of e Enum:
enum MyPersonEnum
{
STUDENT, // implicit 1
TEACHER, // implicit 2
DIRECTOR = 10 // explicit 10
}
...
Assert.AreEqual(1, (int)MyPersonEnum.STUDENT);
Assert.AreEqual("STUDENT", MyPersonEnum.STUDENT.ToString());
In the second example, C# have converted Enum to string, so no conversion needed, and it's assumed that your database People.Status column accepts "STUDENT", "TEACHER", "DIRECTOR" strings as valid values in the logic.
The difference is, enum internal representation in CLR is integer, and the first example, #p parameter is passed as an integer, it's an L2S query builder behaviour, that's why the conversion.
The first one would work, if your database column was an int that takes values assigned to the Enum members {1,2,10} in my example.
No, they're different. In the first version, the expression MyPersonEnum.STUDENT.ToString() is within the expression tree - it's part of what LINQ to SQL has to convert into SQL. I'd be interested to see what #p0 is when the query is executed...
In the second version, you've already evaluated the expression, so LINQ to SQL just sees a reference to a variable which is already a string.
We know that they mean the same thing, but presumably LINQ to SQL doesn't have quite enough knowledge to understand that.
Out of interest, do both of them work?
EDIT: Okay, so the second version works. I suggest you use that form then :) In an ideal world, both would work - but in this case it seems you need to help LINQ to SQL a bit.

Categories

Resources