C# SQL to Linq - translation - c#

I have a table with transactions, similar to:
--REQUEST_ID----ITEM_ID----ITEM_STATUS_CD----EXECUTION_DTTM
1 1 1 2016-08-29 12:36:07.000
1 2 0 2016-08-29 12:37:07.000
2 3 5 2016-08-29 13:37:07.000
2 4 1 2016-08-29 15:37:07.000
2 5 10 2016-08-29 15:41:07.000
3 6 0 2016-08-29 15:41:07.000
What i want is at table showing how many success/warning/Error in % with endtime of the latest transaction in the Request_ID:
--REQUEST_ID--Transactions----------EndTime----------Success----Warning----Error
1 2 2016-08-29 12:37:07.000 50 50 0
2 3 2016-08-29 15:41:07.000 0 33 66
3 1 2016-08-29 15:41:07.000 100 0 0
I have the table that I want by the following slq, but I dont know how to do it in linq(C#)....Anyone?
SELECT distinct t1.[REQUEST_ID],
t2.Transactions,
t2.EndTime,
t2.Success,
t2.Warning,
t2.Error
FROM [dbo].[jp_R_ITEM] t1 inner join(
select top(100) max([EXECUTION_DTTM]) EndTime, REQUEST_ID,
count([ITEM_ID]) as Transactions,
coalesce(count(case when [ITEM_STATUS_CD] = 0 then 1 end), 0) * 100 / count([ITEM_ID]) as Success,
coalesce(count(case when [ITEM_STATUS_CD] = 1 then 1 end), 0) * 100 / count([ITEM_ID]) as Warning,
coalesce(count(case when [ITEM_STATUS_CD] > 1 then 1 end), 0) * 100 / count([ITEM_ID]) as Error
from [dbo].[jp_R_ITEM] group by REQUEST_ID order by REQUEST_ID desc) t2 on t1.[REQUEST_ID] = t2.REQUEST_ID and t1.[EXECUTION_DTTM] = t2.EndTime

So from all your Transactions with RequestId 1, you want to make one element. This one element should have the RequestId, which in this case is 1, it should have the latest value of all ExecutionDttms or transactions with RequestId, and finally, from all those transaction you want a percentage of successes, warnings and errors
You want something similar for the Transactions with RequestId 2, and for the Transactions with RequestId 3, etc.
Whenever you see something like: "I want to group all items from a sequence into one object" you should immediately think of GroupBy.
This one object might be a very complex object, a List, a Dictionary, or an object of a class with a lot of properties
So let's first make groups of Transactions that have the same RequestId:
var groupsWithSameRequestId = Transactions
.GroupBy(transaction => transaction.RequestId);
Every group has a Key, which is the RequestId of all elements in the Group. Every group is (not has) the sequence of all Transaction that have this RequestId value.
You want to transform every group into one result element. Every result element
has a property RequestId and the number of transactions with this RequestId.
The RequestId is the Key of the group, the TransactionCount is of course the number of elements in the group
var result = groupsWithSameRequestId.Select(group => new
{
RequestId = group.Key,
TransactionCount = group.Count(),
...
};
Those were the easiest ones.
Endtime is the maximum value of all ExecutionDttm in your group.
var result = groupsWithSameRequestId.Select(group => new
{
RequestId = group.Key,
TransactionCount = group.Count(),
EndTime = group.Select(groupElement => groupElement.ExecutionDttm).Max(),
...
};
It might be that your data query translator does not allow to Max on Datetime. In that case: order descending and take the first:
EndTime = group.Select(groupElement => groupElement.ExecutionDttm)
.OrderByDescenting(date => date)
.First(),
First() is enough, FirstOrDefault() is not needed, we know there are no groups without any elements
We have save the most difficult / fun part for the last. You want the percentage of success / warning / errors, which is the number of elements with ItemStatus 0 / 1 / more.
Success = 100 * group
.Where(groupElement => groupElement.ItemStatus == 0).Count()
/group.Count(),
Warning = 100 * group
.Where(groupElement => groupElement.ItemStatus == 1).Count()
/group.Count(),
Error = 100 * group
.Where(groupElement => groupElement.ItemStatus > 1).Count()
/group.Count(),
It depends a bit on how smart your IQueryable / DbContext is. But at first glance it seems that Count() is called quite often. Introducing an extra Select will prevent this.
So combining this all into one LINQ statement:
var result = Transactions
.GroupBy(transaction => transaction.RequestId)
.Select(group => new
{
RequestId = group.Key
GroupCount = group.Count(),
SuccessCount = group
.Where(groupElement => groupElement.ItemStatus == 0).Count(),
WarningCount = group
.Where(groupElement => groupElement.ItemStatus == 1).Count(),
ErrorCount = group
.Where(groupElement => groupElement.ItemStatus > 1).Count(),
EndTime = group
.Select(groupElement => groupElement.ExecutionDttm)
.Max(),
})
.Select(item => new
{
RequestId = item.RequestId,
TransactionCount = item.GroupCount,
EndTime = item.EndTime,
SuccessCount = 100.0 * item.SuccesCount / item.GroupCount,
WarningCount = 100.0 * item.WarningCount / item.GroupCount,
ErrorCount = 100.0 * item.ErrorCount / item.GroupCount,
}

var query = (from t1 in lst
join t2 in (from b in lst
group b by b.REQUEST_ID into grp
select new
{
EndTime = (from g1 in grp select g1.EXECUTION_DTTM).Max(),
REQUEST_ID = grp.Key,
Transactions = grp.Count(),
Success = ((from g2 in grp select g2.ITEM_STATUS_CD).Count(x => x == 0)) * 100 / grp.Count(),
Warning = ((from g3 in grp select g3.ITEM_STATUS_CD).Count(x => x == 1)) * 100 / grp.Count(),
Error = ((from g4 in grp select g4.ITEM_STATUS_CD).Count(x => x > 1)) * 100 / grp.Count(),
}).OrderByDescending(x => x.REQUEST_ID).Take(100)
on new { RID = t1.REQUEST_ID, EXDT = t1.EXECUTION_DTTM } equals new { RID = t2.REQUEST_ID, EXDT = t2.EndTime }
select new
{
t1.REQUEST_ID,
t2.Transactions,
t2.EndTime,
t2.Success,
t2.Warning,
t2.Error
}).Distinct().ToList();

Related

SQL from entity core taking 40x longer than raw SQL

This is the SQL generated by entity framework (Query insight https://i.imgur.com/spanWyN.png)
SELECT
t."Id" AS "Id1",
t."PositionId" AS "PositionId1",
t."OddsDecimal" AS "OddsDecimal1",
t0."Id" AS "Id2",
t0."PositionId" AS "PositionId2",
t0."OddsDecimal" AS "OddsDecimal2"
FROM (
SELECT
f."Id",
f."EventKey",
f."BetCategory",
f."CustomLine",
f."CustomParticipant",
f."PositionId",
f."OddsDecimal",
f."BetPosition"
FROM
"FanduelPositions" AS f
WHERE
f."LastUpdated" > (now() + INTERVAL $1)
UNION ALL
SELECT
d."Id",
d."EventKey",
d."BetCategory",
d."CustomLine",
d."CustomParticipant",
d."PositionId",
d."OddsDecimal",
d."BetPosition"
FROM
"DraftkingsPositions" AS d
WHERE
d."LastUpdated" > (now() + INTERVAL $2) ) AS t
INNER JOIN (
SELECT
f0."Id",
f0."EventKey",
f0."BetCategory",
f0."CustomLine",
f0."CustomParticipant",
f0."PositionId",
f0."OddsDecimal",
f0."BetPosition"
FROM
"FanduelPositions" AS f0
WHERE
f0."LastUpdated" > (now() + INTERVAL $3)
UNION ALL
SELECT
d0."Id",
d0."EventKey",
d0."BetCategory",
d0."CustomLine",
d0."CustomParticipant",
d0."PositionId",
d0."OddsDecimal",
d0."BetPosition"
FROM
"DraftkingsPositions" AS d0
WHERE
d0."LastUpdated" > (now() + INTERVAL $4) ) AS t0
ON
(((t."EventKey" = t0."EventKey")
AND (t."CustomParticipant" = t0."CustomParticipant"))
AND (t."BetCategory" = t0."BetCategory"))
AND (t."CustomLine" = t0."CustomLine")
WHERE
(t."BetPosition" <> t0."BetPosition")
AND ((($5 / t."OddsDecimal") + ($6 / t0."OddsDecimal")) < $7)
This is the raw SQL I'm executing (query insight https://i.imgur.com/o2Z0Sl6.png)
WITH
tt AS (
SELECT
"Id",
"EventKey",
"BetCategory",
"BetPosition",
"OddsDecimal",
"PositionId",
"CustomLine",
"CustomParticipant"
FROM
public."FanduelPositions" fd
WHERE
"LastUpdated" > CURRENT_TIMESTAMP - INTERVAL $1 second
UNION ALL
SELECT
"Id",
"EventKey",
"BetCategory",
"BetPosition",
"OddsDecimal",
"PositionId",
"CustomLine",
"CustomParticipant"
FROM
public."DraftkingsPositions" dk
WHERE
"LastUpdated" > CURRENT_TIMESTAMP - INTERVAL $2 second )
SELECT
t1."PositionId",
t1."OddsDecimal",
t2."PositionId",
t2."OddsDecimal"
FROM
tt t1
JOIN
tt t2
ON
t1."EventKey" = t2."EventKey"
AND t1."BetCategory" = t2."BetCategory"
AND t1."CustomParticipant" = t2."CustomParticipant"
AND t1."CustomLine" = t2."CustomLine"
AND t1."BetPosition" <> t2."BetPosition"
AND $3 / t1."OddsDecimal" + $4 / t2."OddsDecimal" < $5
The raw SQL takes ~200ms while the SQL generated by entity framework takes over 15seconds.
Here is my entity framework code -
var fd = dbContext.FanduelPositions.Where(x => x.LastUpdated > DateTime.UtcNow.AddSeconds(-15)).Select(x => new { x.Id, x.EventKey, x.BetCategory, x.CustomLine, x.CustomParticipant, x.PositionId, x.OddsDecimal, x.BetPosition });
var dk = dbContext.DraftkingsPositions.Where(x => x.LastUpdated > DateTime.UtcNow.AddSeconds(-15)).Select(x => new { x.Id, x.EventKey, x.BetCategory, x.CustomLine, x.CustomParticipant, x.PositionId, x.OddsDecimal, x.BetPosition });
var baseTable = fd.Concat(dk);
var positions = (from x in baseTable
join y in baseTable
on new { x.EventKey, x.CustomParticipant, x.BetCategory, x.CustomLine } equals new { y.EventKey, y.CustomParticipant, y.BetCategory, y.CustomLine }
where
x.BetPosition != y.BetPosition && (1 / x.OddsDecimal + 1 / y.OddsDecimal) < .985
select new
{
Id1 = x.Id,
PositionId1 = x.PositionId,
OddsDecimal1 = x.OddsDecimal,
Id2 = y.Id,
PositionId2 = y.PositionId,
OddsDecimal2 = y.OddsDecimal
});
Is it possible to move the where clause to the on clause somehow? Is it possible to force entity framework to use a with statement? It is causing 2 full tablescans of each table. I am not quiet sure how these 2 things are causing a 40x increase to the query time (even executing the entity framework SQL independently of .net) but I have added the query insights for these 2 queries as some possible information, I am not sure how to digest that information though.
See if following is faster
var groups = baseTable.GroupBy(x => new { y.EventKey, y.CustomParticipant, y.BetCategory, y.CustomLine });
var positions = groups.SelectMany(x => x.SelectMany(y => x.Where(z => (y.BetPosition != z=BetPosition) && (1 / y.OddsDecimal + 1 / z.OddsDecimal) < .985).Select(z => new
{
Id1 = y.Id,
PositionId1 = y.PositionId,
OddsDecimal1 = y.OddsDecimal,
Id2 = z.Id,
PositionId2 = z.PositionId,
OddsDecimal2 = z.OddsDecimal
}));

Return array from Linq SQL query on 80 million row table using EFCore

I'm querying a SQL database using Linq and Entity Framework Core in a Razorpages project to generate a residency plot, here's one that I made earlier.
I'm struggling to optimise this query despite many attempts and iterations, it is slow and it often times out. I need the final array of the Count() values that make up each square of the residency plot and am not interested in the raw data.
The data are from a table with ~80 million rows and I've found solutions from SO which might work with fewer entries but that aren't suitable in this use case (generally searching Linq, group, join). I think the problem is the combination of filters, groups and joins followed by a count occurring server-side without first downloading the raw data.
Reviewing the SQL command in SSMS (pulled from LINQPad) it is very poorly optimised - I can post this if it would be useful but it's 236 lines long made up of repeated sections.
The Linq I've hobbled together performs the required operation in the 4 steps outlined here.
Step 1 (rows between a certain time, with a certain LocationTypeId, and a channelId = engSpeed):
var speedRows = context.TestData
.Where(a => a.Time >= start
&& a.Time < end
&& a.LocationTypeId == 3
&& a.channelId == 7)
.Select(s => new
{
s.Time,
s.ChannelValue
})
.Distinct();
Step 2 (rows with a channelId = torque):
var torqueRows = context.TestData
.Where(a => a.LocationTypeId == 3
&& a.channelId == 8)
.Select(s => new
{
s.Time,
s.ChannelValue
})
.Distinct();
Step 3 (join the speed and torque rows from Step 1 and Step 2 on Time):
var joinedRows = speedRows.Join(torqueRows, arg => arg.Time, arg => arg.Time,
(speed, torque) => new
{
Id = speed.Time,
Speed = Convert.ToDouble(speed.ChannelValue),
Torque = Convert.ToInt16(torque.ChannelValue)
});
Step 4 (create the dynamic groupings using the joined table from Step 3):
var response = (from a in joinedRows
group a by (a.Torque / 100) into torqueGroup
orderby torqueGroup.Key
select new
{
TorqueBracket = $"{100 * torqueGroup.Key} <> {100 + (100 * torqueGroup.Key)}",
TorqueMin = 100 * torqueGroup.Key,
TorqueMax = 100 + (100 * torqueGroup.Key),
Speeds = (from d in torqueGroup
group d by (Math.Floor((d.Speed) / 500)) into speedGroup
orderby speedGroup.Key
select new
{
SpeedBracket = $"{500 * speedGroup.Key} <> {500 + (500 * speedGroup.Key)}",
SpeedMin = 500 * (int)speedGroup.Key,
SpeedMax = 500 + (500 * (int)speedGroup.Key),
Minutes = speedGroup.Count()
})
}).ToList();
I could be missing something obvious but I've tried many attempts and this is the best I've got.
The TestData class:
public partial class TestData {
public int LiveDataId { get; set; }
public DateTime? Time { get; set; }
public int? LocationTypeId { get; set; }
public int? TestNo { get; set; }
public int? LogNo { get; set; }
public int? LiveDataChannelId { get; set; }
public decimal? ChannelValue { get; set; }
public virtual LiveDataChannelNames LiveDataChannel { get; set; }
public virtual LocationType LocationType { get; set; }
}
Any help or pointers would be appreciated.
Thank you.
I doubt the actual generated SQL command is so big - you've probably are checking the SQL command generated by EF6.
The generated SQL by EF Core is not so big, but the problem is that the Speeds = ... part of the GroupBy cannot be translated to SQL, and is evaluated client side after retrieving all the data from the previous parts of the query.
What you can do is the create intermediate query which retries only the data needed (2 groping keys + count) and the do the rest client side.
First you need to make sure that the subqueries from Step 1, 2 and 3 are translatable to SQL. Convert.ToDouble and Convert.ToInt16 are not translatable, so replace them with casts:
Speed = (double)speed.ChannelValue,
Torque = (short)torque.ChannelValue
Then split the Step4 on two parts. The server part:
var groupedData = joinedRows
.GroupBy(arg => new { TorqueGroupKey = arg.Torque / 100, SpeedGroupKey = Math.Floor((arg.Speed) / 500) })
.Select(g => new
{
g.Key.TorqueGroupKey,
g.Key.SpeedGroupKey,
Minutes = g.Count()
});
and the client part:
var response = (from a in groupedData.AsEnumerable() // <-- swicth to client evaluation
group a by a.TorqueGroupKey into torqueGroup
orderby torqueGroup.Key
select new
{
TorqueBracket = $"{100 * torqueGroup.Key} <> {100 + (100 * torqueGroup.Key)}",
TorqueMin = 100 * torqueGroup.Key,
TorqueMax = 100 + (100 * torqueGroup.Key),
Speeds = (from d in torqueGroup
orderby d.SpeedGroupKey
select new
{
SpeedBracket = $"{500 * d.SpeedGroupKey} <> {500 + (500 * d.SpeedGroupKey)}",
SpeedMin = 500 * (int)d.SpeedGroupKey,
SpeedMax = 500 + (500 * (int)d.SpeedGroupKey),
Minutes = d.Minutes
})
}).ToList();
Note that in EF Core 3.0+ you'll be forced to do something like this because implicit client evaluation has been removed.
The generated SQL query now should be something like this:
SELECT [t].[ChannelValue] / 100 AS [TorqueGroupKey], FLOOR([t].[ChannelValue] / 500.0E0) AS [SpeedGroupKey], COUNT(*) AS [Minutes]
FROM (
SELECT DISTINCT [a].[Time], [a].[ChannelValue]
FROM [TestData] AS [a]
WHERE ((([a].[Time] >= #__start_0) AND ([a].[Time] < #__end_1)) AND ([a].[LocationTypeId] = 3)) AND ([a].[LiveDataChannelId] = 7)
) AS [t]
INNER JOIN (
SELECT DISTINCT [a0].[Time], [a0].[ChannelValue]
FROM [TestData] AS [a0]
WHERE ([a0].[LocationTypeId] = 3) AND ([a0].[LiveDataChannelId] = 8)
) AS [t0] ON [t].[Time] = [t0].[Time]
GROUP BY [t].[ChannelValue] / 100, FLOOR([t].[ChannelValue] / 500.0E0)
Although the answers posted here were helpful in making me understand the intricacies of the problem (thank you to all that posted) they did not work with Entity Framework Core version 2.2.6. I've managed to get reasonable and stable performance with the C# code below.
Crucially in steps 1 and 2 the .ToList() stops timeouts on longer queries by (I assume) dividing the enumeration of results, possibly with a small time penalty. Also the (double) and (short) conversions are performed server-side natively as opposed to Convert.ToDouble and Convert.ToInt16 respectively.
Step 1 (rows between a certain time, with a certain LocationTypeId, and a channelId = engSpeed):
var speedRows = context.TestData
.Where(a => a.Time >= start
&& a.Time < end
&& a.LocationTypeId == 3
&& a.channelId 7)
.Select(s => new
{
s.Time,
ChannelValue = (double)s.ChannelValue
})
.Distinct().ToList();
Step 2 (rows with a channelId = torque):
var torqueRows = context.TestData
.Where(a => a.LocationTypeId == 3
&& a.channelId == 8)
.Select(s => new
{
s.Time,
ChannelValue = (short)s.ChannelValue
})
.Distinct().ToList();
Step 3 (join the speed and torque rows from Step 1 and Step 2 on Time):
var joinedRows = speedRows.Join(torqueRows, arg => arg.Time, arg => arg.Time,
(speed, torque) => new
{
Id = speed.Time,
Speed = speed.ChannelValue,
Torque = torque.ChannelValue
});
Step 4 (group the rows into keyed groups)
var groupedData = joinedRows
.GroupBy(arg => new { TorqueGroupKey = (arg.Torque / 100), SpeedGroupKey = Math.Floor((arg.Speed) / 500) })
.Select(g => new
{
g.Key.TorqueGroupKey,
g.Key.SpeedGroupKey,
Minutes = g.Count()
});
Step 5 (create the dynamic groupings using groupedData from Step 4):
var response = (from a in groupedData.AsEnumerable()
group a by a.TorqueGroupKey into torqueGroup
orderby torqueGroup.Key
select new ResidencySqlResult
{
TorqueBracket = $"{100 * torqueGroup.Key} <> {100 + (100 * torqueGroup.Key)}",
TorqueMin = 100 * torqueGroup.Key,
TorqueMax = 100 + (100 * torqueGroup.Key),
Speeds = (from d in torqueGroup
orderby d.SpeedGroupKey
select new Speeds
{
SpeedBracket = $"{500 * d.SpeedGroupKey} <> {500 + (500 * d.SpeedGroupKey)}",
SpeedMin = 500 * (int)d.SpeedGroupKey,
SpeedMax = 500 + (500 * (int)d.SpeedGroupKey),
Minutes = d.Minutes
})
}).ToList();
Thank you again to all those that helped out.

How to perform operation on grouped records?

This is my records:
Id EmpName Stats
1 Abc 1000
1 Abc 3000
1 Abc 2000
2 Pqr 4000
2 Pqr 5000
2 Pqr 6000
2 Pqr 7000
I am trying to group by on Id fields and after doing group by i want output like this:
Expected output:
Id EmpName Stats
1 Abc 3000
2 Pqr 3000
For 1st output record calculation is like this:
3000 - 1000=2000 (i.e subtract highest - lowest from 1st and 2nd records)
3000 - 2000=1000 (i.e subtract highest - lowest from 2nd and 3rd records)
Total=2000 + 1000 =3000
For 2nd output record calculation is like this:
5000 - 4000=1000 (i.e subtract highest - lowest from first two records)
6000 - 5000=1000
7000 - 6000=1000
total=1000 + 1000=2000
This is 1 sample fiddle i have created:Fiddle
So far i have manage to group records by id but now how do i perform this calculation on group records??
You can use the Aggregate method overload that allows you to maintain custom accumulator state.
In your case, we'll be maintaining the following:
decimal Sum; // Current result
decimal Prev; // Previous element Stats (zero for the first element)
int Index; // The index of the current element
The Index is basically needed just to avoid accumulating the first element Stats into the result.
And here is the query:
var result = list.GroupBy(t => t.Id)
.Select(g => new
{
ID = g.Key,
Name = g.First().EmpName,
Stats = g.Aggregate(
new { Sum = 0m, Prev = 0m, Index = 0 },
(a, e) => new
{
Sum = (a.Index < 2 ? 0 : a.Sum) + Math.Abs(e.Stats - a.Prev),
Prev = e.Stats,
Index = a.Index + 1
}, a => a.Sum)
}).ToList();
Edit: As requested in the comments, here is the foreach equivalent of the above Aggregate usage:
static decimal GetStats(IEnumerable<Employee> g)
{
decimal sum = 0;
decimal prev = 0;
int index = 0;
foreach (var e in g)
{
sum = (index < 2 ? 0 : sum) + Math.Abs(e.Stats - prev);
prev = e.Stats;
index++;
}
return sum;
}
Firstly, like mentioned in my comment, this can be done using a single linq query but would have many complications, one being unreadable code.
Using a simple foreach on the IGrouping List,
Updated (handle dynamic group length):
var list = CreateData();
var groupList = list.GroupBy(t => t.Id);
var finalList = new List<Employee>();
//Iterate on the groups
foreach(var grp in groupList){
var part1 = grp.Count()/2;
var part2 = (int)Math.Ceiling((double)grp.Count()/2);
var firstSet = grp.Select(i=>i.Stats).Take(part2);
var secondSet = grp.Select(i=>i.Stats).Skip(part1).Take(part2);
var total = (firstSet.Max() - firstSet.Min()) + (secondSet.Max() - secondSet.Min());
finalList.Add(new Employee{
Id = grp.Key,
EmpName = grp.FirstOrDefault().EmpName,
Stats = total
});
}
*Note -
You can optimize the logic used in getting the data for calculation.
More complicated logic is to divide the group into equal parts in case it is not fixed.
Updated Fiddle
The LinQ way,
var list = CreateData();
var groupList = list.GroupBy(t => t.Id);
var testLinq = (from l in list
group l by l.Id into grp
let part1 = grp.Count()/2
let part2 = (int)Math.Ceiling((double)grp.Count()/2)
let firstSet = grp.Select(i=>i.Stats).Take(part2)
let secondSet = grp.Select(i=>i.Stats).Skip(part1).Take(part2)
select new Employee{
Id = grp.Key,
EmpName = grp.FirstOrDefault().EmpName,
Stats = (firstSet.Max() - firstSet.Min()) + (secondSet.Max() - secondSet.Min())
}).ToList();

Merging two tables using criteria with linq

Im trying to resolve something with just one linq sentence, and I dont know if is possible do this.
I have one table named PRICES, with this fields:
pkey: int
region: int?
product_type: int
product_size: int
price: double
desc: string
The unique key is: product_type + product_size
I want to do a query that returns all rows WHERE region == 17
(this is my first set of rows)
AND want to add all rows where region is null
(this is my second set of rows)
BUT
if there are rows with the same product_type and product_size in both sets, i want in the final result just the row of the first set.
Example:
pkey | region | product_type | product_size | price | desc
1, null, 20, 7, 2.70, salad1
2, null, 20, 3, 2.50, salad7
3, 17, 20, 7, 1.90, saladspecial
4, 17, 20, 5, 2.20, other
I want a linq query that returns this:
2, null, 20, 3, 2.50, salad7
3, 17, 20, 7, 1.90, saladspecial
4, 17, 20, 5, 2.20, other
(note that row with pkey 1 is discarded because the row with pkey 3 has the same product_type and product_size)
var query1 = from p in PRICES where p.region == 17
select p;
var query2 = from p in PRICES where p.region is null
select p;
Questions:
How to join query1 and query2 to obtain the expected output?
It can be done with just 1 query?
Following query selects only prices with region 17 or null, groups them by unique key { p.product_type, p.product_size }. Then it checks whether group contain at least one price with region 17. If yes, then we select all prices of this region from group (and skipping prices with null region). Otherwise we return whole group (it has null regions only):
var query = from p in PRICES.Where(x => x.region == 17 || x.region == null)
group p by new { p.product_type, p.product_size } into g
from pp in g.Any(x => x.region == 17) ?
g.Where(x => x.region == 17) : g
select pp;
Input:
1 null 20 7 2.7 salad1 // goes to group {20,7} with region 17 price
2 null 20 3 2.5 salad7 // goes to group {20,3} without region 17 prices
3 17 20 7 1.9 saladspecial // goes to group {20,7}
4 17 20 5 2.2 other // goes to group {20,5}
Output:
2 null 20 3 2.5 salad7
3 17 20 7 1.9 saladspecial
4 17 20 5 2.2 other
EDIT Query above works fine with objects in memory (i.e. LINQ to Objects) but LINQ to Entitis is not that powerful - it does not support nested queries. So, for Entity Framework you will need two queries - one to fetch prices with null region, which does not have region 17 prices in the group, and second - prices from region 17:
var pricesWithoutRegion =
db.PRICES.Where(p => p.region == 17 || p.region == null)
.GroupBy(p => new { p.product_type, p.product_size })
.Where(g => !g.Any(p => p.region == 17))
.SelectMany(g => g);
var query = db.PRICES.Where(p => p.region == 17).Concat(pricesWithoutRegion);
Actually EF executes both sub-queries in one UNION query to server. Following SQL will be generated (I removed desc and price columns to fit screen):
SELECT [UnionAll1].[pkey] AS [C1],
[UnionAll1].[region] AS [C2],
[UnionAll1].[product_type] AS [C3],
[UnionAll1].[product_size] AS [C4]
FROM (SELECT [Extent1].[pkey] AS [pkey],
[Extent1].[region] AS [region],
[Extent1].[product_type] AS [product_type],
[Extent1].[product_size] AS [product_size]
FROM [dbo].[Prices] AS [Extent1] WHERE 17 = [Extent1].[region]
UNION ALL
SELECT [Extent4].[pkey] AS [pkey],
[Extent4].[region] AS [region],
[Extent4].[product_type] AS [product_type],
[Extent4].[product_size] AS [product_size]
FROM (SELECT DISTINCT [Extent2].[product_type] AS [product_type],
[Extent2].[product_size] AS [product_size]
FROM [dbo].[Prices] AS [Extent2]
WHERE ([Extent2].[region] = 17 OR [Extent2].[region] IS NULL) AND
(NOT EXISTS
(SELECT 1 AS [C1] FROM [dbo].[Prices] AS [Extent3]
WHERE ([Extent3].[region] = 17 OR [Extent3].[region] IS NULL)
AND ([Extent2].[product_type] = [Extent3].[product_type])
AND ([Extent2].[product_size] = [Extent3].[product_size])
AND (17 = [Extent3].[region])
))) AS [Distinct1]
INNER JOIN [dbo].[Prices] AS [Extent4]
ON ([Extent4].[region] = 17 OR [Extent4].[region] IS NULL)
AND ([Distinct1].[product_type] = [Extent4].[product_type])
AND ([Distinct1].[product_size] = [Extent4].[product_size]))
AS [UnionAll1]
BTW it's surprise to me that GroupBy was translated into inner join with conditions.
I think you should go for 1 query, for 2 queries, we have to repeat something:
//for 2 queries
var query = query1.Union(query2.Except(query2.Where(x=>query1.Any(y=>x.product_type==y.product_type&&x.product_size==y.product_size))))
.OrderBy(x=>x.pkey);
//for 1 query
//the class/type to make the group key
public class GroupKey
{
public int ProductType { get; set; }
public int ProductSize { get; set; }
public override bool Equals(object obj)
{
GroupKey gk = obj as GroupKey;
return ProductType == gk.ProductType && ProductSize == gk.ProductSize;
}
public override int GetHashCode()
{
return ProductSize ^ ProductType;
}
}
//-------
var query = list.Where(x => x.region == 17 || x.region == null)
.GroupBy(x => new GroupKey{ProductType = x.product_type, ProductSize = x.product_size })
.SelectMany<IGrouping<GroupKey,Price>,Price,Price>(x => x.Where(k => x.Count(y => y.region == 17) == 0 || k.region == 17), (x,g) => g)
.OrderBy(x=>x.pkey);

LINQ Query with Aggregates and Group By

i have the following SQL query...
select seaMake AS Make,
seaModel AS Model,
COUNT(*) AS [Count],
MIN(seaPrice) AS [From],
MIN(seaCapId) AS [CapId]
from tblSearch
where seaPrice >= 2000
and seaPrice <= 7000
group by seaMake, seaModel
order by seaMake, seaModel
Im trying to write this as a LINQ to Entities Query, but im having problems. This is what i have so far but i cannot access the make and model values from the var S
var tester = from s in db.tblSearches
where s.seaPrice >= 2000
&& s.seaPrice <= 7000
orderby s.seaMake
group s by s.seaMake into g
select new
{
make = g.seaMake,
model = s.seaModel,
count = g.Max(x => x.seaMake),
PriceFrom = g.Min(s.seaPrice)
};
Where am i going wrong ?
This should be a straightforward translation of the SQL:
from s in db.tblSearches
where
s.seaPrice >= 2000 &&
s.seaPrice <= 7000
group s by new {s.seaMake, s.seaModel} into g
orderby g.Key
select new
{
Make = g.Key.seaMake,
Model = g.Key.seaModel,
Count = g.Count(),
From = g.Min(x => x.seaPrice),
CapId = g.Min(x => x.seaCapId)
}
Instead of your original collection of IEnumerable<TypeOfS> when you grouped into g you converted that collection into an IEnumerable> so the collection in current scope is g. So the following would be valid
from s in db.tblSearches
where s.seaPrice >= 2000
&& s.seaPrice <= 7000
orderby s.seaMake
group s by s.seaMake into g // the collection is now IEnumerable<IGrouping<TypeOfSeaMake, TypeofS>>
select new {
make = g.Key, // this was populated by s.seaMake
model = g.First().seaModel, // get the first item in the collection
count = g.Max(x => x.seaMake), // get the max value from the collection
PriceFrom = g.Min(x => x.seaPrice), // get the min price from the collection
};
there will now be one item returned for each grouping

Categories

Resources