Group by and left join in linq

Group by and left join in linq - c#

There are two tables, one is customers has the fields customerID,GroupID and the other one is CustomerGroup has the fields GroupID,GroupName, I want to get the quantity of customerID in each group, here is the LINQ statement:
var groups = from customerGroups in db.CustomerGroup
join customers in db.Customers on customerGroups.GroupID equals customers.GroupID into gc
where customerGroups.MerchantID == merchantID
from subCustomerGroups in gc.DefaultIfEmpty()
group customerGroups by customerGroups.GroupName into grpCustomerGroups
select new { GroupName = grpCustomerGroups.Key, Quantity = customers.Count()};
the problme is that Quantity = customers.Count() is invalid, how to correct the statement?
The expected sql steatment is
exec sp_executesql N'SELECT
1 AS [C1],
[GroupBy1].[K1] AS [GroupName],
[GroupBy1].[A1] AS [C2]
FROM ( SELECT
[Extent1].[GroupName] AS [K1],
COUNT(CustomerID) AS [A1]
FROM [dbo].[CustomerGroup] AS [Extent1]
LEFT OUTER JOIN [dbo].[Customer] AS [Extent2] ON [Extent1].[GroupID] = [Extent2].[GroupID]
WHERE [Extent1].[MerchantID] = #p__linq__0
GROUP BY [Extent1].[GroupName]
) AS [GroupBy1]',N'#p__linq__0 bigint',#p__linq__0=9

Usually, if you find yourself doing a left outer join followed by a GroupBy, it is because you want "items with their sub-items", Like "Schools with their Students", "Clients with their Orders", "CustomerGroups with their Customers", etc. If you want this, consider using GroupJoin instead of "Join + DefaultIfEmpty + GroupBy"
I'm more familiar with method syntax, so I'll use that one.
int merchantId = ...
var result = dbContext.CustomerGroups
// keep only the CustomerGroups from merchantId
.Where(customerGroup => customerGroup.MerchantId == merchantId)
.GroupJoin(dbContext.Customers, // GroupJoin with Customers
customerGroup => customerGroup.GroupId, // from every CustomerGroup take the GroupId
customer => customer.GroupId, // from every Customer take the GroupId
// ResultSelector:
(customerGroup, customersInThisGroup) => new // from every CustomerGroup with all its
{ // matching customers make one new object
GroupName = customerGroup.Key,
Quantity = customersInThisGroup.CustomerId, // ???
});
In words:
Take the sequence of CustomerGroups. Keep only those CustomerGroups that have a value for property MerchantId equal to merchantId. From every remaining CustomerGroup, get all its Customers, by comparing the CustomerGroup.GroupId with each Customer.GroupId.
The result is a sequence of CustomerGroups, each with its Customers. From this result (parameter ResultSelector) get the GroupName from the Customer and the Quantity from the Customers in this group.
Your statement was:
Quantity = customers.CustomerID,
This will not work. I'm sure this is not what you want. Alas you forgot to write what you want. I think it is this:
Quantity = customers.Count().
But if you want the CustomerId of all Customers in this CustomerGroup:
// ResultSelector:
(customerGroup, customersInThisGroup) => new
{
GroupName = customerGroup.Key,
CustomerIds = customersInThisGroup.Select(customer => customer.CustomerId)
.ToList(),
);
If you want you can use the ResultSelector to get "CustomerGroups with their Customers". Most efficient is to select only the properties you actually plan to use:
// ResultSelector:
(customerGroup, customersInThisGroup) => new
{
// select only the CustomerGroup properties that you plan to use:
Id = CustomerGroup.GroupId,
Name = CustomerGroup.Name,
... // other properties that you plan to use
Customers = customersInThisGroup.Select(customer => new
{
// again, select only the Customer properties that you plan to use
Id = customer.Id,
Name = customer.Name,
...
// not needed, you know the value:
// GroupId = customer.GroupId
});
The reason not to select the foreign key of the Customers, is efficiency. If CustomerGroup [14] has 1000 Customers, then every Customer in this group will have a value for GroupId equal to [14]. It would be a waste to send this value [14] 1001 times.

Related

How can I turn SQL query that joins two columns and groups by count of one column and a column of each joined table into LINQ?

In my database, each URI has associated tags (Tag table) and each pageview (PageView table) is associated with someone viewing a particular page. I want to return a list of URIs that have the same tags as a given URI, by count of each URI that shares those tag(s). My SQL query looks like this:
select count(URI) as 'Count', p.URI, t.Name
from tracking.PageView as p
inner join Tracking.Tag as t on p.ID = t.PageViewID
where t.name in
(select t.Name
from tracking.PageView as p
inner join Tracking.Tag as t on p.ID = t.PageViewID
where p.URI = 'URI WE WANT TAGS OF'
)
and p.uri like '%/articles/%'
group by p.URI , t.name
order by Count desc
My apologies if the description is too vague for the query or if the query itself is rough. It was just the first one that worked. I've tried to separate the subquery into a variable and select values in that subquery, but it's been some time since I've used LINQ and I'm spinning wheels at this point.

The following is pretty much an exact translation of your current SQL query, which should get you started.
from p in tracking.PageView
join t in Tracking.Tag on p.ID equals t.PageViewID
where p.uri.Contains("/articles/")
&& (
from p2 in tracking.PageView
join t2 in Tracking.Tag on p2.ID equals t2.PageViewID
where p2.URI == "URI WE WANT TAGS OF"
select t2.name
).Contains(t.name)
group new { p, t } by new { p.URI, t.name } into g
orderby g.Count() descending
select new {
Count = g.Count(),
g.Key.URI,
g.Key.Name
}

Basic misunderstanding of LINQ to SQL and foreign keys

I am working on a much larger project, and can't seem to get LINQ to SQL working the way I expect it to. I created a simple subset of the project so I can use LinqPad to try to make sure I have a basic understanding of how this should work.
Clearly, I don't: I've created two very simple tables - customer and job. The customer table has an ID (auto-increment) and a Name, the Job table has an ID (ai), a Name, and a CustomerID (foreign key to ID in the customer table).
When I run the following code against an initially empty database:
void Main()
{
string custName = "James";
string[] jobNames = new string[] {"Home Depot", "Menards", "Sam's Club" };
var cust = customer.FirstOrDefault(c => c.Name == custName);
if (cust == null)
{
cust = new customer
{
Name = custName
};
customer.InsertOnSubmit(cust);
}
foreach(var jn in jobNames)
{
if (!job.Any(j => j.Customer.Name == cust.Name && j.Name == jn))
job.InsertOnSubmit(new job {
Name = jn,
Customer = cust
});
}
SubmitChanges();
customer.Dump();
job.Dump();
}
I would expect to end up with 1 customer and 3 jobs in the database - that's all good. But the generated SQL and the setting of the customer IDs are not at all what I expect:
SQL --
SELECT t0.ID, t0.Name
FROM customer AS t0
WHERE (t0.Name = #p0)
LIMIT 0, 1
-- p0 = [James]
SELECT COUNT(*) AS value
FROM job AS t0
LEFT OUTER JOIN customer AS t1
ON (t1.ID = t0.CustomerID)
WHERE ((t1.Name = #p0) AND (t0.Name = #p1))
-- p0 = [James]
-- p1 = [Home Depot]
SELECT COUNT(*) AS value
FROM job AS t0
LEFT OUTER JOIN customer AS t1
ON (t1.ID = t0.CustomerID)
WHERE ((t1.Name = #p0) AND (t0.Name = #p1))
-- p0 = [James]
-- p1 = [Menards]
SELECT COUNT(*) AS value
FROM job AS t0
LEFT OUTER JOIN customer AS t1
ON (t1.ID = t0.CustomerID)
WHERE ((t1.Name = #p0) AND (t0.Name = #p1))
-- p0 = [James]
-- p1 = [Sam's Club]
INSERT INTO job(CustomerID, ID, Name)
VALUES (NULL, 0, #p0)
-- p0 = [Home Depot]
INSERT INTO job(CustomerID, ID, Name)
VALUES (NULL, 0, #p0)
-- p0 = [Menards]
INSERT INTO job(CustomerID, ID, Name)
VALUES (NULL, 0, #p0)
-- p0 = [Sam's Club]
INSERT INTO customer(ID, Name)
VALUES (0, #p0)
-- p0 = [James]
SELECT t0.ID, t0.Name
FROM customer AS t0
SELECT t0.CustomerID, t0.ID, t0.Name
FROM job AS t0
Results in LinqPad:
I thought that the beauty of LINQ to SQL was that I don't have to manage to set the foreign keys and that I should be able to do what I'm trying to do here with a single hit to the database. What am I missing?
EDIT: So I guess I understand the customer IDs being set to NULL because the generated SQL is calling INSERT on the jobs before the INSERT on the customer, hence there is no ID yet. Why would it do that? Also, if I run the same query again, I get three more rows in the jobs table, but the CustomerIDs are all still set to NULL.

Linq query have no issues.
check DB table relations

EF Core 3.0 - Convert SQL to LINQ

The example given in the blog has the following
from e in s.StudentCourseEnrollments where courseIDs.Contains(e.Course.CourseID) select e
The contains logic will not work when we are looking for an exact match. If a student has enrolled for 6 courses (ex : 1,2,3,4,5,6) and the requested list contains 5 (ex: 1,2,3,4,5) the query will return a match when it should not. The other way works well when the student has enrolled in a subset of the requested list.
Below solution works but need help to convert the below sql to LINQ (EF Core 3.0) ?
Create TABLE dbo.Enrollments (StudentId INT NOT NULL, CourseId INT NOT NULL)
insert into dbo.Enrollments values (1,1)
insert into dbo.Enrollments values (1,2)
insert into dbo.Enrollments values (1,3)
insert into dbo.Enrollments values (1,4)
insert into dbo.Enrollments values (1,5)
insert into dbo.Enrollments values (1,6)
DECLARE #TempCourses TABLE
(
CourseId INT
);
INSERT INTO #TempCourses (CourseId) VALUES (1), (2), (3),(4),(5);
SELECT t.StudentId
FROM
(
SELECT StudentId, cnt=COUNT(*)
FROM dbo.Enrollments
GROUP BY StudentId
) kc
INNER JOIN
(
SELECT cnt=COUNT(*)
FROM #TempCourses
) nc ON nc.cnt = kc.cnt
JOIN dbo.Enrollments t ON t.StudentId = kc.StudentId
JOIN #TempCourses n ON n.CourseId = t.CourseId
GROUP BY t.StudentId
HAVING COUNT(*) = MIN(nc.cnt);
drop table dbo.Enrollments
db<>Fiddle

I don't know about the SQL query, but the EF Core 3.0 LINQ query for the same task is something like this:
var matchIds = new[] { 1, 2, 3, 4, 5 }.AsEnumerable();
var query = dbContext.Students
.Where(s => s.Enrollments.All(e => matchIds.Contains(e.CourseId))
&& s.Enrollments.Count() == matchIds.Count());
The main matching job is done with All subquery. Unfortunately that's not enough for the case when related link records are more than the matching ids, so additional counts comparison solves that.

You can achieve it with a simple way like this, live demo here
Let's say that you've got the list of enrollments by this way
var enrollments = from s in dc.Students
from c in s.Courses
select new { StudentID = s.StudentID, CourseID = c.CourseID };
Then get the result by this way
var groupedEnrollment = enrollments.GroupBy(p => p.StudentId)
.Select(g => new
{
StudentId = g.Key,
Courses = g.Select(p => p.CourseId).ToArray()
});
var result = groupedEnrollment.Where(g =>
g.Courses.Length == courses.Length &&
g.Courses.Intersect(courses).Count() == courses.Length);

ORA-00907: Distinct, join and group by in LINQ C#

I'm getting the error code ORA-00907, when executing the linq query below. It seems to be Oracle specific. The problem seems to be the "group by" subquery.
Lets say I have these two tables: USER and ADDRESS, with columns:
USER{userid, addressid},
ADDRESS{addressid, streetname}
Table ADDRESS contains several rows with the same addressid, so I guess I would like to group the ADDRESS-table (DISTINCT) on the addressid so I only get one match with addressid in USER-table, it should also be a LEFT JOIN, so if there is no match I still get the USER-record.
I have tried several different approaches, My code (example):
List<MyObject> result =
(
from u in context.USER.Where(i => i.userid > 100)
join a in (from address in context.ADDRESS group address by address.addressid)
on u.addressid equals a.FirstOrDefault().addressid into joinedaddress
from lfjoinedaddress in joinedaddress.DefaultIfEmpty()
join email in context.EMAIL on u.userid equals email.userid into jemail
from lfjemail in jemail.DefaultIfEmpty()
select new MyObject()
{
UserId = u.userid,
StreetName = lfjoinedaddress.streetname,
UserEmail = lfjemail.emailaddress
}
).ToList();
Someone know how to achieve this, by rewriting the query so it works against Oracle.
UPDATE:
This is the generated sql-query, except the "email":
SELECT
1 AS "C1",
"Extent1"."USERID" AS "USERID",
"Extent1"."ADDRESSID" AS "ADDRESSID"
FROM (SELECT
"USER"."USERID" AS "USERID",
"USER"."ADDRESSID" AS "ADDRESSIF",
FROM "EXT"."USER" "USER") "Extent1"
LEFT OUTER JOIN (SELECT "Distinct1"."ADDRESSID" AS "ADDRESSID1", "Limit1"."ADDRESSID" AS "ADDRESSID2", , "Limit1"."STREETNAME" AS "STREETNAME1"
FROM (SELECT DISTINCT
"Extent2"."ADDRESSID" AS "ADDRESSID"
FROM (SELECT
"ADDRESS"."ADDRESSID" AS "ADDRESSID",
"ADDRESS"."STREETNAME" AS "STREETNAME",
FROM "EXT"."ADDRESS" "ADDRESS") "Extent2" ) "Distinct1"
OUTER APPLY (SELECT "Extent3"."ADDRESSID" AS "ADDRESSID", "Extent3"."STREETNAME" AS "STREETNAME"
FROM (SELECT
"ADDRESS"."ADDRESSID" AS "ADDRESSID",
"ADDRESS"."STREETNAME" AS "STREETNAME",
FROM "EXT"."ADDRESS" "ADDRESS") "Extent3"
WHERE ("Distinct1"."ADDRESSID" = "Extent3"."ADDRESSID") AND (ROWNUM <= (1) ) ) "Limit1"
OUTER APPLY (SELECT "Extent4"."ADDRESSID" AS "ADDRESSID", , "Extent4"."STREETNAME" AS "STREETNAME"
FROM (SELECT
"ADDRESS"."ADDRESSID" AS "ADDRESSID",
"ADDRESS"."STREETNAME" AS "STREETNAME",
FROM "EXT"."ADDRESS" "ADDRESS") "Extent4"
WHERE ("Distinct1"."ADDRESSID" = "Extent4"."ADDRESSID") AND (ROWNUM <= (1) ) ) "Limit2" ) "Apply2" ON ("Extent1"."ADDRESSID" = "Apply2"."ADDRESSID2") OR (("Extent1"."ADDRESSID" IS NULL) AND ("Apply2"."ADDRESSID3" IS NULL))))

DISTINCT is applied to tuples not an individual value within a tuple. If STREETNAME is always the same per ADDRESSID in table ADDRESS, then you want DISTINCT tuples of (ADDRESSID, STREETNAME). Which you could simply do with selecting the distinct columns of context.ADDRESS as your subquery and omit the .FirstOrDefault().
join a in
(
from address in context.ADDRESS
select new
{
address.addressid,
address.streetname
}
).Distinct()
on u.addressid equals a.addressid into joinedaddress
from lfjoinedaddress in joinedaddress.DefaultIfEmpty()
If STREETNAME is not always the same per ADDRESSID, then you don't want DISTINCT at all.

Get top five most repeating records in Entity Framework

I want to get top five most repeating records from a table in link to Entity Framework 4.0. How it can be possible in a single query which returns a list of collection of five records?

You simply group by count, order descending by count and then Take(5). Grouping examples, amongst others, can be found at 101 LINQ Samples.

Actually you should group by fields which define whether record is repeating or not. E.g. in your case it should be something like member id. Then you can introduce new range variable which will keep number of records in each group. Use that variable for ordering results:
var query = from s in db.Statistics
group s by s.MemberId into g // group by member Id
let loginsCount = g.Count() // get count of entries for each member
orderby loginsCount descending // order by entries count
select new { // create new anonymous object with all data you need
MemberId = g.Key,
LoginsCount = loginsCount
};
Then take first 5:
var top5 = query.Take(5);
That will generate query like
SELECT TOP (5) // Take(5)
[GroupBy1].[K1] AS [MemberId], // new { MemberId, LoginsCount }
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
[Extent1].[MemberId] AS [K1],
COUNT(1) AS [A1] // let loginsCount = g.Count()
FROM [dbo].[Statistics] AS [Extent1]
GROUP BY [Extent1].[MemberId] // group s by s.MemberId
) AS [GroupBy1]
ORDER BY [GroupBy1].[A1] DESC // orderby loginsCount descending

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Group by and left join in linq - c#

Related

How can I turn SQL query that joins two columns and groups by count of one column and a column of each joined table into LINQ?

Basic misunderstanding of LINQ to SQL and foreign keys

EF Core 3.0 - Convert SQL to LINQ

ORA-00907: Distinct, join and group by in LINQ C#

Get top five most repeating records in Entity Framework

Categories

Resources