LINQ left join, group by and Count generates wrong result - c#

I'm struggling with linq (left join - group - count). Please help me.
Below is my code and it gives me this result.
Geography 2
Economy 1
Biology 1
I'm expecting this...
Geography 2
Economy 1
Biology 0
How can I fix it?
class Department
{
public int DNO { get; set; }
public string DeptName { get; set; }
}
class Student
{
public string Name { get; set; }
public int DNO { get; set; }
}
class Program
{
static void Main(string[] args)
{
List<Department> departments = new List<Department>
{
new Department {DNO=1, DeptName="Geography"},
new Department {DNO=2, DeptName="Economy"},
new Department {DNO=3, DeptName="Biology"}
};
List<Student> students = new List<Student>
{
new Student {Name="Peter", DNO=2},
new Student {Name="Paul", DNO=1},
new Student {Name="Mary", DNO=1},
};
var query = from dp in departments
join st in students on dp.DNO equals st.DNO into gst
from st2 in gst.DefaultIfEmpty()
group st2 by dp.DeptName into g
select new
{
DName = g.Key,
Count = g.Count()
};
foreach (var st in query)
{
Console.WriteLine("{0} \t{1}", st.DName, st.Count);
}
}
}

var query =
from department in departments
join student in students on department.DNO equals student.DNO into gst
select new
{
DepartmentName = department.DeptName,
Count = gst.Count()
};
I don't think any grouping is required for answering your question.
You only want to know 2 things:
- name of department
- number of students per department
By using the 'join' and 'into' you're putting the results of the join in the temp identifier gst. You only have to count the number of results in the gst.

var query = from dp in departments
from st in students.Where(stud => stud.DNO == dp.DNO).DefaultIfEmpty()
group st by dp.DeptName into g
select new
{
DName = g.Key,
Count = g.Count(x => x!=null)
};
You want to group the students by the department name but you want the count to filter out null students. I did change the join syntax slightly although that really does not matter to much.
Here is a working fiddle

Well, see what #Danny said in his answer, it's the best and cleanest fix for this case. By the way, you could also rewrite it to the lambda syntax:
var query = departments.GroupJoin(students,
dp => dp.DNO, st => st.DNO,
(dept,studs) => new
{
DName = dept.DNO,
Count = studs.Count()
});
I find this syntax much more predictable in results, and often, shorter.
BTW: .GroupJoin is effectively a "left join", and .Join is "inner join". Be careful to not mistake one for another.

And my answer is similar to #Igor
var query = from dp in departments
join st in students on dp.DNO equals st.DNO into gst
from st2 in gst.DefaultIfEmpty()
group st2 by dp.DeptName into g
select new
{
DName = g.Key,
Count = g.Count(std => std != null)
};
g.Count(std => std != null) is only one change you should take.

Related

How to translate SQL query with multiple grouping in EF equivalent

I have a database (PostgreSQL) where there is a main table student, additional information amount and 3 dictionaries. I make a query with grouping by three fields of dictionary IDs, output the number of objects and the amount from an additional table with a condition. And how to translate it to EF Core 6?
create table region (id serial primary key, name varchar);
create table district (id serial primary key, name varchar);
create table department (id serial primary key, name varchar);
create table student (
id serial primary key,
name varchar,
region_id bigint references region,
district_id bigint references district,
department_id bigint references department
);
create table amount (
id serial primary key,
student_id bigint references student on delete cascade,
value numeric,
year int
);
My SQL query is working well:
select
t.region_id,
region."name" region_name,
t.district_id,
district."name" district_name,
t.department_id,
department."name" department_name,
t.cnt,
t.value
from (
select
region_id,
district_id,
department_id,
count(distinct s.id) cnt,
sum(a.value) "value"
from student s
join amount a on s.id = a.student_id
where a.year = 2020
group by region_id, district_id, department_id
) t
join region on t.region_id = region.id
join district on t.district_id = district.id
join department on t.department_id = department.id
How do I get names from dictionaries when translating a query to EF?
[Table("student")]
public class Student
{
[Key]
[Column("id")]
public int Id { get; set; }
[Column("name")]
public string? Name { get; set; }
[Column("region_id")]
public int? RegionId { get; set; }
[Column("district_id")]
public int? DistrictId { get; set; }
[Column("department_id")]
public int? DepartmentId { get; set; }
[ForeignKey(nameof(RegionId))]
public virtual Region? Region { get; set; }
[ForeignKey(nameof(DistrictId))]
public virtual District? District { get; set; }
[ForeignKey(nameof(DepartmentId))]
public virtual Department? Department { get; set; }
public ICollection<Amount>? Amounts { get; set; }
}
EF query:
var result = await db.Student
.GroupBy(x => new { x.RegionId, x.DistrictId, x.DepartmentId })
.Select(x => new
{
x.Key.RegionId,
x.Key.DistrictId,
x.Key.DepartmentId,
Cnt = x.Count(),
Value = x.Sum(c => c.Amounts.Where(v => v.Year == 2020).Sum(v => v.Value))
})
.ToListAsync();
At the moment I have such a solution, but will such a request be optimal in the end? In addition, you need to add a null check here.
RegionName = x.First().Region.Name,
DistrictName = x.First().District.Name,
DepartmentName = x.First().Department.Name,
This can be done with the following EF Core query:
var query = from student in db.Student
join region in db.Region on student.RegionId equals region.id
join district in db.District on student.DistrictId equals district.id
join department in db.Department on student.DepartmentId equals department.id
join amount in db.Amount on student.Id equals amount.student_id
where amount.Year == 2020
group amount by new
{
student.RegionId,
RegionName = region.Name,
student.DistrictId,
DistrictName = district.Name,
student.DepartmentId,
DepartmentName = department.Name
} into g
select new
{
g.Key.RegionName,
g.Key.DistrictName,
g.Key.DepartmentName,
Cnt = g.Count(),
Value = g.Sum(a => a.Value)
};
var result = await query.ToListAsync();
It is translated into the following SQL:
SELECT r.name AS "RegionName", d.name AS "DistrictName", d0.name AS "DepartmentName",
count(*)::int AS "Cnt", COALESCE(sum(a.value), 0.0) AS "Value"
FROM student AS s
INNER JOIN region AS r ON s.region_id = r.id
INNER JOIN district AS d ON s.district_id = d.id
INNER JOIN department AS d0 ON s.department_id = d0.id
INNER JOIN amount AS a ON s.id = a.student_id
WHERE a.year = 2020
GROUP BY s.region_id, r.name, s.district_id, d.name, s.department_id, d0.name
If you need LEFT JOIN then it will be:
var query = from student in db.Student
join region in db.Region on student.RegionId equals region.id into rg
from r in rg.DefaultIfEmpty()
join district in db.District on student.DistrictId equals district.id into dg
from d in dg.DefaultIfEmpty()
join department in db.Department on student.DepartmentId equals department.id into dpg
from dp in dpg.DefaultIfEmpty()
join amount in db.Amount on student.Id equals amount.student_id
where amount.Year == 2020
group amount by new
{
student.RegionId,
RegionName = r.Name,
student.DistrictId,
DistrictName = d.Name,
student.DepartmentId,
DepartmentName = dp.Name
} into g
select new
{
g.Key.RegionName,
g.Key.DistrictName,
g.Key.DepartmentName,
Cnt = g.Count(),
Value = g.Sum(a => a.Value)
};
Try the following query:
var query =
from s in db.Student
from a in s.Amounts
where a.Year == 2020
group a by new
{
s.RegionId,
RegionName = s.Region.Name,
s.DistrictId,
DistrictName = s.District.Name,
s.DepartmentId,
DepartmentName = s.Department.Name
} into g
select new
{
x.Key.RegionId,
x.Key.DepartmentName,
x.Key.DistrictId,
x.Key.DistrictName,
x.Key.DepartmentId,
x.Key.DepartmentName,
Cnt = x.Select(v => v.StudentId).Distinct().Count(),
Value = x.Sum(v => v.Value)
};
var result = await query.ToListAsync();
Not sure that Cnt = x.Select(v => v.StudentId).Distinct().Count() will be translated, it depends on EF Core version.
UPDATE - added equivalent to the SQL query:
var groupingQuery =
from s in db.Student
from a in s.Amounts
where a.Year == 2020
group a by new
{
s.RegionId,
s.DistrictId,
s.DepartmentId,
} into g
select new
{
x.Key.RegionId,
x.Key.DistrictId,
x.Key.DepartmentId,
Cnt = x.Select(v => v.StudentId).Distinct().Count(),
Value = x.Sum(v => v.Value)
};
var query =
from g in groupingQuery
join region in db.Region on g.RegionId equals region.id
join district in db.District on g.DistrictId equals district.id
join department in db.Department on g.DepartmentId equals department.id
select new
{
g.RegionId,
RegionName = region.Name,
g.DistrictId,
DistrictName = district.Name,
g.DepartmentId,
DepartmentName = department.Name,
g.Cnt,
g.Value
};
var result = await query.ToListAsync();

How to make use of join in linq having multiple tables and use orderby?

I need some help, i have a one working join, need the other one for third table? How can i create it? My orderby does not work either with year and need some help also. This is my logic as below and using Linq in sql;
// controller
public IList<ExtractionViewModel> GetExtractionViewModels()
{
ProductionManagementEntities db = new ProductionManagementEntities();
var scheduleList = (from p in db.ProductionDays
join w in db.Weeks on p.WeekId equals w.WeekId
// need other join here for the second table
orderby w.Year ascending // this is not working, year starts in 2017 instead of 2021 downwards
where(w.WeekNum == 9)
select new ExtractionViewModel
{
Year = w.Year,
Week = w.WeekNum,
Day = p.ProductionDate,
}).ToList();
return scheduleList;
}
// Model
public class ExtractionViewModel
{
public string Year { get; set; }
public int Week { get; set; }
public DateTime Day { get; set; }
public string VW250 { get; set; }
public string VW270 { get; set; }
public string VW250_2PA { get; set; }
public string VW_270_PA { get; set; }
}
//Controller
public IList<ExtractionViewModel> GetExtractionViewModels()
{
ProductionManagementEntities db = new ProductionManagementEntities();
var scheduleList = (from p in db.ProductionDays
from m in db.Models
join w in db.Weeks on p.WeekId equals w.WeekId
orderby w.Year descending
orderby m.Name ascending
where(m.Name== "VW250")
where(w.WeekNum == 9)
select new ExtractionViewModel
{
Year = w.Year,
Week = w.WeekNum,
Day = p.ProductionDate,
VW250 = m.Name
}).ToList();
return scheduleList;
}
Using a Linq query, this should work:
var scheduleList = (from p in db.ProductionDays
join w in db.Weeks on p.WeekId equals w.WeekId
join n in db.NewTable on p.WeekId equals n.WeekId
where w.WeekNum equals 9 and m.Name equals "VW250"
orderby w.Year ascending
select new ExtractionViewModel
{
Year = w.Year,
Week = w.WeekNum,
Day = p.ProductionDate,
Property = n.Property
}).ToList();
A simpler way, and also seems to run a bit quicker as well, is:
var scheduleList = db.ProductionDays
.Include(x => x.Weeks)
.Include(x => x.NewTable)
.Where(x => x.Week.WeekNum == 9)
.OrderBy(x => x.Week.Year)
.Select(x => new ExtractionViewModel {
x.Week.Year,
x.Week.WeekNum,
x.ProductionDate,
x.NewTable.Property
})
.ToList();
The second one is linq method and when debugging and stepping through I notice that they seem to be quicker than linq queries.
The problem with your query seemed to be the syntax on the where clause. You had where(w.WeekNum == 9) which may work but I have never seen that syntax. With linq, I have only worked with lambda expressions or where property equals value type syntax. I haven't tested this so if there is an error you will probably need to move the .OrderBy() statement to the bottom, but it should be fine.
In the question you don't mention what third table you would like to join but indicate that there is a third table. I added NewTable and NewTable.Property to indicate the third table and one of its columns/Properties.

Remove similar items from List

The following data is returned from an SQL View:
Name CandidateID Filled
Tom Jones 1003436 2014-05-09 07:13:53.087
Tom Jones 1003436 2014-05-09 07:13:18.957
Ed Harris 1421522 2014-05-09 08:17:20.234
I only want the one Tom Jones record with the latest Filled time. How can I achive this in C#/LINQ while getting or after getting data from server?
Maybe something like this:
var q = from n in table
group n by new {n.CandidateID,n.Name} into g
select new
{
CandidateID = g.Key.CandidateID,
Name = g.Key.Name,
Filled = g.Max(t=>t.Filled)
};
Test class
class Foo
{
public string Name { get; set; }
public int CandidateID { get; set; }
public DateTime Filled { get; set; }
}
Test case
var ls=new List<Foo>
{
new Foo(){Name="Tom Jones",CandidateID=1003436,
Filled=DateTime.Parse("2014-05-09 07:13:53.087")},
new Foo(){Name="Tom Jones",CandidateID=1003436,
Filled=DateTime.Parse("2014-05-09 07:13:18.957")},
new Foo(){Name="Ed Harris",CandidateID=1421522,
Filled=DateTime.Parse("2014-05-09 08:17:20.234")}
};
var q =
(from n in ls
group n by new {n.CandidateID,n.Name} into g
select new
{
CandidateID = g.Key.CandidateID,
Name = g.Key.Name,
Filled = g.Max(t=>t.Filled)
});
Output
CandidateID Name Filled
1003436 Tom Jones 09/05/2014 7:13:53 AM
1421522 Ed Harris 09/05/2014 8:17:20 AM
var q = from n in table
group n by n.CandidateID into g
select g.OrderByDescending(t=>t.Filled).FirstOrDefault();
you need Group by as shown below
var distinctItems = ls.GroupBy(x => x.CandidateID).Select(y => y.First());

Extensible relational division in LINQ

In this example class IcdPatient represents a many-to-many relationship between a Patient table (not shown in this example) and a lookup table Icd.
public class IcdPatient
{
public int PatientId { get; set; }
public int ConditionCode { get; set; }
public static List<IcdPatient> GetIcdPatientList()
{
return new List<IcdPatient>()
{
new IcdPatient { PatientId = 100, ConditionCode = 111 },
new IcdPatient { PatientId = 100, ConditionCode = 222 },
new IcdPatient { PatientId = 200, ConditionCode = 111 },
new IcdPatient { PatientId = 200, ConditionCode = 222 },
new IcdPatient { PatientId = 3, ConditionCode = 222 },
};
}
}
public class Icd
{
public int ConditionCode { get; set; }
public string ConditionName { get; set; }
public static List<Icd> GetIcdList()
{
return new List<Icd>()
{
new Icd() { ConditionCode =111, ConditionName ="Condition 1"},
new Icd() { ConditionCode =222, ConditionName ="Condition 2"},
};
}
}
I would like for the user to be able to enter as many conditions as they want, and get a LINQ object back that tells them how many PatientIds satisfy that query. I've come up with:
List<string> stringFilteredList = new List<string> { "Condition 1", "Condition 2" };
List<int> filteringList = new List<int> { 111,222 };
var manyToMany = IcdPatient.GetIcdPatientList();
var icdList = Icd.GetIcdList();
/*Working method without joining on the lookup table*/
var grouped = from m in manyToMany
group m by m.PatientId into g
where g.Count() == filteringList.Distinct().Count()
select new
{
PatientId = g.Key,
Count = g.Count()
};
/*End*/
foreach (var item in grouped)
{
Console.WriteLine(item.PatientId);
}
Let's say that IcdPatient has a composite primary key on both fields, so we know that each row is unique. If we find the distinct number of entries in filteringList and do a count on the number of times a PatientId shows up, that means we've found all the people who have all conditions. Because the codes can be esoteric, I would like to do something like
let the user table in the ConditionName in type Icd and perform the same operation. I've not used LINQ this way a lot and I've gathered:
List<int> filteringList = new List<int> { 111,222 };
List<string> stringFilteredList= new List<string>{"Condition 1","Condition 2" };
filteringList.Distinct();
var manyToMany = IcdPatient.GetIcdPatientList();
var icdList = Icd.GetIcdList();
/*Working method without joining on the lookup table*/
var grouped = from m in manyToMany
join i in icdList on
m.ConditionCode equals i.ConditionCode
//group m by m.PatientId into g
group new {m,i} by new { m.ConditionCode }into g
where g.Count() == filteringList.Distinct().Count()
select new
{
Condition = g.Key.ConditionCode
};
/*End*/
but can't get anything to work. This is essentially a join on top of my first query, but I'm not getting what I need to group on.
You don't need to group anything in this case, just use a join and a contains:
List<string> stringFilteredList= new List<string>{"Condition 1","Condition 2" };
var patients =
from icd in Icd.GetIcdList()
join patient in IcdPatient.GetIcdPatientList() on icd.ConditionCode equals patient.ConditionCode
where stringFilteredList.Contains(icd.ConditionName)
select patient.PatientId;
Let's say that IcdPatient has a composite primary key on both fields, so we know that each row is unique. If we find the distinct number of entries in filteringList and do a count on the number of times a PatientId shows up, that means we've found all the people who have all conditions. Because the codes can be esoteric, I would like to do something like let the user table in the ConditionName in type Icd and perform the same operation.
I believe you're asking:
Given a list of ConditionCodes, return a list of PatientIds where every patient has every condition in the list.
In that case, the easiest thing to do is group your IcdPatients table by Id, so that we can tell every condition that a patient has by looking once. Then we check that every ConditionCode we're looking for is in the group. In code, that looks like:
var result = IcdPatient.GetIcdPatientList()
// group up all the objects with the same PatientId
.GroupBy(patient => patient.PatientId)
// gather the information we care about into a single object of type {int, List<int>}
.Select(patients => new {Id = patients.Key,
Conditions = patients.Select(p => p.ConditionCode)})
// get rid of the patients without every condition
.Where(conditionsByPatient =>
conditionsByPatient.Conditions.All(condition => filteringList.Contains(condition)))
.Select(conditionsByPatient => conditionsByPatient.Id);
In query format, that looks like:
var groupedInfo = from patient in IcdPatient.GetIcdPatientList()
group patient by patient.PatientId
into patients
select new { Id = patients.Key,
Conditions = patients.Select(patient => patient.ConditionCode) };
var resultAlt = from g in groupedInfo
where g.Conditions.All(condition => filteringList.Contains(condition))
select g.Id;
Edit: If you'd also like to let your user specify the ConditionName rather than the ConditionId then simply convert from one to the other, storing the result in filteringList, like so:
var conditionNames = // some list of names from the user
var filteringList = Icd.GetIcdList().Where(icd => conditionNames.Contains(icd.ConditionName))
.Select(icd => icd.ConditionCode);

LINQ: find items in a list that have frequency = 1

I'm struggling with the following task. Any suggestions would be greatly appreciated!
I have a list of Person objects like below:
public class Person {
private string firstname {get; set}
private string lastname {get; set}
private string zipcode {get; set;}
private string id {get; set;}
private int freq = 1;
public Person(...) {...}
}
List<Person> PersonList = new List<Person>; //Gets populated with Person objects
I want to find all the people who have unique names within their zipcode.
So far, I've tried performing a frequency count on all the distinct combinations of (firstname, lastname, zipcode) and then selecting the combinations that have frequency = 1. However, I then lose all information about these peoples' IDs. I need a way to retain the original Person objects despite the grouping operation.
Below is the frequency count I mentioned above, but it isn't the result I'm looking for:
var QueryFreqAnalysis =
from p in PersonList
group p by new { p.firstName, p.lastName, p.zipcode } into g
select new {
fName = g.Key.firstname,
lName = g.Key.lastname,
zip3 = g.Key.zipcode,
freq = g.Sum(p => p.freq)
};
As I mentioned, even though I can now select groups within g that have freq = 1, I have lost all information about the Person IDs.
I hope I've made the problem clear. Thanks in advance for any suggestions!
from p in PersonList
// Group by name and zip
group p by new { p.firstName, p.lastName, p.zipcode } into g
// Only select those who have unique names within zipcode
where g.Count() == 1
// There is guaranteed to be one result per group: use it
let p = g.FirstOrDefault()
select new {
fName = p.firstname,
lName = p.lastname,
zip3 = p.zipcode,
id = p.id
}
I know you probably only need and want a linq answer :)
But i just had to write a non linq one:
var dict = new Dictionary<string, Person>(PersonList.Count);
var uniqueList = new List<Person>();
foreach (var p in PersonList)
{
var key = p.firstname + p.lastname + p.zipcode;
if (!dict.ContainsKey(key))
dict.Add(key, p);
else
dict[key] = null;
}
foreach (var kval in dict)
{
if (kval.Value != null)
uniqueList.Add(kval.Value);
}
return uniqueList;
Using Hash Codes is also possible.

Categories

Resources