Select rows based on group by counts - c#

I have a table (Students) with three columns:
StudentID - MotherID - FatherID
I'm having a hard time understanding how I can form a LINQ query to do the following:
I want to get back a list of all students with less than 'y' number of fullsiblings (same mother id and father id) and less than 'z' number of halfsiblings (same father id different mother id).
Using LINQ, I am able to get the correct rows based on half sibling relation ships, but not full sibling relationships:
var c = studentsDT
.GroupBy(a => new { a.FatherID}).Where(grp => grp.Count() <= halfSiblings)
.SelectMany(grp => grp.Select(r => r))
.GroupBy(a => new { a.MotherID}).Where(grp1 => grp1.Count() <= fullSiblings)
.SelectMany(grp1 => grp1.Select(r1 => r1));
If table data looked like the following:
1 100 200
2 101 200
3 100 200
4 100 200
5 101 200
In the above data snippet, student 1 has two full siblings and two half siblings by father.
Student 2 has one full sibling and three half siblings by father.
If I wanted a list that only had students with no more than two full siblings and no more than 1 half sibling, how could this be achieved?

You're going to want a GroupJoin. Something like this:
from student in Students
join sibling in Students
on student.FatherID equals sibling.FatherID
into siblings
where
siblings.Count(s => s.MotherID == student.MotherID) < fullSiblingLimit &&
siblings.Count(s => s.MotherID != student.MotherID) < halfSiblingLimit
select student
Note that you specified half siblings sharing a father and not a mother.
If your data set is very large, there is room to tweak the query for efficiency.

To get the number of full siblings, you need to specify two keys to group by:
var c = studentsDT
.GroupBy(a => new { a.FatherID, a.MotherID })
.Where(g => g.Count() <= fullSiblings)
.SelectMany(g => g)
.GroupBy(a => a.FatherID)
.Where(g => g.Count() <= halfSiblings)
.SelectMany(g => g);
Note that this counts a full sibling as a half sibling (i.e. it ensures that the total number of full and half siblings is less than halfSiblings).

Related

Why is Group By translating in Order By by Entity Framework?

I am using Entity Framework in .NET 7.
I have 3 entities:
Course that contains a ProfessorId among other things
Grade that has a CourseId among other things
Professor
I want to get all the courses that are assigned to a professor and have at least 1 grade associated with them and filter them in a Dictionary<string, CourseViewModel> where string is the semester.
I have written the following LINQ query:
var professorGradedCourses = _dbContext.Courses
.Where(course => course.ProfessorId == professorId && course.Grades.Any())
.Select(course => new CourseViewModel
{
Title = course.Title,
Semester = course.Semester,
})
.GroupBy(course => course.Semester)
.OrderBy(course => course.Key)
.ToDictionary(group => group.Key, group => group.ToList());
When that executes I get an exception saying it can't be translated.
If I remove the OrderBy and keep only the GroupBy, it works and the translated SQL in Microsoft SQL Server is:
SELECT [c].[Semester], [c].[Title]
FROM [Courses] AS [c]
WHERE [c].[ProfessorId] = #__professorId_0
AND EXISTS (SELECT 1
FROM [Grades] AS [g]
WHERE [c].[Id] = [g].[CourseId])
ORDER BY [c].[Semester]
As you can see it adds ORDER BY anyway, even though I have removed it and kept only GroupBy(). Can someone explain why is that? What if I wanted to order by descending would that be possible? Also the weird thing is that if I remove GroupBy() and keep only OrderBy() and replace the ToDictionary with ToList, it works and the exact same query is produced (only now I can't really use the results without further actions).
LINQ GroupBy :
Groups the elements of a sequence.
SQL GROUP BY :
A SELECT statement clause that divides the query result into groups of rows, usually by performing one or more aggregations on each group. The SELECT statement returns one row per group.
They aren't equivalent. The main difference is LINQ GroupBy return a collection by key, when SQL GROUP BY return ONE element (column) by key.
If the projection ask ONE element by key, then EF Core translate LINQ GroupBy to SQL GROUP BY :
// Get the number of course by semester
context
.Courses
.GroupBy(c => c.Semester)
.Select(cs => new { Semester = cs.Key, Count = cs.Count() })
.ToList();
Translated to :
SELECT [c].[Semester], COUNT(*) AS [Count]
FROM [Courses] AS [c]
GROUP BY [c].[Semester]
But if the projection ask several element, then EF Core translate LINQ GroupBy to SQL ORDER BY and group by itself.
context
.Courses
.Select(c => new { c.Id, c.Semester })
.GroupBy(c => c.Semester)
.ToDictionary(cs => cs.Key, cs => cs.ToList());
Translated to :
SELECT [c].[Semester], [c].[Id]
FROM [Courses] AS [c]
ORDER BY [c].[Semester]
If the result is :
Semester
Id
2023 S1
1
2023 S1
4
2023 S2
2
...
...
Then EF Core read like :
Read first row : Semester is "2023 S1"
No group
Then create a group and add the row in.
Read second row : Semester is "2023 S1"
The key is the same that precedent element
Then Add the row in the group
Read the third row : Semester is "2023 S2"
The key is different that precedent element
Then create a new group and the row in.
And so on...
You understand the interest of sorting.
About the error, I don't know that EF Core can't. The query sound legit. Maybe this should not be implemented at this time.
About that you try, to convert a sorted grouping enumeration to a dictionary. This is weird because the dictionary isn't sortable. Then this sorts elements and put them in loose.
If Dictionary seem sorted, it's a coincidence, not a feature. In intern, the dictionary sort element by key's has code, that is generally the sorted order... But not every time.
If you want a sorted dictionary, you can use SortedDictyonary. But it can be tricky if you need a custom sort rule, like :
context
.Courses
.Select(c => new { c.Id, c.Semester })
.GroupBy(c => c.Semester)
.ToImmutableSortedDictionary(cs => cs.Key, cs => cs.ToList(), new ReverseComparer<string>());
public class ReverseComparer<T> : IComparer<T>
{
private IComparer<T> _comparer = Comparer<T>.Default;
public int Compare(T? x, T? y)
{
return _comparer.Compare(x, y) * -1;
}
}
The exception you are encountering is most likely due to the fact that the OrderBy clause cannot be translated into SQL by Entity Framework. The OrderBy clause is executed in memory after the data has been retrieved from the database, which is why it works when you remove it and keep only the GroupBy clause.
However, if you want to order the dictionary by descending, you can simply call the Reverse method on the ToDictionary result:
var professorGradedCourses = _dbContext.Courses
.Where(course => course.ProfessorId == professorId && course.Grades.Any())
.Select(course => new CourseViewModel
{
Title = course.Title,
Semester = course.Semester,
})
.GroupBy(course => course.Semester)
.OrderByDescending(course => course.Key)
.ToDictionary(group => group.Key,
group => group.ToList())
.Reverse();
This way, the dictionary will be sorted in descending order based on the semester.
Give this a try and let me know how it works for you.
EDIT:
Converting the IEnumerable back to a Dictionary should work like this:
var professorGradedCourses = _dbContext.Courses
.Where(course => course.ProfessorId == professorId && course.Grades.Any())
.Select(course => new CourseViewModel
{
Title = course.Title,
Semester = course.Semester,
})
.GroupBy(course => course.Semester)
.OrderByDescending(course => course.Key)
.ToDictionary(group => group.Key,
group => group.ToList())
.Reverse()
.ToDictionary(pair => pair.Key,
pair => pair.Value);

Use Linq to filter data from another table

I have a MovieReviewsDataContext context that includes a Movie and Review table.
I would like to retrieve the top 15 movies that have at least two reviews and that have an average review better than the average movie reviews.
The code I was able to write:
var topMovies = movieReviewsDataContext.Movies
.Where(m => m.Reviews.Count > 2)
.Where(m => m.Reviews.Average(r => r.rating) >
movieReviewsDataContext.Reviews.Average(r => r.rating))
.Take(15);
checks only if the average rating for the movie is higher than the global average. How do I change it to compare it to the average of all the movies' average?
Assuming you have a
public virtual Movie Movie {get;set;} property in Review entity
If you have not that property, you have probably something else (an int MovieId for example) to discriminate the review's movie, which you can use in the group by clause.
var averageOfMovieAverages = movieReviewsDataContext
.GroupBy(x => x.Movie.Id)
//.Where(x => x.Count() > 2) if you need this clause
.Select(x => x.Average(m => m.rating))
.Average();
which you can use in your query (you can also do this directly in your query, but I just created a variable for readability).

Retrieve a next record in a list ordered by DateTime

Consider the following list of dates
ID NAME DATE
1 Mary 01-01-1901
2 Mary 01-01-1901
3 Mary 01-01-1901
4 Mary 01-01-1901
5 Lucy 01-01-1951
6 Peter 01-01-1961
The above is a list ORDERED BY DATE. Hence in the database it is not represented like so.
I am trying to fetch the next record in the list BY DATE. What I am doing is, retrieving the list of
persons from the database, then ordering by Date and then by ID. What is happening is that the
IDs returned are always the same:
Next Record -> 2 -> Next Record -> 1 -> Next record 2
and so on... It seems I am stuck on the first 2 records. I am using LINQ-to-SQL
Below is the code I am using to achieve this
string newID = dx.Persons
.AsEnumerable()
.ToList()
.Where(
x => x.DOB.CompareTo(newConvertedValue) == 0 && x.Id > currentID
||
x.DOB.CompareTo(newConvertedValue) > 0 && x.Id != currentID)
.OrderBy(x => x.DOB)
.ThenBy(x => x.Id)
.Select(x => x.Id.ToString(CultureInfo.InvariantCulture))
.First();
For me it makes no sense to order the list after specifying the Where conditions, but when i tried to modify
the statement I got a no element in sequence exception.
Please note that the variables stated above have the following meanings:
newConvertedvalue = date of birth of the currently displayed record
currentID = ID of the currently displayed record.
I have tried using different solutions but I cannot seem to find a way to solve this.
I have implemented similar methods to sort and fetch next records for columns containing integers and strings,
which are working fine (Also thanks to the contribution from this great website).
But this column containing Dates is giving me a hard time.
Thanks for reading.
Use
var Ids = dx.Persons
.AsEnumerable()
.ToList()
.Where(x => x.DOB.CompareTo(newConvertedValue) == 0 && x.Id > currentID
||
x.DOB.CompareTo(newConvertedValue) > 0 && x.Id != currentID)
.OrderBy(x => x.DOB)
.ThenBy(x => x.Id)
.Select(x => new { ID = Id, Date = DOB })
.ToList();
Ids.ForEach(x => Console.WriteLine(x.ID));

linq, group by and count

I have a list of Unions with several members and another table with page hits by Union.
Need a report that list each union, the number of each type of member and the clicks (page views).
clicks are on a separate table with a date and unionID
this is close but doesn't really have the results I want. The count shows the total of members but not the total of clicks. How do i get the total clicks? (count records in clicks table that match the UnionID)
(from c in db.view_Members_Details
join h in db.tbl_Clicks on c.unionID equals h.unionID
group c by new { c.UnionName, h.unionID } into g
select new
{
TotalClicks = g.Count(),
UnionName = g.Key.UnionName,
userTypeID1 = g.Where(x => x.UnionName.Equals(g.Key.UnionName) && x.userTypeID.Equals(1)).Count(),
userTypeID2= g.Where(x => x.UnionName.Equals(g.Key.UnionName) && x.userTypeID.Equals(2)).Count(),
userTypeID3= g.Where(x => x.UnionName.Equals(g.Key.UnionName) && x.userTypeID.Equals(3)).Count(),
}).ToList();
results should be:
Clicks Count | Union Name | userTypeID1 Count | userTypeID2 Count | userTypeID3 Count |
I don't think you need the first condition in your WHERE, because you're already grouping on the UnionName.
g.Count(x => x.userTypeID == 3) //etc for 1, 2, 3
As for the clicks, try the following:
TotalClicks = g.Count(x => x.unionID == g.Key.unionID)

How do I .OrderBy() and .Take(x) this LINQ query?

The LINQ query below is working fine but I need to tweak it a bit.
I want all the records in the file grouped by recordId (a customer number) and then ordered by, in descending order, the date. I'm getting the grouping and the dates are in descending order. Now, here comes the tweaking.
I want the groups to be sorted, in ascending order, by recordId. Currently, the groups are sorted by the date, or so it seems. I tried adding a .OrderBy after the .GroupBy and couldn't get that to work at all.
Last, I want to .take(x) records where x is dependent on some other factors. Basically, the .take(x) will return the most-recent x records. I tried placing a .take(x) in various places and I wasn't getting the correct results.
var recipients = File.ReadAllLines(path)
.Select (record => record.Split('|'))
.Select (tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
recordId = tokens[13],
date = Convert.ToDateTime(tokens[17])
}
)
.OrderByDescending (m => m.date)
.GroupBy (m => m.recordId)
.Dump();
Edit #1 -
recordId is not unique. There may / will likely be multiple records with the same recordId. recordId is actually a customer number.
The output will be a resultset with first name, last name, date, and recordId. Depending on several factors, there many be 1 to 5 records returned for each recordId.
Edit #2 -
The .Take(x) is for the recordId. Each recordId may have multiple rows. For now, let's assume I want the most recent date for each recordId. (select top(1) when sorted by date descending)
Edit #3 -
The following query generates the following results. Note each recordId only produces 1 row in the output (this is okay) and it appears it is the most recent date. I haven't thouroughly checked this yet.
Now, how do I sort, in ascending order, by recordId?
var recipients = File.ReadAllLines(path)
.Select (record => record.Split('|'))
.Select (tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
recordId = Convert.ToInt32(tokens[13]),
date = Convert.ToDateTime(tokens[17])
}
)
.GroupBy (m => m.recordId)
.OrderByDescending (m => m.Max (x => x.date ) )
.Select (m => m.First () )
.Dump();
FirstName LastName recordId date
X X 2531334 3/11/2011 12:00:00 AM
X X 1443809 10/18/2001 12:00:00 AM
X X 2570897 3/10/2011 12:00:00 AM
X X 1960526 3/10/2011 12:00:00 AM
X X 2475293 3/10/2011 12:00:00 AM
X X 2601783 3/10/2011 12:00:00 AM
X X 2581844 3/6/2011 12:00:00 AM
X X 1773430 3/3/2011 12:00:00 AM
X X 1723271 2/4/2003 12:00:00 AM
X X 1341886 2/28/2011 12:00:00 AM
X X 1427818 11/15/1986 12:00:00 AM
You can't that easily order by a field which is not part of the group by fields. You get a list for each group. This means, you get a list of date for each recordId.
You could order by Max(date) or Min(date).
Or you could group by recordId and date, and order by date.
order by most recent date:
.GroupBy (m => m.recordId)
// take the most recent date in the group
.OrderByDescending (m => m.Max(x => x.date))
.SelectMany(x => x.First
The Take part is another question. You could just add Take(x) to the expression, then you get this number of groups.
Edit:
For a kind of select top(1):
.GroupBy (m => m.recordId)
// take the most recent date in the group
.OrderByDescending (m => m.Max(x => x.date))
// take the first of each group, which is the most recent
.Select(x => x.First())
// you got the most recent record of each recordId
// and you can take a certain number of it.
.Take(x);
snipped I had before in my answer, you won't need it according to your question as it is now:
// create a separate group for each unique date and recordId
.GroupBy (m => m.date, m => m.recordId)
.OrderByDescending (m => m.Key)
This seems very similar to your other question - Reading a delimted file using LINQ
I don't believe you want to use Group here at all - I believe instead that you want to use OrderBy and ThenBy - something like:
var recipients = File.ReadAllLines(path)
.Select (record => record.Split('|'))
.Select (tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
recordId = tokens[13],
date = Convert.ToDateTime(tokens[17])
}
)
.OrderBy (m => m.recordId)
.ThenByDescending (m => m.date)
.Dump();
For a simple Take... you can just add this .Take(N) just before the Dump()
However, I'm not sure this is what you are looking for? Can you clarify your question?
just add
.OrderBy( g=> g.Key);
after your grouping. This will order your groupings by RecordId ascending.
Last, I want to .take(x) records where
x is dependent on some other factors.
Basically, the .take(x) will return
the most-recent x records.
If you mean by "the most recent" by date, why would you want to group by RecordId in the first place - just order by date descending:
..
.OrderByDescending (m => m.date)
.Take(x)
.Dump();
If you just want to get the top x records in the order established by the grouping though you could do the following:
...
.GroupBy (m => m.recordId)
.SelectMany(s => s)
.Take(x)
.Dump();
If you want something like the first 3 for each group, then I think you need to use a nested query like:
var recipients = File.ReadAllLines(path)
.Select(record => record.Split('|'))
.Select(tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
RecordId = tokens[13],
Date = Convert.ToDateTime(tokens[17])
}
)
.GroupBy(m => m.RecordId)
.Select(grouped => new
{
Id = grouped.Key,
First3 = grouped.OrderByDescending(x => x.Date).Take(3)
}
.Dump();
and if you want this flattened into a record list then you can use SelectMany:
var recipients = var recipients = File.ReadAllLines(path)
.Select(record => record.Split('|'))
.Select(tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
RecordId = tokens[13],
Date = Convert.ToDateTime(tokens[17])
}
)
.GroupBy(m => m.RecordId)
.Select(grouped => grouped.OrderByDescending(x => x.Date).Take(3))
.SelectMany(item => item)
.Dump();

Categories

Resources