Select records with max property value per group - c#

I've got a data set like this:
GroupName GroupValue MemberName MemberValue
'Group1' 1 'Member1' 1
'Group1' 1 'Member2' 2
'Group2' 2 'Member3' 3
'Group2' 2 'Member4' 2
'Group3' 2 'Member5' 4
'Group3' 2 'Member6' 1
What I want to select is the rows that have the maximum MemberValue per GroupName, but only for those GroupNames that have the largest GroupValue, and pass them into a delegate function. Like this:
'Group2' 2 'Member3' 3
'Group3' 2 'Member5' 4
So far I've tried this format...
data.Where(maxGroupValue =>
maxGroupValue.GroupValue == data.Max(groupValue => groupValue.GroupValue))
.Select(FunctionThatTakesData)
...but that just gives me every member of Group2 and Group3. I've tried putting a GroupBy() before the Select(), but that turns the output into an IGrouping<string, DataType> so FunctionThatTakesData() doesn't know what to do with it, and I can't do another Where() to filter out only the maximum MemberValues.
What can I do to get this data set properly filtered and passed into my function?

You can do that with the following Linq.
var results = data.GroupBy(r = r.GroupValue)
.OrderByDescending(g => g.Key)
.FirstOrDefault()
?.GroupBy(r => r.GroupName)
.Select(g => g.OrderByDescending(r => r.MemberValue).First());
First you have to group on the GroupValue then order the groups in descending order by the Key (which is the GroupValue) and take the first one. Now you have all the rows with the max GroupValue. Then you group those on the GroupName and from those groups order the MemberValue in descending order and take the First row to get the row in each GroupName group with the max MemberValue. Also I'm using the C# 6 null conditional operator ?. after FirstOrDefault in case data is empty. If you're not using C# 6 then you'll need to handle that case up front and you can just use First instead.

So basically what you want, is to divide your data elements into groups with the same value for GroupName. From every group you want to take one element, namely the one with the largest value for property MemberValue.
Whenever you have a sequence of items, and you want to divide this sequence into groups based on the value of one or more properties of the items in the sequence you use Enumerable.GroupBy
'GroupBy' takes your sequence as input and an extra input parameter: a function that selects which properties of your items you want to compare in your decision in which group you want the item to appear.
In your case, you want to divide your sequence into groups where all elements in a group have the same GroupName.
var groups = mySequence.GroupBy(element => element.GroupName);
What it does, it takes from every element in mySequence the property GroupName, and puts this element into a group of elements that have this value of GroupName.
Using your example data, you'll have three groups:
The group with all elements with GroupName == "Group1". The first two elements of your sequence will be in this group
The group with all elements with GroupName == "Group2". The third and fourth element of your sequence will be in this group
The group with all elements with GroupName == "Group3". The last two elements of your sequence will be in this group
Each group has a property Key, containing your selection value. This key identifies the group and is guaranteed to be unique within your collection of groups. So you'll have a group with Key == "Group1", a group with Key == "Group2", etc.
Besides the Key, every group is a sequence of the elements in the group (note: the group IS an enumerable sequence, not: it HAS an enumerable sequence.
Your second step would be to take from every group the element in the group with the largest value for MemberValue. For this you would order the elements in the group by descending value for property MemberValue and take the first one.
var myResult = mySequence.GroupBy(element => element.GroupName)
// intermediate result: groups where all elements have the same GroupName
.Select(group => group.OrderByDescending(groupElement => groupElement.MemberValue)
// intermediate result: groups where all elements are ordered in descending memberValue
.First();
Result: from every group ordered by descending memberValue, take the first element, which should be the largest one.
It is not very efficient to order the complete group, if you only want the element with the largest value for memberValue. The answer for this can be found
here on StackOverflow

The easier way to solve this problem is to use the new (.NET 6) MaxBy LINQ operator, along with the GroupBy and Select operators:
IEnumerable<Record> query = records
.GroupBy(x => x.GroupName)
.Select(g => g.MaxBy(x => x.MemberValue));
This is an easy but not memory efficient solution. The reason is because it generates a full blown Lookup<TKey, TSource> structure under the hood, which is a dictionary-line container that contains all the records associated with each key. This structure is generated before starting to compare the elements contained in each grouping, in order to select the maximum element.
In most cases this inefficiency is not a problem, because the records are not that many, and they are already stored in memory. But if you have a truly deferred enumerable sequence that contains a humongous number of elements, you might run out of memory. In this case you could use the GroupMaxBy operator below. This operator stores in memory only the currently maximum element per key:
/// <summary>
/// Groups the elements of a sequence according to a specified key selector
/// function, and then returns the maximum element in each group according to
/// a specified value selector function.
/// </summary>
public static IEnumerable<TSource> GroupMaxBy<TSource, TKey, TValue>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TSource, TValue> valueSelector,
IEqualityComparer<TKey> keyComparer = default,
IComparer<TValue> valueComparer = default)
{
// Arguments validation omitted
valueComparer ??= Comparer<TValue>.Default;
var dictionary = new Dictionary<TKey, (TSource Item, TValue Value)>(keyComparer);
foreach (var item in source)
{
var key = keySelector(item);
var value = valueSelector(item);
if (dictionary.TryGetValue(key, out var existing) &&
valueComparer.Compare(existing.Value, value) >= 0) continue;
dictionary[key] = (item, value);
}
foreach (var entry in dictionary.Values)
yield return entry.Item;
}
Usage example:
IEnumerable<Record> query = records
.GroupMaxBy(x => x.GroupName, x => x.MemberValue);
The reverse GroupMinBy can be implemented similarly by replacing the >= with <=.
Below is a demonstration of the difference in memory-efficiency between the two approaches:
var source = Enumerable.Range(1, 1_000_000);
{
var mem0 = GC.GetTotalAllocatedBytes(true);
source.GroupBy(x => x % 1000).Select(g => g.MaxBy(x => x % 3333)).Count();
var mem1 = GC.GetTotalAllocatedBytes(true);
Console.WriteLine($"Allocated: {mem1 - mem0:#,0} bytes");
}
{
var mem0 = GC.GetTotalAllocatedBytes(true);
source.GroupMaxBy(x => x % 1000, x => x % 3333).Count();
var mem1 = GC.GetTotalAllocatedBytes(true);
Console.WriteLine($"Allocated: {mem1 - mem0:#,0} bytes");
}
Output:
Allocated: 8,571,168 bytes
Allocated: 104,144 bytes
Try it on Fiddle.

Related

linq groupby and max count

I have a class like
public class Test
{
public string name;
public int status;
}
Example data
new Test("Name1", 1);
new Test("Name2", 2);
new Test("Name3", 3);
new Test("Name4", 1);
new Test("Name5", 2);
new Test("Name6", 2);
new Test("Name7", 3);
I'm looking for some linq to return the value 2 - which is the status that occurs the most.
Currently I have the following which is not correct.
var status = listTest.GroupBy(x => x.status).Select(x => x.OrderByDescending(t => t.status).First()).FirstOrDefault().status;
But hoping there is something cleaner?
I think this is what you want
You need to order the groups themselves, not what is in each group.
var status = listTest
.GroupBy(x => x.Status)
.OrderByDescending(g => g.Count())
.FirstOrDefault()?.Key;
You can group and pick the top after sorting them descending
var value = list.GroupBy(q => q.status)
.OrderByDescending(gp => gp.Count())
.First().Key;
Requirement: Given a sequence of objects of class Test, where every Test has an int property Status, give me the value of Status that occurs the most.
For this, make Groups Test objects that have the same value for property Status. Count the number of elements in each group. Order the result such that the group with the largest number comes first, and take the first element.
IEnumerable<Test> testSequence = ...
var statusThatOccursMost = testSequence
// make Groups of Tests that have the same value for Status:
.GroupBy(test => test.Status,
// parameter resultSelector: for every occurring Status value and all
// Tests with this common status value, make one new object,
// containing the common Status value and the number of Tests that have
// this common Status value
(commonStatusValue, testsThatHaveThisCommonStatusValue) => new
{
Status = commonStatusValue,
Count = testsThatHaveThisCommonStatusValue.Count(),
})
Result: a sequence of [Status, Count] combinations. The Status occurs at least once in testSequence. Count is the number of times that Status occurs. So we know, that Count is >= 1.
Order this sequence of [Status, Count] combinations by descending value of Count, so the first element is the one with the largest value for Count:
.OrderByDescenting(statusCountCombination => statusCountCombination.Count)
Result: a sequence of [Status, Count] combinations, where the combination with the largest value of Count comes first.
Extract the value of Status from the combination, and take the first one:
.Select(statusCountCombination => statusCountCombination.Status)
.FirstOrDefault();
Optimization
Although this LINQ is fairly simple, it is not very efficient to count all Status values and order all StatusCount combinations, if you only want the one that has the largest value for Count.
Consider to create an extension method. If you are not familiar with extension methods, read Extension Methods Demystified
Make a Dictionary: key is the Status. Value is the number of time this Status has occurred. Then take the Status with the largest Count
public static int ToMostOccuringStatusValueOrDefault(
this IEnumerable<Test> testSequence)
{
// return default if testSequence is empty
if (!testSequence.Any()) return 0;
Dictionary<int, int> statusCountCombinations = new Dictionary<int, int>();
foreach (Test test in testSequence)
{
if (statusCountCombinations.TryGetValue(test.Status, out int count)
{
// Status value already in dictionary: increase count:
statusCountCombinations[test.Status] = count + 1;
}
else
{
// Status value not in dictionary yet. Add with count 1
statusCountCombinations.Add(test.Status, 1);
}
}
GroupBy works similar to above, except it will first make a Dictionary where every Value is a list of Tests. Then if counts the number of Tests, and throws away the list. In the extension method we don't have to make the List.
Continuing the extension method: find the KeyValuePair that has the largest Value. We can use Enumerable.Aggregate, or enumerate:
using (var enumerator = statusCountCombinations.GetEnumerator())
{
// we know there is at least one element
enumerator.MoveNext();
// the first element is the largest until now:
KeyValuePair<int, int> largest = enumerator.Current;
// enumerate the rest:
while (enumerator.MoveNext)
{
if (enumerator.Current.Value > largest.Value)
{
// found a new largest one
largest = enumerator.Current;
}
}
return largest.Key;
}
}
In this method we only have to enumerate testSequence once, and your Dictionary once. If you would use Linq GroupBy / OrderByDescending, the result of GroupBy would be enumerated several times
Usage:
IEnumerable<Test> testSequence = ...
var mostCommonStatus = testSequence.ToMostOccurringStatusValueOrDefault();

get priority items from a list and insert them into the same list at given position c#

I have a list of 50 sorted items(say) in which few items are priority ones (assume they have flag set to 1).
By default, i have to show the latest items (based on date) first, but the priority items should appear after some 'x' number of records. Like below
index 0: Item
index 1: Item
index 2: Priority Item (insert priority items from this position)
index 3: Priority Item
index 4: Priority Item
index 5: Item
index 6: Item
The index 'x' at which priority items should be inserted is pre-defined.
To achieve this, i am using following code
These are my 50 sorted items
var list= getMyTop50SortedItems();
fetching all priority items and storing it in another list
var priorityItems = list.Where(x => x.flag == 1).ToList();
filtering out the priority items from main list
list.RemoveAll(x => z.flag == 1);
inserting priority items in the main list at given position
list.InsertRange(1, priorityRecords);
This process is doing the job correctly and giving me the expected result. But am not sure whether it is the correct way to do it or is there any better way (considering the performance)?
Please provide your suggestions.
Also, how is the performance effected as i am doing many operations (filter, remove, insert) considering the increase in number of records from 50 to 100000(any number).
Update: How can i use IQueryable to decrease the number of operations on list.
As per documentation on InsertRange:
This method is an O(n * m) operation, where n is the number of
elements to be added and m is Count.
n*m isn't too very good, so I would use LINQ's Concat method to create a whole new list from three smaller lists, instead of modifying an existing one.
var allItems = getMyTop50();
var topPriorityItems = list.Where(x => x.flag == 1).ToList();
var topNonPriorityItems = list.Where(x => x.flag != 1).ToList();
var result = topNonPriorityItems
.Take(constant)
.Concat(topPriorityItems)
.Concat(topNonPriorityItems.Skip(constant));
I am not sure how fast the Concat, Skip and Take methods for List<T> are, though, but I'd bet they are not slower than O(n).
It seems like the problem you're actually trying to solve is just sorting the list of items. If this is the case, you don't need to be concerned with removing the priority items and reinserting them at the correct index, you just need to figure out your sort ordering function. Something like this ought to work:
// Set "x" to be whatever you want based on your requirements --
// this is the number of items that will precede the "priority" items in the
// sorted list
var x = 3;
var sortedList = list
.Select((item, index) => Tuple.Create(item, index))
.OrderBy(item => {
// If the original position of the item is below whatever you've
// defined "x" to be, then keep the original position
if (item.Item2 < x) {
return item.Item2;
}
// Otherwise, ensure that "priority" items appear first
return item.Item1.flag == 1 ? x + item.Item2 : list.Count + x + item.Item2;
}).Select(item => item.Item1);
You may need to tweak this slightly based on what you're trying to do, but it seems much simpler than removing/inserting from multiple lists.
Edit: Forgot that .OrderBy doesn't provide an overload that provides the original index of the item; updated answer to wrap the items in a Tuple that contains the original index. Not as clean as the original answer, but it should still work.
This can be done using a single enumeration of the original collection using linq-to-objects. IMO this also reads pretty clearly based on the original requirements you defined.
First, define the "buckets" that we'll be sorting into: I like using an enum here for clarity, but you could also just use an int.
enum SortBucket
{
RecentItems = 0,
PriorityItems = 1,
Rest = 2,
}
Then we'll define the logic for which "bucket" a particular item will be sorted into:
private static SortBucket GetBucket(Item item, int position, int recentItemCount)
{
if (position <= recentItemCount)
{
return SortBucket.RecentItems;
}
return item.IsPriority ? SortBucket.PriorityItems : SortBucket.Rest;
}
And then a fairly straightforward linq-to-objects statement to sort first into the buckets we defined, and then by the original position. Written as an extension method:
static IEnumerable<Item> PrioritySort(this IEnumerable<Item> items, int recentItemCount)
{
return items
.Select((item, originalPosition) => new { item, originalPosition })
.OrderBy(o => GetBucket(o.item, o.originalPosition, recentItemCount))
.ThenBy(o => o.originalPosition)
.Select(o => o.item);
}

Select item from one list based on item in another list in the same order

exhibits contain ids that are in a certain order. When I query another table to get the BMI ids based on the exhibit ids, the order is not the same. Instead of pulling the first document id in exhibit, I think it is pulling the first record in the database that has the same exhibit id, but I want it to pull the record in the database in the same order as the exhibits ids.
var exhibits = _context.ApExhibits.Where(x => x.CASE_ID == apDockets.CASE_ID)
.Where(x => x.EXHIBIT_NBR != null)
.Where(x => !documents617.Contains(x.DOC_ID))
.OrderBy(x => x.EXHIBIT_NBR)
.Select(x => x.DIM_ID).ToList();
if (exhibits.Count > 0)
{
var bmiIds =
_context.DocumentImages.Where(x => exhibits.Contains((int)x.DIM_ID))
.Select(x => (int)x.BMI_ID).ToList();
}
it seems like your first collection exhibits is ordered based on EXHIBIT_NBR whereas when you query the _context.DocumentImages you're not ordering it by the same property, hence you're going to receive results based on the order of the elements in the source sequence which in this case is _context.DocumentImages. Essentially, you're saying "given an element of the source sequence DocumentImages, search linearly within the exhibits collection and if there is an element which meets the given criteria then retain the element of the source sequence".
So let's say the first element from the source sequence DocumentImages to be passed into the Where clause has an equivalent id of an element from the collection exhibits but the element in exhibits is say at the 5th position, this will make the element of the source sequence the first element of the result list when we perform a ToList() eager operation on the methods whereas it should technically be at the 5th position of the result list given that the matching element from exhibits is also at the 5th position.
So in order to have elements that are in the same order as exhibits, one solution is to inner join the DocumentImages and exhibits collections which would be the equivalent of checking if one element in one collection is contained within another. Then we can order by the same property as you did with exhibits.
example with query syntax:
var bmiIds = (from e in _context.ApExhibits
join x in _context.DocumentImages on e.DIM_ID equals (int)x.DIM_ID
where exhibits.Contains((int)x.DIM_ID)
orderby e.EXHIBIT_NBR
select (int)x.BMI_ID).ToList();
example with fluent syntax:
var bmiIds = _context.ApExhibits
.Join(_context.DocumentImages,
e => e.DIM_ID,
x => (int)x.DIM_ID,
(e, x) => new { e.EXHIBIT_NBR, x.BMI_ID })
.Where(x => exhibits.Contains((int)x.DIM_ID))
.OrderBy(e => e.EXHIBIT_NBR)
.Select(x => (int)x.BMI_ID).ToList();

Use LINQ to group by one field, sum based on group AND store on Dictionary

I am having a hard time to achieve a result likely because I'm misunderstanding sintax. Here the case:
I have a class that outputs a list. In this list there are two fields that I want specifically. they are ItemID (string) and TotalNet (string). There are other fields but I don't need to work with them at this point, but I do later at code.
The ItemID is the identifier of an Item, it is not unique in the list, so I must group it into one and account how many times this item appears. I achieved that with the following:
Dictionary<string, int> dstLoops = list.Select(l => l.ItemID).ToList()
.GroupBy(x => x).ToDictionary(g => g.Key, g => g.Count());
So I get my dictionary with the ItemID, now unique, with the total times that Item appears on my list.
The part that I am struggling to achieve is to sum the TotalNet of all the times the ItemID appears on the list.
I managed to get that using this:
var APTotal = from i in list group i by i.ItemID into g select new
{ total = g.Key, totals = g.Sum(i =>Convert.ToDecimal(i.TotalAmountPaid)) };
The problem is, it is not a dictionary, and later in the code that will be a headache... well... not much but I would like to avoid it, so I tried:
Dictionary<string, decimal> liststotal = list.Select(l => l.ItemID).ToList().
GroupBy(x => x).ToDictionary(g => g.Key, g => g.Sum(Convert.ToDecimal(item.TotalAmountPaid)));
Unfortunately it returns me that I cannot convert decimal to System.Func
I have tried some other solutions from this forum and other places, but they usually don't add values to the dictionary.
I can accept two scenarios as a solution:
Exactly what I requested above, where I will get a
dictionary with all Items from my List grouped by
ItemID and sum of their respective item.TotalNet;
FROM:
Item1, 10
Item2, 15
Item2, 15
TO:
Item1, 10
Item2, 30
OR:
2. a var/decimal/whatever that returns me the TotalNet of the current
ItemID iteration (I can use a foreach/for later on the code to
achieve the result I am looking for). So to be clear, this TotalNet
must be the sum of all TotalNets with the same ItemID from my list
that I am iterating at the moment. Something like "I'am on ItemID 1,
go and sum all the TotalNet of ItemID 1 from List".
Honestly... I would like help with the first option, for the sake of learning, but a solution is appreciated whatever the means (sacrificing children in name of an old god to get my output will not be marked as a solution...and possibly down voted).
Would something like this work?
list.GroupBy(x => x.ItemID)
.ToDictionary(x => x.Key,
x => x.Sum(t => Convert.ToDecimal(t.TotalAmountPaid)));
#bashis answer is functionally what you are looking for. I thought I could explain the logic a bit.
You have a list with items that have an ItemID as the identifier and TotalNet as the amount, given as a string. You would like to calculate the sum of TotalNet of all elements with the same ItemId, and store the results in a dictionary.
First, we begin with a GroupBy statement. It accepts a lambda used for determining what items in the list to group together. Its return value:
The GroupBy(IEnumerable, Func, Func) method returns a collection of IGrouping objects, one for each distinct key that was encountered. An IGrouping is an IEnumerable that also has a key associated with its elements.
So you get a collection of groups. Each group contains an enumeration of all the elements which returned the same value for the grouping expression that you entered into the GroupBy statement. In our example, all elements that have the same ItemId. Furthermore, the value of the ItemId used for each group is stored as the group's key.
The final step presented in this solution uses the ToDictionary method. Namely, this flavor:
public static Dictionary<TKey, TElement> ToDictionary<TSource, TKey, TElement>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TSource, TElement> elementSelector
)
Here, we need to pass two funcs - one to select the key used for the dictionary, and the other to create the value that will be stored in our dictionary.
For the key selector, we choose the group's key, which in turn is the element's ItemId. For the value, remember that each group contains all the elements with the same ItemId, so we can just Sum the TotalAmountPaid value (converting to decimal from string) and get the desired result.

Group by Linq setting properties

I'm working on a groupby query using Linq, but I want to set the value for a new property in combination with another list. This is my code:
var result = list1.GroupBy(f => f.Name)
.ToList()
.Select(b => new Obj
{
ClientName = b.Name,
Status = (AnotherClass.List().Where(a=>a.state_id=b.????).First()).Status
})
I know I'm using a group by, but I'm not sure of how to access the value inside my bcollection to compare it with a.state_id.
This snippet:
Status = (AnotherClass.List().Where(a=>a.state_id=b.????).First()).Status
I've done that before but months ago I don't remember the syntax, when I put a dot behind b I have acces only to Key and the Linq Methods... What should be the syntax?`
Issue in your code is happening here:
a=>a.state_id=b.????
Why ?
Check type of b here, it would be IGrouping<TKey,TValue>, which is because, post GroupBy on an IEnumerable, you get result as IEnumerable<IGrouping<TKey,TValue>>
What does that mean?
Think of Grouping operation in the database, where when you GroupBy on a given Key, then remaining columns that are selected need an aggregation operation,since there could be more than one record per key and that needs to be represented
How it is represented in your code
Let's assume list1 has Type T objects
You grouped the data by Name property, which is part of Type T
There's no data projection so for a given key, it will aggregate the remaining data as IEnumerable<T>, as grouped values
Result is in the format IEnumerable<IGrouping<TK, TV>>, where TK is Name and TV represent IEnumerable<T>
Let's check out some code, break your original code in following parts
var result = list1.GroupBy(f => f.Name) - result will be of type IEnumerable<IGrouping<string,T>>, where list1 is IEnumerable<T>
On doing result.Select(b => ...), b is of type IGrouping<string,T>
Further you can run Linq queries on b, as follows:
b.Key, will give access to Name Key, there's no b.Value, for that your options could be following or any other relevant Linq operations:
a=>b.Any(x => a.state_id == x.state_id) or // Suuggests if an Id match in the Collection
a=>a.state_id == b.FirstOrDefault(x => x.state_id) //Selects First or default Value
Thus you can create a final result, from the IGrouping<string,T>, as per the logical requirement / use case

Categories

Resources