LINQ expression: Specifing maximum groupby size - c#

Is there a elegant way of doing following in LINQ or should I write an extension for this
i have a list of objects that need to be grouped by startdate
lets say
09.00,
13.00,
13.00,
13.00,
15.00,
var groupedStartDates = startdate.groupby(x => x.StartDate);
I need to have maximum size of group to be 2.
Expected result is
var groupedStartDates = startDate.GroupBy(x => x.StartDate);
List
list1 {09.00}
list2 {13.00; 13.00}
list3 {13.00}
list4 {15.00}

After the initial grouping you can then group by the index (in the groups) divided by 2 to do a further grouping, then use SelectMany to flatten that back out.
var result = startDate.GroupBy(x => x.StartDate)
.SelectMany(grp => grp.Select((x,i) => new{x,i})
.GroupBy(a => a.i / 2)
.Select(sgrp => sgrp.Select(a => a.x)));
Here's a break down of what's going on. Note curly brackets will represent collections and square will represent object with multiple properties.
Initial data
09.00, 13.00, 13.00, 13.00, 15.00
After GroupBy(x => x.StartDate)
[Key:09.00, {09.00}], [Key:13.00, {13.00, 13.00, 13.00}], [Key:15.00, {15.00}]
Now it's going to operate on each group, but I'll show the results for all of them at each step.
After the Select((x,i) => new{x,i})
{[x:09.00, i:0]}, {[x:13.00, i:0], [x:13.00, i:1], [x:13.00, i:2]}, {[x:15.00, i:0]}
After the GroupBy(a => a.i / 2)
{[Key:0, {[x:09.00, i:0]}]}, {[Key:0, {[x:13.00, i:0], [x:13.00, i:1]}], [Key:1, {[x:13.00, i:2]}}, {[Key:0, {[x:15.00, i:0]}}
After the .Select(sgrp => sgrp.Select(a => a.x))
{{09.00}}, {{13.00, 13.00}, {13.00}}, {{15.00}}
And finally the SelectMany will flatten that to.
{09.00}, {13.00, 13.00}, {13.00}, {15.00}
Note that each line represents a collection, but I didn't put curly braces around them as I felt it made it even harder to read.
Or with an extension method
public static IEnumerable<IEnumerable<T>> Bin<T>(this IEnumerable<T> items, int binSize)
{
return items
.Select((x,i) => new{x,i})
.GroupBy(a => a.i / binSize)
.Select(grp => grp.Select(a => a.x));
}
You can make it a little nicer.
var result = startDate
.GroupBy(x => x.StartDate)
.SelectMany(grp => grp.Bin(2));
Update: As of .Net 6 they have added the new Linq method Chuck that does the same thing as my Bin method above. So now you can do
var result = startDate
.GroupBy(x => x.StartDate)
.SelectMany(grp => grp.Chunk(2));

If I understand your question correctly, you can use Take:
var result= startDate.GroupBy(x => x.StartDate)
.Select(x => x.Take(2))
.ToList();
Each group will contains at most 2 members and additional items of groups will not return.

Related

Sorting a formatted LINQ query

A sequence of non-empty strings stringList is given, containing only uppercase letters of the Latin alphabet. For all strings starting with the same letter, determine their total length and obtain a sequence of strings of the form "S-C", where S is the total length of all strings from stringList that begin with the character C. Order the resulting sequence in descending order of the numerical values of the sums, and for equal values of the sums, in ascending order of the C character codes.
This question is related to one of my previous questions.
One solution that works is this one:
stringList.GroupBy(x => x[0]).Select(g => $"{g.Sum(x => x.Length)}-{g.Key}");
The problem is that with this given example I don't know where to add the OrderByDescending()/ThenBy() clauses in order to get the correctly sorted list.
Create an intermediate data structure to store needed info and use it for sorting and then building the output:
stringList
.GroupBy(x => x[0])
.Select(g => (Length: g.Sum(x => x.Length), Char: g.Key))
.OrderByDescending(t => t.Length)
.ThenBy(t => t.Char)
.Select(t => $"{t.Length}-{t.Char}");
You're almost there. The cleanest way of doing it would be to make a more complex object with the properties you care about, use those to sort, then keep only what you want in the output. Like:
stringList
.GroupBy(x => x[0])
.Select(g => new {
Len = g.Sum(x => x.Length),
Char = g.Key,
Val = $"{g.Sum(x => x.Length)}-{g.Key}"
})
.OrderByDescending(x => Len)
.ThenBy(x => x.Char)
.Select(x => x.Val);
You can add a Select after the GroupBy to transform the groups into an anonymous object containing the things you want to sort by. Then you can use OrderByDescending and ThenBy to sort. After that, Select the formatted string you want:
stringList.GroupBy(x => x[0]) // assuming all strings are non-empty
.Select(g => new {
LengthSum = g.Sum(x => x.Length),
FirstChar = g.Key
})
.OrderByDescending(x => x.LengthSum)
.ThenBy(x => x.FirstChar)
.Select(x => $"{x.LengthSum}-{x.FirstChar}");
Alternatively, do it in the query syntax with let clauses, which I find more readable:
var query = from str in stringList
group str by str[0] into g
let lengthSum = g.Sum(x => x.Length)
let firstChar = g.Key
orderby lengthSum descending, firstChar
select $"{lengthSum}-{firstChar}";

Linq get Distinct ToDictionary

need help to only select/get distinct entries based on i.Code.
There are duplicates and thus I'm getting an error in my expression "An item with the same key has already been added."
var myDictionary = dbContext.myDbTable
.Where(i => i.shoesize>= 4)
.OrderBy(i => i.Code)
.ToDictionary(i => i.Code, i => i);
Have tried to use Select and/or Distinct in different combinations and also by themselves but am still getting the same error
var myDictionary= dbContext.myDbTable
.Where(i => i.shoesize>= 4)
.OrderBy(i => i.Code)
//.Select(i => i)
//.Distinct()
.ToDictionary(i => i.Code, i => i);
Can anybody help? C#
UPDATE: If there are multiple objects with the same code I only want to add the first object(with that particular code) to myDictionary.
You can group by Code and select the first item from each group (which is equivalent to distinct):
var myDictionary = dbContext.myDbTable
.Where(i => i.shoesize >= 4) // filter
.GroupBy(x => x.Code) // group by Code
.Select(g => g.First()) // select 1st item from each group
.ToDictionary(i => i.Code, i => i);
You don't need the OrderBy since Dictionarys represent an unordered collection. If you need an ordered dictionary you could use SortedDictionary.
It sounds to me that what you are looking for is .DistinctBy() (available in .NET 6), which lets you specify which property to distinct the elements in your collection by:
var myDictionary= dbContext.myDbTable
.Where(i => i.shoesize>= 4)
.DistinctBy(i => i.Code)
.ToDictionary(i => i.Code, i => i);
By dividing it and creating a list first it worked as compared to when it was all bundled up into one linq, guess the First() needed it to be in a list before being able to make it into a dict.
var firstLinq = dbContext.myDbTable
.Where(i => i.shoesize>= 4)
.ToList();
then
var finalLinq = fromConcurWithDuplicates
.GroupBy(i => i.Code)
.Select(i => i.First())
.ToList()
.ToDictionary(i => i.Code, i => i);

Find duplicates and return List(class) on a fast way

I have a class "litem" that contains strings, integers and doubles. I have a List(litem) called "myList". I would like to find duplicates in myList based on litem.c1 that is a string element of litem. I need to modify those duplicates elements by linking them to their duplicate pair with a unique ID. The problem is that my data is large and my code is very slow. myList has 2.2 million entries. I find the duplicates this way:
var duplicateItems = myList
.AsParallel()
.GroupBy(x => x.c1)
.Where(x => x.Count() > 1)
.Select(x => x.Key)
.ToList();
This runs in about 3 seconds and returns about 40.000 strings that are the litem.c1 values of the duplicates. Then I run:
var result = myList
.AsParallel()
.Where(item => duplicateItems.Any(d => d.Equals(item.c1)))
.ToList();
This returns the list of 80.000 litems that I need but it's running for more than 30 minutes while 100% loading an i7 CPU. After this I use a foreach on resut to add the link between duplicate litems found. Question is how can I get result on a cheaper way?
Instead of a list use a hashset, and check if the hashset contains the item.
var duplicateItems = new HashSet<string>(myList
.AsParallel()
.GroupBy(x => x.c1)
.Where(x => x.Count() > 1)
.Select(x => x.Key));
var result = myList
.AsParallel()
.Where(item => duplicateItems.Contains(item.c1))
.ToList();
This should speed up things.
FYI, hashset isnt thread-safe so .AsParallel() may result in error.
But, I dont really understand why you dont just do:
var groups = myList
.AsParallel()
.GroupBy(x => x.c1)
.Where(x => x.Count() > 1);
foreach (var group in groups)
{
foreach (var value in group)
{
//duplicate values
}
}
Rather than getting all of the duplicated items, then projecting that query out into just they group's key, only to go through and find all of the items in those groups again, you can just use the already grouped records to get your results, instead of dropping them on the floor in the first query, rendering the second query unnecessary.
var duplicateItems = myList
.GroupBy(x => x.c1)
.Where(x => x.Count() > 1)
.SelectMany(x => x)
.ToList();

How can I split a List<T> into two lists, one containing all duplicate values and the other containing the remainder?

I have a basic class for an Account (other properties removed for brevity):
public class Account
{
public string Email { get; set; }
}
I have a List<T> of these accounts.
I can remove duplicates based on the e-mail address easily:
var uniques = list.GroupBy(x => x.Email).Select(x => x.First()).ToList();
The list named 'uniques' now contains only one of each account based on e-mail address, any that were duplicates were discarded.
I want to do something a little different and split the list into two.
One list will contain only 'true' unique values, the other list will contain all duplicates.
For example the following list of Account e-mails:
unique#email.com
dupe#email.com
dupe#email.com
Would be split into two lists:
Unique
unique#email.com
Duplicates
dupe#email.com
dupe#email.com
I have been able to achieve this already by creating a list of unique values using the example at the top. I then use .Except() on the original list to get the differences which are the duplicates. Lastly I can loop over each duplicate to 'pop' it out of the unique list and move it to the duplicate list.
Here is a working example on .NET Fiddle
Can I split the list in a more efficient or syntactically sugary way?
I'd be happy to use a third party library if necessary but I'd rather just stick to pure LINQ.
I'm aware of CodeReview but feel the question also fits here.
var groups = list.GroupBy(x => x.Email)
.GroupBy(g => g.Count() == 1 ? 0 : 1)
.OrderBy(g => g.Key)
.Select(g => g.SelectMany(x => x))
.ToList();
groups[0] will be the unique ones and group[1] will be the non-unique ones.
var duplicates = list.GroupBy(x => x) // or x.Property if you are grouping by some property.
.Where(g => g.Count() > 1)
.SelectMany(g => g);
var uniques = list.GroupBy(x => x) // or x.Property if you are grouping by some property.
.Where(g => g.Count() == 1)
.SelectMany(g => g);
Alternatively, once you get one list, you can get the other one using Except:
var uniques = list.Except(duplicates);
// or
var duplicates = list.Except(uniques);
Another way to do it would be to get uniques, and then for duplicates simply get the elements in the original list that aren't in uniques.
IEnumerable<Account> uniques;
IEnumerable<Account> dupes;
dupes = list.Where(d =>
!(uniques = list.GroupBy(x => x.Email)
.Where(g => g.Count() == 1)
.SelectMany(u => u))
.Contains(d));

LINQ group by then order groups of result

I have a table that has the following 3 columns, ID, ShortCode, UploadDate.
I want to use LINQ to group the results by shortcode (and keep all the results) then order those groups and return a list.
I have the following:
rawData.Provider.CreateQuery<PDFDocument>(qb.rootExperession)
.ToList<PDFDocument>().
GroupBy(b=>b.ShortCode)
.SelectMany(b=>b).ToList<PDFDocument>()
I want to return all results, grouped by ShortCode, the items within each group sorted by UploadDate and the groups sorted so the one that has the most recent document in it first.
Does anyone know if this is even possible?
Try
rawData.Provider.CreateQuery<PDFDocument>(qb.rootExperession)
.AsEnumerable()
.OrderByDescending(d => d.UploadDate)
.GroupBy(d => d.ShortCode)
.SelectMany(g => g)
.ToList();
This should
Order the items by upload date (descending so newest first)
Then group them by short code - so within each group the items are still sorted
The groups are still in descending order, so no need to order again
Finally concatenate the results into a single list
If performance is an issue you many be better off doing
rawData.Provider.CreateQuery<PDFDocument>(qb.rootExperession)
.AsEnumerable()
.GroupBy(d => d.ShortCode)
.Select(g => g.OrderByDescending(d => d.UploadDate))
.OrderByDescending(e => e.First().UploadDate)
.SelectMany(e => e)
.ToList();
which sorts the contents of each group separately rather than sorting everything first and then grouping.
In fact, you don't want to group by short code, you want to order by them. So the following query should do the trick:
rawData.Provider.CreateQuery<PDFDocument>(qb.rootExperession)
.ToList()
.OrderBy(b => b.ShortCode)
.ThenBy(b => b.UploadDate)
.ToList()
Edit
If you really want to use a GroupBy, you can do so this way:
rawData.Provider.CreateQuery<PDFDocument>(qb.rootExperession)
.ToList()
.GroupBy(b => b.ShortCode)
.SelectMany(grouping => grouping.OrderBy(b => b.UploadDate))
.ToList()
But I discourage it. There is no point creating groups if you do not want groups in the first place!
Second edit
I did not get you wanted the groups ordered by UpdateTime too. It complicates a little the query:
rawData.Provider.CreateQuery<PDFDocument>(qb.rootExperession)
.ToList()
.GroupBy(b => b.ShortCode)
.Select(grouping => grouping.OrderByDescending(b => b.UploadDate))
.OrderByDescending(grouping => grouping.First().UploadDate)
.SelectMany(grouping => grouping)
.ToList()

Categories

Resources