Sorting a formatted LINQ query - c#

A sequence of non-empty strings stringList is given, containing only uppercase letters of the Latin alphabet. For all strings starting with the same letter, determine their total length and obtain a sequence of strings of the form "S-C", where S is the total length of all strings from stringList that begin with the character C. Order the resulting sequence in descending order of the numerical values of the sums, and for equal values of the sums, in ascending order of the C character codes.
This question is related to one of my previous questions.
One solution that works is this one:
stringList.GroupBy(x => x[0]).Select(g => $"{g.Sum(x => x.Length)}-{g.Key}");
The problem is that with this given example I don't know where to add the OrderByDescending()/ThenBy() clauses in order to get the correctly sorted list.

Create an intermediate data structure to store needed info and use it for sorting and then building the output:
stringList
.GroupBy(x => x[0])
.Select(g => (Length: g.Sum(x => x.Length), Char: g.Key))
.OrderByDescending(t => t.Length)
.ThenBy(t => t.Char)
.Select(t => $"{t.Length}-{t.Char}");

You're almost there. The cleanest way of doing it would be to make a more complex object with the properties you care about, use those to sort, then keep only what you want in the output. Like:
stringList
.GroupBy(x => x[0])
.Select(g => new {
Len = g.Sum(x => x.Length),
Char = g.Key,
Val = $"{g.Sum(x => x.Length)}-{g.Key}"
})
.OrderByDescending(x => Len)
.ThenBy(x => x.Char)
.Select(x => x.Val);

You can add a Select after the GroupBy to transform the groups into an anonymous object containing the things you want to sort by. Then you can use OrderByDescending and ThenBy to sort. After that, Select the formatted string you want:
stringList.GroupBy(x => x[0]) // assuming all strings are non-empty
.Select(g => new {
LengthSum = g.Sum(x => x.Length),
FirstChar = g.Key
})
.OrderByDescending(x => x.LengthSum)
.ThenBy(x => x.FirstChar)
.Select(x => $"{x.LengthSum}-{x.FirstChar}");
Alternatively, do it in the query syntax with let clauses, which I find more readable:
var query = from str in stringList
group str by str[0] into g
let lengthSum = g.Sum(x => x.Length)
let firstChar = g.Key
orderby lengthSum descending, firstChar
select $"{lengthSum}-{firstChar}";

Related

LINQ Return max repeated item but sorted in reverse

I have an string array containing names like:
[Alex, Alex, Michael, Michael, Dave, Victor]
That I convert to a List<string>
Then I need to write a function that returns the max repeated item in the list but should be sorted in descending order, which in this case, Michael.
I have followed the LINQ code stated in this link. Which is:
string maxRepeated = prod.GroupBy(s => s)
.OrderByDescending(s => s.Count())
.First().Key;
However, code returns Alex instead of Michael.
I tried to add another OrderByDescending however it returns Victor.
string maxRepeated = prod.GroupBy(s => s)
.OrderByDescending(s => s.Count())
.OrderByDescending(b => b)
.First().Key;
I am stuck and don't know what needs to be added to achieve the desired result.
Any help is appreciated.
Not a second OrderByDescending which ignores the previous order but ThenByDescending:
string maxRepeated = prod.GroupBy(s => s)
.OrderByDescending(g => g.Count())
.ThenByDescending(g => g.Key)
.First().Key;
You probably need to add a condition after the "GroupBy" to limit it to groups with more than one item. This is basically the equivalent of "Having" in SQL. I think this would do what you want:
prod.GroupBy(s => s)
.Where(group => group.Count() > 1)
.Select(group => group.Key)
.OrderByDescending(s => s)
.First();

How can I split a List<T> into two lists, one containing all duplicate values and the other containing the remainder?

I have a basic class for an Account (other properties removed for brevity):
public class Account
{
public string Email { get; set; }
}
I have a List<T> of these accounts.
I can remove duplicates based on the e-mail address easily:
var uniques = list.GroupBy(x => x.Email).Select(x => x.First()).ToList();
The list named 'uniques' now contains only one of each account based on e-mail address, any that were duplicates were discarded.
I want to do something a little different and split the list into two.
One list will contain only 'true' unique values, the other list will contain all duplicates.
For example the following list of Account e-mails:
unique#email.com
dupe#email.com
dupe#email.com
Would be split into two lists:
Unique
unique#email.com
Duplicates
dupe#email.com
dupe#email.com
I have been able to achieve this already by creating a list of unique values using the example at the top. I then use .Except() on the original list to get the differences which are the duplicates. Lastly I can loop over each duplicate to 'pop' it out of the unique list and move it to the duplicate list.
Here is a working example on .NET Fiddle
Can I split the list in a more efficient or syntactically sugary way?
I'd be happy to use a third party library if necessary but I'd rather just stick to pure LINQ.
I'm aware of CodeReview but feel the question also fits here.
var groups = list.GroupBy(x => x.Email)
.GroupBy(g => g.Count() == 1 ? 0 : 1)
.OrderBy(g => g.Key)
.Select(g => g.SelectMany(x => x))
.ToList();
groups[0] will be the unique ones and group[1] will be the non-unique ones.
var duplicates = list.GroupBy(x => x) // or x.Property if you are grouping by some property.
.Where(g => g.Count() > 1)
.SelectMany(g => g);
var uniques = list.GroupBy(x => x) // or x.Property if you are grouping by some property.
.Where(g => g.Count() == 1)
.SelectMany(g => g);
Alternatively, once you get one list, you can get the other one using Except:
var uniques = list.Except(duplicates);
// or
var duplicates = list.Except(uniques);
Another way to do it would be to get uniques, and then for duplicates simply get the elements in the original list that aren't in uniques.
IEnumerable<Account> uniques;
IEnumerable<Account> dupes;
dupes = list.Where(d =>
!(uniques = list.GroupBy(x => x.Email)
.Where(g => g.Count() == 1)
.SelectMany(u => u))
.Contains(d));

Selecting distinct attrbutes where count isn't same

I have the following Linq query which selects a distinct list of attributes from all products:
products
.SelectMany(p => p.Attributes)
.Where(a => a.AttributeGroup.IsProductFilter)
.Distinct()
.ToList();
Each attribute is able to be assigned to each product, so I am only wanting a list of attributes where the number of attributes is less than the number of products (as they are used for filtering and there would be no change if the numbers were equal)
I'm not sure how to go about doing this - I thought I need to use GroupBy but wasn't sure how to get a list of attributes back:
IEnumerable<ProductAttribute> attributes = products.SelectMany(p => p.Attributes).Where(a => a.AttributeGroup.IsProductFilter);
return attributes.GroupBy(a => a.ID)
.Where(g => g.Count() < products.Count) // this is now an ienumarable group object so not sure how to get it back to an ienumarable attribute
Or this seemed a bit better
attributes.GroupBy(a => a)
.Where(g => g.Count() < products.Count)
.Select(g => g.ToList())
.Distinct()
.OrderBy(a => a.AttributeGroup.Order) // this doesn't work as a isn't an attribute
It's probably really simple but I'm not that great with Linq so any help solving this would be appreciated
I'm not sure, but doesn't SelectMany helps here too?
return attributes.GroupBy(a => a.ID)
.Where(g => g.Count() < products.Count)
.SelectMany(g => g); // perhaps Distinct after

LINQ expression: Specifing maximum groupby size

Is there a elegant way of doing following in LINQ or should I write an extension for this
i have a list of objects that need to be grouped by startdate
lets say
09.00,
13.00,
13.00,
13.00,
15.00,
var groupedStartDates = startdate.groupby(x => x.StartDate);
I need to have maximum size of group to be 2.
Expected result is
var groupedStartDates = startDate.GroupBy(x => x.StartDate);
List
list1 {09.00}
list2 {13.00; 13.00}
list3 {13.00}
list4 {15.00}
After the initial grouping you can then group by the index (in the groups) divided by 2 to do a further grouping, then use SelectMany to flatten that back out.
var result = startDate.GroupBy(x => x.StartDate)
.SelectMany(grp => grp.Select((x,i) => new{x,i})
.GroupBy(a => a.i / 2)
.Select(sgrp => sgrp.Select(a => a.x)));
Here's a break down of what's going on. Note curly brackets will represent collections and square will represent object with multiple properties.
Initial data
09.00, 13.00, 13.00, 13.00, 15.00
After GroupBy(x => x.StartDate)
[Key:09.00, {09.00}], [Key:13.00, {13.00, 13.00, 13.00}], [Key:15.00, {15.00}]
Now it's going to operate on each group, but I'll show the results for all of them at each step.
After the Select((x,i) => new{x,i})
{[x:09.00, i:0]}, {[x:13.00, i:0], [x:13.00, i:1], [x:13.00, i:2]}, {[x:15.00, i:0]}
After the GroupBy(a => a.i / 2)
{[Key:0, {[x:09.00, i:0]}]}, {[Key:0, {[x:13.00, i:0], [x:13.00, i:1]}], [Key:1, {[x:13.00, i:2]}}, {[Key:0, {[x:15.00, i:0]}}
After the .Select(sgrp => sgrp.Select(a => a.x))
{{09.00}}, {{13.00, 13.00}, {13.00}}, {{15.00}}
And finally the SelectMany will flatten that to.
{09.00}, {13.00, 13.00}, {13.00}, {15.00}
Note that each line represents a collection, but I didn't put curly braces around them as I felt it made it even harder to read.
Or with an extension method
public static IEnumerable<IEnumerable<T>> Bin<T>(this IEnumerable<T> items, int binSize)
{
return items
.Select((x,i) => new{x,i})
.GroupBy(a => a.i / binSize)
.Select(grp => grp.Select(a => a.x));
}
You can make it a little nicer.
var result = startDate
.GroupBy(x => x.StartDate)
.SelectMany(grp => grp.Bin(2));
Update: As of .Net 6 they have added the new Linq method Chuck that does the same thing as my Bin method above. So now you can do
var result = startDate
.GroupBy(x => x.StartDate)
.SelectMany(grp => grp.Chunk(2));
If I understand your question correctly, you can use Take:
var result= startDate.GroupBy(x => x.StartDate)
.Select(x => x.Take(2))
.ToList();
Each group will contains at most 2 members and additional items of groups will not return.

LINQ: Select all from each group except the first item

It is easy to select the first of each group:
var firstOfEachGroup = dbContext.Measurements
.OrderByDescending(m => m.MeasurementId)
.GroupBy(m => new { m.SomeColumn })
.Where(g => g.Count() > 1)
.Select(g => g.First());
But...
Question: how can I select all from each group except the first item?
var everythingButFirstOfEachGroup = dbContext.Measurements
.OrderByDescending(m => m.MeasurementId)
.GroupBy(m => new { m.SomeColumn })
.Where(g => g.Count() > 1)
.Select( ...? );
Additional information:
My real goal is to delete all duplicates except the last (in a bulk way, ie: not using an in-memory foreach), so after the previous query I want to use RemoveRange:
dbContext.Measurements.RemoveRange(everythingButFirstOfEachGroup);
So, if my question has no sense, this information might be handy.
Use Skip(1) to skip the first record and select the rest.
Something like:
var firstOfEachGroup = dbContext.Measurements
.OrderByDescending(m => m.MeasurementId)
.GroupBy(m => new { m.SomeColumn })
.Where(g => g.Count() > 1)
.SelectMany(g => g.OrderByDescending(r => r.SomeColumn).Skip(1));
See: Enumerable.Skip
If you do not need a flattened collection then replace SelectMany with Select in code snippet.
IGrouping<K, V> implements IEnumerable<V>; you simply need to skip inside the select clause to apply it to each group:
.Select(g => g.Skip(1))
You can always use .Distinct() to remove duplicates; presumably sorting or reverse-sorting and then applying .distinct() would give you what you want.

Categories

Resources