Remove Duplicates and Original from C# List - c#

I have a List of custom types where I want to remove the duplicate and the original if a duplicate is found. Can only be one possible duplicate.
I can overide Equals and GetHashCode and then use Distinct but this only removes the duplicate. I need to remove both original and duplicate... Any ideas for something elegant so I don't have to use a hammer.

You can use GroupBy, followed by Where (g => g.Count() == 1) to filter out all records that have duplicates:
var res = orig.GroupBy(x => x).Where(g => g.Count() == 1).Select(g => g.Key);
In order for this to work, you still need to override GetHashCode and Equals.

var itemsExistingExactlyOnce = list.GroupBy(x => x)
.Where(group => group.Count() == 1)
.Select(group => group.Key);

Related

How can I split a List<T> into two lists, one containing all duplicate values and the other containing the remainder?

I have a basic class for an Account (other properties removed for brevity):
public class Account
{
public string Email { get; set; }
}
I have a List<T> of these accounts.
I can remove duplicates based on the e-mail address easily:
var uniques = list.GroupBy(x => x.Email).Select(x => x.First()).ToList();
The list named 'uniques' now contains only one of each account based on e-mail address, any that were duplicates were discarded.
I want to do something a little different and split the list into two.
One list will contain only 'true' unique values, the other list will contain all duplicates.
For example the following list of Account e-mails:
unique#email.com
dupe#email.com
dupe#email.com
Would be split into two lists:
Unique
unique#email.com
Duplicates
dupe#email.com
dupe#email.com
I have been able to achieve this already by creating a list of unique values using the example at the top. I then use .Except() on the original list to get the differences which are the duplicates. Lastly I can loop over each duplicate to 'pop' it out of the unique list and move it to the duplicate list.
Here is a working example on .NET Fiddle
Can I split the list in a more efficient or syntactically sugary way?
I'd be happy to use a third party library if necessary but I'd rather just stick to pure LINQ.
I'm aware of CodeReview but feel the question also fits here.
var groups = list.GroupBy(x => x.Email)
.GroupBy(g => g.Count() == 1 ? 0 : 1)
.OrderBy(g => g.Key)
.Select(g => g.SelectMany(x => x))
.ToList();
groups[0] will be the unique ones and group[1] will be the non-unique ones.
var duplicates = list.GroupBy(x => x) // or x.Property if you are grouping by some property.
.Where(g => g.Count() > 1)
.SelectMany(g => g);
var uniques = list.GroupBy(x => x) // or x.Property if you are grouping by some property.
.Where(g => g.Count() == 1)
.SelectMany(g => g);
Alternatively, once you get one list, you can get the other one using Except:
var uniques = list.Except(duplicates);
// or
var duplicates = list.Except(uniques);
Another way to do it would be to get uniques, and then for duplicates simply get the elements in the original list that aren't in uniques.
IEnumerable<Account> uniques;
IEnumerable<Account> dupes;
dupes = list.Where(d =>
!(uniques = list.GroupBy(x => x.Email)
.Where(g => g.Count() == 1)
.SelectMany(u => u))
.Contains(d));

Selecting distinct attrbutes where count isn't same

I have the following Linq query which selects a distinct list of attributes from all products:
products
.SelectMany(p => p.Attributes)
.Where(a => a.AttributeGroup.IsProductFilter)
.Distinct()
.ToList();
Each attribute is able to be assigned to each product, so I am only wanting a list of attributes where the number of attributes is less than the number of products (as they are used for filtering and there would be no change if the numbers were equal)
I'm not sure how to go about doing this - I thought I need to use GroupBy but wasn't sure how to get a list of attributes back:
IEnumerable<ProductAttribute> attributes = products.SelectMany(p => p.Attributes).Where(a => a.AttributeGroup.IsProductFilter);
return attributes.GroupBy(a => a.ID)
.Where(g => g.Count() < products.Count) // this is now an ienumarable group object so not sure how to get it back to an ienumarable attribute
Or this seemed a bit better
attributes.GroupBy(a => a)
.Where(g => g.Count() < products.Count)
.Select(g => g.ToList())
.Distinct()
.OrderBy(a => a.AttributeGroup.Order) // this doesn't work as a isn't an attribute
It's probably really simple but I'm not that great with Linq so any help solving this would be appreciated
I'm not sure, but doesn't SelectMany helps here too?
return attributes.GroupBy(a => a.ID)
.Where(g => g.Count() < products.Count)
.SelectMany(g => g); // perhaps Distinct after

Finding duplicate texts in IEnumerable<TextBox> collection

I have a collection of textboxes in my winform application.
I need help with LINQ query to get the collection of TextBox (i.e. IEnumerable) which contain duplicate entries.I want to make use of LINQ.
This query I used, is returning just the duplicate entry. But I need all the duplicate entries.
var duplicates = emailAddressList.GroupBy(t => t.Text)
.Where(g => !string.IsNullOrEmpty(g.Key))
.SelectMany(grp => grp.Skip(1))
.ToList();
Can any one help where am I going wrong ?
Regards
This query I used, is returning just the duplicate entry. But I need
all the duplicate entries.
Check if g.Count() > 1 and use SelectMany(g => g) to get all of each duplicate-group instead of only the duplicates (without first).
var duplicates = emailAddressList
.GroupBy(t => t.Text)
.Where(g => !string.IsNullOrEmpty(g.Key) && g.Count() > 1)
.SelectMany(g => g)
.ToList();

LINQ: Select all from each group except the first item

It is easy to select the first of each group:
var firstOfEachGroup = dbContext.Measurements
.OrderByDescending(m => m.MeasurementId)
.GroupBy(m => new { m.SomeColumn })
.Where(g => g.Count() > 1)
.Select(g => g.First());
But...
Question: how can I select all from each group except the first item?
var everythingButFirstOfEachGroup = dbContext.Measurements
.OrderByDescending(m => m.MeasurementId)
.GroupBy(m => new { m.SomeColumn })
.Where(g => g.Count() > 1)
.Select( ...? );
Additional information:
My real goal is to delete all duplicates except the last (in a bulk way, ie: not using an in-memory foreach), so after the previous query I want to use RemoveRange:
dbContext.Measurements.RemoveRange(everythingButFirstOfEachGroup);
So, if my question has no sense, this information might be handy.
Use Skip(1) to skip the first record and select the rest.
Something like:
var firstOfEachGroup = dbContext.Measurements
.OrderByDescending(m => m.MeasurementId)
.GroupBy(m => new { m.SomeColumn })
.Where(g => g.Count() > 1)
.SelectMany(g => g.OrderByDescending(r => r.SomeColumn).Skip(1));
See: Enumerable.Skip
If you do not need a flattened collection then replace SelectMany with Select in code snippet.
IGrouping<K, V> implements IEnumerable<V>; you simply need to skip inside the select clause to apply it to each group:
.Select(g => g.Skip(1))
You can always use .Distinct() to remove duplicates; presumably sorting or reverse-sorting and then applying .distinct() would give you what you want.

Remove duplicates of a List, selecting by a property value in C#?

I have a list of objects that I need some duplicates removed from. We consider them duplicates if they have the same Id and prefer the one whose booleanValue is false. Here's what I have so far:
objects.GroupBy(x => x.Id).Select(x => x.Where(y => !y.booleanValue));
I've determined that GroupBy is doing no such grouping, so I don't see if any of the other functions are working. Any ideas on this? Thanks in advance.
You can do this:
var results =
from x in objects
group x by x.Id into g
select g.OrderBy(y => y.booleanValue).First();
For every Id it finds in objects, it will select the first element where booleanValue == false, or the the first one (if none of them have booleanValue == false).
If you prefer fluent syntax:
var results = objects.GroupBy(x => x.Id)
.Select(g => g.OrderBy(y => y.booleanValue).First());
Something like this should work:
var result =
objects.GroupBy(x => x.Id).Select(g =>
g.FirstOrDefault(y => !y.booleanValue) ?? g.First())
This assumes that your objects are of a reference type.
Another possibility might be to use Distinct() with a custom IEqualityComparer<>.
This partially answers the question above, but I justed need a really basic solution:
objects.GroupBy(x => x.Id)
.Select(x => x.First())
.ToArray();
The key to getting the original object from the GroupBy() is the Select() getting the First() and the ToArray() gets you an array of your objects, not a Linq object.

Categories

Resources