Distinct of items in dictionary<T,IEnumerable<T>> - c#

I am Looking for a cleaner way to perform the following operation:
Filter out distinct values from a dictionary of type Dictionary<T,IEnumerable<T>> , based on the uniqueness of the Value (i.e. unique by one of the attributes of T in the IEnumerable<T>).
We can ignore the key on the dictionary. Can someone suggest a good way of achieving the above ?

Jurgen Camilleri has the piece for 'SelectMany().Distinct()'. For the comparer, we use the following comparer all the time to compare based on a property:
Usage
//this uses 'Product' for 'T'. If you want to just use 'T' you'd have to constrain it
var distinctValues = dictionary.SelectMany((entry) => entry.Value)
.Distinct(new LambdaEqualityComparer<Product>(p=>p.ProductName));
Code
public class LambdaEqualityComparer<T> : IEqualityComparer<T>
{
private Func<T,object> _action;
public LambdaEqualityComparer(Func<T, object> action)
{
_action = action;
}
public bool Equals(T x, T y)
{
var areEqual = baseCheck(x, y) ?? baseCheck(getObj(x), getObj(y));
if (areEqual != null)
{
return areEqual.Value;
}
return _action(x).Equals(_action(y));
}
public int GetHashCode(T obj)
{
return _action(obj).GetHashCode();
}
/// <summary>
/// True = return true
/// False = return false
/// null = continue evaluating
/// </summary>
/// <param name="x"></param>
/// <param name="y"></param>
/// <returns></returns>
private bool? baseCheck(object x, object y)
{
if (x == null && y == null)
{
return true;
}
else if (x == null || y == null)
{
return false;
}
return null;
}
private object getObj(T t)
{
try
{
return _action(t);
}
catch (NullReferenceException)
{
}
return null;
}
}

This should work for your case:
IEnumerable<T> allValues = dictionary.SelectMany((entry) => entry.Value);
IEnumerable<T> distinctValues = allValues.Distinct(comparer);
Make sure that you create a comparer class which implements IEqualityComparer<T> so that Distinct() can differentiate between instances of T (unless T is a class which already has a comparer in the .NET Framework such as System.String).

I wrote a method that will merge dictionaries removing duplicates. Perhaps you could use the guts of this to perform something similar.
public static Dictionary<TKey, List<TValue>> MergeDictionaries<TKey, TValue>(this IEnumerable<Dictionary<TKey, List<TValue>>> dictionaries)
{
return dictionaries.SelectMany(dict => dict)
.ToLookup(pair => pair.Key, pair => pair.Value)
.ToDictionary(group => group.Key, group => group.SelectMany(value => value).Distinct().ToList());
}
Maybe, just this section:
.ToLookup(pair => pair.Key, pair => pair.Value)
.ToDictionary(group => group.Key, group => group.SelectMany(value => value).Distinct().ToList());

Related

Duplicate by using LINQ with mulitiple OR creteria

I can retrieve duplicate by using this query:
var duplicates = grpDupes
.GroupBy(i => new { i.Email })
.Where(g => g.Count() > 1)
.SelectMany(g => g);
But i am interested to find duplicates by using either Email or Address or xyz.
If i modify above query
GroupBy(i => new { i.Email, i.Address })
then it becomes AND condition, any help?
You have to use the overloaded method which accepts an EqualityComparer.
/// <summary>
/// Factory class which creates an EqualityComparer based on lambda expressions.
/// </summary>
/// <typeparam name="T">The type of which a new equality comparer is to be created.</typeparam>
public static class EqualityComparerFactory<T>
{
private class MyComparer : IEqualityComparer<T>
{
private readonly Func<T, int> _getHashCodeFunc;
private readonly Func<T, T, bool> _equalsFunc;
public MyComparer(Func<T, T, bool> equalsFunc, Func<T, int> getHashCodeFunc = null)
{
_getHashCodeFunc = getHashCodeFunc ?? (a=>0);
_equalsFunc = equalsFunc;
}
public bool Equals(T x, T y)
{
return _equalsFunc(x, y);
}
public int GetHashCode(T obj)
{
return _getHashCodeFunc(obj);
}
}
/// <summary>
/// Creates an <see cref="IEqualityComparer{T}" /> based on an equality function and optionally on a hash function.
/// </summary>
/// <param name="equalsFunc">The equality function.</param>
/// <param name="getHashCodeFunc">The hash function.</param>
/// <returns>
/// A typed Equality Comparer.
/// </returns>
public static IEqualityComparer<T> CreateComparer(Func<T, T, bool> equalsFunc, Func<T, int> getHashCodeFunc = null)
{
ArgumentValidator.NotNull(() => equalsFunc);
return new MyComparer(equalsFunc, getHashCodeFunc);
}
}
Sample Usage:
var comparer = EqualityComparerFactory<YourClassHere>.CreateComparer((a, b) => a.Address == b.Address || a.Email == b.Email);
data.GroupBy(a => a, comparer);
You could use EXISTS in SQL which is Any in LINQ:
var duplicates = grpDupes
.Where(i => (i.Email.Trim() != "" || i.Address.Trim() != "") && grpDupes
.Any(i2 => i.ID != i2.ID &&
((i.Email.Trim() != "" && i.Email == i2.Email) ||
(i.Address.Trim() != "" && i.Address == i2.Address))));
Note that i've used ID as the primary key column. If you don't have one you need to use the column(s) that you want to use as identifier.
If you use as database driven LINQ provider like LINQ-To-SQL or LINQ-To-Entities this is efficient.
I'd keep this very simple by using .ToLookup().
How about this?
var emailLookup = grpDupes.ToLookup(x => x.Email);
var addressLookup = grpDupes.ToLookup(x => x.Address);
var duplicates = grpDupes
.Where(x =>
emailLookup[x.Email].Count() > 1 || addressLookup[x.Address].Count() > 1);

Remove objects with duplicate properties from List

I have a List of objects in C#. All of the objects contain the properties dept and course.
There are several objects that have the same dept and course.
How can I trim the List(or make a new List) where there is only one object per unique (dept & course) properties.
[Any additional duplicates are dropped out of the List]
I know how to do this with a single property:
fooList.GroupBy(x => x.dept).Select(x => x.First());
However, I am wondering how to do this for multiple properties (2 or more)?
To use multiple properties you can use an anonymous type:
var query = fooList.GroupBy(x => new { x.Dept, x.Course })
.Select(x => x.First());
Of course, this depends on what types Dept and Course are to determine equality. Alternately, your classes can implement IEqualityComparer<T> and then you could use the Enumerable.Distinct method that accepts a comparer.
Another approach is to use the LINQ Distinct extension method together with an IEqualityComparer<Foo>. It requires you to implement a comparer; however, the latter is reusable and testable.
public class FooDeptCourseEqualityComparer : IEqualityComparer<Foo>
{
public bool Equals(Foo x, Foo y)
{
return
x.Dept == y.Dept &&
x.Course.ToLower() == y.Course.ToLower();
}
public int GetHashCode(Foo obj)
{
unchecked {
return 527 + obj.Dept.GetHashCode() * 31 + obj.Course.GetHashCode();
}
}
#region Singleton Pattern
public static readonly FooDeptCourseEqualityComparer Instance =
new FooDeptCourseEqualityComparer();
private FooDeptCourseEqualityComparer() { }
#endregion
}
My example uses the singleton pattern. Since the class does not have any state information, we do not need to create a new instance each time we use it.
My code does not handle null values. Of course you would have to handle them, if they can occur.
The unique values are returned like this
var result = fooList.Distinct(FooDeptCourseEqualityComparer.Instance);
UPDATE
I suggest using a generic EqualityComparer class that accepts lambda expressions in the constructor and can be reused in multiple situations
public class LambdaEqualityComparer<T> : IEqualityComparer<T>
{
private Func<T, T, bool> _areEqual;
private Func<T, int> _getHashCode;
public LambdaEqualityComparer(Func<T, T, bool> areEqual,
Func<T, int> getHashCode)
{
_areEqual = areEqual;
_getHashCode = getHashCode;
}
public LambdaEqualityComparer(Func<T, T, bool> areEqual)
: this(areEqual, obj => obj.GetHashCode())
{
}
#region IEqualityComparer<T> Members
public bool Equals(T x, T y)
{
return _areEqual(x, y);
}
public int GetHashCode(T obj)
{
return _getHashCode(obj);
}
#endregion
}
You can use it like this
var comparer = new LambdaEqualityComparer<Foo>(
(x, y) => x.Dept == y.Dept && x.Course == y.Course,
obj => {
unchecked {
return 527 + obj.Dept.GetHashCode() * 31 + obj.Course.GetHashCode();
}
}
);
var result = fooList.Distinct(comparer);
Note: You have to provide a calculation of the hash code, since Distinct uses an internal Set<T> class, which in turn uses hash codes.
UPDATE #2
An even more generic equality comparer implements the comparison automatically and accepts a list of property accessors; however, you have no control, on how the comparison is performed.
public class AutoEqualityComparer<T> : IEqualityComparer<T>
{
private Func<T, object>[] _propertyAccessors;
public AutoEqualityComparer(params Func<T, object>[] propertyAccessors)
{
_propertyAccessors = propertyAccessors;
}
#region IEqualityComparer<T> Members
public bool Equals(T x, T y)
{
foreach (var getProp in _propertyAccessors) {
if (!getProp(x).Equals(getProp(y))) {
return false;
}
}
return true;
}
public int GetHashCode(T obj)
{
unchecked {
int hash = 17;
foreach (var getProp in _propertyAccessors) {
hash = hash * 31 + getProp(obj).GetHashCode();
}
return hash;
}
}
#endregion
}
Usage
var comparer = new AutoEqualityComparer<Foo>(foo => foo.Dept,
foo => foo.Course);
var result = fooList.Distinct(comparer);

Can I cache partially-executed LINQ queries?

I have the following code:
IEnumerable<KeyValuePair<T, double>> items =
sequence.Select(item => new KeyValuePair<T, double>(item, weight(item)));
if (items.Any(pair => pair.Value<0))
throw new ArgumentException("Item weights cannot be less than zero.");
double sum = items.Sum(pair => pair.Value);
foreach (KeyValuePair<T, double> pair in items) {...}
Where weight is a Func<T, double>.
The problem is I want weight to be executed as few times as possible. This means it should be executed at most once for each item. I could achieve this by saving it to an array. However, if any weight returns a negative value, I don't want to continue execution.
Is there any way to accomplish this easily within the LINQ framework?
Sure, that's totally doable:
public static Func<A, double> ThrowIfNegative<A, double>(this Func<A, double> f)
{
return a=>
{
double r = f(a);
// if r is NaN then this will throw.
if ( !(r >= 0.0) )
throw new Exception();
return r;
};
}
public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
var d = new Dictionary<A, R>();
return a=>
{
R r;
if (!d.TryGetValue(a, out r))
{
r = f(a);
d.Add(a, r);
}
return r;
};
}
And now...
Func<T, double> weight = whatever;
weight = weight.ThrowIfNegative().Memoize();
and you're done.
One way is to move the exception into the weight function, or at least simulate doing so, by doing something like:
Func<T, double> weightWithCheck = i =>
{
double result = weight(i);
if (result < 0)
{
throw new ArgumentException("Item weights cannot be less than zero.");
}
return result;
};
IEnumerable<KeyValuePair<T, double>> items =
sequence.Select(item => new KeyValuePair<T, double>(item, weightWithCheck(item)));
double sum = items.Sum(pair => pair.Value);
By this point, if there is an exception to be had, you should have it. You do have to enumerate items before you can be assured of getting the exception, though, but once you get it, you will not call weight again.
Both answers are good (where to throw the exception, and memoizing the function).
But your real problem is that your LINQ expression is evaluated every time you use it, unless you force it to evaluate and store as a List (or similar). Just change this:
sequence.Select(item => new KeyValuePair<T, double>(item, weight(item)));
To this:
sequence.Select(item => new KeyValuePair<T, double>(item, weight(item))).ToList();
You could possibly do it with a foreach loop. Here is a way to do it in one statement:
IEnumerable<KeyValuePair<T, double>> items = sequence
.Select(item => new KeyValuePair<T, double>(item, weight(item)))
.Select(kvp =>
{
if (kvp.Value < 0)
throw new ArgumentException("Item weights cannot be less than zero.");
else
return kvp;
}
);
No, there is nothing already IN the LINQ framework to do this, but you could surely write up your own methods and invoke them from the linq query (As has already been shown by many).
Personally, I would either ToList the first query or use Eric's suggestion.
Instead of a functional memoization suggested by other answers, you can also employ a memoization for the whole data sequence:
var items = sequence
.Select(item => new KeyValuePair<T, double>(item, weight(item)))
.Memoize();
(Note a call to Memoize() method at the end of the expression above)
A nice property of data memoization is that it represents a drop-in replacement for ToList() or ToArray() approaches.
The fully featured implementation is pretty involved though:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
static class MemoizationExtensions
{
/// <summary>
/// Memoize all elements of a sequence, e.g. ensure that every element of a sequence is retrieved only once.
/// </summary>
/// <remarks>
/// The resulting sequence is not thread safe.
/// </remarks>
/// <typeparam name="T">The type of the elements of source.</typeparam>
/// <param name="source">The source sequence.</param>
/// <returns>The sequence that fully replicates the source with all elements being memoized.</returns>
public static IEnumerable<T> Memoize<T>(this IEnumerable<T> source) => Memoize(source, false);
/// <summary>
/// Memoize all elements of a sequence, e.g. ensure that every element of a sequence is retrieved only once.
/// </summary>
/// <typeparam name="T">The type of the elements of source.</typeparam>
/// <param name="source">The source sequence.</param>
/// <param name="isThreadSafe">Indicates whether resulting sequence is thread safe.</param>
/// <returns>The sequence that fully replicates the source with all elements being memoized.</returns>
public static IEnumerable<T> Memoize<T>(this IEnumerable<T> source, bool isThreadSafe)
{
switch (source)
{
case null:
return null;
case CachedEnumerable<T> existingCachedEnumerable:
if (!isThreadSafe || existingCachedEnumerable is ThreadSafeCachedEnumerable<T>)
{
// The source is already memoized with compatible parameters.
return existingCachedEnumerable;
}
break;
case IList<T> _:
case IReadOnlyList<T> _:
case string _:
// Given source types are intrinsically memoized by their nature.
return source;
}
if (isThreadSafe)
return new ThreadSafeCachedEnumerable<T>(source);
else
return new CachedEnumerable<T>(source);
}
class CachedEnumerable<T> : IEnumerable<T>, IReadOnlyList<T>
{
public CachedEnumerable(IEnumerable<T> source)
{
_Source = source;
}
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
IEnumerable<T> _Source;
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
IEnumerator<T> _SourceEnumerator;
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
protected readonly IList<T> Cache = new List<T>();
public virtual int Count
{
get
{
while (_TryCacheElementNoLock()) ;
return Cache.Count;
}
}
bool _TryCacheElementNoLock()
{
if (_SourceEnumerator == null && _Source != null)
{
_SourceEnumerator = _Source.GetEnumerator();
_Source = null;
}
if (_SourceEnumerator == null)
{
// Source enumerator already reached the end.
return false;
}
else if (_SourceEnumerator.MoveNext())
{
Cache.Add(_SourceEnumerator.Current);
return true;
}
else
{
// Source enumerator has reached the end, so it is no longer needed.
_SourceEnumerator.Dispose();
_SourceEnumerator = null;
return false;
}
}
public virtual T this[int index]
{
get
{
_EnsureItemIsCachedNoLock(index);
return Cache[index];
}
}
public IEnumerator<T> GetEnumerator() => new CachedEnumerator<T>(this);
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
internal virtual bool EnsureItemIsCached(int index) => _EnsureItemIsCachedNoLock(index);
bool _EnsureItemIsCachedNoLock(int index)
{
while (Cache.Count <= index)
{
if (!_TryCacheElementNoLock())
return false;
}
return true;
}
internal virtual T GetCacheItem(int index) => Cache[index];
}
sealed class ThreadSafeCachedEnumerable<T> : CachedEnumerable<T>
{
public ThreadSafeCachedEnumerable(IEnumerable<T> source) :
base(source)
{
}
public override int Count
{
get
{
lock (Cache)
return base.Count;
}
}
public override T this[int index]
{
get
{
lock (Cache)
return base[index];
}
}
internal override bool EnsureItemIsCached(int index)
{
lock (Cache)
return base.EnsureItemIsCached(index);
}
internal override T GetCacheItem(int index)
{
lock (Cache)
return base.GetCacheItem(index);
}
}
sealed class CachedEnumerator<T> : IEnumerator<T>
{
CachedEnumerable<T> _CachedEnumerable;
const int InitialIndex = -1;
const int EofIndex = -2;
int _Index = InitialIndex;
public CachedEnumerator(CachedEnumerable<T> cachedEnumerable)
{
_CachedEnumerable = cachedEnumerable;
}
public T Current
{
get
{
var cachedEnumerable = _CachedEnumerable;
if (cachedEnumerable == null)
throw new InvalidOperationException();
var index = _Index;
if (index < 0)
throw new InvalidOperationException();
return cachedEnumerable.GetCacheItem(index);
}
}
object IEnumerator.Current => Current;
public void Dispose()
{
_CachedEnumerable = null;
}
public bool MoveNext()
{
var cachedEnumerable = _CachedEnumerable;
if (cachedEnumerable == null)
{
// Disposed.
return false;
}
if (_Index == EofIndex)
return false;
_Index++;
if (!cachedEnumerable.EnsureItemIsCached(_Index))
{
_Index = EofIndex;
return false;
}
else
{
return true;
}
}
public void Reset()
{
_Index = InitialIndex;
}
}
}
More info and a readily available NuGet package: https://github.com/gapotchenko/Gapotchenko.FX/tree/master/Source/Gapotchenko.FX.Linq#memoize

Filtering duplicates out of an IEnumerable

I have this code:
class MyObj {
int Id;
string Name;
string Location;
}
IEnumerable<MyObj> list;
I want to convert list to a dictionary like this:
list.ToDictionary(x => x.Name);
but it tells me I have duplicate keys. How can I keep only the first item for each key?
I suppose the easiest way would be to group by key and take the first element of each group:
list.GroupBy(x => x.name).Select(g => g.First()).ToDictionary(x => x.name);
Or you could use Distinct if your objects implement IEquatable to compare between themselves by key:
// I'll just randomly call your object Person for this example.
class Person : IEquatable<Person>
{
public string Name { get; set; }
public bool Equals(Person other)
{
if (other == null)
return false;
return Name == other.Name;
}
public override bool Equals(object obj)
{
return base.Equals(obj as Person);
}
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
...
list.Distinct().ToDictionary(x => x.Name);
Or if you don't want to do that (maybe because you normally want to compare for equality in a different way, so Equals is already in use) you could make a custom implementation of IEqualityComparer just for this case:
class PersonComparer : IEqualityComparer<Person>
{
public bool Equals(Person x, Person y)
{
if (x == null)
return y == null;
if (y == null)
return false;
return x.Name == y.Name;
}
public int GetHashCode(Person obj)
{
return obj.Name.GetHashCode();
}
}
...
list.Distinct(new PersonComparer()).ToDictionary(x => x.Name);
list.Distinct().ToDictionary(x => x.Name);
You could also create your own Distinct extension overload method that accepted a Func<> for choosing the distinct key:
public static class EnumerationExtensions
{
public static IEnumerable<TSource> Distinct<TSource,TKey>(
this IEnumerable<TSource> source, Func<TSource,TKey> keySelector)
{
KeyComparer comparer = new KeyComparer(keySelector);
return source.Distinct(comparer);
}
private class KeyComparer<TSource,TKey> : IEqualityComparer<TSource>
{
private Func<TSource,TKey> keySelector;
public DelegatedComparer(Func<TSource,TKey> keySelector)
{
this.keySelector = keySelector;
}
bool IEqualityComparer.Equals(TSource a, TSource b)
{
if (a == null && b == null) return true;
if (a == null || b == null) return false;
return keySelector(a) == keySelector(b);
}
int IEqualityComparer.GetHashCode(TSource obj)
{
return keySelector(obj).GetHashCode();
}
}
}
Apologies for any bad code formatting, I wanted to reduce the size of the code on the page. Anyway, you can then use ToDictionary:
var dictionary = list.Distinct(x => x.Name).ToDictionary(x => x.Name);
Could make your own perhaps? For example:
public static class Extensions
{
public static IDictionary<TKey, TValue> ToDictionary2<TKey, TValue>(
this IEnumerable<TValue> subjects, Func<TValue, TKey> keySelector)
{
var dictionary = new Dictionary<TKey, TValue>();
foreach(var subject in subjects)
{
var key = keySelector(subject);
if(!dictionary.ContainsKey(key))
dictionary.Add(key, subject);
}
return dictionary;
}
}
var dictionary = list.ToDictionary2(x => x.Name);
Haven't tested it, but should work. (and it should probably have a better name than ToDictionary2 :p)
Alternatively, you can implement a DistinctBy method, for example like this:
public static IEnumerable<TSubject> DistinctBy<TSubject, TValue>(this IEnumerable<TSubject> subjects, Func<TSubject, TValue> valueSelector)
{
var set = new HashSet<TValue>();
foreach(var subject in subjects)
if(set.Add(valueSelector(subject)))
yield return subject;
}
var dictionary = list.DistinctBy(x => x.Name).ToDictionary(x => x.Name);
The problem here is that the ToDictionary extension method does not support multiple values with the same key. One solution is to write a version which does and use that instead.
public static Dictionary<TKey,TValue> ToDictionaryAllowDuplicateKeys<TKey,TValue>(
this IEnumerable<TValue> values,
Func<TValue,TKey> keyFunc) {
var map = new Dictionary<TKey,TValue>();
foreach ( var cur in values ) {
var key = keyFunc(cur);
map[key] = cur;
}
return map;
}
Now converting to a dictionary is straight forward
var map = list.ToDictionaryAllowDuplicateKeys(x => x.Name);
The following will work if you have different instances of MyObj with the same value for the Name property. It will take the first instance found for each duplicate (sorry for the obj - obj2 notation, it is just sample code):
list.SelectMany(obj => new MyObj[] {list.Where(obj2 => obj2.Name == obj.Name).First()}).Distinct();
EDIT: Joren's solution is better as it does not create unnecessary arrays in the process.

Distinct() with lambda?

Right, so I have an enumerable and wish to get distinct values from it.
Using System.Linq, there's, of course, an extension method called Distinct. In the simple case, it can be used with no parameters, like:
var distinctValues = myStringList.Distinct();
Well and good, but if I have an enumerable of objects for which I need to specify equality, the only available overload is:
var distinctValues = myCustomerList.Distinct(someEqualityComparer);
The equality comparer argument must be an instance of IEqualityComparer<T>. I can do this, of course, but it's somewhat verbose and, well, cludgy.
What I would have expected is an overload that would take a lambda, say a Func<T, T, bool>:
var distinctValues = myCustomerList.Distinct((c1, c2) => c1.CustomerId == c2.CustomerId);
Anyone know if some such extension exists, or some equivalent workaround? Or am I missing something?
Alternatively, is there a way of specifying an IEqualityComparer inline (embarrass me)?
Update
I found a reply by Anders Hejlsberg to a post in an MSDN forum on this subject. He says:
The problem you're going to run into is that when two objects compare
equal they must have the same GetHashCode return value (or else the
hash table used internally by Distinct will not function correctly).
We use IEqualityComparer because it packages compatible
implementations of Equals and GetHashCode into a single interface.
I suppose that makes sense.
IEnumerable<Customer> filteredList = originalList
.GroupBy(customer => customer.CustomerId)
.Select(group => group.First());
It looks to me like you want DistinctBy from MoreLINQ. You can then write:
var distinctValues = myCustomerList.DistinctBy(c => c.CustomerId);
Here's a cut-down version of DistinctBy (no nullity checking and no option to specify your own key comparer):
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> knownKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (knownKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
To Wrap things up . I think most of the people which came here like me want the simplest solution possible without using any libraries and with best possible performance.
(The accepted group by method for me i think is an overkill in terms of performance. )
Here is a simple extension method using the IEqualityComparer interface which works also for null values.
Usage:
var filtered = taskList.DistinctBy(t => t.TaskExternalId).ToArray();
Extension Method Code
public static class LinqExtensions
{
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
{
GeneralPropertyComparer<T, TKey> comparer = new GeneralPropertyComparer<T,TKey>(property);
return items.Distinct(comparer);
}
}
public class GeneralPropertyComparer<T,TKey> : IEqualityComparer<T>
{
private Func<T, TKey> expr { get; set; }
public GeneralPropertyComparer (Func<T, TKey> expr)
{
this.expr = expr;
}
public bool Equals(T left, T right)
{
var leftProp = expr.Invoke(left);
var rightProp = expr.Invoke(right);
if (leftProp == null && rightProp == null)
return true;
else if (leftProp == null ^ rightProp == null)
return false;
else
return leftProp.Equals(rightProp);
}
public int GetHashCode(T obj)
{
var prop = expr.Invoke(obj);
return (prop==null)? 0:prop.GetHashCode();
}
}
Shorthand solution
myCustomerList.GroupBy(c => c.CustomerId, (key, c) => c.FirstOrDefault());
No there is no such extension method overload for this. I've found this frustrating myself in the past and as such I usually write a helper class to deal with this problem. The goal is to convert a Func<T,T,bool> to IEqualityComparer<T,T>.
Example
public class EqualityFactory {
private sealed class Impl<T> : IEqualityComparer<T,T> {
private Func<T,T,bool> m_del;
private IEqualityComparer<T> m_comp;
public Impl(Func<T,T,bool> del) {
m_del = del;
m_comp = EqualityComparer<T>.Default;
}
public bool Equals(T left, T right) {
return m_del(left, right);
}
public int GetHashCode(T value) {
return m_comp.GetHashCode(value);
}
}
public static IEqualityComparer<T,T> Create<T>(Func<T,T,bool> del) {
return new Impl<T>(del);
}
}
This allows you to write the following
var distinctValues = myCustomerList
.Distinct(EqualityFactory.Create((c1, c2) => c1.CustomerId == c2.CustomerId));
Here's a simple extension method that does what I need...
public static class EnumerableExtensions
{
public static IEnumerable<TKey> Distinct<T, TKey>(this IEnumerable<T> source, Func<T, TKey> selector)
{
return source.GroupBy(selector).Select(x => x.Key);
}
}
It's a shame they didn't bake a distinct method like this into the framework, but hey ho.
This will do what you want but I don't know about performance:
var distinctValues =
from cust in myCustomerList
group cust by cust.CustomerId
into gcust
select gcust.First();
At least it's not verbose.
From .NET 6 or later, there is a new build-in method Enumerable.DistinctBy to achieve this.
var distinctValues = myCustomerList.DistinctBy(c => c.CustomerId);
// With IEqualityComparer
var distinctValues = myCustomerList.DistinctBy(c => c.CustomerId, someEqualityComparer);
Something I have used which worked well for me.
/// <summary>
/// A class to wrap the IEqualityComparer interface into matching functions for simple implementation
/// </summary>
/// <typeparam name="T">The type of object to be compared</typeparam>
public class MyIEqualityComparer<T> : IEqualityComparer<T>
{
/// <summary>
/// Create a new comparer based on the given Equals and GetHashCode methods
/// </summary>
/// <param name="equals">The method to compute equals of two T instances</param>
/// <param name="getHashCode">The method to compute a hashcode for a T instance</param>
public MyIEqualityComparer(Func<T, T, bool> equals, Func<T, int> getHashCode)
{
if (equals == null)
throw new ArgumentNullException("equals", "Equals parameter is required for all MyIEqualityComparer instances");
EqualsMethod = equals;
GetHashCodeMethod = getHashCode;
}
/// <summary>
/// Gets the method used to compute equals
/// </summary>
public Func<T, T, bool> EqualsMethod { get; private set; }
/// <summary>
/// Gets the method used to compute a hash code
/// </summary>
public Func<T, int> GetHashCodeMethod { get; private set; }
bool IEqualityComparer<T>.Equals(T x, T y)
{
return EqualsMethod(x, y);
}
int IEqualityComparer<T>.GetHashCode(T obj)
{
if (GetHashCodeMethod == null)
return obj.GetHashCode();
return GetHashCodeMethod(obj);
}
}
All solutions I've seen here rely on selecting an already comparable field. If one needs to compare in a different way, though, this solution here seems to work generally, for something like:
somedoubles.Distinct(new LambdaComparer<double>((x, y) => Math.Abs(x - y) < double.Epsilon)).Count()
Take another way:
var distinctValues = myCustomerList.
Select(x => x._myCaustomerProperty).Distinct();
The sequence return distinct elements compare them by property '_myCaustomerProperty' .
You can use LambdaEqualityComparer:
var distinctValues
= myCustomerList.Distinct(new LambdaEqualityComparer<OurType>((c1, c2) => c1.CustomerId == c2.CustomerId));
public class LambdaEqualityComparer<T> : IEqualityComparer<T>
{
public LambdaEqualityComparer(Func<T, T, bool> equalsFunction)
{
_equalsFunction = equalsFunction;
}
public bool Equals(T x, T y)
{
return _equalsFunction(x, y);
}
public int GetHashCode(T obj)
{
return obj.GetHashCode();
}
private readonly Func<T, T, bool> _equalsFunction;
}
You can use InlineComparer
public class InlineComparer<T> : IEqualityComparer<T>
{
//private readonly Func<T, T, bool> equalsMethod;
//private readonly Func<T, int> getHashCodeMethod;
public Func<T, T, bool> EqualsMethod { get; private set; }
public Func<T, int> GetHashCodeMethod { get; private set; }
public InlineComparer(Func<T, T, bool> equals, Func<T, int> hashCode)
{
if (equals == null) throw new ArgumentNullException("equals", "Equals parameter is required for all InlineComparer instances");
EqualsMethod = equals;
GetHashCodeMethod = hashCode;
}
public bool Equals(T x, T y)
{
return EqualsMethod(x, y);
}
public int GetHashCode(T obj)
{
if (GetHashCodeMethod == null) return obj.GetHashCode();
return GetHashCodeMethod(obj);
}
}
Usage sample:
var comparer = new InlineComparer<DetalleLog>((i1, i2) => i1.PeticionEV == i2.PeticionEV && i1.Etiqueta == i2.Etiqueta, i => i.PeticionEV.GetHashCode() + i.Etiqueta.GetHashCode());
var peticionesEV = listaLogs.Distinct(comparer).ToList();
Assert.IsNotNull(peticionesEV);
Assert.AreNotEqual(0, peticionesEV.Count);
Source:
https://stackoverflow.com/a/5969691/206730
Using IEqualityComparer for Union
Can I specify my explicit type comparator inline?
If Distinct() doesn't produce unique results, try this one:
var filteredWC = tblWorkCenter.GroupBy(cc => cc.WCID_I).Select(grp => grp.First()).Select(cc => new Model.WorkCenter { WCID = cc.WCID_I }).OrderBy(cc => cc.WCID);
ObservableCollection<Model.WorkCenter> WorkCenter = new ObservableCollection<Model.WorkCenter>(filteredWC);
A tricky way to do this is use Aggregate() extension, using a dictionary as accumulator with the key-property values as keys:
var customers = new List<Customer>();
var distincts = customers.Aggregate(new Dictionary<int, Customer>(),
(d, e) => { d[e.CustomerId] = e; return d; },
d => d.Values);
And a GroupBy-style solution is using ToLookup():
var distincts = customers.ToLookup(c => c.CustomerId).Select(g => g.First());
IEnumerable lambda extension:
public static class ListExtensions
{
public static IEnumerable<T> Distinct<T>(this IEnumerable<T> list, Func<T, int> hashCode)
{
Dictionary<int, T> hashCodeDic = new Dictionary<int, T>();
list.ToList().ForEach(t =>
{
var key = hashCode(t);
if (!hashCodeDic.ContainsKey(key))
hashCodeDic.Add(key, t);
});
return hashCodeDic.Select(kvp => kvp.Value);
}
}
Usage:
class Employee
{
public string Name { get; set; }
public int EmployeeID { get; set; }
}
//Add 5 employees to List
List<Employee> lst = new List<Employee>();
Employee e = new Employee { Name = "Shantanu", EmployeeID = 123456 };
lst.Add(e);
lst.Add(e);
Employee e1 = new Employee { Name = "Adam Warren", EmployeeID = 823456 };
lst.Add(e1);
//Add a space in the Name
Employee e2 = new Employee { Name = "Adam Warren", EmployeeID = 823456 };
lst.Add(e2);
//Name is different case
Employee e3 = new Employee { Name = "adam warren", EmployeeID = 823456 };
lst.Add(e3);
//Distinct (without IEqalityComparer<T>) - Returns 4 employees
var lstDistinct1 = lst.Distinct();
//Lambda Extension - Return 2 employees
var lstDistinct = lst.Distinct(employee => employee.EmployeeID.GetHashCode() ^ employee.Name.ToUpper().Replace(" ", "").GetHashCode());
The Microsoft System.Interactive package has a version of Distinct that takes a key selector lambda. This is effectively the same as Jon Skeet's solution, but it may be helpful for people to know, and to check out the rest of the library.
Here's how you can do it:
public static class Extensions
{
public static IEnumerable<T> MyDistinct<T, V>(this IEnumerable<T> query,
Func<T, V> f,
Func<IGrouping<V,T>,T> h=null)
{
if (h==null) h=(x => x.First());
return query.GroupBy(f).Select(h);
}
}
This method allows you to use it by specifying one parameter like .MyDistinct(d => d.Name), but it also allows you to specify a having condition as a second parameter like so:
var myQuery = (from x in _myObject select x).MyDistinct(d => d.Name,
x => x.FirstOrDefault(y=>y.Name.Contains("1") || y.Name.Contains("2"))
);
N.B. This would also allow you to specify other functions like for example .LastOrDefault(...) as well.
If you want to expose just the condition, you can have it even simpler by implementing it as:
public static IEnumerable<T> MyDistinct2<T, V>(this IEnumerable<T> query,
Func<T, V> f,
Func<T,bool> h=null
)
{
if (h == null) h = (y => true);
return query.GroupBy(f).Select(x=>x.FirstOrDefault(h));
}
In this case, the query would just look like:
var myQuery2 = (from x in _myObject select x).MyDistinct2(d => d.Name,
y => y.Name.Contains("1") || y.Name.Contains("2")
);
N.B. Here, the expression is simpler, but note .MyDistinct2 uses .FirstOrDefault(...) implicitly.
Note: The examples above are using the following demo class
class MyObject
{
public string Name;
public string Code;
}
private MyObject[] _myObject = {
new MyObject() { Name = "Test1", Code = "T"},
new MyObject() { Name = "Test2", Code = "Q"},
new MyObject() { Name = "Test2", Code = "T"},
new MyObject() { Name = "Test5", Code = "Q"}
};
I'm assuming you have an IEnumerable<T>, and in your example delegate, you would like c1 and c2 to be referring to two elements in this list?
I believe you could achieve this with a self join:
var distinctResults = from c1 in myList
join c2 in myList on <your equality conditions>
I found this as the easiest solution.
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
return source.GroupBy(keySelector).Select(x => x.FirstOrDefault());
}

Categories

Resources