Linq Distinct not returning expected values - c#

I am trying to get a list of distinct items from a custom collection, however the comparison seems to be getting ignored as I keep getting duplicates appearing in my list. I have debugged the code and I can clearly see that the values in the list that I am comparing are equal...
NOTE: The Id and Id2 values are strings
Custom Comparer:
public class UpsellSimpleComparer : IEqualityComparer<UpsellProduct>
{
public bool Equals(UpsellProduct x, UpsellProduct y)
{
return x.Id == y.Id && x.Id2 == y.Id2;
}
public int GetHashCode(UpsellProduct obj)
{
return obj.GetHashCode();
}
}
Calling code:
var upsellProducts = (Settings.SelectedSeatingPageGuids.Contains(CurrentItem.ID.ToString())) ?
GetAOSUpsellProducts(selectedProductIds) : GetGeneralUpsellProducts(selectedProductIds);
// we use a special comparer here so that same items are not included
var comparer = new UpsellSimpleComparer();
return upsellProducts.Distinct(comparer);

Most likely UpsellProduct has default implementation of GetHashCode that returns unique value for each instance of reference type.
To fix - either implement one correctly in UpsellProduct or in comparer.
public class UpsellSimpleComparer : IEqualityComparer<UpsellProduct>
{
public bool Equals(UpsellProduct x, UpsellProduct y)
{
return x.Id == y.Id && x.Id2 == y.Id2;
}
// sample, correct GetHashCode is a bit more complex
public int GetHashCode(UpsellProduct obj)
{
return obj.Id.GetHashCode() ^ obj.Id2.GetHashCode();
}
}
Note for better code to compute combined GetHashCode check Concise way to combine field hashcodes? and Is it possible to combine hash codes for private members to generate a new hash code?

Your GetHashCode() doesn't return the same values even two UpsellProduct instances are consider equals by your Equals() method.
Use something like this to reflect the same logic instead.
public int GetHashCode(UpsellProduct obj)
{
return obj.Id.GetHashCode() ^ obj.Id2.GetHashCode();
}

Related

How to use C# LINQ Union to get the Union of Custom list1 with list2

I am using the Enumerable.Union<TSource> method to get the union of the Custom List1 with the Custom List2. But somehow it does not work as it should in my case. I am getting all the items also the duplicate once.
I followed the MSDN Link to get the work done, but still I am not able to achieve the same.
Following is the Code of the custom class:-
public class CustomFormat : IEqualityComparer<CustomFormat>
{
private string mask;
public string Mask
{
get { return mask; }
set { mask = value; }
}
private int type;//0 for Default 1 for userdefined
public int Type
{
get { return type; }
set { type = value; }
}
public CustomFormat(string c_maskin, int c_type)
{
mask = c_maskin;
type = c_type;
}
public bool Equals(CustomFormat x, CustomFormat y)
{
if (ReferenceEquals(x, y)) return true;
//Check whether the products' properties are equal.
return x != null && y != null && x.Mask.Equals(y.Mask) && x.Type.Equals(y.Type);
}
public int GetHashCode(CustomFormat obj)
{
//Get hash code for the Name field if it is not null.
int hashProductName = obj.Mask == null ? 0 : obj.Mask.GetHashCode();
//Get hash code for the Code field.
int hashProductCode = obj.Type.GetHashCode();
//Calculate the hash code for the product.
return hashProductName ^ hashProductCode;
}
}
This I am calling as follows:-
List<CustomFormat> l1 = new List<CustomFormat>();
l1.Add(new CustomFormat("#",1));
l1.Add(new CustomFormat("##",1));
l1.Add(new CustomFormat("###",1));
l1.Add(new CustomFormat("####",1));
List<CustomFormat> l2 = new List<CustomFormat>();
l2.Add(new CustomFormat("#",1));
l2.Add(new CustomFormat("##",1));
l2.Add(new CustomFormat("###",1));
l2.Add(new CustomFormat("####",1));
l2.Add(new CustomFormat("## ###.0",1));
l1 = l1.Union(l2).ToList();
foreach(var l3 in l1)
{
Console.WriteLine(l3.Mask + " " + l3.Type);
}
Please suggest the appropriate way to achieve the same!
The oddity here is that your class implement IEqualityComparer<CustomClass> instead of IEquatable<CustomClass>. You could pass in another instance of CustomClass which would be used as the comparer, but it would be more idiomatic to just make CustomClass implement IEquatable<CustomClass>, and also override Equals(object).
The difference between IEquatable<T> and IEqualityComparer<T> is that IEquatable<T> says "I know how to compare myself with another instance of T" whereas IEqualityComparer<T> says "I know how to compare two instances of T". The latter is normally provided separately - just as it can be provided to Union via another parameter. It's very rare for a type to implement IEqualityComparer<T> for its own type - whereas IEquatable<T> should pretty much only be used to compare values of the same type.
Here's an implementation using automatically implemented properties for simplicity and more idiomatic parameter names. I'd probably change the hash code implementation myself and use expression-bodied members, but that's a different matter.
public class CustomFormat : IEquatable<CustomFormat>
{
public string Mask { get; set; }
public int Type { get; set; }
public CustomFormat(string mask, int type)
{
Mask = mask;
Type = type;
}
public bool Equals(CustomFormat other)
{
if (ReferenceEquals(this, other))
{
return true;
}
return other != null && other.Mask == Mask && other.Type == Type;
}
public override bool Equals(object obj)
{
return Equals(obj as CustomFormat);
}
public override int GetHashCode()
{
// Get hash code for the Name field if it is not null.
int hashProductName = Mask == null ? 0 : Mask.GetHashCode();
//Get hash code for the Code field.
int hashProductCode = Type.GetHashCode();
//Calculate the hash code for the product.
return hashProductName ^ hashProductCode;
}
}
Now it doesn't help that (as noted in comments) the documentation for Enumerable.Union is wrong. It currently states:
The default equality comparer, Default, is used to compare values of the types that implement the IEqualityComparer<T> generic interface.
It should say something like:
The default equality comparer, Default, is used to compare values when a specific IEqualityComparer<T> is not provided. If T implements IEquatable<T>, the default comparer will use that implementation. Otherwise, it will use the implementation of Equals(object).
You need to pass an instance of an IEqualityComparer to the Union method. The method has an overload to pass in your comparer.
The easiest and ugliest solution is
var comparer = new CustomFormat(null,0);
l1 = l1.Union(l2, comparer).ToList();
You have made some mistakes in your implementation. You should not implement the IEqualityComparer method on your type (CustomFormat), but on a separate class, like CustomFormatComparer.
On your type (CustomFormat) you should implemented IEquatable.

Enumerable.Distinct - What methods does your custom class have to implement for it to work?

I've implemented every function that MSDN says is necessary, plus some additional comparison interfaces - nothing seems to work. Following is code (optimized for LinqPad).
The resulting output is all 4 items, not 2 like I expect.
Please don't post work arounds as answers - I want to know how Distinct works
void Main()
{
List<NameClass> results = new List<NameClass>();
results.Add(new NameClass("hello"));
results.Add(new NameClass("hello"));
results.Add(new NameClass("55"));
results.Add(new NameClass("55"));
results.Distinct().Dump();
}
// Define other methods and classes here
public class NameClass : Object
, IEquatable<NameClass>
, IComparer<NameClass>
, IComparable<NameClass>
, IEqualityComparer<NameClass>
, IEqualityComparer
, IComparable
{
public NameClass(string name)
{
Name = name;
}
public string Name { get; private set; }
public int Compare(NameClass x, NameClass y)
{
return String.Compare(x.Name, y.Name);
}
public int CompareTo(NameClass other)
{
return String.Compare(Name, other.Name);
}
public bool Equals(NameClass x, NameClass y)
{
return (0 == Compare(x, y));
}
public bool Equals(NameClass other)
{
return (0 == CompareTo(other));
}
public int GetHashCode(NameClass obj)
{
return obj.Name.GetHashCode();
}
public new int GetHashCode()
{
return Name.GetHashCode();
}
public new bool Equals(object a)
{
var x = a as NameClass;
if (null == x) { return false; }
return Equals(x);
}
public new bool Equals(object a, object b)
{
if (null == a && null == b) { return true; }
if (null == a && null != b) { return false; }
if (null != a && null == b) { return false; }
var x = a as NameClass;
var y = b as NameClass;
if (null == x && null == y) { return true; }
if (null == x && null != y) { return false; }
if (null != x && null == y) { return false; }
return x.Equals(y);
}
public int GetHashCode(object obj)
{
if (null == obj) { return 0; }
var x = obj as NameClass;
if (null != x) { return x.GetHashCode(); }
return obj.GetHashCode();
}
public int CompareTo(object obj)
{
if (obj == null) return 1;
NameClass x = obj as NameClass;
if (x == null)
{
throw new ArgumentException("Object is not a NameClass");
}
return CompareTo(x);
}
}
How Distinct works:
There is at least no implementation of Object.GetHashCode() which is used for initial comparison of objects: basic version of Distinct compares (actually puts in dictionary) by Object.GetHashCode first, than if hash code matches by Object.Equals.
To be precise Enumerable.Distinct(this IEnumerable source) uses EqualityComparer<NameClass>.Default to finally check for equality (note that if hash codes don't match it will not reach that portion of the comparison which is why your sample does not work).
The default equality comparer, Default, is used to compare values of the types that implement the IEquatable generic interface.
EqualityComparer.Default in turn actually allows to use class without IEquatable<T> at all falling back directly to Object.Equals:
The Default property checks whether type T implements the System.IEquatable interface and, if so, returns an EqualityComparer that uses that implementation. Otherwise, it returns an EqualityComparer that uses the overrides of Object.Equals and Object.GetHashCode provided by T.
So for basic Distinct to work you just need correct version of Equals/GetHashCode. IEquatable is optional, but must match behavior of GetHashCode in the class.
How to fix:
Your sample have public new int GetHashCode() method, which likely should be public override int GetHashCode() (Same for Equals).
Note that public new int... does not mean "override", but instead "create new version of the method that hides old one". It does not impact callers that call method via pointer to parent object.
Personally I think new should rarely be used in defining methods. Some suggestions when it is useful are covered in Usecases for method hiding using new.
You don't have to implement any interface, just GetHashCode and Equals methods correctly:
public class NameClass
{
public NameClass(string name)
{
Name = name;
}
public string Name { get; private set; }
public override bool Equals(object obj)
{
var other = obj as NameClass;
return other != null && other.Name == this.Name;
}
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
Enumerable.Distinct<TSource> Method:
It uses the default equality comparer, Default, to compare values.
EqualityComparer.Default:
The Default property checks whether type T implements the System.IEquatable<T> interface and, if so, returns an EqualityComparer<T> that uses that implementation. Otherwise, it returns an EqualityComparer<T> that uses the overrides of Object.Equals and Object.GetHashCode provided by T.
IEquatable<T> Interface:
If you implement IEquatable<T>, you should also override the base class implementations of Object.Equals(Object) and GetHashCode so that their behavior is consistent with that of the IEquatable<T>.Equals method.
Overriding methods:
The override modifier is required to extend or modify the abstract or virtual implementation of an inherited method, property, indexer, or event.
So your code should look like this:
public class NameClass : IEquatable<NameClass>
{
public NameClass(string name)
{
Name = name;
}
public string Name { get; private set; }
// implement IEquatable<NameClass>
public bool Equals(NameClass other)
{
return (other != null) && (Name == other.Name);
}
// override Object.Equals(Object)
public override bool Equals(object obj)
{
return Equals(obj as NameClass);
}
// override Object.GetHashCode()
public override GetHashCode()
{
return Name.GetHashCode();
}
}
So, first off, Distinct will, as per it's documentation, use EqualityComparer<T>.Default to compare objects if no custom equality comparer is provided (you provided none).
EqualityComparer<T>.Default, as per its documentation, will look to see if the object implements IEquatable<T>, if it does it will use that implementation of Equals.
Regardless of whether or not the type implements IEquatable<T>, EqualityComparer<T>.Default will use the object.GetHashCode method to get the has code of the object. IEquatable<T>, unfortunately, does not force you to also override the object's GetHashCode implementation, and in your case, while you do implement IEquatable<T>, your code does not override the object's GetHashCode implementation.
As a result of this Distinct is actually using the proper Equals method for your type, but it's using the wrong GetHashCode method. Whenever you're hashing objects and that type has an Equals and GetHashCode implementation that's out of sync problems ensue. What's happening is that in whatever hash based collection it's sending the two "equal" objects to different buckets, so they never even get to the point where their Equals methods are called on each other. If you happened to get lucky and there was a hash collection and the objects were coincidentally sent to the same bucket, then, since the Equals method is what you intended it would actually work, but the odds of that happening are...very low. (In this specific case, about 2/2147483647, or
9.3e-10.
While you do provide a new GetHashCode method in NameClass, it is hiding the object implementation, not overriding it. If you change your GetHashCode implementation to use override rather than new then your code will work.
I just realized I messed up my sample code - my class derives from DependencyObject, not Object. I can't override thew GetHashCode or Equals functions because the DependencyObject class is sealed.

How to get the distinct data from a list?

I want to get distinct list from list of persons .
List<Person> plst = cl.PersonList;
How to do this through LINQ. I want to store the result in List<Person>
Distinct() will give you distinct values - but unless you've overridden Equals / GetHashCode() you'll just get distinct references. For example, if you want two Person objects to be equal if their names are equal, you need to override Equals/GetHashCode to indicate that. (Ideally, implement IEquatable<Person> as well as just overriding Equals(object).)
You'll then need to call ToList() to get the results back as a List<Person>:
var distinct = plst.Distinct().ToList();
If you want to get distinct people by some specific property but that's not a suitable candidate for "natural" equality, you'll either need to use GroupBy like this:
var people = plst.GroupBy(p => p.Name)
.Select(g => g.First())
.ToList();
or use the DistinctBy method from MoreLINQ:
var people = plst.DistinctBy(p => p.Name).ToList();
Using the Distinct extension method will return an IEnumerable which you can then do a ToList() on:
List<Person> plst = cl.PersonList.Distinct().ToList();
You can use Distinct method, you will need to Implement IEquatable and override equals and hashcode.
public class Person : IEquatable<Person>
{
public string Name { get; set; }
public int Code { get; set; }
public bool Equals(Person other)
{
//Check whether the compared object is null.
if (Object.ReferenceEquals(other, null)) return false;
//Check whether the compared object references the same data.
if (Object.ReferenceEquals(this, other)) return true;
//Check whether the person' properties are equal.
return Code.Equals(other.Code) && Name.Equals(other.Name);
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public override int GetHashCode()
{
//Get hash code for the Name field if it is not null.
int hashPersonName = Name == null ? 0 : Name.GetHashCode();
//Get hash code for the Code field.
int hashPersonCode = Code.GetHashCode();
//Calculate the hash code for the person.
return hashPersonName ^ hashPersonCode;
}
}
var distinctPersons = plst.Distinct().ToList();

Inheriting the equality comparer

I have a Customer class.
public class Customer
{
private string _id;
private string _name;
// some more properties follow
I am inheriting the EqualityComparer form MyEqualityComparer(of Customer).
This I am intending to use in LINQ queries.
MyEqualityComparer is intended for partial check between two objects.
If the customer.id and customer.name matches I treat the objects the equal.
public class MyComparer : System.Collections.Generic.EqualityComparer<Customer>
{
public override bool Equals(Customer x, Customer y)
{
if (x.Id == y.Id && x.Name == y.Name)
return true;
else
return false;
}
public override int GetHashCode(Customer obj)
{
return string.Concat(obj.Id,obj.Name).GetHashCode();
}
}
I referred to generating hashcode.
I am little unsure about concatenating strings and using that as a hashcode.
Is this safe and sound what I am trying to do ?
See this question on hashcodes for a pretty simple way to return one hashcode based on multiple fields.
Having said that, I wouldn't derive from EqualityComparer<T> myself - I'd just implement IEqualityComparer<T> directly. I'm not sure what value EqualityComparer<T> is really giving you here, other than also implementing the non-generic IEqualityComparer.
A couple more things:
You should handle nullity in Equals
Your present Equals code can be simplified to:
return x.Id == y.Id && x.Name == y.Name;
A fuller implementation of Equals might be:
public override bool Equals(Customer x, Customer y)
{
if (object.ReferenceEquals(x, y))
{
return true;
}
if (x == null || y == null)
{
return false;
}
return x.Id == y.Id && x.Name == y.Name;
}
You should see it from the perspective of possible "collisions", e.g. when two different objects get the same hash code. This could be the case with such pairs as "1,2any" and "12, any", the values in the pairs being "id" and "name". If this is not possible with your data, you're good to go. Otherwise you can change it to something like:
return obj.Id.GetHashCode() ^ obj.Name.GetHashCode();
Resharper (fantastic refactoring plugin from JetBrains) thinks it should be:
public override int GetHashCode(Customer obj)
{
unchecked
{
return ((obj.Id != null ? obj.Id.GetHashCode() : 0) * 397)
^ (obj.Name != null ? obj.Name.GetHashCode() : 0);
}
}
I have to admit I almost always just let Resharper generate the equality and hash code implementations for me. I've tested their implementation a great deal and found it to be as good if not better than anything I'd write by hand. So I'll usually take the implementation I don't have to type.

How do I implement IEqualityComparer on an immutable generic Pair struct?

Currently I have this (edited after reading advice):
struct Pair<T, K> : IEqualityComparer<Pair<T, K>>
{
readonly private T _first;
readonly private K _second;
public Pair(T first, K second)
{
_first = first;
_second = second;
}
public T First { get { return _first; } }
public K Second { get { return _second; } }
#region IEqualityComparer<Pair<T,K>> Members
public bool Equals(Pair<T, K> x, Pair<T, K> y)
{
return x.GetHashCode(x) == y.GetHashCode(y);
}
public int GetHashCode(Pair<T, K> obj)
{
int hashCode = obj.First == null ? 0 : obj._first.GetHashCode();
hashCode ^= obj.Second == null ? 0 : obj._second.GetHashCode();
return hashCode;
}
#endregion
public override int GetHashCode()
{
return this.GetHashCode(this);
}
public override bool Equals(object obj)
{
return (obj != null) &&
(obj is Pair<T, K>) &&
this.Equals(this, (Pair<T, K>) obj);
}
}
The problem is that First and Second may not be reference types (VS actually warns me about this), but the code still compiles. Should I cast them (First and Second) to objects before I compare them, or is there a better way to do this?
Edit:
Note that I want this struct to support value and reference types (in other words, constraining by class is not a valid solution)
Edit 2:
As to what I'm trying to achieve, I want this to work in a Dictionary. Secondly, SRP isn't important to me right now because that isn't really the essence of this problem - it can always be refactored later. Thirdly, comparing to default(T) will not work in lieu of comparing to null - try it.
Your IEqualityComparer implementation should be a different class (and definately not a struct as you want to reuse the reference).
Also, your hashcode should never be cached, as the default GetHashcode implementation for a struct (which you do not override) will take that member into account.
It looks like you need IEquatable instead:
internal struct Pair<T, K> : IEquatable<Pair<T, K>>
{
private readonly T _first;
private readonly K _second;
public Pair(T first, K second)
{
_first = first;
_second = second;
}
public T First
{
get { return _first; }
}
public K Second
{
get { return _second; }
}
public bool Equals(Pair<T, K> obj)
{
return Equals(obj._first, _first) && Equals(obj._second, _second);
}
public override bool Equals(object obj)
{
return obj is Pair<T, K> && Equals((Pair<T, K>) obj);
}
public override int GetHashCode()
{
unchecked
{
return (_first != null ? _first.GetHashCode() * 397 : 0) ^ (_second != null ? _second.GetHashCode() : 0);
}
}
}
If you use hashcodes in comparing methods, you should check for "realy value" if the hash codes are same.
bool result = ( x._hashCode == y._hashCode );
if ( result ) { result = ( x._first == y._first && x._second == y._second ); }
// OR?: if ( result ) { result = object.Equals( x._first, y._first ) && object.Equals( x._second, y._second ); }
// OR?: if ( result ) { result = object.ReferenceEquals( x._first, y._first ) && object.Equals( x._second, y._second ); }
return result;
But there is littlebit problem with comparing "_first" and "_second" fields.
By default reference types uses fore equality comparing "object.ReferenceEquals" method, bud they can override them. So the correct solution depends on the "what exactly should do" the your comparing method. Should use "Equals" method of the "_first" & "_second" fields, or object.ReferenceEquals ? Or something more complex?
Regarding the warning, you can use default(T) and default(K) instead of null.
I can't see what you're trying to achieve, but you shouldn't be using the hashcode to compare for equality - there is no guarantee that two different objects won't have the same hashcode. Also even though your struct is immutable, the members _first and _second aren't.
First of all this code violates SRP principle. Pair class used to hold pairs if items, right? It's incorrect to delegate equality comparing functionality to it.
Next let take a look at your code:
Equals method will fail if one of the arguments is null - no good. Equals uses hash code of Pair class, but take a look at the definition of GetHashCode, it just a combination of pair members hash codes - it's has nothing to do with equality of items. I would expect that Equals method will compare actual data. I'm too busy at the moment to provide correct implementation, unfortunately. But from the first look, you code seems to be wrong. It would be better if you provide us description of what you want to achieve. I'm sure SO members will be able to give you some advices.
Might I suggest the use of Lambda expressions as a parameter ?
this would allow you to specify how to compare the internal generic types.
I don't get any warning when compiling about this but I assume you are talking about the == null comparison? A cast seems like it would make this all somewhat cleaner, yes.
PS. You really should use a separate class for the comparer. This class that fills two roles (being a pair and comparing pairs) is plain ugly.

Categories

Resources