I have a Customer class.
public class Customer
{
private string _id;
private string _name;
// some more properties follow
I am inheriting the EqualityComparer form MyEqualityComparer(of Customer).
This I am intending to use in LINQ queries.
MyEqualityComparer is intended for partial check between two objects.
If the customer.id and customer.name matches I treat the objects the equal.
public class MyComparer : System.Collections.Generic.EqualityComparer<Customer>
{
public override bool Equals(Customer x, Customer y)
{
if (x.Id == y.Id && x.Name == y.Name)
return true;
else
return false;
}
public override int GetHashCode(Customer obj)
{
return string.Concat(obj.Id,obj.Name).GetHashCode();
}
}
I referred to generating hashcode.
I am little unsure about concatenating strings and using that as a hashcode.
Is this safe and sound what I am trying to do ?
See this question on hashcodes for a pretty simple way to return one hashcode based on multiple fields.
Having said that, I wouldn't derive from EqualityComparer<T> myself - I'd just implement IEqualityComparer<T> directly. I'm not sure what value EqualityComparer<T> is really giving you here, other than also implementing the non-generic IEqualityComparer.
A couple more things:
You should handle nullity in Equals
Your present Equals code can be simplified to:
return x.Id == y.Id && x.Name == y.Name;
A fuller implementation of Equals might be:
public override bool Equals(Customer x, Customer y)
{
if (object.ReferenceEquals(x, y))
{
return true;
}
if (x == null || y == null)
{
return false;
}
return x.Id == y.Id && x.Name == y.Name;
}
You should see it from the perspective of possible "collisions", e.g. when two different objects get the same hash code. This could be the case with such pairs as "1,2any" and "12, any", the values in the pairs being "id" and "name". If this is not possible with your data, you're good to go. Otherwise you can change it to something like:
return obj.Id.GetHashCode() ^ obj.Name.GetHashCode();
Resharper (fantastic refactoring plugin from JetBrains) thinks it should be:
public override int GetHashCode(Customer obj)
{
unchecked
{
return ((obj.Id != null ? obj.Id.GetHashCode() : 0) * 397)
^ (obj.Name != null ? obj.Name.GetHashCode() : 0);
}
}
I have to admit I almost always just let Resharper generate the equality and hash code implementations for me. I've tested their implementation a great deal and found it to be as good if not better than anything I'd write by hand. So I'll usually take the implementation I don't have to type.
Related
I am trying to get a list of distinct items from a custom collection, however the comparison seems to be getting ignored as I keep getting duplicates appearing in my list. I have debugged the code and I can clearly see that the values in the list that I am comparing are equal...
NOTE: The Id and Id2 values are strings
Custom Comparer:
public class UpsellSimpleComparer : IEqualityComparer<UpsellProduct>
{
public bool Equals(UpsellProduct x, UpsellProduct y)
{
return x.Id == y.Id && x.Id2 == y.Id2;
}
public int GetHashCode(UpsellProduct obj)
{
return obj.GetHashCode();
}
}
Calling code:
var upsellProducts = (Settings.SelectedSeatingPageGuids.Contains(CurrentItem.ID.ToString())) ?
GetAOSUpsellProducts(selectedProductIds) : GetGeneralUpsellProducts(selectedProductIds);
// we use a special comparer here so that same items are not included
var comparer = new UpsellSimpleComparer();
return upsellProducts.Distinct(comparer);
Most likely UpsellProduct has default implementation of GetHashCode that returns unique value for each instance of reference type.
To fix - either implement one correctly in UpsellProduct or in comparer.
public class UpsellSimpleComparer : IEqualityComparer<UpsellProduct>
{
public bool Equals(UpsellProduct x, UpsellProduct y)
{
return x.Id == y.Id && x.Id2 == y.Id2;
}
// sample, correct GetHashCode is a bit more complex
public int GetHashCode(UpsellProduct obj)
{
return obj.Id.GetHashCode() ^ obj.Id2.GetHashCode();
}
}
Note for better code to compute combined GetHashCode check Concise way to combine field hashcodes? and Is it possible to combine hash codes for private members to generate a new hash code?
Your GetHashCode() doesn't return the same values even two UpsellProduct instances are consider equals by your Equals() method.
Use something like this to reflect the same logic instead.
public int GetHashCode(UpsellProduct obj)
{
return obj.Id.GetHashCode() ^ obj.Id2.GetHashCode();
}
Lets say I Have class:
public class Ident
{
public String Name { get; set; }
public String SName { get; set; }
}
and also one more:
class IdenNode
{
public Ident id { get; set; }
public List<IdenNode> Nodes { get; set; }
public IdenNode()
{
Nodes = new List<IdenNode>();
}
}
I want to use HashSet<IdenNode> with mind that two elements of it are same(Equal) if and only if their id.Names are Equal.
So, I'm gonna override Equals and GetHashCode like next:
public override bool Equals(object obj)
{
IdenNode otherNode = obj as IdenNode;
return otherNode != null &&
otherNode.id != null &&
id.Name == otherNode.id.Name;
}
public override int GetHashCode()
{
if (id != null)
return id.Name.GetHashCode();
else
// what should I write here?
}
Am I think right? What should I place in GetHashCode if so?
UPDATE
Could please tell me is it OK to use == and != in Equals method? Or may be ReferenceEquals or some other?
Also, should I override operators == and != ?
If id (or id.Name) is null then it's perfectly fine to return 0. Nullable<T> (like int?) returns 0 for "null" values.
Keep in mind that two objects returning the same value from GetHashCode() does NOT imply equality - it only implies that two objects might be equal. The flip, however, is that two "equal" objects must return the same hash code. Both principles seem to be fulfilled by your definition of Equals and GetHashCode
Beware of nulls! You've got a lot of them. Take care of StackOverflow: try not use == and != within Equals method. Usually, we return 0 as a hash code in case of null, e.g.:
public override bool Equals(object obj) {
// Often we should compare an instance with itself,
// so let's have a special case for it (optimization)
if (Object.ReferenceEquals(obj, this))
return true;
IdenNode other = obj as IdenNode;
// otherNode != null line in your code can cause StackOverflow:
// "!=" calls "Equals" which in turn calls "!=" etc...
if (Object.ReferenceEquals(null, other))
return false;
// Id can be null
if (Object.ReferenceEquals(id, other.id))
return true;
else if (Object.ReferenceEquals(id, null) || Object.ReferenceEquals(other.id, null))
return false;
// Let's be exact when comparing strings:
// i.e. should we use current locale or not etc
return String.Equals(id.Name, other.id.Name, StringComparison.Ordinal);
}
public override int GetHashCode() {
// It's typical to return 0 in case of null
if (Object.ReferenceEquals(null, id))
return 0;
else if (Object.ReferenceEquals(null, id.Name)) // <- Name can be null as well!
return 0;
return id.Name.GetHashCode();
}
What should I place in GetHashCode if so?
Returning zero is fine. Note that defining value equality on a name is a bad idea; I know of at least three other Eric Lipperts in the United States and they're not me. There are literally millions, possibly billions, of people who have a name collision.
Could please tell me is it OK to use "==" and "!=" in Equals method? Or may be ReferenceEquals or some other?
My advice is: when mixing reference and value equality, be very clear. If you intend reference equality, say so.
Also, should I override operators "==" and "!=" ?
Yes. It is confusing to have Equals mean one thing and == mean another.
I've implemented every function that MSDN says is necessary, plus some additional comparison interfaces - nothing seems to work. Following is code (optimized for LinqPad).
The resulting output is all 4 items, not 2 like I expect.
Please don't post work arounds as answers - I want to know how Distinct works
void Main()
{
List<NameClass> results = new List<NameClass>();
results.Add(new NameClass("hello"));
results.Add(new NameClass("hello"));
results.Add(new NameClass("55"));
results.Add(new NameClass("55"));
results.Distinct().Dump();
}
// Define other methods and classes here
public class NameClass : Object
, IEquatable<NameClass>
, IComparer<NameClass>
, IComparable<NameClass>
, IEqualityComparer<NameClass>
, IEqualityComparer
, IComparable
{
public NameClass(string name)
{
Name = name;
}
public string Name { get; private set; }
public int Compare(NameClass x, NameClass y)
{
return String.Compare(x.Name, y.Name);
}
public int CompareTo(NameClass other)
{
return String.Compare(Name, other.Name);
}
public bool Equals(NameClass x, NameClass y)
{
return (0 == Compare(x, y));
}
public bool Equals(NameClass other)
{
return (0 == CompareTo(other));
}
public int GetHashCode(NameClass obj)
{
return obj.Name.GetHashCode();
}
public new int GetHashCode()
{
return Name.GetHashCode();
}
public new bool Equals(object a)
{
var x = a as NameClass;
if (null == x) { return false; }
return Equals(x);
}
public new bool Equals(object a, object b)
{
if (null == a && null == b) { return true; }
if (null == a && null != b) { return false; }
if (null != a && null == b) { return false; }
var x = a as NameClass;
var y = b as NameClass;
if (null == x && null == y) { return true; }
if (null == x && null != y) { return false; }
if (null != x && null == y) { return false; }
return x.Equals(y);
}
public int GetHashCode(object obj)
{
if (null == obj) { return 0; }
var x = obj as NameClass;
if (null != x) { return x.GetHashCode(); }
return obj.GetHashCode();
}
public int CompareTo(object obj)
{
if (obj == null) return 1;
NameClass x = obj as NameClass;
if (x == null)
{
throw new ArgumentException("Object is not a NameClass");
}
return CompareTo(x);
}
}
How Distinct works:
There is at least no implementation of Object.GetHashCode() which is used for initial comparison of objects: basic version of Distinct compares (actually puts in dictionary) by Object.GetHashCode first, than if hash code matches by Object.Equals.
To be precise Enumerable.Distinct(this IEnumerable source) uses EqualityComparer<NameClass>.Default to finally check for equality (note that if hash codes don't match it will not reach that portion of the comparison which is why your sample does not work).
The default equality comparer, Default, is used to compare values of the types that implement the IEquatable generic interface.
EqualityComparer.Default in turn actually allows to use class without IEquatable<T> at all falling back directly to Object.Equals:
The Default property checks whether type T implements the System.IEquatable interface and, if so, returns an EqualityComparer that uses that implementation. Otherwise, it returns an EqualityComparer that uses the overrides of Object.Equals and Object.GetHashCode provided by T.
So for basic Distinct to work you just need correct version of Equals/GetHashCode. IEquatable is optional, but must match behavior of GetHashCode in the class.
How to fix:
Your sample have public new int GetHashCode() method, which likely should be public override int GetHashCode() (Same for Equals).
Note that public new int... does not mean "override", but instead "create new version of the method that hides old one". It does not impact callers that call method via pointer to parent object.
Personally I think new should rarely be used in defining methods. Some suggestions when it is useful are covered in Usecases for method hiding using new.
You don't have to implement any interface, just GetHashCode and Equals methods correctly:
public class NameClass
{
public NameClass(string name)
{
Name = name;
}
public string Name { get; private set; }
public override bool Equals(object obj)
{
var other = obj as NameClass;
return other != null && other.Name == this.Name;
}
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
Enumerable.Distinct<TSource> Method:
It uses the default equality comparer, Default, to compare values.
EqualityComparer.Default:
The Default property checks whether type T implements the System.IEquatable<T> interface and, if so, returns an EqualityComparer<T> that uses that implementation. Otherwise, it returns an EqualityComparer<T> that uses the overrides of Object.Equals and Object.GetHashCode provided by T.
IEquatable<T> Interface:
If you implement IEquatable<T>, you should also override the base class implementations of Object.Equals(Object) and GetHashCode so that their behavior is consistent with that of the IEquatable<T>.Equals method.
Overriding methods:
The override modifier is required to extend or modify the abstract or virtual implementation of an inherited method, property, indexer, or event.
So your code should look like this:
public class NameClass : IEquatable<NameClass>
{
public NameClass(string name)
{
Name = name;
}
public string Name { get; private set; }
// implement IEquatable<NameClass>
public bool Equals(NameClass other)
{
return (other != null) && (Name == other.Name);
}
// override Object.Equals(Object)
public override bool Equals(object obj)
{
return Equals(obj as NameClass);
}
// override Object.GetHashCode()
public override GetHashCode()
{
return Name.GetHashCode();
}
}
So, first off, Distinct will, as per it's documentation, use EqualityComparer<T>.Default to compare objects if no custom equality comparer is provided (you provided none).
EqualityComparer<T>.Default, as per its documentation, will look to see if the object implements IEquatable<T>, if it does it will use that implementation of Equals.
Regardless of whether or not the type implements IEquatable<T>, EqualityComparer<T>.Default will use the object.GetHashCode method to get the has code of the object. IEquatable<T>, unfortunately, does not force you to also override the object's GetHashCode implementation, and in your case, while you do implement IEquatable<T>, your code does not override the object's GetHashCode implementation.
As a result of this Distinct is actually using the proper Equals method for your type, but it's using the wrong GetHashCode method. Whenever you're hashing objects and that type has an Equals and GetHashCode implementation that's out of sync problems ensue. What's happening is that in whatever hash based collection it's sending the two "equal" objects to different buckets, so they never even get to the point where their Equals methods are called on each other. If you happened to get lucky and there was a hash collection and the objects were coincidentally sent to the same bucket, then, since the Equals method is what you intended it would actually work, but the odds of that happening are...very low. (In this specific case, about 2/2147483647, or
9.3e-10.
While you do provide a new GetHashCode method in NameClass, it is hiding the object implementation, not overriding it. If you change your GetHashCode implementation to use override rather than new then your code will work.
I just realized I messed up my sample code - my class derives from DependencyObject, not Object. I can't override thew GetHashCode or Equals functions because the DependencyObject class is sealed.
In the code block below I would expect dictCars to contain:
{ Chevy:Camaro, Dodge:Charger }
But, dictCars comes back empty. Because this line returns false each time it's called:
if(myCars.Contains(new Car(Convert.ToInt64(strCar.Split(':')[1]),strCar.Split(':')[2])))
Code block:
public class Car
{
public long CarID { get; set; }
public string CarName { get; set; }
public Car(long CarID, string CarName)
{
this.CarID = CarID;
this.CarName = CarName;
}
}
List<Car> myCars = new List<Car>();
myCars.Add(new Car(0,"Pinto"));
myCars.Add(new Car(2,"Camaro"));
myCars.Add(new Car(3,"Charger"));
Dictionary<string, string> dictCars = new Dictionary<string, string>();
string strCars = "Ford:1:Mustang,Chevy:2:Camaro,Dodge:3:Charger";
String[] arrCars = strCars.Split(',');
foreach (string strCar in arrCars)
{
if(myCars.Contains(new Car(Convert.ToInt64(strCar.Split(':')[1]),strCar.Split(':')[2])))
{
if (!dictCars.ContainsKey(strCar.Split(':')[0]))
{
dictCars.Add(strCar.Split(':')[0], strCar.Split(':')[2]);
}
}
}
return dictCars;
Question: What am I doing wrong with my List.Contains implementation?
Thanks in advance!
You need to tell Contains what makes two Cars equal. By default it will use ReferenceEquals which will only call two objects equal if they are the same instance.
Either override Equals and GetHashCode in your Car class or define an IEqualityComparer<Car> class and pass that to Contains.
If two Cars that have the same CarID are "equal" then the implementation is pretty straightforward:
public override bool Equals(object o)
{
if(o.GetType() != typeof(Car))
return false;
return (this.CarID == ((Car)o).CarID);
}
public override int GetHashCode()
{
return CarID.GetHashCode();
}
Your Car class is a reference type. By default reference types are compared to each other by reference, meaning they are considered the same if they reference the same instance in memory. In your case you want them to be considered equal if they contain the same values.
To change the equality behavior, you need to override Equals and GetHashCode.
If two cars are equal only when ID and Name are equal, the following is one possible implementation of the equality members:
protected bool Equals(Car other)
{
return CarID == other.CarID && string.Equals(CarName, other.CarName);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj))
return false;
if (ReferenceEquals(this, obj))
return true;
var other = obj as Car;
return other != null && Equals(other);
}
public override int GetHashCode()
{
unchecked
{
return (CarID.GetHashCode() * 397) ^
(CarName != null ? CarName.GetHashCode() : 0);
}
}
This implementation has been created automatically by ReSharper.
It takes into account null values and the possibility of sub-classes of Car. Additionally, it provides a useful implementation of GetHashCode.
You can add this code, by implementing IEquatable
public class Car: IEquatable<Car>
{
......
public bool Equals( Car other )
{
return this.CarID == other.CarID && this.CarName == other.CarName;
}
}
Link : http://msdn.microsoft.com/fr-fr/library/vstudio/ms131187.aspx
You are assuming that two Car instances that have the same CarID and CarName are equal.
This is incorrect. By default, each new Car(...) is different from each other car, since they are references to different objects.
There are a few ways to "fix" that:
Use a struct instead of a class for your Car.
Structs inherit ValueType's default implementation of Equals, which compares all fields and properties to determine equality.
Note that in this case, it is recommended that you make your Car struct immutable to avoid common problems with mutable structs.
Override Equals and GetHashCode.
That way, List.Contains will know that you intend Cars with the same ID and Name to be equal.
Use another method instead of List.Contains.
For example, Enumerable.Any allows you to specify a predicate that can be matched:
bool exists = myCars.Any(car => car.ID == Convert.ToInt64(strCar.Split(':')[1])
&& car.Name = strCar.Split(':')[2]);
You need to implement Equals. Most probably as:
public override bool Equals(object obj)
{
Car car = obj as Car;
if(car == null) return false;
return car.CarID == this.CarID && car.CarName == this.CarName;
}
Your car class needs to implement interface IEquatable and define an Equals method, otherwise the contains method is comparing the underlying references.
You need to implement the IEqualityComparer
More information on how to do it can be found here;
http://msdn.microsoft.com/en-us/library/bb339118.aspx
// Custom comparer for the class
class CarComparer : IEqualityComparer<Car>
{
// Products are equal if their names and product numbers are equal.
public bool Equals(Car x, Car y)
{
//Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
//Check whether the properties are equal.
return x.CarID == y.CarID && x.CarName == y.CarName;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(Car car)
{
//Check whether the object is null
if (Object.ReferenceEquals(car, null)) return 0;
//Get hash code for the Name field if it is not null.
string hashCarName = car.CarName == null ? 0 : car.CarName.GetHashCode();
//Get hash code for the ID field.
int hashCarID = car.CarID.GetHashCode();
//Calculate the hash code for the product.
return hashCarName ^ hashCarID;
}
Check for equality;
CarComparer carComp = new CarComparer();
bool blnIsEqual = CarList1.Contains(CarList2, carComp);
A collection can never "contain" a newly newed object which uses the default Object.Equals comparison. (The default comparison is ReferenceEquals, which simply compares instances. This will never be true comparing an existing Car with a new Car())
To use Contains in this way, you will need to either:
Override Car.Equals (and Car.GetHashCode) to specify what it means to be equivalent, or
Implement an IEqualityComparer<Car> to compare the instances and specify that in your call to Contains.
Note the side effect that in the first option, other uses of Car.Equals(Car) will also use this comparison.
Otherwise, you can use Any and specify the comparison yourself (but IMHO this smells a little funny - a Car should know how to compare itself):
if(myCars.Any(c=> c.CarID == Convert.ToInt64(strCar.Split(':')[1]) && c.CarName == strCar.Split(':')[2]))
myCars.Contains(newCar)
myCars.Where(c => c.CarID == newCar.CarID && c.CarName==newCar.CarName).Count() > 0
Currently I have this (edited after reading advice):
struct Pair<T, K> : IEqualityComparer<Pair<T, K>>
{
readonly private T _first;
readonly private K _second;
public Pair(T first, K second)
{
_first = first;
_second = second;
}
public T First { get { return _first; } }
public K Second { get { return _second; } }
#region IEqualityComparer<Pair<T,K>> Members
public bool Equals(Pair<T, K> x, Pair<T, K> y)
{
return x.GetHashCode(x) == y.GetHashCode(y);
}
public int GetHashCode(Pair<T, K> obj)
{
int hashCode = obj.First == null ? 0 : obj._first.GetHashCode();
hashCode ^= obj.Second == null ? 0 : obj._second.GetHashCode();
return hashCode;
}
#endregion
public override int GetHashCode()
{
return this.GetHashCode(this);
}
public override bool Equals(object obj)
{
return (obj != null) &&
(obj is Pair<T, K>) &&
this.Equals(this, (Pair<T, K>) obj);
}
}
The problem is that First and Second may not be reference types (VS actually warns me about this), but the code still compiles. Should I cast them (First and Second) to objects before I compare them, or is there a better way to do this?
Edit:
Note that I want this struct to support value and reference types (in other words, constraining by class is not a valid solution)
Edit 2:
As to what I'm trying to achieve, I want this to work in a Dictionary. Secondly, SRP isn't important to me right now because that isn't really the essence of this problem - it can always be refactored later. Thirdly, comparing to default(T) will not work in lieu of comparing to null - try it.
Your IEqualityComparer implementation should be a different class (and definately not a struct as you want to reuse the reference).
Also, your hashcode should never be cached, as the default GetHashcode implementation for a struct (which you do not override) will take that member into account.
It looks like you need IEquatable instead:
internal struct Pair<T, K> : IEquatable<Pair<T, K>>
{
private readonly T _first;
private readonly K _second;
public Pair(T first, K second)
{
_first = first;
_second = second;
}
public T First
{
get { return _first; }
}
public K Second
{
get { return _second; }
}
public bool Equals(Pair<T, K> obj)
{
return Equals(obj._first, _first) && Equals(obj._second, _second);
}
public override bool Equals(object obj)
{
return obj is Pair<T, K> && Equals((Pair<T, K>) obj);
}
public override int GetHashCode()
{
unchecked
{
return (_first != null ? _first.GetHashCode() * 397 : 0) ^ (_second != null ? _second.GetHashCode() : 0);
}
}
}
If you use hashcodes in comparing methods, you should check for "realy value" if the hash codes are same.
bool result = ( x._hashCode == y._hashCode );
if ( result ) { result = ( x._first == y._first && x._second == y._second ); }
// OR?: if ( result ) { result = object.Equals( x._first, y._first ) && object.Equals( x._second, y._second ); }
// OR?: if ( result ) { result = object.ReferenceEquals( x._first, y._first ) && object.Equals( x._second, y._second ); }
return result;
But there is littlebit problem with comparing "_first" and "_second" fields.
By default reference types uses fore equality comparing "object.ReferenceEquals" method, bud they can override them. So the correct solution depends on the "what exactly should do" the your comparing method. Should use "Equals" method of the "_first" & "_second" fields, or object.ReferenceEquals ? Or something more complex?
Regarding the warning, you can use default(T) and default(K) instead of null.
I can't see what you're trying to achieve, but you shouldn't be using the hashcode to compare for equality - there is no guarantee that two different objects won't have the same hashcode. Also even though your struct is immutable, the members _first and _second aren't.
First of all this code violates SRP principle. Pair class used to hold pairs if items, right? It's incorrect to delegate equality comparing functionality to it.
Next let take a look at your code:
Equals method will fail if one of the arguments is null - no good. Equals uses hash code of Pair class, but take a look at the definition of GetHashCode, it just a combination of pair members hash codes - it's has nothing to do with equality of items. I would expect that Equals method will compare actual data. I'm too busy at the moment to provide correct implementation, unfortunately. But from the first look, you code seems to be wrong. It would be better if you provide us description of what you want to achieve. I'm sure SO members will be able to give you some advices.
Might I suggest the use of Lambda expressions as a parameter ?
this would allow you to specify how to compare the internal generic types.
I don't get any warning when compiling about this but I assume you are talking about the == null comparison? A cast seems like it would make this all somewhat cleaner, yes.
PS. You really should use a separate class for the comparer. This class that fills two roles (being a pair and comparing pairs) is plain ugly.