How should I override Equals and GetHashCode for HashSet? - c#

Lets say I Have class:
public class Ident
{
public String Name { get; set; }
public String SName { get; set; }
}
and also one more:
class IdenNode
{
public Ident id { get; set; }
public List<IdenNode> Nodes { get; set; }
public IdenNode()
{
Nodes = new List<IdenNode>();
}
}
I want to use HashSet<IdenNode> with mind that two elements of it are same(Equal) if and only if their id.Names are Equal.
So, I'm gonna override Equals and GetHashCode like next:
public override bool Equals(object obj)
{
IdenNode otherNode = obj as IdenNode;
return otherNode != null &&
otherNode.id != null &&
id.Name == otherNode.id.Name;
}
public override int GetHashCode()
{
if (id != null)
return id.Name.GetHashCode();
else
// what should I write here?
}
Am I think right? What should I place in GetHashCode if so?
UPDATE
Could please tell me is it OK to use == and != in Equals method? Or may be ReferenceEquals or some other?
Also, should I override operators == and != ?

If id (or id.Name) is null then it's perfectly fine to return 0. Nullable<T> (like int?) returns 0 for "null" values.
Keep in mind that two objects returning the same value from GetHashCode() does NOT imply equality - it only implies that two objects might be equal. The flip, however, is that two "equal" objects must return the same hash code. Both principles seem to be fulfilled by your definition of Equals and GetHashCode

Beware of nulls! You've got a lot of them. Take care of StackOverflow: try not use == and != within Equals method. Usually, we return 0 as a hash code in case of null, e.g.:
public override bool Equals(object obj) {
// Often we should compare an instance with itself,
// so let's have a special case for it (optimization)
if (Object.ReferenceEquals(obj, this))
return true;
IdenNode other = obj as IdenNode;
// otherNode != null line in your code can cause StackOverflow:
// "!=" calls "Equals" which in turn calls "!=" etc...
if (Object.ReferenceEquals(null, other))
return false;
// Id can be null
if (Object.ReferenceEquals(id, other.id))
return true;
else if (Object.ReferenceEquals(id, null) || Object.ReferenceEquals(other.id, null))
return false;
// Let's be exact when comparing strings:
// i.e. should we use current locale or not etc
return String.Equals(id.Name, other.id.Name, StringComparison.Ordinal);
}
public override int GetHashCode() {
// It's typical to return 0 in case of null
if (Object.ReferenceEquals(null, id))
return 0;
else if (Object.ReferenceEquals(null, id.Name)) // <- Name can be null as well!
return 0;
return id.Name.GetHashCode();
}

What should I place in GetHashCode if so?
Returning zero is fine. Note that defining value equality on a name is a bad idea; I know of at least three other Eric Lipperts in the United States and they're not me. There are literally millions, possibly billions, of people who have a name collision.
Could please tell me is it OK to use "==" and "!=" in Equals method? Or may be ReferenceEquals or some other?
My advice is: when mixing reference and value equality, be very clear. If you intend reference equality, say so.
Also, should I override operators "==" and "!=" ?
Yes. It is confusing to have Equals mean one thing and == mean another.

Related

Overriding GetHashCode with different properties

I have this object:
public class Foo {
public string MyOwnId { get; set; }
public Guid FooGuid { get; } = Guid.NewGuid();
}
I would like Equals() to only care about those with MyOwnId, otherwise they are never equal. When a Foo has a MyOwnId I try to use it, otherwise I want to use FooGuid.
Since FooGuid probably never will be the same, I did something like this:
public bool Equals(Foo foo) {
if (foo== null) return false;
return MyOwnId.Equals(foo.MyOwnId);
}
public override bool Equals(object obj) {
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((Foo)obj);
}
public override int GetHashCode() {
int hash = 13;
hash = (hash*7) + (!string.IsNullOrEmpty(MyOwnId) ? MyOwnId.GetHashCode() : FooGuid.GetHashCode());
return hash;
}
Is this a proper way to do what I want? Or do I also need change my Equals method so it looks the same like my GetHashCode? For e.g:
public bool Equals(Foo foo) {
if (foo == null) return false;
if (string.IsNullOrEmpty(MyOwnId) || string.IsNullOrEmpty(foo.MyOwnId)) return false;
return MyOwnId.Equals(foo.MyOwnId);
}
Well, let's see. Your implementation of Equals and GetHashCode is erroneous.
Both Equals and GetHashCode must never throw an exception; the counter example is
Foo A = new Foo();
Foo B = new Foo() {
MyOwnId = "bla-bla-bla",
};
// Throws an exception
if (A.Equals(B)) {}
If two instances are equal via Equals these instances must have the same hash code; the counter example is
Foo A = new Foo() {
MyOwnId = "",
};
Foo B = new Foo() {
MyOwnId = "",
};
if (A.Equals(B)) {
// Hashcodes must be equal and they are not
Console.Write(String.Format("{0} != {1}", A.GetHashCode(), B.GetHashCode()));
}
Possible (simplest) implementation
// since you've declared Equals(Foo other) let others know via interface implementation
public class Foo: IEquatable<Foo> {
public string MyOwnId { get; set; }
public Guid FooGuid { get; } = Guid.NewGuid();
public bool Equals(Foo other) {
if (Object.ReferenceEquals(this, other))
return true;
else if (Object.ReferenceEquals(null, other))
return false;
else
return String.Equals(MyOwnId, other.MyOwnId);
}
public override bool Equals(object obj) {
return Equals(obj as Foo); // do not repeat youself: you've got Equals already
}
public override int GetHashCode() {
// String.GetHashCode is good enough, do not re-invent a wheel
return null == MyOwnId ? 0 : MyOwnId.GetHashCode();
}
}
Or do I also need change my Equals method so it looks the same like my GetHashCode?
You change your Equals to match how you want equality to be resolved. You've done this.
You change your GetHashCode() to key on the same information. In this case:
public override int GetHashCode()
{
return MyOwnId == null ? 0 : MyOwnId.GetHashCode();
}
Incidentally, your Equals(object) is a bit overly-complicated. I would use:
public override bool Equals(object obj)
{
return Equals(obj as Foo);
}
This passes handling the case of obj being null to the specific Equals() (which has to handle it too), deals with obj being something that isn't a Foo by passing that Equals() a null (so false anyway) and passes the handling of the case of obj being something derived from Foo to the more specific too (which again, has to handle that).
The short-cut on ReferenceEquals isn't worth doing here as there's only one field being compared, and its comparison will have the same ReferenceEquals shortcut. You don't though handle foo being a derived type in the specialised Foo. If Foo isn't sealed you should include that:
public bool Equals(Foo foo)
{
return (object)foo != null &&
foo.GetType() == GetType() &&
MyOwnId.Equals(foo.MyOwnId);
}
If Foo is sealed then that GetType() comparison should be omitted.
If the logic of the Equals() was more complicated than this then the likes of:
public bool Equals(Foo foo)
{
if ((object)foo == (object)this)
return true;
return (object)foo != null &&
foo.GetType() == GetType() &&
// Some more complicated logic here.
}
Would indeed be beneficial, but again it should be in the specific overload not the general override.
(Doing a reference-equality check is more beneficial again in == overloads, since they have to consider the possibility of both operands being null so they might as well consider that of them both being the same which implicitly includes that case).
A hash function must have the following properties:
If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two objects do not have to return different values.
The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.
For the best performance, a hash function should generate an even distribution for all input, including input that is heavily clustered. An implication is that small modifications to object state should result in large modifications to the resulting hash code for best hash table performance.
Hash functions should be inexpensive to compute.
The GetHashCode method should not throw exceptions.
See https://msdn.microsoft.com/en-us/library/system.object.gethashcode(v=vs.110).aspx

Enumerable.Distinct - What methods does your custom class have to implement for it to work?

I've implemented every function that MSDN says is necessary, plus some additional comparison interfaces - nothing seems to work. Following is code (optimized for LinqPad).
The resulting output is all 4 items, not 2 like I expect.
Please don't post work arounds as answers - I want to know how Distinct works
void Main()
{
List<NameClass> results = new List<NameClass>();
results.Add(new NameClass("hello"));
results.Add(new NameClass("hello"));
results.Add(new NameClass("55"));
results.Add(new NameClass("55"));
results.Distinct().Dump();
}
// Define other methods and classes here
public class NameClass : Object
, IEquatable<NameClass>
, IComparer<NameClass>
, IComparable<NameClass>
, IEqualityComparer<NameClass>
, IEqualityComparer
, IComparable
{
public NameClass(string name)
{
Name = name;
}
public string Name { get; private set; }
public int Compare(NameClass x, NameClass y)
{
return String.Compare(x.Name, y.Name);
}
public int CompareTo(NameClass other)
{
return String.Compare(Name, other.Name);
}
public bool Equals(NameClass x, NameClass y)
{
return (0 == Compare(x, y));
}
public bool Equals(NameClass other)
{
return (0 == CompareTo(other));
}
public int GetHashCode(NameClass obj)
{
return obj.Name.GetHashCode();
}
public new int GetHashCode()
{
return Name.GetHashCode();
}
public new bool Equals(object a)
{
var x = a as NameClass;
if (null == x) { return false; }
return Equals(x);
}
public new bool Equals(object a, object b)
{
if (null == a && null == b) { return true; }
if (null == a && null != b) { return false; }
if (null != a && null == b) { return false; }
var x = a as NameClass;
var y = b as NameClass;
if (null == x && null == y) { return true; }
if (null == x && null != y) { return false; }
if (null != x && null == y) { return false; }
return x.Equals(y);
}
public int GetHashCode(object obj)
{
if (null == obj) { return 0; }
var x = obj as NameClass;
if (null != x) { return x.GetHashCode(); }
return obj.GetHashCode();
}
public int CompareTo(object obj)
{
if (obj == null) return 1;
NameClass x = obj as NameClass;
if (x == null)
{
throw new ArgumentException("Object is not a NameClass");
}
return CompareTo(x);
}
}
How Distinct works:
There is at least no implementation of Object.GetHashCode() which is used for initial comparison of objects: basic version of Distinct compares (actually puts in dictionary) by Object.GetHashCode first, than if hash code matches by Object.Equals.
To be precise Enumerable.Distinct(this IEnumerable source) uses EqualityComparer<NameClass>.Default to finally check for equality (note that if hash codes don't match it will not reach that portion of the comparison which is why your sample does not work).
The default equality comparer, Default, is used to compare values of the types that implement the IEquatable generic interface.
EqualityComparer.Default in turn actually allows to use class without IEquatable<T> at all falling back directly to Object.Equals:
The Default property checks whether type T implements the System.IEquatable interface and, if so, returns an EqualityComparer that uses that implementation. Otherwise, it returns an EqualityComparer that uses the overrides of Object.Equals and Object.GetHashCode provided by T.
So for basic Distinct to work you just need correct version of Equals/GetHashCode. IEquatable is optional, but must match behavior of GetHashCode in the class.
How to fix:
Your sample have public new int GetHashCode() method, which likely should be public override int GetHashCode() (Same for Equals).
Note that public new int... does not mean "override", but instead "create new version of the method that hides old one". It does not impact callers that call method via pointer to parent object.
Personally I think new should rarely be used in defining methods. Some suggestions when it is useful are covered in Usecases for method hiding using new.
You don't have to implement any interface, just GetHashCode and Equals methods correctly:
public class NameClass
{
public NameClass(string name)
{
Name = name;
}
public string Name { get; private set; }
public override bool Equals(object obj)
{
var other = obj as NameClass;
return other != null && other.Name == this.Name;
}
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
Enumerable.Distinct<TSource> Method:
It uses the default equality comparer, Default, to compare values.
EqualityComparer.Default:
The Default property checks whether type T implements the System.IEquatable<T> interface and, if so, returns an EqualityComparer<T> that uses that implementation. Otherwise, it returns an EqualityComparer<T> that uses the overrides of Object.Equals and Object.GetHashCode provided by T.
IEquatable<T> Interface:
If you implement IEquatable<T>, you should also override the base class implementations of Object.Equals(Object) and GetHashCode so that their behavior is consistent with that of the IEquatable<T>.Equals method.
Overriding methods:
The override modifier is required to extend or modify the abstract or virtual implementation of an inherited method, property, indexer, or event.
So your code should look like this:
public class NameClass : IEquatable<NameClass>
{
public NameClass(string name)
{
Name = name;
}
public string Name { get; private set; }
// implement IEquatable<NameClass>
public bool Equals(NameClass other)
{
return (other != null) && (Name == other.Name);
}
// override Object.Equals(Object)
public override bool Equals(object obj)
{
return Equals(obj as NameClass);
}
// override Object.GetHashCode()
public override GetHashCode()
{
return Name.GetHashCode();
}
}
So, first off, Distinct will, as per it's documentation, use EqualityComparer<T>.Default to compare objects if no custom equality comparer is provided (you provided none).
EqualityComparer<T>.Default, as per its documentation, will look to see if the object implements IEquatable<T>, if it does it will use that implementation of Equals.
Regardless of whether or not the type implements IEquatable<T>, EqualityComparer<T>.Default will use the object.GetHashCode method to get the has code of the object. IEquatable<T>, unfortunately, does not force you to also override the object's GetHashCode implementation, and in your case, while you do implement IEquatable<T>, your code does not override the object's GetHashCode implementation.
As a result of this Distinct is actually using the proper Equals method for your type, but it's using the wrong GetHashCode method. Whenever you're hashing objects and that type has an Equals and GetHashCode implementation that's out of sync problems ensue. What's happening is that in whatever hash based collection it's sending the two "equal" objects to different buckets, so they never even get to the point where their Equals methods are called on each other. If you happened to get lucky and there was a hash collection and the objects were coincidentally sent to the same bucket, then, since the Equals method is what you intended it would actually work, but the odds of that happening are...very low. (In this specific case, about 2/2147483647, or
9.3e-10.
While you do provide a new GetHashCode method in NameClass, it is hiding the object implementation, not overriding it. If you change your GetHashCode implementation to use override rather than new then your code will work.
I just realized I messed up my sample code - my class derives from DependencyObject, not Object. I can't override thew GetHashCode or Equals functions because the DependencyObject class is sealed.

c# List<T>.Contains() Method Returns False

In the code block below I would expect dictCars to contain:
{ Chevy:Camaro, Dodge:Charger }
But, dictCars comes back empty. Because this line returns false each time it's called:
if(myCars.Contains(new Car(Convert.ToInt64(strCar.Split(':')[1]),strCar.Split(':')[2])))
Code block:
public class Car
{
public long CarID { get; set; }
public string CarName { get; set; }
public Car(long CarID, string CarName)
{
this.CarID = CarID;
this.CarName = CarName;
}
}
List<Car> myCars = new List<Car>();
myCars.Add(new Car(0,"Pinto"));
myCars.Add(new Car(2,"Camaro"));
myCars.Add(new Car(3,"Charger"));
Dictionary<string, string> dictCars = new Dictionary<string, string>();
string strCars = "Ford:1:Mustang,Chevy:2:Camaro,Dodge:3:Charger";
String[] arrCars = strCars.Split(',');
foreach (string strCar in arrCars)
{
if(myCars.Contains(new Car(Convert.ToInt64(strCar.Split(':')[1]),strCar.Split(':')[2])))
{
if (!dictCars.ContainsKey(strCar.Split(':')[0]))
{
dictCars.Add(strCar.Split(':')[0], strCar.Split(':')[2]);
}
}
}
return dictCars;
Question: What am I doing wrong with my List.Contains implementation?
Thanks in advance!
You need to tell Contains what makes two Cars equal. By default it will use ReferenceEquals which will only call two objects equal if they are the same instance.
Either override Equals and GetHashCode in your Car class or define an IEqualityComparer<Car> class and pass that to Contains.
If two Cars that have the same CarID are "equal" then the implementation is pretty straightforward:
public override bool Equals(object o)
{
if(o.GetType() != typeof(Car))
return false;
return (this.CarID == ((Car)o).CarID);
}
public override int GetHashCode()
{
return CarID.GetHashCode();
}
Your Car class is a reference type. By default reference types are compared to each other by reference, meaning they are considered the same if they reference the same instance in memory. In your case you want them to be considered equal if they contain the same values.
To change the equality behavior, you need to override Equals and GetHashCode.
If two cars are equal only when ID and Name are equal, the following is one possible implementation of the equality members:
protected bool Equals(Car other)
{
return CarID == other.CarID && string.Equals(CarName, other.CarName);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj))
return false;
if (ReferenceEquals(this, obj))
return true;
var other = obj as Car;
return other != null && Equals(other);
}
public override int GetHashCode()
{
unchecked
{
return (CarID.GetHashCode() * 397) ^
(CarName != null ? CarName.GetHashCode() : 0);
}
}
This implementation has been created automatically by ReSharper.
It takes into account null values and the possibility of sub-classes of Car. Additionally, it provides a useful implementation of GetHashCode.
You can add this code, by implementing IEquatable
public class Car: IEquatable<Car>
{
......
public bool Equals( Car other )
{
return this.CarID == other.CarID && this.CarName == other.CarName;
}
}
Link : http://msdn.microsoft.com/fr-fr/library/vstudio/ms131187.aspx
You are assuming that two Car instances that have the same CarID and CarName are equal.
This is incorrect. By default, each new Car(...) is different from each other car, since they are references to different objects.
There are a few ways to "fix" that:
Use a struct instead of a class for your Car.
Structs inherit ValueType's default implementation of Equals, which compares all fields and properties to determine equality.
Note that in this case, it is recommended that you make your Car struct immutable to avoid common problems with mutable structs.
Override Equals and GetHashCode.
That way, List.Contains will know that you intend Cars with the same ID and Name to be equal.
Use another method instead of List.Contains.
For example, Enumerable.Any allows you to specify a predicate that can be matched:
bool exists = myCars.Any(car => car.ID == Convert.ToInt64(strCar.Split(':')[1])
&& car.Name = strCar.Split(':')[2]);
You need to implement Equals. Most probably as:
public override bool Equals(object obj)
{
Car car = obj as Car;
if(car == null) return false;
return car.CarID == this.CarID && car.CarName == this.CarName;
}
Your car class needs to implement interface IEquatable and define an Equals method, otherwise the contains method is comparing the underlying references.
You need to implement the IEqualityComparer
More information on how to do it can be found here;
http://msdn.microsoft.com/en-us/library/bb339118.aspx
// Custom comparer for the class
class CarComparer : IEqualityComparer<Car>
{
// Products are equal if their names and product numbers are equal.
public bool Equals(Car x, Car y)
{
//Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
//Check whether the properties are equal.
return x.CarID == y.CarID && x.CarName == y.CarName;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(Car car)
{
//Check whether the object is null
if (Object.ReferenceEquals(car, null)) return 0;
//Get hash code for the Name field if it is not null.
string hashCarName = car.CarName == null ? 0 : car.CarName.GetHashCode();
//Get hash code for the ID field.
int hashCarID = car.CarID.GetHashCode();
//Calculate the hash code for the product.
return hashCarName ^ hashCarID;
}
Check for equality;
CarComparer carComp = new CarComparer();
bool blnIsEqual = CarList1.Contains(CarList2, carComp);
A collection can never "contain" a newly newed object which uses the default Object.Equals comparison. (The default comparison is ReferenceEquals, which simply compares instances. This will never be true comparing an existing Car with a new Car())
To use Contains in this way, you will need to either:
Override Car.Equals (and Car.GetHashCode) to specify what it means to be equivalent, or
Implement an IEqualityComparer<Car> to compare the instances and specify that in your call to Contains.
Note the side effect that in the first option, other uses of Car.Equals(Car) will also use this comparison.
Otherwise, you can use Any and specify the comparison yourself (but IMHO this smells a little funny - a Car should know how to compare itself):
if(myCars.Any(c=> c.CarID == Convert.ToInt64(strCar.Split(':')[1]) && c.CarName == strCar.Split(':')[2]))
myCars.Contains(newCar)
myCars.Where(c => c.CarID == newCar.CarID && c.CarName==newCar.CarName).Count() > 0

Overriding == operator. How to compare to null? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do I check for nulls in an ‘==’ operator overload without infinite recursion?
There is probably an easy answer to this...but it seems to be eluding me. Here is a simplified example:
public class Person
{
public string SocialSecurityNumber;
public string FirstName;
public string LastName;
}
Let's say that for this particular application, it is valid to say that if the social security numbers match, and both names match, then we are referring to the same "person".
public override bool Equals(object Obj)
{
Person other = (Person)Obj;
return (this.SocialSecurityNumber == other.SocialSecurityNumber &&
this.FirstName == other.FirstName &&
this.LastName == other.LastName);
}
To keep things consistent, we override the == and != operators, too, for the developers on the team who don't use the .Equals method.
public static bool operator !=(Person person1, Person person2)
{
return ! person1.Equals(person2);
}
public static bool operator ==(Person person1, Person person2)
{
return person1.Equals(person2);
}
Fine and dandy, right?
However, what happens when a Person object is null?
You can't write:
if (person == null)
{
//fail!
}
Since this will cause the == operator override to run, and the code will fail on the:
person.Equals()
method call, since you can't call a method on a null instance.
On the other hand, you can't explicitly check for this condition inside the == override, since it would cause an infinite recursion (and a Stack Overflow [dot com])
public static bool operator ==(Person person1, Person person2)
{
if (person1 == null)
{
//any code here never gets executed! We first die a slow painful death.
}
return person1.Equals(person2);
}
So, how do you override the == and != operators for value equality and still account for null objects?
I hope that the answer is not painfully simple. :-)
Use object.ReferenceEquals(person1, null) or the new is operator instead of the == operator:
public static bool operator ==(Person person1, Person person2)
{
if (person1 is null)
{
return person2 is null;
}
return person1.Equals(person2);
}
I've always done it this way (for the == and != operators) and I reuse this code for every object I create:
public static bool operator ==(Person lhs, Person rhs)
{
// If left hand side is null...
if (System.Object.ReferenceEquals(lhs, null))
{
// ...and right hand side is null...
if (System.Object.ReferenceEquals(rhs, null))
{
//...both are null and are Equal.
return true;
}
// ...right hand side is not null, therefore not Equal.
return false;
}
// Return true if the fields match:
return lhs.Equals(rhs);
}
"!=" then goes like this:
public static bool operator !=(Person lhs, Person rhs)
{
return !(lhs == rhs);
}
Edit
I modified the == operator function to match Microsoft's suggested implementation here.
you could alway override and put
(Object)(person1)==null
I'd imagine this would work, not sure though.
Easier than any of those approaches would be to just use
public static bool operator ==(Person person1, Person person2)
{
EqualityComparer<Person>.Default.Equals(person1, person2)
}
This has the same null equality semantics as the approaches that everyone else is proposing, but it's the framework's problem to figure out the details :)
The final (hypothetical) routine is below. It is very similar to #cdhowie's first accepted response.
public static bool operator ==(Person person1, Person person2)
{
if (Person.ReferenceEquals(person1, person2)) return true;
if (Person.ReferenceEquals(person1, null)) return false; //*
return person1.Equals(person2);
}
Thanks for the great responses!
//* - .Equals() performs the null check on person2
Cast the Person instance to object:
public static bool operator ==(Person person1, Person person2)
{
if ((object)person1 == (object)person2) return true;
if ((object)person1 == null) return false;
if ((object)person2 == null) return false;
return person1.Equals(person2);
}
Cast the Person to an Object and then perform the comparison:
object o1 = (object)person1;
object o2 = (object)person2;
if(o1==o2) //compare instances.
return true;
if (o1 == null || o2 == null) //compare to null.
return false;
//continue with Person logic.
Overloading these operators consistently is pretty hard. My answer to a related question may serve as a template.
Basically, you first need to do a reference (object.ReferenceEquals) test to see if the object is null. Then you call Equals.
cdhowie is on the money with the use of ReferenceEquals, but it's worth noting that you can still get an exception if someone passes null directly to Equals. Also, if you are going to override Equals it's almost always worth implementing IEquatable<T> so I would instead have.
public class Person : IEquatable<Person>
{
/* more stuff elided */
public bool Equals(Person other)
{
return !ReferenceEquals(other, null) &&
SocialSecurityNumber == other.SocialSecurityNumber &&
FirstName == other.FirstName &&
LastName == other.LastName;
}
public override bool Equals(object obj)
{
return Equals(obj as Person);
}
public static bool operator !=(Person person1, Person person2)
{
return !(person1 == person2);
}
public static bool operator ==(Person person1, Person person2)
{
return ReferenceEquals(person1, person2)
|| (!ReferenceEquals(person1, null) && person1.Equals(person2));
}
}
And of course, you should never override Equals and not override GetHashCode()
public override int GetHashCode()
{
//I'm going to assume that different
//people with the same SocialSecurityNumber are extremely rare,
//as optimise by hashing on that alone. If this isn't the case, change this
return SocialSecurityNumber.GetHashCode();
}
It's also worth noting that identity entails equality (that is, for any valid concept of "equality" something is always equal to itself). Since equality tests can be expensive and occur in loops, and since comparing something with itself tends to be quite common in real code (esp. if objects are passed around in several places), it can be worth adding as a shortcut:
public bool Equals(Person other)
{
return !ReferenceEquals(other, null) &&
ReferenceEquals(this, other) ||
(
SocialSecurityNumber == other.SocialSecurityNumber &&
FirstName == other.FirstName &&
LastName == other.LastName
);
}
Just how much of a benefit short-cutting on ReferenceEquals(this, other) is can vary considerably depending on the nature of the class, but whether it is worth while doing or not is something one should always consider, so I include the technique here.

Distinct() returns duplicates with a user-defined type

I'm trying to write a Linq query which returns an array of objects, with unique values in their constructors. For integer types, Distinct returns only one copy of each value, but when I try creating my list of objects, things fall apart. I suspect it's a problem with the equality operator for my class, but when I set a breakpoint, it's never hit.
Filtering out the duplicate int in a sub-expression solves the problem, and also saves me from constructing objects that will be immediately discarded, but I'm curious why this version doesn't work.
UPDATE: 11:04 PM Several folks have pointed out that MyType doesn't override GetHashCode(). I'm afraid I oversimplified the example. The original MyType does indeed implement it. I've added it below, modified only to put the hash code in a temp variable before returning it.
Running through the debugger, I see that all five invocations of GetHashCode return a different value. And since MyType only inherits from Object, this is presumably the same behavior Object would exhibit.
Would I be correct then to conclude that the hash should instead be based on the contents of Value? This was my first attempt at overriding operators, and at the time, it didn't appear that GetHashCode needed to be particularly fancy. (This is the first time one of my equality checks didn't seem to work properly.)
class Program
{
static void Main(string[] args)
{
int[] list = { 1, 3, 4, 4, 5 };
int[] list2 =
(from value in list
select value).Distinct().ToArray(); // One copy of each value.
MyType[] distinct =
(from value in list
select new MyType(value)).Distinct().ToArray(); // Two objects created with 4.
Array.ForEach(distinct, value => Console.WriteLine(value));
}
}
class MyType
{
public int Value { get; private set; }
public MyType(int arg)
{
Value = arg;
}
public override int GetHashCode()
{
int retval = base.GetHashCode();
return retval;
}
public override bool Equals(object obj)
{
if (obj == null)
return false;
MyType rhs = obj as MyType;
if ((Object)rhs == null)
return false;
return this == rhs;
}
public static bool operator ==(MyType lhs, MyType rhs)
{
bool result;
if ((Object)lhs != null && (Object)rhs != null)
result = lhs.Value == rhs.Value;
else
result = (Object)lhs == (Object)rhs;
return result;
}
public static bool operator !=(MyType lhs, MyType rhs)
{
return !(lhs == rhs);
}
}
You need to override GetHashCode() in your class. GetHashCode must be implemented in tandem with Equals overloads. It is common for code to check for hashcode equality before calling Equals. That's why your Equals implementation is not getting called.
Your suspicion is correct,it is the equality which currently just checks the object references. Even your implementation does not do anything extra, change it to this:
public override bool Equals(object obj)
{
if (obj == null)
return false;
MyType rhs = obj as MyType;
if ((Object)rhs == null)
return false;
return this.Value == rhs.Value;
}
In you equality method you are still testing for reference equality, rather than semantic equality, eg on this line:
result = (Object)lhs == (Object)rhs
you are just comparing two object references which, even if they hold exactly the same data, are still not the same object. Instead, your test for equality needs to compare one or more properties of your object. For instance, if your object had an ID property, and objects with the same ID should be considered semantically equivalent, then you could do this:
result = lhs.ID == rhs.ID
Note that overriding Equals() means you should also override GetHashCode(), which is another kettle of fish, and can be quite difficult to do correctly.
You need to implement GetHashCode().
It seems that a simple Distinct operation can be implemented more elegantly as follows:
var distinct = items.GroupBy(x => x.ID).Select(x => x.First());
where ID is the property that determines if two objects are semantically equivalent. From the confusion here (including that of myself), the default implementation of Distinct() seems to be a little convoluted.
I think MyType needs to implement IEquatable for this to work.
The other answers have pretty much covered the fact that you need to implement Equals and GetHashCode correctly, but as a side note you may be interested to know that anonymous types have these values implemented automatically:
var distinct =
(from value in list
select new {Value = value}).Distinct().ToArray();
So without ever having to define this class, you automatically get the Equals and GetHashCode behavior you're looking for. Cool, eh?

Categories

Resources