Overridding Equals and GetHash - c#

I have read that when you override Equals on an class/object you need to override GetHashCode.
public class Person : IEquatable<Person>
{
public int PersonId { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public Person(int personId, string firstName, string lastName)
{
PersonId = personId;
FirstName = firstName;
LastName = lastName;
}
public bool Equals(Person obj)
{
Person p = obj as Person;
if (ReferenceEquals(null, p))
return false;
if (ReferenceEquals(this, p))
return true;
return Equals(p.FirstName, FirstName) &&
Equals(p.LastName, LastName);
}
}
Now given the following:
public static Dictionary<Person, Person> ObjDic= new Dictionary<Person, Person>();
public static Dictionary<int, Person> PKDic = new Dictionary<int, Person>();
Will not overridding the GetHashCode affect both of the Dictionary's above? What I am basically asking is how is GetHashCode generated? IF I still look for an object in PKDic will I be able to find it just based of the PK. If I wanted to override the GetHashCode how would one go about doing that?

You should always override GetHashCode.
A Dictionary<int, Person> will function without GetHashCode, but as soon as you call LINQ methods like Distinct or GroupBy, it will stop working.
Note, by the way, that you haven't actually overridden Equals either.
The IEquatable.Equals method is not the same as the virtual bool Equals(object obj) inherited from Object. Although the default IEqualityComparer<T> will use the IEquatable<T> interface if the class implements it, you should still override Equals, because other code might not.
In your case, you should override Equals and GetHashCode like this:
public override bool Equals(object obj) { return Equals(obj as Person); }
public override int GetHashCode() {
return FirstName.GetHashCode() ^ LastName.GetHashCode();
}

In your scenario, not overriding GetHashCode on your type will affect only the first dictionary, as the key is what's used for hashing, not the value.
When looking for the presence of a key, the Dictionary<TKey,TValue> will use the hash code to find out if any keys could be equal. It's important to note that a hash is a value that can determine if two things could be equal or very likely are equal. A hash, strictly speaking cannot determine if two items are equal.
Two equal objects are required to return the same hash code. However, two non-equal objects are not required to return different hash codes. In other words, if the hash codes don't match, you're guaranteed that the objects are not equal. If the hash codes do match, then the objects could be equal.
Because of this, the Dictionary will only call Equals on two objects if their hash codes match.
As to "how to override GetHashCode", that's a complicated question. Clasically, a hashing algorithm should provide a balance between even distribution of the codes over the set of values with a low collision rate (a collision is when two non-equal objects produce the same code). This is a simple thing to describe and a very difficult thing to accomplish. It's easy to do one or the other, but hard to balance them.
From a practical perspective (meaning disregarding performance), you could just XOR all of the characters of the first and last names (or even use their respective hash codes, as Joel suggests) as your hash code. This will give a low degree of collision, but won't result in a terribly even distribution. Unless you're dealing with very large sets or very frequent lookups, it won't be an issue.

Your GetHashCode() and Equals() methods should look like this:
public int GetHashCode()
{
return (FirstName.GetHashCode()+1) ^ (LastName.GetHashCode()+2);
}
public bool Equals(Object obj)
{
Person p = obj as Person;
if (p == null)
return false;
return this.Firstname == p.FirstName && this.LastName == p.Lastname;
}
The rule is that GetHashCode() must use exactly the fields used in determining equality for the .Equals() method.
As for the dictionary part of your question, .GetHashCode() is used for determining the key in a dictionary. However, this has a different impact for each of the dictionarys in your question.
The dictionary with the int key (presumably your person ID) will use the GetHashCode() for the integer, while the other dictionary (ObjDic) will use the GetHashCode() from your Person object. Therefore PKDic will always differentiate between two people with different IDs, while ObjDic might treat two people with different IDs but the same first and last names as the same record.

Here is how I would do it. Since it is common for two different people to have exactly the same name it makes more sense to use a unique identifier (which you already have).
public class Person : IEquatable<Person>
{
public override int GetHashCode()
{
return PersonId.GetHashCode();
}
public override bool Equals(object obj)
{
var that = obj as Person;
if (that != null)
{
return Equals(that);
}
return false;
}
public bool Equals(Person that)
{
return this.PersonId == that.PersonId;
}
}
To answer your specific question: This only matters if you are using Person as a key in an IDictionary collection. For example, Dictionary<Person, string> or SortedDictionary<Person, Foo>, but not Dictionary<int, Person>.

Related

ExceptWith in HashSet for complex types

I have HashSet of my custom class:
public class Vertex
{
public string Name;
public override bool Equals(object obj)
{
var vert = obj as Vertex;
if (vert !=null)
{
return Name.Equals(vert.Name, StringComparison.InvariantCulture);
}
return false;
}
}
And now I have tow hashsets
HashSet<Vertex> hashSet1 = new HashSet<Vertex>();
HashSet<Vertex> hashSet1 = new HashSet<Vertex>();
And now I'd like to have in hashSet1 only Vertexes that are not in hashSet2
So I use ExceptWith method
hashSet1.ExceptWith(hashSet2);
But this doesn't work.
I suppose that this doesn't work because I have complex type.
So the question is: is there some interface required to be implemented in Vertex class to make this thing work?
I know that while creation of HashSet I can pass a EqualityComparer but it seems to me that it would be more elegant to implement some comparing interface method in Vertex class.
Is it possible or I just doesn't understand sth?
Thanks.
When overriding Equals you should also override GetHashCode. HashSet (and other hashing structures like Dictionary) will first calculate a hash code for your objects to locate them in tne structure before comparing elements with Equals.
public override int GetHashCode()
{
return StringComparer.InvariantCulture.GetHashCode(this.Name);
}
You don't have to implement any interface (although IEquatable<T>) is encouraged. When you create a hash-set without specifying an equality-comparer, it defaults to using EqualityComparer<T>.Default, which asks the objects themselves to compare themselves to each other (special-casing null references).
However, in your case, your equality contract is broken since you haven't overriden GetHashCode. Here's how I would fix your type:
public class Vertex : IEquatable<Vertex>
{
public string Name { get; private set; }
public Vertex(string name)
{
Name = name;
}
public override int GetHashCode()
{
return StringComparer.InvariantCulture.GetHashCode(Name);
}
public override bool Equals(object obj)
{
return Equals(obj as Vertex);
}
public bool Equals(Vertex obj)
{
return obj != null && StringComparer.InvariantCulture.Equals(Name, obj.Name);
}
}
Would you mind overriding the .GetHashCode()too?
Here's the reference.
You have to override GetHashCode with Equals overriding.
Object.Equals Method:
Types that override Equals(Object) must also override GetHashCode; otherwise, hash tables might not work correctly.

How to get the distinct data from a list?

I want to get distinct list from list of persons .
List<Person> plst = cl.PersonList;
How to do this through LINQ. I want to store the result in List<Person>
Distinct() will give you distinct values - but unless you've overridden Equals / GetHashCode() you'll just get distinct references. For example, if you want two Person objects to be equal if their names are equal, you need to override Equals/GetHashCode to indicate that. (Ideally, implement IEquatable<Person> as well as just overriding Equals(object).)
You'll then need to call ToList() to get the results back as a List<Person>:
var distinct = plst.Distinct().ToList();
If you want to get distinct people by some specific property but that's not a suitable candidate for "natural" equality, you'll either need to use GroupBy like this:
var people = plst.GroupBy(p => p.Name)
.Select(g => g.First())
.ToList();
or use the DistinctBy method from MoreLINQ:
var people = plst.DistinctBy(p => p.Name).ToList();
Using the Distinct extension method will return an IEnumerable which you can then do a ToList() on:
List<Person> plst = cl.PersonList.Distinct().ToList();
You can use Distinct method, you will need to Implement IEquatable and override equals and hashcode.
public class Person : IEquatable<Person>
{
public string Name { get; set; }
public int Code { get; set; }
public bool Equals(Person other)
{
//Check whether the compared object is null.
if (Object.ReferenceEquals(other, null)) return false;
//Check whether the compared object references the same data.
if (Object.ReferenceEquals(this, other)) return true;
//Check whether the person' properties are equal.
return Code.Equals(other.Code) && Name.Equals(other.Name);
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public override int GetHashCode()
{
//Get hash code for the Name field if it is not null.
int hashPersonName = Name == null ? 0 : Name.GetHashCode();
//Get hash code for the Code field.
int hashPersonCode = Code.GetHashCode();
//Calculate the hash code for the person.
return hashPersonName ^ hashPersonCode;
}
}
var distinctPersons = plst.Distinct().ToList();

Generate hash of object consistently

I'm trying to get a hash (md5 or sha) of an object.
I've implemented this:
http://alexmg.com/post/2009/04/16/Compute-any-hash-for-any-object-in-C.aspx
I'm using nHibernate to retrieve my POCOs from a database.
When running GetHash on this, it's different each time it's selected and hydrated from the database. I guess this is expected, as the underlying proxies will change.
Anyway,
Is there a way to get a hash of all the properties on an object, consistently each time?
I've toyed with the idea of using a StringBuilder over this.GetType().GetProperties..... and creating a hash on that, but that seems inefficient?
As a side note, this is for change-tracking these entities from one database (RDBMS) to a NoSQL store
(comparing hash values to see if objects changed between rdbms and nosql)
If you're not overriding GetHashCode you just inherit Object.GetHashCode. Object.GetHashCode basically just returns the memory address of the instance, if it's a reference object. Of course, each time an object is loaded it will likely be loaded into a different part of memory and thus result in a different hash code.
It's debatable whether that's the correct thing to do; but that's what was implemented "back in the day" so it can't change now.
If you want something consistent then you have to override GetHashCode and create a code based on the "value" of the object (i.e. the properties and/or fields). This can be as simple as a distributed merging of the hash codes of all the properties/fields. Or, it could be as complicated as you need it to be. If all you're looking for is something to differentiate two different objects, then using a unique key on the object might work for you.If you're looking for change tracking, using the unique key for the hash probably isn't going to work
I simply use all the hash codes of the fields to create a reasonably distributed hash code for the parent object. For example:
public override int GetHashCode()
{
unchecked
{
int result = (Name != null ? Name.GetHashCode() : 0);
result = (result*397) ^ (Street != null ? Street.GetHashCode() : 0);
result = (result*397) ^ Age;
return result;
}
}
The use of the prime number 397 is to generate a unique number for a value to better distribute the hash code. See http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/ for more details on the use of primes in hash code calculations.
You could, of course, use reflection to get at all the properties to do this, but that would be slower. Alternatively you could use the CodeDOM to generate code dynamically to generate the hash based on reflecting on the properties and cache that code (i.e. generate it once and reload it next time). But, this of course, is very complex and might not be worth the effort.
An MD5 or SHA hash or CRC is generally based on a block of data. If you want that, then using the hash code of each property doesn't make sense. Possibly serializing the data to memory and calculating the hash that way would be more applicable, as Henk describes.
If this 'hash' is solely used to determine whether entities have changed then the following algorithm may help (NB it is untested and assumes that the same runtime will be used when generating hashes (otherwise the reliance on GetHashCode for 'simple' types is incorrect)):
public static byte[] Hash<T>(T entity)
{
var seen = new HashSet<object>();
var properties = GetAllSimpleProperties(entity, seen);
return properties.Select(p => BitConverter.GetBytes(p.GetHashCode()).AsEnumerable()).Aggregate((ag, next) => ag.Concat(next)).ToArray();
}
private static IEnumerable<object> GetAllSimpleProperties<T>(T entity, HashSet<object> seen)
{
foreach (var property in PropertiesOf<T>.All(entity))
{
if (property is int || property is long || property is string ...) yield return property;
else if (seen.Add(property)) // Handle cyclic references
{
foreach (var simple in GetAllSimpleProperties(property, seen)) yield return simple;
}
}
}
private static class PropertiesOf<T>
{
private static readonly List<Func<T, dynamic>> Properties = new List<Func<T, dynamic>>();
static PropertiesOf()
{
foreach (var property in typeof(T).GetProperties())
{
var getMethod = property.GetGetMethod();
var function = (Func<T, dynamic>)Delegate.CreateDelegate(typeof(Func<T, dynamic>), getMethod);
Properties.Add(function);
}
}
public static IEnumerable<dynamic> All(T entity)
{
return Properties.Select(p => p(entity)).Where(v => v != null);
}
}
This would then be useable like so:
var entity1 = LoadEntityFromRdbms();
var entity2 = LoadEntityFromNoSql();
var hash1 = Hash(entity1);
var hash2 = Hash(entity2);
Assert.IsTrue(hash1.SequenceEqual(hash2));
GetHashCode() returns an Int32 (not an MD5).
If you create two objects with all the same property values they will not have the same Hash if you use the base or system GetHashCode().
String is an object and an exception.
string s1 = "john";
string s2 = "john";
if (s1 == s2) returns true and will return the same GetHashCode()
If you want to control equality comparison of two objects then you should override the GetHash and Equality.
If two object are the same then they must also have the same GetHash(). But two objects with the same GetHash() are not necessarily the same. A comparison will first test the GetHash() and if it gets a match there it will test the Equals. OK there are some comparisons that go straight to Equals but you should still override both and make sure two identical objects produce the same GetHash.
I use this for syncing a client with the server. You could use all the Properties or you could have any Property change change the VerID. The advantage here is a simpler quicker GetHashCode(). In my case I was resetting the VerID with any Property change already.
public override bool Equals(Object obj)
{
//Check for null and compare run-time types.
if (obj == null || !(obj is FTSdocWord)) return false;
FTSdocWord item = (FTSdocWord)obj;
return (OjbID == item.ObjID && VerID == item.VerID);
}
public override int GetHashCode()
{
return ObjID ^ VerID;
}
I ended up using ObjID alone so I could do the following
if (myClientObj == myServerObj && myClientObj.VerID <> myServerObj.VerID)
{
// need to synch
}
Object.GetHashCode Method
Two objects with the same property values. Are they equal? Do they produce the same GetHashCode()?
personDefault pd1 = new personDefault("John");
personDefault pd2 = new personDefault("John");
System.Diagnostics.Debug.WriteLine(po1.GetHashCode().ToString());
System.Diagnostics.Debug.WriteLine(po2.GetHashCode().ToString());
// different GetHashCode
if (pd1.Equals(pd2)) // returns false
{
System.Diagnostics.Debug.WriteLine("pd1 == pd2");
}
List<personDefault> personsDefault = new List<personDefault>();
personsDefault.Add(pd1);
if (personsDefault.Contains(pd2)) // returns false
{
System.Diagnostics.Debug.WriteLine("Contains(pd2)");
}
personOverRide po1 = new personOverRide("John");
personOverRide po2 = new personOverRide("John");
System.Diagnostics.Debug.WriteLine(po1.GetHashCode().ToString());
System.Diagnostics.Debug.WriteLine(po2.GetHashCode().ToString());
// same hash
if (po1.Equals(po2)) // returns true
{
System.Diagnostics.Debug.WriteLine("po1 == po2");
}
List<personOverRide> personsOverRide = new List<personOverRide>();
personsOverRide.Add(po1);
if (personsOverRide.Contains(po2)) // returns true
{
System.Diagnostics.Debug.WriteLine("Contains(p02)");
}
}
public class personDefault
{
public string Name { get; private set; }
public personDefault(string name) { Name = name; }
}
public class personOverRide: Object
{
public string Name { get; private set; }
public personOverRide(string name) { Name = name; }
public override bool Equals(Object obj)
{
//Check for null and compare run-time types.
if (obj == null || !(obj is personOverRide)) return false;
personOverRide item = (personOverRide)obj;
return (Name == item.Name);
}
public override int GetHashCode()
{
return Name.GetHashCode();
}
}

GetHashCode() for OrdinalIgnoreCase-dependent string classes

public class Address{
public string ContactName {get; private set;}
public string Company {get; private set;}
//...
public string Zip {get; private set;}
}
I'd like to implement a notion of distint addresses, so I overrode Equals() to test for case-insensitive equality in all of the fields (as these are US addresses, I used Ordinal instead of InvariantCulture for maximum performance):
public override bool Equals(Object obj){
if (obj == null || this.GetType() != obj.GetType())
return false;
Address o = (Address)obj;
return
(string.Compare(this.ContactName, o.ContactName, StringComparison.OrdinalIgnoreCase) == 0) &&
(string.Compare(this.Company, o.Company, StringComparison.OrdinalIgnoreCase) == 0)
// ...
(string.Compare(this.Zip, o.Zip, StringComparison.OrdinalIgnoreCase) == 0)
}
I'd like to write a GetHashCode() similarly like so (ignore the concatenation inefficiency for the moment):
public override int GetHashCode(){
return (this.contactName + this.address1 + this.zip).ToLowerOrdinal().GetHashCode();
}
but that doesn't exist. What should I use instead? Or should I just use InvariantCulture in my Equals() method?
(I'm thinking .ToLowerInvariant().GetHashCode(), but I'm not 100% sure that InvariantCulture can't decide that an identical character (such as an accent) has a different meaning in another context.)
Whatever string comparison method you use in Equals(), it makes sense to use the same in GetHashCode().
There's no need to create temporary strings just to calculate hash codes. For StringComparison.OrdinalIgnoreCase, use StringComparer.OrdinalIgnoreCase.GetHashCode()
Then you need to combine multiple hash codes into one. XOR should be ok (because it's unlikely that one person's zip code is another's contact name). However purists might disagree.
public override int GetHashCode()
{
return StringComparer.OrdinalIgnoreCase.GetHashCode(ContactName) ^
StringComparer.OrdinalIgnoreCase.GetHashCode(Company) ^
// ...
StringComparer.OrdinalIgnoreCase.GetHashCode(Zip);
}
Having said all that, I'd question whether it's sensible to use a composite structure like Address as the key to a dictionary. But the principle holds for identity-type strings.
Two unequal objects can have the same hashcode. Though two equal objects should never have different hashcodes. If you use InvariantCulture for your hashcode it will still be correct as far as the contract for Equals goes if it's implemented in terms of OrdinalIgnoreCase.
From the documentation on StringComparer.OrdinalIgnoreCase (emphasis mine):
http://msdn.microsoft.com/en-us/library/system.stringcomparer.ordinalignorecase.aspx
The StringComparer returned by the OrdinalIgnoreCase property treats
the characters in the strings to compare as if they were converted to
uppercase using the conventions of the invariant culture, and
then performs a simple byte comparison that is independent of
language. This is most appropriate when comparing strings that are
generated programmatically or when comparing case-insensitive
resources such as paths and filenames.
Now you can use System.HashCode
public class Address
{
public string ContactName { get; private set; }
public string Company { get; private set; }
// ...
public string Zip { get; private set; }
public override bool Equals(object obj)
{
return
obj is Address address &&
string.Equals(ContactName, address.ContactName, StringComparison.OrdinalIgnoreCase) &&
string.Equals(Company, address.Company, StringComparison.OrdinalIgnoreCase) &&
// ...
string.Equals(Zip, address.Zip, StringComparison.OrdinalIgnoreCase);
}
public override int GetHashCode()
{
var hash = new HashCode();
hash.Add(ContactName, StringComparer.OrdinalIgnoreCase);
hash.Add(Company, StringComparer.OrdinalIgnoreCase);
// ...
hash.Add(Zip, StringComparer.OrdinalIgnoreCase);
return hash.ToHashCode();
}
}

Overriding the Equals and GetHashCode of a type, which has 'dibs'?

This question and Jon's answer made me aware this even existed, so I got curious and launched Visual Studio.
I followed along one example of the MSDN page, and then I created my own little example. It's as follows:
public class Person : IEquatable<Person>
{
public string IdNumber { get; set; }
public string Name { get; set; }
public bool Equals(Person otherPerson)
{
if (IdNumber == otherPerson.IdNumber)
return true;
else
return false;
}
public override bool Equals(object obj)
{
if (obj == null)
return base.Equals(obj);
if (!(obj is Person))
throw new InvalidCastException("The Object isn't of Type Person.");
else
return Equals(obj as Person);
}
public override int GetHashCode()
{
return IdNumber.GetHashCode();
}
public static bool operator ==(Person person1, Person person2)
{
return person1.Equals(person2);
}
public static bool operator !=(Person person1, Person person2)
{
return (!person1.Equals(person2));
}
}
So I have a couple of questions:
If the Equals method does a good job at handling my custom equality, why do I have to override the GetHashCode method as well?
When comparing something like below, which comparer is used, the Equals or the GetHashCode?
.
static void Main(string[] args)
{
Person sergio = new Person() { IdNumber = "1", Name = "Sergio" };
Person lucille = new Person() { IdNumber = "2", Name = "Lucille" };
List<Person> people = new List<Person>(){
sergio,
lucille
};
Person lucille2 = new Person() { IdNumber = "2", Name = "Lucille" };
if (people.Contains(lucille2))
{
Console.WriteLine("Already exists.");
}
Console.ReadKey();
}
What exactly do the operator method do? It looks like some sort of voodoo black magic going on there.
If the Equals method does a good job at handling my custom equality, why do I have to override the GetHashCode method as well?
This allows your type to be used in collections that work via hashing, such as being the key in a Dictionary<T, U>, or storing in a HashSet<T>.
When comparing something like below, which comparer is used, the Equals or the GetHashCode?
GetHashCode is not used for comparisons - only for hashing operations. Equals is always used.
What exactly do the operator method do? It looks like some sort of voodoo black magic going on there.
This allows you to directly use == on two instances of your type. Without this, you'll be comparing by reference if your type is a class, not by the values within your type.
The purpose of GetHashCode is to balance a hash table, not to determine equality. When looking up a member of a hash table the hash bucket checked is determined by the hash code, and then whether the object is in the bucket or not is determined by equality. That's why GetHashCode has to agree with equality.
For more details see my article on the subject:
http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/
GetHashCode and Equals are two very different things. Equals determines equality. GetHashCode returns a hashcode suitable for a hash map, but does not guarantee equality. Therefore, in equality matters, Equals will be the method that determines equality.
GetHashCodeis intended for hash sets, such as a Dictionary. When looking up an item in a dictionary, you will match the entry on the hashcode, then on Equals.
GetHashCode is used by MSDN only when you use a hash table.
If you need equality you only care about Equals. MSDN suggests to also implement GetHashCode because sooner or later you might use your objects in a hash like object (hash table, hash map etc).
Imagine the objects have 1000 bytes and you need a fast way to determine equality between 2 objects - you calculate the hash key (via GetHashCode). If keys do not match, the objects are different. If they do match, you cannot say for sure if they are indeed equal, you need to verify with Equal() - which is more expensive.
Hash table collections use this ideea.

Categories

Resources