When do we do GetHashCode() for a Dictionary? - c#

I have used Dictionary(TKey, TValue) for many purposes. But I haven't encountered any scenario to implement GetHashCode() which I believe is because my keys were of primary types like int and string.
I am curious to know the scenarios (real world examples) when one should use a custom object for key and thus implement methods GetHashCode() Equals() etc.
And, does using a custom object for key necessitate implementing these functions?

You should override Equals and GetHashCode whenever the default Object.Equals (tests for reference equality) will not suffice. This happens, for example, when the type of your key is a custom type and you want two keys to be considered equal even in cases when they are not the same instance of the custom type.
For example, if your key is as simple as
class Point {
public int X { get; set; }
public int Y { get; set; }
}
and you want two Points two be considered equal if their Xs are equal and their Ys are equal then you will need to override Equals and GetHashCode.

Just to make it clear: There is one important thing about Dictionary<TKey, TValue> and GetHashCode(): Dictionary uses GetHashCode to determine if two keys are equal i.e. if <TKey> is of custom type you should care about implementing GetHashCode() carefully. As Andrew Hare pointed out this is easy, if you have a simple type that identifies your custom object unambiguously. In case you have a combined identifier, it gets a little more complicated.
As example consider a complex number as TKey. A complex number is determined by its real and its imaginary part. Both are of simple type e.g. double. But how would you identify if two complex numbers are equal? You implement GetHashCode() for your custom complex type and combine both identifying parts.
You find further reading on the latter here.
UPDATE
Based on Ergwun's comment I checked the behavior of Dictionary<TKey, TValue>.Add with special respect to TKey's implementation of Equals(object) and GetHashCode(). I
must confess that I was rather surprised by the results.
Given two objects k1 and k2 of type TKey, two arbitrary objects v1 and v2 of type TValue, and an empty dictionary d of type Dictionary<TKey, TValue>, this is what happens when adding v1 with key k1 to d first and v2 with key k2 second (depending on the implementation of TKey.Equals(object) and TKey.GetHashCode()):
k1.Equals(k2) k1.GetHashCode() == k2.GetHashCode() d.Add(k2, v2)
false false ok
false true ok
true false ok
true true System.ArgumentException
Conclusion: I was wrong as I originally thought the second case (where Equals returns false but both key objects have same hash code) would raise an ArgumentException. But as the third case shows dictionary in some way does use GetHashCode(). Anyway it seems to be good advice that two objects that are the same type and are equal must return the same hash code to ensure that instances Dictionary<TKey, TValue> work correctly.

You have two questions here.
When do you need to implement
GetHashCode()
Would you ever use an object for a dictionary key.
Lets start with 1. If you are writing a class that might possibly be used by someone else, you will want to define GetHashCode() and Equals(), when reference Equals() is not enough. If you're not planning on using it in a dictionary, and it's for your own usage, then I see no reason to skip GetHashCode() etc.
For 2), you should use an object anytime you have a need to have a constant time lookup from an object to some other type. Since GetHashCode() returns a numeric value, and collections store references, there is no penalty for using an Object over an Int or a string (remember a string is an object).

One example is when you need to create a composite key (that is a key comprised of more that one piece of data). That composite key would be a custom type that would need to override those methods.
For example, let's say that you had an in-memory cache of address records and you wanted to check to see if an address was in cache to save an expensive trip to the database to retrieve it. Let's also say that addresses are unique in terms of their street 1 and zip code fields. You would implement your cache with something like this:
class AddressCacheKey
{
public String StreetOne { get; set; }
public String ZipCode { get; set; }
// overrides for Equals and GetHashCode
}
and
static Dictionary<AddressCacheKey,Address> cache;
Since your AddressCacheKey type overrides the Equals and GetHashCode methods they would be a good candidate for a key in the dictionary and you would be able to determine whether or not you needed to take a trip to the database to retrieve a record based on more than one piece of data.

Related

Comparing an Entity

I've got an entity that I build, I take an instantiated entity and a modified entity. This allows me to hold the initial data, to compare against the modified data. The question, is what would be the ideal approach? Should I implement IEquatable as an override on Object.Equals or implement ICompare? My original implementation was:
var properties = typeof(TEntity).GetProperties();
foreach(var property in properties)
{
var initialEntity = original.GetType().GetProperty(property.Name).GetValue(original, null);
var modifiedEntity = userChange.GetType().GetProperty(property.Name).GetValue(userChange, null);
if(initialEntity.Equals(modifiedEntity) == false && !ignore.Contains(property.Name))
{
// Do Something
}
}
My understanding was that it would return a boolean, also in this instance it would compare on Value Equality, I'm assuming though it is comparing based on reference equality.
Because it never distinguishes, it remains equal under all circumstances.
The simplest answer:
If you need to test equality, implement IEquatable<T> and override Equals() and GetHashCode()
If you need to sort objects, implement IComparable<T>
The default implementation of Object.Equals() determines if the memory location used by one object is the same as another object. This is essentially what Object.ReferenceEquals(obj1, obj2) does, but dot net needs you to tell it how to determine if two objects you create are equivalent.
Additionally, the default implementation of Object.GetHashCode() is the 32 bit address (or portion of an address) of where the object is located in memory. Unless you override it to generate a hash code that is a function of everything you compared in your Equals() method, you will get unexpected results when you attempt to store it in a hash set or use it as a dictionary key.
You may need to implement both, but it looks like in your case IEquatable<T> is the most pressing need.

C# .NET 4.5 how to get list of unique objects based on `GetHashCode`

I have a IEnumerable of objects that have redefined GetHashCode method. I assumed that if I add those objects to HashSet<T>, it would hold only the unique objects. But it doesn't:
var set = new HashSet<SomeObject>();
Count = 0
set.Add(first);
true
set.Add(second);
true
set.Count
2
first.GetHashCode()
-927637658
second.GetHashCode()
-927637658
So how could I reduce my IEnumerable structure of objects to those that are unique based on their GetHashCode() value.
Although I don't know if this helps in any way:
public class SomeObject
{
...
public string GetAggregateKey()
{
var json = ToJson();
json.Property("id").Remove();
return json.ToString(); // without the `id`, the json string of two separate objects with same content could be the same
}
override public int GetHashCode()
{
// two equal strings have same hash code
return GetAggregateKey().GetHashCode();
}
...
}
It is not enough to only have a GetHashCode method.
The GetHashCode method is used to quickly figure out if there are potential candidates already in the hashset (or dictionary):
If no existing object in the hashset has the same hash code, the new one is not a duplicate
If any existing object(s) in the hashset has the same hash code, the new one is a potential duplicate
To figure out if it is just a potential duplicate or an actual duplicate, Equals is used.
If you haven't implemented that then the object.Equals method will be used, which is simply comparing references. Two distinct objects will thus never be equal, even though they may both have the same property values and the same hash code.
The solution: Implement Equals with the same rules as the GetHashCode, or provide a IEqualityComparer<T> implementation to your hashset.
Have a look at the Reference Source for HashSet:
This line (960, and those around it) is what you're looking for:
if (m_slots[i].hashCode == hashCode && m_comparer.Equals(m_slots[i].value, value))
The hash of the object is only used to decide which bucket the object goes into. If Equals returns false for the two objects, the new one will still be inserted.

Why is Equals() being not called for the all objects while adding to collection

I have a type which I am using as key in the IDictionary. The type is as following
public class Employee
{
public string Name { get; set; }
public int ID { get; set; }
public override bool Equals(object obj)
{
Employee emp = obj as Employee;
if (emp != null)
return emp.Name.Equals(this.Name);
return false;
}
public override int GetHashCode()
{
return this.Name.GetHashCode();
}
}
Now I have created a dictionary as following in my main as following
IDictionary<Employee, int> empCollection = new Dictionary<Employee, int>();
Employee emp1 = new Employee() { Name = "abhi", ID = 1 };
Employee emp2 = new Employee() { Name = "vikram", ID = 2 };
Employee emp3 = new Employee() { Name = "vikram", ID = 3 };
empCollection.Add(emp1, 1);
empCollection.Add(emp2, 2);
empCollection.Add(emp3, 3);
Now while debugging I found out that when emp1 is added to the collection only GetHashCode method is called of the key type, after that when emp2 is added to the collection only GetHashCode method is called again but in the case of emp3 both GetHashCode and Equals methods are called.
May be it looks too naive being asking this question but why isn't Equals method not called when eqImp2 object is added to collection. What is happening inside. Please explain.
The dictionary and all other similar containers use the hashcode as a quick-and-dirty check: different hashcodes mean that two objects are definitely not equal; identical hashcodes do not mean anything. The documentation of GetHashCode specifies this behavior by saying
If two objects compare as equal, the GetHashCode method for each
object must return the same value. However, if two objects do not
compare as equal, the GetHashCode methods for the two object do not
have to return different values.
Your emp1 and emp2 generate different hashcodes, so the dictionary does not need to run Equals; it already knows they are not equal. On the other hand, emp2 and emp3 generate the same hashcode so the dictionary must call Equals to definitely determine if they are indeed equal, or if the identical hashcode was just the result of chance.
emp2 and emp3 have the same key. This will cause a "key collision" in the dictionary. It first called GetHashCode() and determined the hash codes were the same. It then ensures they're the same by calling Equals(). The code from Dictionary is:
int num = this.comparer.GetHashCode(key) & 2147483647;
...
if (this.entries[i].hashCode == num && this.comparer.Equals(this.entries[i].key, key))
Obviously, if the hashcodes don't match, it never has to call Equals.
You should get a tool like ILSpy and then you can look at the code and find the answer yourself.
In your example the GetHashCode looks at the Name hash code. emp3 has the same name as emp2 , ("vikram"). They are equal given the hash code so it further looks using Equals.
If you continue this experiment, you'll observe some behavior which is specific to the Dictionary<TKey, TValue> implementation, and some behavior that is required due to the way you implemented GetHashCode.
First, it's important to understand the role of GetHashCode and Equals when comparing objects for equality. Additional information is available on this question, but I'll repeat the basic rules here:
The Equals method establishes exactly which objects are equal and which objects are not. All necessary checks need to be performed in this method for a final determination before returning.
A hash code is a value calculated from the value of your object. Typically it is much smaller than the original object (in our case the hash code is a 4 byte integer) and not necessarily unique. However it is much faster to compute and compare to each other than the original objects themselves.
When hash codes do not need to be unique, different hash codes indicate different objects (i.e. Equals will definitely return false), but equal hash codes do not mean anything (i.e. Equals could return true or false).
Collections which associate values with a key object (e.g. IDictionary<TKey, TValue> in .NET, or Map<K, V> in Java) take advantage of the hash codes to improve implementation efficiency. However, since the documentation for Object.GetHashCode specifically does not require the results to be unique, these collections cannot rely on the hash codes alone for proper functionality. When two objects have the same hash code, only a call to Equals can distinguish them. The case you describe for the insertion of emp3 falls into this case: the [IDictionary<TKey, TValue>.Add] method needs to throw an ArgumentException if you are trying to insert the same value, and only a call to Equals can determine if the new key emp3 is equal to the previously inserted emp3.
Additional implementation characteristics
The particular collection implementation may result in more calls to GetHashCode than you anticipate. For example, when the internal storage of a hash table is resized, an implementation might call GetHashCode for every object stored in the collection. Collections based on a binary- or B-tree might only call GetHashCode once (if the results are cached in the tree structure), or might need to call GetHashCode for multiple objects during every insertion or lookup operation (if the results are not cached).
Sometimes hash table implementations need to call GetHashCode for multiple objects, or perhaps even Equals for objects with different hash codes due to the way they use modulo arithmetic to place keys into "buckets". The specific characteristics of this vary from one implementation to the next.
That is because GetHashCode is a shortcut.
C# will first call GetHashCode which is supposed to be fast executing.
If two objects have different HashCodes then there is no need to call the, assumingly, more expensive Equals method.
Only if they have the same HashCode then it will call Equals. That is because HashCode is not guaranteed to be unique

C# dictionary equality requirement

Do the keys of a Dictionary need to be comparable with equality?
For example
Class mytype
{
public bool equals(mytype other)
{
return ...;
}
}
In my case they won't be equal unless they are the same instance.
If I need to implement equality should I have a large numeric value that increments with every new instance of mytype created?
If your classes are only equal if they are same instance, then you don't need to do anything to use them in a Dictionary. Classes (reference types) are considered equal if and only if the refer to the same object.
From the documentation of GetHashCode
For derived classes of Object, the GetHashCode method can delegate to the Object.GetHashCode implementation, if and only if that derived class defines value equality to be reference equality and the type is not a value type.
Which seems to be true in your case. As a rule of thumb, if you override Equal you need to override GetHashCode as well but this is not necessary in your case as the default is what you are looking for.
By default, equality is based on the instance. Two separate instances are never equal. You can only change that by providing your own Equals method.
Only if they are being used as a key and you don't want to base equivalence on the instance of the object itself. If you only want references to the exact same instance to be equivalent, you are fine and need do nothing, but if you are using your type as a key, and you want "equivalent" instances to be considered equal, your class must implement Equals() and GetHashCode().
If your custom type is being stored as a value, and not used as a key, this is not necessary, of course. For example, in this case MyType does not need to override Equals() or GetHashCode() because it is only used as a value, and not as the storage key.
Dictionary<string, MyType> x;
However in this case:
Dictionary<MyType, string> x;
Your custom type is the key, and thus it would need to override Equals() and GetHashCode(). The GetHashCode() is used to determine which location it hashes to, and the Equals() is used to resolve collisions on the hash code (among other things).
You'd need to override the same two methods when dealing with many LINQ queries as well. Alternatively, you can provide a standalone IEqualityComparer apart from your class to determine if two instances are equivalent.
See the EqualityComparer.Default<T> property. This is how the dictionary obtains an equality comparer if you don't supply it with one.
This returns an equality comparer based on the type & capabilities of T.
For example, if T extends IEquatable, EqualityComparer.Default will return an equality comparer instance that uses the IEquatable interface. Otherwise it will return an equality comparer instance that uses the Object.Equals method.
The Object.Equals method, by default for reference types, uses reference equality (Object.ReferenceEquals) unless you override it with a custom comparison.
The Object.Equals method, by default for value types, uses reflection to compare the fields of the struct for equality*. Reflection being slow, this is why it's always recommended to override Equals in value types.
* unless it's a blittable value type in which case the raw bits are compared.
No, there are no type constraints on Dictionary<TKey, TValue>

Overriding Equals() but not checking all fields - what will happen?

If I override Equals and GetHashCode, how do I decide which fields to compare? And what will happen if I have two objects with two fields each, but Equals only checks one field?
In other words, let's say I have this class:
class EqualsTestClass
{
public string MyDescription { get; set; }
public int MyId { get; set; }
public override bool Equals(object obj)
{
EqualsTestClass eq = obj as EqualsTestClass;
if(eq == null) {
return false;
} else {
return MyId.Equals(eq.MyId);
}
}
public override int GetHashCode()
{
int hashcode = 23;
return (hashcode * 17) + MyId.GetHashCode();
}
}
I consider two objects Equal if they have the same MyId. So if the Id is equal but the description is different, they are still considered equal.
I just wonder what the pitfalls of this approach are? Of course, a construct like this will behave as expected:
List<EqualsTestClass> test = new List<EqualsTestClass>();
EqualsTestClass eq1 = new EqualsTestClass();
eq1.MyId = 1;
eq1.MyDescription = "Des1";
EqualsTestClass eq2 = new EqualsTestClass();
eq2.MyId = 1;
eq2.MyDescription = "Des2";
test.Add(eq1);
if (!test.Contains(eq2))
{
// Will not be executed, as test.Contains is true
test.Add(eq2);
}
As eq2 is value-equal to eq1, it will not be added. But that is code that I control, but I wonder if there is code in the framework that could cause unexpected problems?
So, should I always add all public Fields in my Equals() Comparison, or what are the guidelines to avoid a nasty surprise because of some bad Framework-Mojo that was completely unexpected?
The reason for overriding Equals() is that you define, what it means for two instances to be equal. In some cases that means that all fields must be equal, but it doesn't have to. You decide.
For more information see the documentation and this post.
I don't think you need to worry about the Framework in this instance.
If you as the Class Designer consider two instances of that class to be equal if they share the same MyId, then, you only need to test MyId in your overriden Equals() and GetHashCode() methods.
You only need to check for the fields that are required to match, if all that needs to match is the ID then go with that.
A question: If I override Equals and GetHashCode, how do i decide which fields I compare?
It depends on what you are trying to accomplish. If you are trying to see if the objects are exactly the same you should compare all of them. If you have some 'key' and you only want to know if they are the same 'object', even if other data is different then just check the 'key' values.
And what will happen if I have two objects with two fields each, but Equals only checks one field?
Then you will have an equality method that just checks to see if the 'key' is the same, and could potentially have multiple 'equal' objects that have internal variances.
Others have said this is perfectly valid and expected, and it's exactly how Equals is supposed to operate. So there's no problem with it as a class.
I'd be very slightly wary of this as an API. Forgive me if it isn't what was intended: in that case this is just a note of caution to others.
The potential problem is that users of the API will naturally expect equal objects to "be the same". This isn't part of the contract of equality, but it is part of the common-sense meaning of the word. The class looks a bit like a binary tuple, but isn't one, so that should be for sensible reasons.
An example of such a sensible reason is that one field is a "visible implementation detail", like the max load factor on a hashtable-based container. An example of a risky (although tempting) reason is "because I added the description in afterwards and didn't want to change the Equals method in case it broke something".
So it's completely valid to do something a bit counter-intuitive, especially if you clearly document that the behaviour might be surprising. Such Equals methods have to be supported, because banning them would be crazy in cases where a field is obviously irrelevant. But it should be clear why it makes sense to create two ID-description pairs with the same ID and different descriptions, but it doesn't make sense to add them both to a container (like HashSet) which uses Equals/HashCode to prevent duplicate entries.
Your example is checking to see if a list(of EqualsTestClass) contains an object of the same type with the same property values. Another way to accomplish this task without overriding equals (and the traditional understanding of equals) is to use a custom comparer. It would look something like this (in VB):
Public Class EqualsTestComparer
Implements IEqualityComparer(Of EqualsTestClass)
Public Function Equals1(ByVal x As EqualsTestClass, ByVal y As EqualsTestClass) As Boolean Implements System.Collections.Generic.IEqualityComparer(Of EqualsTestClass).Equals
If x.MyId = y.MyId and x.MyDescription = y.MyDescription Then
Return True
Else
Return False
End If
End Function
Public Function GetHashCode1(ByVal obj As EqualsTestClass) As Integer Implements System.Collections.Generic.IEqualityComparer(Of EqualsTestClass).GetHashCode
Return obj.ToString.ToLower.GetHashCode
End Function
End Class
Then in your routine you simply use the custom comparer:
If Not test.Contains(eq2, New EqualsTestComparer) Then
//Do Stuff
End if

Categories

Resources