Firstly, I'm using the GetHashCode algorithm described, here. Now, picture the following (contrived) example:
class Foo
{
public Foo(int intValue, double doubleValue)
{
this.IntValue = intValue;
this.DoubleValue = doubleValue;
}
public int IntValue { get; private set; }
public double DoubleValue { get; private set; }
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 23 + IntValue.GetHashCode();
hash = hash * 23 + DoubleValue.GetHashCode();
return hash;
}
}
}
class DerivedFoo : Foo
{
public DerivedFoo(int intValue, double doubleValue)
: base(intValue, doubleValue)
{
}
}
If I have a Foo and a DerivedFoo with the same values for each of the properties they're going to have the same hash code. Which means I could have HashSet<Foo> or use the Distinct method in Linq and the two instances would be treated as if they were the same.
I'm probably just misunderstanding the use of GetHashCode but I would expect these the two instances to have different hash codes. Is this an invalid expectation or should GetHashCode use the type in the calculation? (Or should DerivedClass also override GetHashCode)?
P.S. I realize there are many, many questions on SO relating to this topic, but I've haven't spotted one that directly answers this question.
GetHashCode() is not supposed to guarantee uniqueness (though it helps for performance if as unique as possible).
The main rule with GetHashCode() is that equivalent objects must have the same hash code, but that doesn't mean non-equivalent objects can't have the same hash code.
If two objects have the same hash code, the Equals() method is then invoked to see if they are the same. Since the types are different (depending on how you coded your Equals overload of course) they will not be equal and thus it will be fine.
Even if you had a different hash code algorithm for each type, there's still always a chance of a collision, thus the need for the Equals() check as well.
Now given your example above, you do not implement Equals() this will make every object distinct regardless of the hash code because the default implementation of Equals() from object is a reference equality check.
If you haven't, go ahead and override Equals() for each of your types as well (they can inherit your implementation of GetHashCode() if you like, or have new ones) and there you can make sure that the type of the compare-to object are the same before declaring them equal. And make sure Equals() and GetHashCode() are always implemented so that:
Objects that are Equals() must have same GetHashCode() results.
Objects with different GetHashCode() must not be Equals().
The two instances do not need to have different hash codes. The results of GetHashCode are not assumed by the HashSet or other framework classes, because there can be collisions even within a type. GetHashCode is simply used to determine the location within the hash table to store the item. If there is a collision within the HashSet, it then falls back on the result of the Equals method to determine the unique match. This means that whever you implement GetHashCode, you should also implement Equals (and check that the types match). Similarly, whenever you implement Equals, you should also implement GetHashCode. See a good explanation by Eric Lippert here.
Related
Is the following code OK?
public override bool Equals(object obj)
{
if (obj == null || !(obj is LicenseType))
return false;
return GetHashCode() == obj.GetHashCode();
}
public override int GetHashCode()
{
return
Vendor.GetHashCode() ^
Version.GetHashCode() ^
Modifiers.GetHashCode() ^
Locale.GetHashCode();
}
All the properties are enums/numeric fields, and are the only properties that define the LicenseType objects.
No, the documentation states very clearly:
You should not assume that equal hash codes imply object equality.
Also:
Two objects that are equal return hash codes that are equal. However, the reverse is not true: equal hash codes do not imply object equality
And:
Caution:
Do not test for equality of hash codes to determine whether two objects are equal. (Unequal objects can have identical hash codes.) To test for equality, call the ReferenceEquals or Equals method.
What happens when two different objects are returning the same HashCodes?
It is, after all, just a hash, and so may not be distinct over the full range of values the objects can have.
It is ok (no negative consequences) only if GetHashCode is unique for each possible value. To give an example, the GetHashCode of a short (a 16 bit value) is always unique (let's hope it so :-) ), so basing the Equals to the GetHashCode is ok.
Another example, for int, the GetHashCode() is the value of the integer, so we have that ((int)value).GetHashCode() == ((int)value). Note that this isn't true for example for short (but still the hash codes of a short are unique, simply they use a more complex formula)
Note that what what Patrick wrote is wrong, because that is true for the "user" of an object/class. You are the "writer" of the object/class, so you define the concept of equality and the concept of hash code. If you define that two objects are always equal, whatever their value is, then it's ok.
public override int GetHashCode() { return 1; }
public override bool Equals(object obj) { return true; }
The only important rules for Equals are:
Implementations are required to ensure that if the Equals method returns true for two objects x and y, then the value returned by the GetHashCode method for x must equal the value returned for y.
The Equals method is reflexive, symmetric, and transitive...
Clearly your Equals() and GetHashCode() are ok with this rules, so they are ok.
Just out of curiosity, there is at least an exception for the equality operator (==) (normally you define the equality operator based on the Equals method)
bool v1 = double.NaN.Equals(double.NaN); // true
bool v2 = double.NaN == double.NaN; // false
This because the NaN value is defined in the IEEE 754 standard as being different from all the values, NaN include. For practical reasons, the Equals returns true.
It must be Noted that it is NOT a rule that if two objects have the same hash code, then they must be equal.
There are only four billion or so possible hash codes, but obviously there are more than four billion possible objects. There are far more than four billion ten-character strings alone. Therefore there must be at least two unequal objects that share the same hash code, by the Pigeonhole Principle.
Suppose you have a Customer object that has a bunch of fields like Name, Address, and so on. If you make two such objects with exactly the same data in two different processes, they do not have to return the same hash code. If you make such an object on Tuesday in one process, shut it down, and run the program again on Wednesday, the hash codes can be different.
This has bitten people in the past. The documentation for System.String.GetHashCode notes specifically that two identical strings can have different hash codes in different versions of the CLR, and in fact they do. Don't store string hashes in databases and expect them to be the same forever, because they won't be.
When I examined a project(my company codes). I looked this:
public override int GetHashCode()
{
unchecked
{
int result = 17;
result = result * 23 + ((connection != null) ? this.connection.GetHashCode() : 0);
return result;
}
}
actually, I have seen GetHashCode() first time. I ran a little about it. But I cant understand why they used it in this code line and for connection?
Is there a special reason? What is the logic to use getHashCode for connection?
Thanks.
GetHashCode should be used if you know a good way to generate fast and well distributed hashkeys for an object.
This is useful when you want to store the object in dictionaries and such. Better hashing means better efficiency in those datastructures.
Otherwise the base GetHashCode() is used. Which is suboptimal in a lot of cases.For some details on using the GetHasCode override combined with Equals override see;
http://msdn.microsoft.com/en-us/library/ms173147(v=vs.80).aspx
That GetHashCode can be used to quickly identify whether two instances appear to be "the same" (different hashcode = different instance; same hashcode could mean same instance).
Apparently that "connection" field is important in deciding the identity of your object.
I have overridden Equal(Object comparee) method but when i add objects to my HashSet i still get doubles. What did i miss? The MyType type contains two int fields (let's say that). Is HashSet a wrong collection type, perhaps?
I wish to add some MyType thingies but so that the collection only stores the unique ones, where unique is defined by me (using Equals method or any other way).
You should always override GetHashCode() when you override Equals(). I typically return some sort of primary key, if available, for that method. Otherwise, you can check out this thread for ideas for implementing it.
The key to understanding the relationship between those two methods is:
If two entries have different hash codes, they are definitely not
equal.
If two entries have the same hash code, they might be equal, so call
Equals() to find out for sure.
You need to override GetHashCode() as well; otherwise, your objects will have different hashcodes and will therefore automatically be assumed to be different. Take some unique-ish value from your object and use that if available, or just generate your own.
And don't be lazy and use the same hash code for all of them, either; that will defeat the purpose of a HashSet.
So for your example with two int fields, you might do something like:
public override int GetHashCode() {
return field1 ^ field2;
}
Consider the following objects:
class Route
{
public int Origin { get; set; }
public int Destination { get; set; }
}
Route implements equality operators.
class Routing
{
public List<Route> Paths { get; set; }
}
I used the code below to implement GetHashCode method for the Routing object and it seems to work but I wonder if that's the right way to do it? I rely on equality checks and as I'm uncertain I thought I'll ask you guys. Can I just sum the hash codes or do I need to do more magic in order to guarantee the desired effect?
public override int GetHashCode() =>
{
return (Paths != null
? (Paths.Select(p => p.GetHashCode())
.Sum())
: 0);
}
I checked several GetHashCode() questions here as well as MSDN and Eric Lippert's article on this topic but couldn't find what I'm looking for.
I think your solution is fine. (Much later remark: LINQ's Sum method will act in checked context, so you can very easily get an OverflowException which means it is not so fine, after all.) But it is more usual to do XOR (addition without carry). So it could be something like
public override int GetHashCode()
{
int hc = 0;
if (Paths != null)
foreach (var p in Paths)
hc ^= p.GetHashCode();
return hc;
}
Addendum (after answer was accepted):
Remember that if you ever use this type Routing in a Dictionary<Routing, Whatever>, a HashSet<Routing> or another situation where a hash table is used, then your instance will be lost if someone alters (mutates) the Routing after it has been added to the collection.
If you're sure that will never happen, use my code above. Dictionary<,> and so on will still work if you make sure no-one alters the Routing that is referenced.
Another choice is to just write
public override int GetHashCode()
{
return 0;
}
if you believe the hash code will never be used. If every instace returns 0 for hash code, you will get very bad performance with hash tables, but your object will not be lost. A third option is to throw a NotSupportedException.
The code from Jeppe Stig Nielsen's answer works but it could lead to a lot of repeating hash code values. Let's say you are hashing a list of ints in the range of 0-100, then your hash code would be guarnteed to be between 0 and 255. This makes for a lot of collisions when used in a Dictionary. Here is an improved version:
public override int GetHashCode()
{
int hc = 0;
if (Paths != null)
foreach (var p in Paths) {
hc ^= p.GetHashCode();
hc = (hc << 7) | (hc >> (32 - 7)); //rotale hc to the left to swipe over all bits
}
return hc;
}
This code will at least involve all bits over time as more and more items are hashed in.
As a guideline, the hash of an object must be the same over the object's entire lifetime. I would leave the GetHashCode function alone, and not overwrite it. The hash code is only used if you want to put your objects in a hash table.
You should read Eric Lippert's great article about hash codes in .NET: Guidelines and rules for GetHashCode.
Quoted from that article:
Guideline: the integer returned by GetHashCode should never change
Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable
If an object's hash code can mutate while it is in the hash table then clearly the Contains method stops working. You put the object in bucket #5, you mutate it, and when you ask the set whether it contains the mutated object, it looks in bucket #74 and doesn't find it.
The GetHashCode function you implemented will not return the same hash code over the lifetime of the object. If you use this function, you will run into trouble if you add those objects to a hash table: the Contains method will not work.
I don't think it's a right way to do, cause to dtermine the final hashcode it has to be unique for specifyed object. In your case you do a Sum(), which can produce the same result with different hashcodes in collection (at the end hashcodes are just integers).
If your intention is to determine equality based on the content of the collection, at this point just compare these cillections between two objects. It could be time consuming operation, by the way.
It always seems to just "work" without ever having to do anything.
The only thing I can think of is that each class has a hidden sort of static identifier that Object.GetHashCode uses. (also, does anyone know how Object.GetHashCode is implemented? I couldn't find it in the .NET Reflector)
I have never overridden GetHashCode but I was reading around and people say you only need to when overriding Equals and providing custom equality checking to your application so I guess I'm fine?
I'd still like to know how the magic works, though =P
It always seems to just "work" without ever having to do anything.
You didn't tell us if you're using value types or reference types for your keys.
If you're using value types, the default implementation of Equals and GetHashCode are okay (Equals checks if the fields are equals, and GetHashCode is based on the fields (not necessarily all of them!)). If you're using reference types, the default implementation of Equals and GetHashCode use reference equality, which may or may not be okay; it depends on what you're doing.
The only thing I can think of is that each class has a hidden sort of static identifier that Object.GetHashCode uses.
No. The default is a hash code based on the fields for a value type, and the reference for a reference type.
(also, does anyone know how Object.GetHashCode is implemented? I couldn't find it in the .NET Reflector)
It's an implementation detail that you should never ever need to know, and never ever rely on it. It could change on you at any moment.
I have never overridden GetHashCode but I was reading around and people say you only need to when overriding Equals and providing custom equality checking to your application so I guess I'm fine?
Well, is default equality okay for you? If not, override Equals and GetHashCode or implmenet IEqualityComparer<T> for your T.
I'd still like to know how the magic works, though =P
Every object has Equals and GetHashCode. The default implementations are as follows:
For value types, Equals is value equality.
For reference types, Equals is reference equality.
For value types, GetHashCode is based on the fields (again, not necessarily all of them!).
For reference types, GetHashCode is based on the reference.
If you use a overload of Dictionary constructor that doesn't take a IEqualityComparer<T> for your T, it will use EqualityComparer<T>.Default. This IEqualityComparer<T> just uses Equals and GetHashCode. So, if you haven't overridden them, you get the implementations as defined above. If you override Equals and GetHashCode then this is what EqualityComparer<T>.Default will use.
Otherwise, pass a custom implementation of IEqualityComparer<T> to the constructor for Dictionary.
Are you using your custom classes as keys or values? If you are using them only for values, then their GetHashCode doesn't matter.
If you are using them as keys, then the quality of the hash affects performance. The Dictionary stores a list of elements for each hash code, since the hash codes don't need to be unique. In the worst case scenario, if all of your keys end up having the same hash code, then the lookup time for the dictionary will like a list, O(n), instead of like a hash table, O(1).
The documentation for Object.GetHashCode is quite clear:
The default implementation of the GetHashCode method does not guarantee unique return values for different objects... Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.
Object's implementations of Equals() and GetHashCode() (which you're inheriting) compare by reference.
Object.GetHashCode is implemented in native code; you can see it in the SSCLI (Rotor).
Two different instances of a class will (usually) have different hashcodes, even if their properties are equal.
You only need to override them if you want to compare by value – if you want to different instances with the same properties to compare equal.
It really depends on your definition of Equality.
class Person
{
public string Name {get; set;}
}
void Test()
{
var joe1 = new Person {Name="Joe"};
var joe2 = new Person {Name="Joe"};
Assert.AreNotEqual(joe1, joe2);
}
If you have a different definition for equality, you should override Equals & GetHashCode to get the appropriate behavior.
Hash codes are for optimizing lookup performance in hash tables (dictionaries). While hash codes have a goal of colliding as little as possible between instances of objects they are not guaranteed to be unique. The goal should be equal distribution among the int range given a set of typical types of those objects.
The way hash tables work is each object implements a function to compute a hash code hopefully as distributed as possible amongst the int range. Two different objects can produce the same hash code but an instance of an object given it's data should always product the same hash code. Therefore, they are not unique and should not be used for equality. The hash table allocates an array of size n (much smaller than the int range) and when an object is added to the hash table, it calls GetHashCode and then it's mod'd (%) against the size of the array allocated. For collisions in the table, typically a list of objects is chained. Since computing hash codes should be very fast, a lookup is fast - jump to the array offset and walk the chain. The larger the array (more memory), the less collisions and the faster the lookup.
Objects GetHashCode cannot possibly produce a good hash code because by definition it knows nothing about the concrete object that's inheriting from it. That's why if you have custom objects that need to be placed in dictionaries and you want to optimize the lookups (control creating an even distribution with minimal collisions), you should override GetHashCode.
If you need to compare two items, then override equals. If you need the object to be sortable (which is needed for sorted lists) then override IComparable.
Hope that helps explain the difference.