Unexpected GetHashCode behavior in Vector3D

Unexpected GetHashCode behavior in Vector3D - c#

I have found some weird behavior with the Vector3D class.
Given two Vector3Ds with rearranged X/Y/Z values (e.g. [0,0,1], [0,1,0] or [3,1,4],[1,3,4]), calling GetHashCode results in the same value. If you check to see if the vectors are equal via .Equals or ==, as expected you see that the two vectors are in fact not equal.
If they are not equal, why would they have the same hash code?
var a = new System.Windows.Media.Media3D.Vector3D(0, 0, 1);
var b = new System.Windows.Media.Media3D.Vector3D(0, 1, 0);
var equal = a == b; // false
var equals = a.Equals(b); // false
var aHashCode = a.GetHashCode(); // 1072693248
var bHashCode = b.GetHashCode(); // 1072693248
var hashEqual = a.GetHashCode() == b.GetHashCode(); // true
PS: I have also observed this behavior in the Point3D class as well. Perhaps other classes in System.Windows.Media namespace are also affected.

Two different objects can have the same hash code. Two equal objects alwasy have the same hash code.
As it is stated in MSDN
Two objects that are equal return hash codes that are equal. However,
the reverse is not true: equal hash codes do not imply object
equality, because different (unequal) objects can have identical hash
codes. Furthermore, the .NET Framework does not guarantee the default
implementation of the GetHashCode method, and the value this method
returns may differ between .NET Framework versions and platforms, such
as 32-bit and 64-bit platforms. For these reasons, do not use the
default implementation of this method as a unique object identifier
for hashing purposes.

Related

Is there any negative consequence in having Equals based on GetHashCode?

Is the following code OK?
public override bool Equals(object obj)
{
if (obj == null || !(obj is LicenseType))
return false;
return GetHashCode() == obj.GetHashCode();
}
public override int GetHashCode()
{
return
Vendor.GetHashCode() ^
Version.GetHashCode() ^
Modifiers.GetHashCode() ^
Locale.GetHashCode();
}
All the properties are enums/numeric fields, and are the only properties that define the LicenseType objects.

No, the documentation states very clearly:
You should not assume that equal hash codes imply object equality.
Also:
Two objects that are equal return hash codes that are equal. However, the reverse is not true: equal hash codes do not imply object equality
And:
Caution:
Do not test for equality of hash codes to determine whether two objects are equal. (Unequal objects can have identical hash codes.) To test for equality, call the ReferenceEquals or Equals method.

What happens when two different objects are returning the same HashCodes?
It is, after all, just a hash, and so may not be distinct over the full range of values the objects can have.

It is ok (no negative consequences) only if GetHashCode is unique for each possible value. To give an example, the GetHashCode of a short (a 16 bit value) is always unique (let's hope it so :-) ), so basing the Equals to the GetHashCode is ok.
Another example, for int, the GetHashCode() is the value of the integer, so we have that ((int)value).GetHashCode() == ((int)value). Note that this isn't true for example for short (but still the hash codes of a short are unique, simply they use a more complex formula)
Note that what what Patrick wrote is wrong, because that is true for the "user" of an object/class. You are the "writer" of the object/class, so you define the concept of equality and the concept of hash code. If you define that two objects are always equal, whatever their value is, then it's ok.
public override int GetHashCode() { return 1; }
public override bool Equals(object obj) { return true; }
The only important rules for Equals are:
Implementations are required to ensure that if the Equals method returns true for two objects x and y, then the value returned by the GetHashCode method for x must equal the value returned for y.
The Equals method is reflexive, symmetric, and transitive...
Clearly your Equals() and GetHashCode() are ok with this rules, so they are ok.
Just out of curiosity, there is at least an exception for the equality operator (==) (normally you define the equality operator based on the Equals method)
bool v1 = double.NaN.Equals(double.NaN); // true
bool v2 = double.NaN == double.NaN; // false
This because the NaN value is defined in the IEEE 754 standard as being different from all the values, NaN include. For practical reasons, the Equals returns true.

It must be Noted that it is NOT a rule that if two objects have the same hash code, then they must be equal.
There are only four billion or so possible hash codes, but obviously there are more than four billion possible objects. There are far more than four billion ten-character strings alone. Therefore there must be at least two unequal objects that share the same hash code, by the Pigeonhole Principle.
Suppose you have a Customer object that has a bunch of fields like Name, Address, and so on. If you make two such objects with exactly the same data in two different processes, they do not have to return the same hash code. If you make such an object on Tuesday in one process, shut it down, and run the program again on Wednesday, the hash codes can be different.
This has bitten people in the past. The documentation for System.String.GetHashCode notes specifically that two identical strings can have different hash codes in different versions of the CLR, and in fact they do. Don't store string hashes in databases and expect them to be the same forever, because they won't be.

implement GetHashCode() for objects that contain collections

Consider the following objects:
class Route
{
public int Origin { get; set; }
public int Destination { get; set; }
}
Route implements equality operators.
class Routing
{
public List<Route> Paths { get; set; }
}
I used the code below to implement GetHashCode method for the Routing object and it seems to work but I wonder if that's the right way to do it? I rely on equality checks and as I'm uncertain I thought I'll ask you guys. Can I just sum the hash codes or do I need to do more magic in order to guarantee the desired effect?
public override int GetHashCode() =>
{
return (Paths != null
? (Paths.Select(p => p.GetHashCode())
.Sum())
: 0);
}
I checked several GetHashCode() questions here as well as MSDN and Eric Lippert's article on this topic but couldn't find what I'm looking for.

I think your solution is fine. (Much later remark: LINQ's Sum method will act in checked context, so you can very easily get an OverflowException which means it is not so fine, after all.) But it is more usual to do XOR (addition without carry). So it could be something like
public override int GetHashCode()
{
int hc = 0;
if (Paths != null)
foreach (var p in Paths)
hc ^= p.GetHashCode();
return hc;
}
Addendum (after answer was accepted):
Remember that if you ever use this type Routing in a Dictionary<Routing, Whatever>, a HashSet<Routing> or another situation where a hash table is used, then your instance will be lost if someone alters (mutates) the Routing after it has been added to the collection.
If you're sure that will never happen, use my code above. Dictionary<,> and so on will still work if you make sure no-one alters the Routing that is referenced.
Another choice is to just write
public override int GetHashCode()
{
return 0;
}
if you believe the hash code will never be used. If every instace returns 0 for hash code, you will get very bad performance with hash tables, but your object will not be lost. A third option is to throw a NotSupportedException.

The code from Jeppe Stig Nielsen's answer works but it could lead to a lot of repeating hash code values. Let's say you are hashing a list of ints in the range of 0-100, then your hash code would be guarnteed to be between 0 and 255. This makes for a lot of collisions when used in a Dictionary. Here is an improved version:
public override int GetHashCode()
{
int hc = 0;
if (Paths != null)
foreach (var p in Paths) {
hc ^= p.GetHashCode();
hc = (hc << 7) | (hc >> (32 - 7)); //rotale hc to the left to swipe over all bits
}
return hc;
}
This code will at least involve all bits over time as more and more items are hashed in.

As a guideline, the hash of an object must be the same over the object's entire lifetime. I would leave the GetHashCode function alone, and not overwrite it. The hash code is only used if you want to put your objects in a hash table.
You should read Eric Lippert's great article about hash codes in .NET: Guidelines and rules for GetHashCode.
Quoted from that article:
Guideline: the integer returned by GetHashCode should never change
Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable
If an object's hash code can mutate while it is in the hash table then clearly the Contains method stops working. You put the object in bucket #5, you mutate it, and when you ask the set whether it contains the mutated object, it looks in bucket #74 and doesn't find it.
The GetHashCode function you implemented will not return the same hash code over the lifetime of the object. If you use this function, you will run into trouble if you add those objects to a hash table: the Contains method will not work.

I don't think it's a right way to do, cause to dtermine the final hashcode it has to be unique for specifyed object. In your case you do a Sum(), which can produce the same result with different hashcodes in collection (at the end hashcodes are just integers).
If your intention is to determine equality based on the content of the collection, at this point just compare these cillections between two objects. It could be time consuming operation, by the way.

Should GetHashCode Depend on the Type?

Firstly, I'm using the GetHashCode algorithm described, here. Now, picture the following (contrived) example:
class Foo
{
public Foo(int intValue, double doubleValue)
{
this.IntValue = intValue;
this.DoubleValue = doubleValue;
}
public int IntValue { get; private set; }
public double DoubleValue { get; private set; }
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 23 + IntValue.GetHashCode();
hash = hash * 23 + DoubleValue.GetHashCode();
return hash;
}
}
}
class DerivedFoo : Foo
{
public DerivedFoo(int intValue, double doubleValue)
: base(intValue, doubleValue)
{
}
}
If I have a Foo and a DerivedFoo with the same values for each of the properties they're going to have the same hash code. Which means I could have HashSet<Foo> or use the Distinct method in Linq and the two instances would be treated as if they were the same.
I'm probably just misunderstanding the use of GetHashCode but I would expect these the two instances to have different hash codes. Is this an invalid expectation or should GetHashCode use the type in the calculation? (Or should DerivedClass also override GetHashCode)?
P.S. I realize there are many, many questions on SO relating to this topic, but I've haven't spotted one that directly answers this question.

GetHashCode() is not supposed to guarantee uniqueness (though it helps for performance if as unique as possible).
The main rule with GetHashCode() is that equivalent objects must have the same hash code, but that doesn't mean non-equivalent objects can't have the same hash code.
If two objects have the same hash code, the Equals() method is then invoked to see if they are the same. Since the types are different (depending on how you coded your Equals overload of course) they will not be equal and thus it will be fine.
Even if you had a different hash code algorithm for each type, there's still always a chance of a collision, thus the need for the Equals() check as well.
Now given your example above, you do not implement Equals() this will make every object distinct regardless of the hash code because the default implementation of Equals() from object is a reference equality check.
If you haven't, go ahead and override Equals() for each of your types as well (they can inherit your implementation of GetHashCode() if you like, or have new ones) and there you can make sure that the type of the compare-to object are the same before declaring them equal. And make sure Equals() and GetHashCode() are always implemented so that:
Objects that are Equals() must have same GetHashCode() results.
Objects with different GetHashCode() must not be Equals().

The two instances do not need to have different hash codes. The results of GetHashCode are not assumed by the HashSet or other framework classes, because there can be collisions even within a type. GetHashCode is simply used to determine the location within the hash table to store the item. If there is a collision within the HashSet, it then falls back on the result of the Equals method to determine the unique match. This means that whever you implement GetHashCode, you should also implement Equals (and check that the types match). Similarly, whenever you implement Equals, you should also implement GetHashCode. See a good explanation by Eric Lippert here.

Using GetHashCode to test equality in Equals override

Is it ok to call GetHashCode as a method to test equality from inside the Equals override?
For example, is this code acceptable?
public class Class1
{
public string A
{
get;
set;
}
public string B
{
get;
set;
}
public override bool Equals(object obj)
{
Class1 other = obj as Class1;
return other != null && other.GetHashCode() == this.GetHashCode();
}
public override int GetHashCode()
{
int result = 0;
result = (result ^ 397) ^ (A == null ? 0 : A.GetHashCode());
result = (result ^ 397) ^ (B == null ? 0 : B.GetHashCode());
return result;
}
}

The others are right; your equality operation is broken. To illustrate:
public static void Main()
{
var c1 = new Class1() { A = "apahaa", B = null };
var c2 = new Class1() { A = "abacaz", B = null };
Console.WriteLine(c1.Equals(c2));
}
I imagine you want the output of that program to be "false" but with your definition of equality it is "true" on some implementations of the CLR.
Remember, there are only about four billion possible hash codes. There are way more than four billion possible six letter strings, and therefore at least two of them have the same hash code. I've shown you two; there are infinitely many more.
In general you can expect that if there are n possible hash codes then the odds of getting a collision rise dramatically once you have about the square root of n elements in play. This is the so-called "birthday paradox". For my article on why you shouldn't rely upon hash codes for equality, click here.

No, it is not ok, because it's not
equality <=> hashcode equality.
It's just
equality => hashcode equality.
or in the other direction:
hashcode inequality => inequality.
Quoting http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx:
If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

I would say, unless you want for Equals to basically mean "has the same hash code as" for your type, then no, because two strings may be different but share the same hash code. The probability may be small, but it isn't zero.

No this is not an acceptable way to test for equality. It is very possible for 2 non-equal values to have the same hash code. This would cause your implementation of Equals to return true when it should return false

You can call GetHashCode to determine if the items are not equal, but if two objects return the same hash code, that doesn't mean they are equal. Two items can have the same hash code but not be equal.
If it's expensive to compare two items, then you can compare the hash codes. If they are unequal, then you can bail. Otherwise (the hash codes are equal), you have to do the full comparison.
For example:
public override bool Equals(object obj)
{
Class1 other = obj as Class1;
if (other == null || other.GetHashCode() != this.GetHashCode())
return false;
// the hash codes are the same so you have to do a full object compare.
}

You cannot say that just because the hash codes are equal then the objects must be equal.
The only time you would call GetHashCode inside of Equals was if it was much cheaper to compute a hash value for an object (say, because you cache it) than to check for equality. In that case you could say if (this.GetHashCode() != other.GetHashCode()) return false; so that you could quickly verify that the objects were not equal.
So when would you ever do this? I wrote some code that takes screenshots at periodic intervals and tries to find how long it's been since the screen changed. Since my screenshots are 8MB and have relatively few pixels that change within the screenshot interval it's fairly expensive to search a list of them to find which ones are the same. A hash value is small and only has to be computed once per screenshot, making it easy to eliminate known non-equal ones. In fact, in my application I decided that having identical hashes was close enough to being equal that I didn't even bother to implement the Equals overload, causing the C# compiler to warn me that I was overloading GetHashCode without overloading Equals.

There is one case where using hashcodes as a short-cut on equality comparisons makes sense.
Consider the case where you are building a hashtable or hashset. In fact, let's just consider hashsets (hashtables extend that by also holding a value, but that isn't relevant).
There are various different approaches one can take, but in all of them you have a small number of slots the hashed values can be placed in, and we take either the open or closed approach (which just for fun, some people use the opposite jargon for to others); if we collide on the same slot for two different objects we can either store them in the same slot (but having a linked list or such for where the objects are actually stored) or by re-probing to pick a different slot (there are various strategies for this).
Now, with either approach, we're moving away from the O(1) complexity we want with a hashtable, and towards an O(n) complexity. The risk of this is inversely proportional to the number of slots available, so after a certain size we resize the hashtable (even if everything was ideal, we'd eventually have to do this if the number of items stored were greater than the number of slots).
Re-inserting the items on a resize will obviously depend on the hash codes. Because of this, while it rarely makes sense to memoise GetHashCode() in an object (it just isn't called often enough on most objects), it certainly does make sense to memoise it within the hash table itself (or perhaps, to memoise a produced result, such as if you re-hashed with a Wang/Jenkins hash to reduce the damage caused by bad GetHashCode() implementations).
Now, when we come to insert our logic is going to be something like:
Get hash code for object.
Get slot for object.
If slot is empty, place object in it and return.
If slot contains equal object, we're done for a hashset and have the position to replace the value for a hashtable. Do this, and return.
Try next slot according to collision strategy, and return to item 3 (perhaps resizing if we loop this too often).
So, in this case we have to get the hash code before we compare for equality. We also have the hash code for existing objects already pre-computed to allow for resize. The combination of these two facts means that it makes sense to implement our comparison for item 4 as:
private bool IsMatch(KeyType newItem, KeyType storedItem, int newHash, int oldHash)
{
return ReferenceEquals(newItem, storedItem) // fast, false negatives, no false positives (only applicable to reference types)
||
(
newHash == oldHash // fast, false positives, no fast negatives
&&
_cmp.Equals(newItem, storedItem) // slow for some types, but always correct result.
);
}
Obviously, the advantage of this depends on the complexity of _cmp.Equals. If our key type was int then this would be a total waste. If our key type where string and we were using case-insensitive Unicode-normalised equality comparisons (so it can't even shortcut on length) then the saving could well be worth it.
Generally memoising hash codes doesn't make sense because they aren't used often enough to be a performance win, but storing them in the hashset or hashtable itself can make sense.

It's wrong implementation, as others have stated why.
You should short circuit the equality check using GetHashCode like:
if (other.GetHashCode() != this.GetHashCode()
return false;
in the Equals method only if you're certain the ensuing Equals implementation is much more expensive than GetHashCode which is not vast majority of cases.
In this one implementation you have shown (which is 99% of the cases) its not only broken, its also much slower. And the reason? Computing the hash of your properties would almost certainly be slower than comparing them, so you're not even gaining in performance terms. The advantage of implementing a proper GetHashCode is when your class can be the key type for hash tables where hash is computed only once (and that value is used for comparison). In your case GetHashCode will be called multiple times if it's in a collection. Even though GetHashCode itself should be fast, it's not mostly faster than equivalent Equals.
To benchmark, run your Equals (a proper implementation, taking out the current hash based implementation) and GetHashCode here
var watch = Stopwatch.StartNew();
for (int i = 0; i < 100000; i++)
{
action(); //Equals and GetHashCode called here to test for performance.
}
watch.Stop();
Console.WriteLine(watch.Elapsed.TotalMilliseconds);

How to write a hashcode generator for this class?

I have a class which holds a position in three floats. I have overridden Equals like so:
return Math.Abs(this.X - that.X) < TOLERANCE
&& Math.Abs(this.Y - that.Y) < TOLERANCE
&& Math.Abs(this.Z - that.Z) < TOLERANCE;
This is all very well, but now I need to write a GetHashCode implementation for these vertices, and I'm stuck. simply taking the hashcode of the three values and xoring them together isn't good enough, because two objects with slightly different positions may be considered the same.
So, how can I build a GetHashCode implementation for this class which will always return the same value for instances which would be considered equal by the above method?

There's only one way to satisfy the requirements of GetHashCode with an Equals like this.
Say you have these objects (the arrows indicate the limits of the tolerance, and I'm simplifying this to 1-D):
a c
<----|----> <----|---->
<----|---->
b
By your implementation of Equals, we have:
a.Equals(b) == true
b.Equals(c) == true
a.Equals(c) == false
(This is the loss of transitivity mentioned...)
However, the requirements of GetHashCode are that Equals being true implies that the hash codes are the same. Thus, we have:
hash(a) = hash(b)
hash(b) = hash(c)
∴ hash(a) = hash(c)
By extension, we can cover any part of the 1-D space with this (imagine d, e, f, ...), and all the hashes will have to be the same!
int GetHashCode()
{
return some_constant_integer;
}
I would say don't bother with .NET's GetHashCode. It doesn't make sense for your application. ;)
If you needed some form of hash for quick lookup for your data type, you should start looking at some kind of spatial index.

I recommend that you rethink your implementation of Equals. It violates the transitive property, and that's going to give you headaches down the road. See How to: Define Value Equality for a Type, and specifically this line:
if (x.Equals(y) && y.Equals(z))
returns true, then x.Equals(z) returns
true. This is called the transitive
property.

This "Equals" implementation doesn't satisfy the transitive property of being equal (that if X equals Y, and Y equals Z, then X equals Z).
Given that you've already got a non-conforming implementation of Equals, I wouldn't worry too much about your hashing code.

Is this possible? In your equality implementation, there's effectively a sliding window within which equality is considered true, however if you have to "bucketize" (or quantize) for a hash, then it's likely that two items that are "equal" might lie on either side of the hash "boundary".

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.