I have a wrapper class that will only have a single field and I want to override the GetHashCode method for the wrapper class.
I have written Hash Code algorithms in situations where there are multiple fields (i.e. using xor, shift-and-wrap, etc.) but this is the first time I've ever only had a single field to work with.
1) Should a wrapper and its field return the same hash code? My initial thought is no, it just doesn't feel right that we would knowingly design a class that generates the same hash as another class.
2) If not, would taking the field's hashcode and adding 1 to it suffice or is there a better way to calculate the new hashcode? The field class itself has a good algorithm for generating its hash codes.
As you're likely aware, you should override GetHashCode() and Equals() when you want to customize the default equality behavior for the type, as used in various contexts but including hash-table based data structures like HashSet<T> and Dictionary<TKey, TValue>.
There are some general rules to follow when doing this (most important being that if two instances your type are equal, they must have the same hash code value), but none of those rules involve worrying about whether a hash value may coincide with the value of an instance of some other type. Indeed, it's not even necessarily a problem if a hash value coincides with the value of an instance of the same type. Since GetHashCode() returns a 32-bit integer, only types that have no more than than 2^32 possible values could even guarantee unique hash values for instances of the same type (e.g. int, short, bool, etc.).
For example, for any given long value (64-bit integer), there are 2^32-1 other long values with the same hash code.
Which is a long way of saying…the simple implementation of just returning as your own type's value for its GetHashCode() method, the value that your single field's GetHashCode() method returns, is a perfectly acceptable and useful way to implement your GetHashCode() method.
It's true that if for some reason you had an instance of the contained object's type and an instance of your own object's type in the same data structure, that there would be a collision (i.e. the two different instances having the same hash code value). But since collisions happen anyway when dealing with hash codes, that's not a problem.
Related
An object in C# has four methods - {Equals, GetType, ToString, GetHashCode}.
What sort of useful thing could someone do with the hash-code?
What sort of useful thing could someone do with the hashcode?
Quickly find potentially equal objects.
In particular, this method is usually used by types such as Dictionary<TKey, TValue> (for the keys) and HashSet<T>.
You should not assume that objects with equal hash codes are equal, however. See Eric Lippert's blog post for more information, and the Wikipedia hash table page for a more general discussion on the uses of hash codes.
A hash code is a numeric value that is used to identify an object
during equality testing. It can also serve as an index for an object
in a collection.
The GetHashCode method is suitable for use in hashing algorithms and
data structures such as a hash table.
The default implementation of the GetHashCode method does not
guarantee unique return values for different objects. Furthermore, the
.NET Framework does not guarantee the default implementation of the
GetHashCode method, and the value it returns will be the same between
different versions of the .NET Framework. Consequently, the default
implementation of this method must not be used as a unique object
identifier for hashing purposes.
The GetHashCode method can be overridden by a derived type. Value
types must override this method to provide a hash function that is
appropriate for that type and to provide a useful distribution in a
hash table. For uniqueness, the hash code must be based on the value
of an instance field or property instead of a static field or
property.
Objects used as a key in a Hashtable object must also override the
GetHashCode method because those objects must generate their own hash
code. If an object used as a key does not provide a useful
implementation of GetHashCode, you can specify a hash code provider
when the Hashtable object is constructed. Prior to the .NET Framework
version 2.0, the hash code provider was based on the
System.Collections.IHashCodeProvider interface. Starting with version
2.0, the hash code provider is based on the
System.Collections.IEqualityComparer interface.
- Sourced from MSDN:
The basic idea is that if two objects have a different hash code, they are different. If they have the same hash code, they could be different or equal.
To check if an object is present in a collection, you can first check against the hash codes, which is quick given you are comparing integers, and then do a more accurate test only on the objects with the same hash code.
This is used in the collection classes, for example.
GetHashCode
GetHashCode exists only for the benefit of these two types
->HashTable
->GenericDictionary
GetHashCode gives you varied keys for good hashtable performance.
Equals
Equals provides a null-safe equality comparison when types are unknown at compile time.
its signature is
public static bool Equals(object A,object B).
So you cant use operators like == or != if the type is unknown at compile time.You have to use Equals
Its useful when writing generic types
For Example:
class Test<T>
{
T value;
public void SetV(T newValue)
{
if(object.Equals(newValue,value))
//We have to use Object.Equals cant use == or !=since they cannot bind to unknown type at compile time
}
}
ToString
It returns default texual representation of a type instance.This method is overridden by all built in types
GetType
GetType is evaluated at runtime.It helps us to know the type's name,assemby,base type..and others
I am stumped by a seemingly simple problem. I have two objects that I am comparing with a !=.
When I run the application, a != b is true.
When I put a breakpoint and do a Watch, a.GetHashCode() == b.GetHashCode() is true.
These two (reference type) objects are defined in a different assembly, but I cannot find an override to the != method (although GetHashCode is overridden). Is there another explanation for this? Could it be possible that a GetHashCode for two objects could be the same, but a not-overriden != would return true?
Thanks.
When two objects that are different return the same code it is called a "collision". With only ~4 billion possible integer values, and more than 4 billion possible values of [your class name here] some collisions are inevitable. This is why a hash based structure (i.e. Dictionary) can't rely entirely on GetHashCode, it also needs a sensible Equals implementation to be effective. The Equals method is what is used to resolve these collisions.
Of course it's also possible that the creator of the class overwrite either GetHashCode or Equals and in some way made a mistake that in some way violated the "contract" for generating hash codes. Here is one list of guidelines to keep in mind when creating your GetHashCode methods. Remember that there is a fairly small set of things that you have to do, and another set of things that can be done to make it work efficiently.
return 0; is actually a perfectly acceptable GetHashCode implementation. It conforms with all of the rules, it just has a 100% chance of causing collisions, so it will be extraordinarily inefficient and you shouldn't ever actually do that.
It is perfectly legal for two objects that are not equal to have the same hashcode, but it is not valid for two objects that are equal to have different hashcodes.
The Dictionary style collection classes use the hashcode value (the GetHashCode value returned from the object specified as the key) to put the key/value pair into a hashbin. All key/value pairs where the hashcode value is the same for the key go into the same hashbin. If the hashcode generation is effective it means there will be very few (hopefully one just one) key/value pairs in each non-empty hashbin in the dictionary.
When you access contents in the dictionary by specifying an object as a key, the pseudo logic for finding the correct value to be returned is:
Get the hashcode value for the object specified as the key in the request (GetHashCode())
If there is a non-empty hashbin for that hashcode, iterate over the key objects of all key/value pairs in that hashbin. For each key/value pair in the hashbin, check if the key object Equals() the object that was passed in as the key to the request. If so, return the Value object in that key/value pair.
This is what makes dictionary lookups very effective compared to looking for an object in a List style collection (when the hashcode distribution is good).
It always seems to just "work" without ever having to do anything.
The only thing I can think of is that each class has a hidden sort of static identifier that Object.GetHashCode uses. (also, does anyone know how Object.GetHashCode is implemented? I couldn't find it in the .NET Reflector)
I have never overridden GetHashCode but I was reading around and people say you only need to when overriding Equals and providing custom equality checking to your application so I guess I'm fine?
I'd still like to know how the magic works, though =P
It always seems to just "work" without ever having to do anything.
You didn't tell us if you're using value types or reference types for your keys.
If you're using value types, the default implementation of Equals and GetHashCode are okay (Equals checks if the fields are equals, and GetHashCode is based on the fields (not necessarily all of them!)). If you're using reference types, the default implementation of Equals and GetHashCode use reference equality, which may or may not be okay; it depends on what you're doing.
The only thing I can think of is that each class has a hidden sort of static identifier that Object.GetHashCode uses.
No. The default is a hash code based on the fields for a value type, and the reference for a reference type.
(also, does anyone know how Object.GetHashCode is implemented? I couldn't find it in the .NET Reflector)
It's an implementation detail that you should never ever need to know, and never ever rely on it. It could change on you at any moment.
I have never overridden GetHashCode but I was reading around and people say you only need to when overriding Equals and providing custom equality checking to your application so I guess I'm fine?
Well, is default equality okay for you? If not, override Equals and GetHashCode or implmenet IEqualityComparer<T> for your T.
I'd still like to know how the magic works, though =P
Every object has Equals and GetHashCode. The default implementations are as follows:
For value types, Equals is value equality.
For reference types, Equals is reference equality.
For value types, GetHashCode is based on the fields (again, not necessarily all of them!).
For reference types, GetHashCode is based on the reference.
If you use a overload of Dictionary constructor that doesn't take a IEqualityComparer<T> for your T, it will use EqualityComparer<T>.Default. This IEqualityComparer<T> just uses Equals and GetHashCode. So, if you haven't overridden them, you get the implementations as defined above. If you override Equals and GetHashCode then this is what EqualityComparer<T>.Default will use.
Otherwise, pass a custom implementation of IEqualityComparer<T> to the constructor for Dictionary.
Are you using your custom classes as keys or values? If you are using them only for values, then their GetHashCode doesn't matter.
If you are using them as keys, then the quality of the hash affects performance. The Dictionary stores a list of elements for each hash code, since the hash codes don't need to be unique. In the worst case scenario, if all of your keys end up having the same hash code, then the lookup time for the dictionary will like a list, O(n), instead of like a hash table, O(1).
The documentation for Object.GetHashCode is quite clear:
The default implementation of the GetHashCode method does not guarantee unique return values for different objects... Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.
Object's implementations of Equals() and GetHashCode() (which you're inheriting) compare by reference.
Object.GetHashCode is implemented in native code; you can see it in the SSCLI (Rotor).
Two different instances of a class will (usually) have different hashcodes, even if their properties are equal.
You only need to override them if you want to compare by value – if you want to different instances with the same properties to compare equal.
It really depends on your definition of Equality.
class Person
{
public string Name {get; set;}
}
void Test()
{
var joe1 = new Person {Name="Joe"};
var joe2 = new Person {Name="Joe"};
Assert.AreNotEqual(joe1, joe2);
}
If you have a different definition for equality, you should override Equals & GetHashCode to get the appropriate behavior.
Hash codes are for optimizing lookup performance in hash tables (dictionaries). While hash codes have a goal of colliding as little as possible between instances of objects they are not guaranteed to be unique. The goal should be equal distribution among the int range given a set of typical types of those objects.
The way hash tables work is each object implements a function to compute a hash code hopefully as distributed as possible amongst the int range. Two different objects can produce the same hash code but an instance of an object given it's data should always product the same hash code. Therefore, they are not unique and should not be used for equality. The hash table allocates an array of size n (much smaller than the int range) and when an object is added to the hash table, it calls GetHashCode and then it's mod'd (%) against the size of the array allocated. For collisions in the table, typically a list of objects is chained. Since computing hash codes should be very fast, a lookup is fast - jump to the array offset and walk the chain. The larger the array (more memory), the less collisions and the faster the lookup.
Objects GetHashCode cannot possibly produce a good hash code because by definition it knows nothing about the concrete object that's inheriting from it. That's why if you have custom objects that need to be placed in dictionaries and you want to optimize the lookups (control creating an even distribution with minimal collisions), you should override GetHashCode.
If you need to compare two items, then override equals. If you need the object to be sortable (which is needed for sorted lists) then override IComparable.
Hope that helps explain the difference.
In C# is there a way of defining the key size when you instantiate a new hash table?
Hashtable myHash = new Hashtable();
I want to use a long value for the key size but I seem to be exceeding the available key size as I am getting negative numbers. I am multiplying together some prime numbers, the largest returned value being 23*23*23*23*23*23*23*23*23 = 1801152661463.
Thanks.
First of all you should use HashSet<T> if you're using .net 3.5 or newer, and Dictionary<T,bool> if you're using .net 2. Generic collections offer better compiletime checks, less casts and less boxing.
The int overflow most likely happens before insertion into the Hashtable in your current code. So you're observed bug is most likely unrelated to Hashtable, but it's a bug in your arithmetic code. You probably need to cast something to long. But unless you post the relevant code, I can't tell you where exactly the overflow happens.
Both the .NET Hashtable and HashSet<T> classes call object.GetHashCode() to retrieve the hash. Since GetHashCode() returns an int, that is the size of the hash key that is used.
If you'd like to provide your own hash function, you can either override GetHashCode() in the type you'll be inserting, or define a custom IEqualityComparer<T> and pass it to the HashSet constructor. However, the IEqualityComparer<T>.GetHashCode() method also returns an integer key, so I'm not sure this will meet your needs.
If your application needs a hash key larger than an int, you may need to create your own HashSet data structure.
Since the System.Object method GetHashCode() returns an int, I would say that your choice of hash code algorithm is restricted to those returning a 32-bit value (whether you want to call it signed or unsigned doesn't make a differece: the hash value is just an arbitrary 32-bit value).
And HashTable doesn't care what your keys or values are: as far is it's concerned, they're both just objects. You might want to make sure that your're properly overriding and implementing GetHashCode() and Equals(). Might want to think about your implementation of IComparable as well.
The MSDN documentation on Object.GetHashCode() describes 3 contradicting rules for how the method should work.
If two objects of the same type represent the same value, the hash function must return the same constant value for either object.
For the best performance, a hash function must generate a random distribution for all input.
The hash function must return exactly the same value regardless of any changes that are made to the object.
Rules 1 & 3 are contradictory to me.
Does Object.GetHashCode() return a unique number based on the value of an object, or the reference to the object. If I override the method I can choose what to use, but I'd like to know what is used internally if anyone knows.
Rules 1 & 3 are contradictory to me.
To a certain extent, they are. The reason is simple: if an object is stored in a hash table and, by changing its value, you change its hash then the hash table has lost the value and you can't find it again by querying the hash table. It is important that while objects are stored in a hash table, they retain their hash value.
To realize this it is often simplest to make hashable objects immutable, thus evading the whole problem. It is however sufficient to make only those fields immutable that determine the hash value.
Consider the following example:
struct Person {
public readonly string FirstName;
public readonly string Name;
public readonly DateTime Birthday;
public int ShoeSize;
}
People rarely change their birthday and most people never change their name (except when marrying). However, their shoe size may grow arbitrarily, or even shrink. It is therefore reasonable to identify people using their birthday and name but not their shoe size. The hash value should reflect this:
public int GetHashCode() {
return FirstName.GetHashCode() ^ Name.GetHashCode() ^ Birthday.GetHashCode();
}
Not sure what MSDN documentation you are referring to. Looking at the current documentation on Object.GetHashCode (http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx) provides the following "rules":
If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.
The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.
For the best performance, a hash function must generate a random distribution for all input.
If you are referring to the second bullet point, the key phrases here are "as long as there is no modification to the object state" and "true only for the current execution of an application".
Also from the documentation,
A hash function is used to quickly generate a number (hash code) that corresponds to the value of an object. Hash functions are usually specific to each Type and must use at least one of the instance fields as input. [Emphasis added is mine.]
As for the actual implementation, it clearly states that derived classes can defer to the Object.GetHashCode implementation if and only if that derived class defines value equality to be reference equality and the type is not a value type. In other words, the default implementation of Object.GetHashCode is going to be based on reference equality since there are no real instance fields to use and, therefore, does not guarantee unique return values for different objects. Otherwise, your implementation should be specific to your type and should use at least one of your instance fields. As an example, the implementation of String.GetHashCode returns identical hash codes for identical string values, so two String objects return the same hash code if they represent the same string value, and uses all the characters in the string to generate that hash value.
Rules 1 & 3 aren't really a contradiction.
For a reference type the hash code is derived from a reference to the object - change an object's property and the reference is the same.
For value types the hash code is derived from the value, change a property of a value type and you get a completely new instance of the value type.
A very good explanation on how to handle GetHashCode (beyond Microsoft rules) is given in Eric Lipperts (co. Designer of C#) Blog with the article "Guidelines and rules for GetHashCode". It is not good practice to add hyperlinks here (since they can get invalid) but this one is worth it, and provided the information above one will probably still find it in case the hyperlink is lost.
By default it does it based on the reference to the object, but that means that it's the exact same object, so both would return the same hash. But a hash should be based on the value, like in the case of the string class. "a" and "b" would have a different hash, but "a" and "a" would return the same hash.
I can't know for sure how Object.GetHashCode is implemented in real .NET Framework, but in Rotor it uses SyncBlock index for the object as hashcode. There are some blog posts about it on the web, however most of them are from 2005.