Does the CompareTo() method use GetHashCode()? - c#

Does the method CompareTo() use GetHashCode() to define to the object a comparable (not the interface) number? If I do
MyObject.CompareTo(MyOtherObject.GetHashCode())
What will happen if I don't want to override the CompareTo() method ?

No, CompareTo does not / should not use GetHashCode to check for equality.
They might (emphasis on might) use it to determine inequality, if the hash code is cached and thus cheaper to look at than comparing all the internal data, but equal hash codes does not necessarily mean equal objects.
If you implement Equals and GetHashCode (you need to implement both or none), then here are the rules you should follow:
If the two objects are equal (Equals returns true), they should produce the same hash code from GetHashCode. You can turn this rule on its head and say that if the two GetHashCode methods returns different values, Equals should return false.
Note that the opposite does not hold. If Equals returns false, it is perfectly valid, though usually very unlikely, that GetHashCode returns the same value. Likewise, if GetHashCode returns the same value, it is perfectly valid, though again usually very unlikely, that Equals returns false. This is because of the Pigeonhole Principle (wikipedia link).
Always use the same fields to check for equality and calculate the hash code
Don't use mutable fields (if you can help it). If you do, make it very clear which fields will break the hashcode and equality checks. Stuffing mutable objects in a hashset or dictionary and the modifying them will break everything.
If it's an object you're creating yourself then here are some rules:
Implement CompareTo and IComparable<T> if you need ordering support. Don't implement CompareTo only to get equality checkes.
Implement Equals, GetHashCode, and IEquatable<T> if you need equality checks.
If it's an object you cannot modify, create:
IComparer<T> to support ordering
IEqualityComparer<T> to support equality checks
Most collections or methods that will do ordering or equality checks allows you to specify an extra object that determines the rules for the ordering or equality checks, assuming that either the implementation built into the object is wrong (possibly in just this single scenario) or missing.
Links to all the types:
System.IComparable<T> and its external sibling System.IComparer<T>
System.IEquatable<T> and its external sibling System.IEqualityComparer<T>

Related

How IEnumerable.Except() works?

I'm trying to exclude entities to be added to database if they already exist there. So I decided newBillInstances.Except(dbContext.BillInstances) would be best approach for that. However it doesn't work at all (no entities are excluded) though for List<string> it works perfectly.
I read this discussion and actual decription of .Except() in MSDN. It states the class to be used in .Except() should implement IEqualityComparer<T> to use default comparer.
Actually the MSDN article doesn't fully describe process of comparison of two instances. I still don't get why both Equals() and GetHashObject() have to be overridden.
I have implemented IEqualityComparer<BillInstance> interface and put break points in boths methods, but while calling .Except(IEnumerable) it's not used. Only when I changed to .Except(IEnumerable, new BillInstanceComparer()) I've cough break in GetHashCode() but no breaks where in Equals().
Then I have implemented IEqualityComparer<BillInstance> right in BillInstance class and expected it would be used while using .Except(IEnumerable) but breaks weren't hit in both methods.
So I've got two questions:
What should be done to use .Except(IEnumerable)?
Why Equals() isn't used at all? Is it used only in case hash codes of two instances are same?
Because the Equals() is used only if two objects have the same GetHashCode(). If there are no objects that have the same GetHashCode() then there is no chance of using the Equals().
Internally the Except() uses a Set<> (you can see it here), that is an internal class that you should consider to be equivalent to HashSet<>. This class uses the hash of the object to "index" them, then uses the Equals() to check if two objects that have the same hash are the same or different-but-with-the-same-hash.
Link to other relevant answer: https://stackoverflow.com/a/371348/613130
Somewhere in the code a set or a map/dictionary is hidden.
These guys typically contains a number of buckets which grows with the number of elements stored in the set. An element is partitioned into buckets based on the hash code and the actual identity comparison within the bucket is done using equals.
So the hash code is used to find the correct bucket (why GetHashCode is needed) whereupon equals is used to compare it to other elements in the buckets.
That's why you need to implement both.
Ok, from the IEnumerable source (thanks to m0sa) I've understood internals of calling Except(IEnumerable):
enumerable1.Except(enumerable2) calls ExceptIterator(enumerable1, enumerable2, null) where null is supposed to be an instance of IEquitableComparer.
ExceptIterator() creates an instance of internal class Set passing null as comparer.
Since comparer is null the property EqualityComparer<TElement>.Default is used.
Default property creates a comparer for TElement unless it's already created by calling CreateComparer(). Specifically 2 points were interesting for me:
If TElement implements IEquatable interface, then as far as I understood some generic comparer for IEquatable is created. I believe it would use then IEquatable.GetHashCode() and IEquatable.Equals().
For general cases (not type of byte, not implementing IEquatable, not Nullable, not enum) ObjectEqualityComparer instance is returned. ObjectEqualityComparer.GetHashCode() and ObjectEqualityComparer.Equals() generally call corresponding methods of the TElement.
So this gave me understanding for my case (each instance of BillInstance is generally immutable) it should be sufficient to override Object.GetHashCode() and Object.Equals().

Implement GetHashCode on a class that has wildcard Equatability

Suppose I want to be able to compare 2 lists of ints and treat one particular value as a wild card.
e.g.
If -1 is a wild card, then
{1,2,3,4} == {1,2,-1,4} //returns true
And I'm writing a class to wrap all this logic, so it implements IEquatable and has the relevant logic in public override bool Equals()
But I have always thought that you more-or-less had to implement GetHashCode if you were overriding .Equals(). Granted it's not enforced by the compiler, but I've been under the impression that if you don't then you're doing it wrong.
Except I don't see how I can implement .GetHashCode() without either breaking its contract (objects that are Equal have different hashes), or just having the implementation be return 1.
Thoughts?
This implementation of Equals is already invalid, as it is not transitive. You should probably leave Equals with the default implementation, and write a new method like WildcardEquals (as suggested in the other answers here).
In general, whenever you have changed Equals, you must implement GetHashCode if you want to be able to store the objects in a hashtable (e.g. a Dictionary<TKey, TValue>) and have it work correctly. If you know for certain that the objects will never end up in a hashtable, then it is in theory optional (but it would be safer and clearer in that case to override it to throw a "NotSupportedException" or always return 0).
The general contract is to always implement GetHashCode if you override Equals, as you can't always be sure in advance that later users won't put your objects in hashtables.
In this case, I would create a new or extension method, WildcardEquals(other), instead of using the operators.
I wouldn't recommend hiding this kind of complexity.
From a logical point of view, we break the concept of equality. It is not transitive any longer. So in case of wildcards, A==B and B==C does not mean that A==C.
From a technical pount of view, returning the same value from GetHashCode() is not somenting unforgivable.
The only possible idea I see is to exploit at least the length, e.g.:
public override int GetHashCode()
{
return this.Length.GetHashCode()
}
It's recommended, but not mandatory at all. If you don't need that custom implementation of GetHashCode, just don't do it.
GetHashCode is generally only important if you're going to be storing elements of your class in some kind of collection, such as a set. If that's the case here then I don't think you're going to be able to achieve consistent semantics since as #AlexD points out equality is no longer transitive.
For example, (using string globs rather than integer lists) if you add the strings "A", "B", and "*" to a set, your set will end up with either one or two elements depending on the order you add them in.
If that's not what you want then I'd recommend putting the wildcard matching into a new method (e.g. EquivalentTo()) rather than overloading equality.
Having GetHashCode() always return a constant is the only 'legal' way of fulfilling the equals/hashcode constraint.
It'll potentially be inefficient if you put it in a hashmap, or similar, but that might be fine (non-equal hashcodes imply non-equality, but equal hashcodes imply nothing).
I think this is the only possible valid option there. Hashcodes essentially exist as keys to look things up by quickly, and since your wildcard must match every item, its key for lookup must equal every item's key, so they must all be the same.
As others have noted though, this isn't what equals is normally for, and breaks assumptions that many other things may use for equals (such as transitivity - EDIT: turns out this is actually contractual requirement, so no-go), so it's definitely worth at least considering comparing these manually, or with an explicitly separate equality comparer.
Since you've changed what "equals" means (adding in wildcards changes things dramatically) then you're already outside the scope of the normal use of Equals and GetHashCode. It's just a recommendation and in your case it seems like it doesn't fit. So don't worry about it.
That said, make sure you're not using your class in places that might use GetHashCode. That can get you in a load of trouble and be hard to debug if you're not watching for it.
It is generally expected that Equals(Object) and IEquatable<T>.Equals(T) should implement equivalence relations, such that if X is observed to be equal to Y, and Y is observed to be equal to Z, and none of the items have been modified, X may be assumed to be equal to Z; additionally, if X is equal to Y and Y does not equal Z, then X may be assumed not to equal Z either. Wild-card and fuzzy comparison methods are do not implement equivalence relations, and thus Equals should generally not be implemented with such semantics.
Many collections will kinda-sorta work with objects that implement Equals in a way that doesn't implement an equivalence relation, provided that any two objects that might compare equal to each other always return the same hash code. Doing this will often require that many things that would compare unequal to return the same hash code, though depending upon what types of wildcard are supported it may be possible to separate items to some degree.
For example, if the only wildcard which a particular string supports represents "arbitrary string of one or more digits", one could hash the string by converting all sequences of consecutive digits and/or string-of-digit wildcard characters into a single "string of digits" wildcard character. If # represents any digit, then the strings abc123, abc#, abc456, and abc#93#22#7 would all be hashed to the same value as abc#, but abc#b, abc123b, etc. could hash to a different value. Depending upon the distribution of strings, such distinctions may or may not yield better performance than returning a constant value.
Note that even if one implements GetHashCode in such a fashion that equal objects yield equal hashes, some collections may still get behave oddly if the equality method doesn't implement an equivalence relation. For example, if a collection foo contains items with keys "abc1" and "abc2", attempts to access foo["abc#"] might arbitrarily return the first item or the second. Attempts to delete the key "abc#" may arbitrarily remove one or both items, or may fail after deleting one item (its expected post-condition wouldn't be met, since abc# would be in the collection even after deletion).
Rather than trying to jinx Equals to compare hash-code equality, an alternative approach is to have a dictionary which holds for each possible wildcard string that would match at least one main-collection string a list of the strings it might possibly match. Thus, if there are many strings which would match abc#, they could all have different hash codes; if a user enters "abc#" as a search request, the system would look up "abc#" in the wild-card dictionary and receive a list of all strings matching that pattern, which could then be looked up individually in the main dictionary.

What can you use as keys in a C# dictionary?

I come from a python world where only hashable objects may be used as keys to a dictionary. Is there a similar restriction in C#? Can you use custom types as dictionary keys?
The requirement for a dictionary key is that it is comparable and hashable. That's turtles all the way down in .NET, every type (other than pointer types) derives from System.Object and it is always comparable thanks to its Equals() method. And hashable thanks to its GetHashCode() method. So any .NET type automatically can be used as a key.
If you want to use your own type as the key then you only need to do something special if you want to re-define object identity. In other words, if you need the ability for two distinct objects to be equal. You'd then override the Equals() method, typically comparing fields of the object. And then you must also override GetHashCode(), equal objects must generate the same hash code.
If the type cannot be changed or you want to customize the behavior especially for the Dictionary then you can pass a custom IEqualityComparer<> to the constructor. Keep in mind that the quality of the hash code you generate with your own GetHashCode() determines the dictionary efficiency.
Yes, the important thing with keys is that they implement (or have a good default implementation of) GetHashCode and Equals. The Dictionary<T, K> implementation can take advantage of generic IEqualityComparer<T>.
All custom types will come with a default implementation of GetHashCode and Equals because these are members of object, however, that default might not always be relevant to your type.
The dictionary first attempts to get the hash code to determine the bucket the value will land in. If there is a hash collision, it falls back onto equality (I think).
Do note that the type of key you use (class, struct, primitive type, etc) can yield different performance characteristics. In our code base we found that the default implementation of GetHashCode in struct wasn't as fast as overriding it ourselves. We also found that nested dictionaries perform better in terms of access time than a single dictionary with a compound key.
Yes you can, just implement interface IEqualityComparer, override GetHashCode and Equals.

Why don't I ever have to override GetHashCode when using Dictionaries on personal classes?

It always seems to just "work" without ever having to do anything.
The only thing I can think of is that each class has a hidden sort of static identifier that Object.GetHashCode uses. (also, does anyone know how Object.GetHashCode is implemented? I couldn't find it in the .NET Reflector)
I have never overridden GetHashCode but I was reading around and people say you only need to when overriding Equals and providing custom equality checking to your application so I guess I'm fine?
I'd still like to know how the magic works, though =P
It always seems to just "work" without ever having to do anything.
You didn't tell us if you're using value types or reference types for your keys.
If you're using value types, the default implementation of Equals and GetHashCode are okay (Equals checks if the fields are equals, and GetHashCode is based on the fields (not necessarily all of them!)). If you're using reference types, the default implementation of Equals and GetHashCode use reference equality, which may or may not be okay; it depends on what you're doing.
The only thing I can think of is that each class has a hidden sort of static identifier that Object.GetHashCode uses.
No. The default is a hash code based on the fields for a value type, and the reference for a reference type.
(also, does anyone know how Object.GetHashCode is implemented? I couldn't find it in the .NET Reflector)
It's an implementation detail that you should never ever need to know, and never ever rely on it. It could change on you at any moment.
I have never overridden GetHashCode but I was reading around and people say you only need to when overriding Equals and providing custom equality checking to your application so I guess I'm fine?
Well, is default equality okay for you? If not, override Equals and GetHashCode or implmenet IEqualityComparer<T> for your T.
I'd still like to know how the magic works, though =P
Every object has Equals and GetHashCode. The default implementations are as follows:
For value types, Equals is value equality.
For reference types, Equals is reference equality.
For value types, GetHashCode is based on the fields (again, not necessarily all of them!).
For reference types, GetHashCode is based on the reference.
If you use a overload of Dictionary constructor that doesn't take a IEqualityComparer<T> for your T, it will use EqualityComparer<T>.Default. This IEqualityComparer<T> just uses Equals and GetHashCode. So, if you haven't overridden them, you get the implementations as defined above. If you override Equals and GetHashCode then this is what EqualityComparer<T>.Default will use.
Otherwise, pass a custom implementation of IEqualityComparer<T> to the constructor for Dictionary.
Are you using your custom classes as keys or values? If you are using them only for values, then their GetHashCode doesn't matter.
If you are using them as keys, then the quality of the hash affects performance. The Dictionary stores a list of elements for each hash code, since the hash codes don't need to be unique. In the worst case scenario, if all of your keys end up having the same hash code, then the lookup time for the dictionary will like a list, O(n), instead of like a hash table, O(1).
The documentation for Object.GetHashCode is quite clear:
The default implementation of the GetHashCode method does not guarantee unique return values for different objects... Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.
Object's implementations of Equals() and GetHashCode() (which you're inheriting) compare by reference.
Object.GetHashCode is implemented in native code; you can see it in the SSCLI (Rotor).
Two different instances of a class will (usually) have different hashcodes, even if their properties are equal.
You only need to override them if you want to compare by value – if you want to different instances with the same properties to compare equal.
It really depends on your definition of Equality.
class Person
{
public string Name {get; set;}
}
void Test()
{
var joe1 = new Person {Name="Joe"};
var joe2 = new Person {Name="Joe"};
Assert.AreNotEqual(joe1, joe2);
}
If you have a different definition for equality, you should override Equals & GetHashCode to get the appropriate behavior.
Hash codes are for optimizing lookup performance in hash tables (dictionaries). While hash codes have a goal of colliding as little as possible between instances of objects they are not guaranteed to be unique. The goal should be equal distribution among the int range given a set of typical types of those objects.
The way hash tables work is each object implements a function to compute a hash code hopefully as distributed as possible amongst the int range. Two different objects can produce the same hash code but an instance of an object given it's data should always product the same hash code. Therefore, they are not unique and should not be used for equality. The hash table allocates an array of size n (much smaller than the int range) and when an object is added to the hash table, it calls GetHashCode and then it's mod'd (%) against the size of the array allocated. For collisions in the table, typically a list of objects is chained. Since computing hash codes should be very fast, a lookup is fast - jump to the array offset and walk the chain. The larger the array (more memory), the less collisions and the faster the lookup.
Objects GetHashCode cannot possibly produce a good hash code because by definition it knows nothing about the concrete object that's inheriting from it. That's why if you have custom objects that need to be placed in dictionaries and you want to optimize the lookups (control creating an even distribution with minimal collisions), you should override GetHashCode.
If you need to compare two items, then override equals. If you need the object to be sortable (which is needed for sorted lists) then override IComparable.
Hope that helps explain the difference.

When does Dictionary<TKey, TValue> call TKey.Equals()?

Just overriding Equals in TKey does not help.
public override bool Equals(object obj)
{ /* ... */ }
... Equals() will never be called ...
When you do a dictionary lookup, this is the order things happen:
The dictionary uses TKey.GetHashCode to compute a hash for the bucket.
It then checks all of the hashes using that bucket, and calls Equals on the individual objects, to determine a match.
If the buckets never match (because GetHashCode wasn't overwritten), then you'll never call Equals. This is part of why you should always implement both if you implement either - and you should override both functions (more meaningfully than just calling base.GetHashCode()) if you want to use your object in a hashed collection.
If you're implementing a class, you should implement a GetHashCode routine that returns the same hash code for items that are Equal. Ideally, you want to return a different hash code for items that are not equal whenever possible, as this will make your dictionary lookups much faster.
You should also implement Equals in a way that checks for equal instances correctly.
The default implementation for classes (reference types) just compare the reference itself. Two instances, with exactly the same values, with return false on Equals (since they have different references), by default. Multiple instances will always also return a different hash code, by default.
Dictionary is a Hash Table. It only calls Equals(object obj) if two objects produces the same hash values. Provide a good hash function for your objects to avoid calling Equals().
Keep in mind that the Hashing part has the complexity O(1) and the search part has the complexity O(n) in worst case and O(n/2) in average case. You should avoid objects that generate the same hash value, otherwise this objects are searched linear
Assuming you've defined a custom reference type as a key, you must either:
always pass the same object instance into a dictionary as a key, or
implement a GetHashCode() that always returns the same value even for different instances and an Equals() method can can compare different instances.
The base.GetHashCode() method build the hash based on the instance identity of the object and so cannot be used when you pass in different instances of a type as a key.
The reason that returning 0 for your hash always works, is that the Dictionary class first uses the hash code to look for the bucket your key belongs to, and only then uses the Equals() method to distinguish instances. You should not return 0 as a hash code from a custom type if you intend to use it as a dictionary key because that will effectively degenerate the dictionary into a list with O(n) lookup performance instead of O(1).
You may also want to consider implement IComparable and IEquatable.
Look at the following question for more details:
Using an object as a generic Dictionary key

Categories

Resources