How IEnumerable.Except() works?

How IEnumerable.Except() works? - c#

I'm trying to exclude entities to be added to database if they already exist there. So I decided newBillInstances.Except(dbContext.BillInstances) would be best approach for that. However it doesn't work at all (no entities are excluded) though for List<string> it works perfectly.
I read this discussion and actual decription of .Except() in MSDN. It states the class to be used in .Except() should implement IEqualityComparer<T> to use default comparer.
Actually the MSDN article doesn't fully describe process of comparison of two instances. I still don't get why both Equals() and GetHashObject() have to be overridden.
I have implemented IEqualityComparer<BillInstance> interface and put break points in boths methods, but while calling .Except(IEnumerable) it's not used. Only when I changed to .Except(IEnumerable, new BillInstanceComparer()) I've cough break in GetHashCode() but no breaks where in Equals().
Then I have implemented IEqualityComparer<BillInstance> right in BillInstance class and expected it would be used while using .Except(IEnumerable) but breaks weren't hit in both methods.
So I've got two questions:
What should be done to use .Except(IEnumerable)?
Why Equals() isn't used at all? Is it used only in case hash codes of two instances are same?

Because the Equals() is used only if two objects have the same GetHashCode(). If there are no objects that have the same GetHashCode() then there is no chance of using the Equals().
Internally the Except() uses a Set<> (you can see it here), that is an internal class that you should consider to be equivalent to HashSet<>. This class uses the hash of the object to "index" them, then uses the Equals() to check if two objects that have the same hash are the same or different-but-with-the-same-hash.
Link to other relevant answer: https://stackoverflow.com/a/371348/613130

Somewhere in the code a set or a map/dictionary is hidden.
These guys typically contains a number of buckets which grows with the number of elements stored in the set. An element is partitioned into buckets based on the hash code and the actual identity comparison within the bucket is done using equals.
So the hash code is used to find the correct bucket (why GetHashCode is needed) whereupon equals is used to compare it to other elements in the buckets.
That's why you need to implement both.

Ok, from the IEnumerable source (thanks to m0sa) I've understood internals of calling Except(IEnumerable):
enumerable1.Except(enumerable2) calls ExceptIterator(enumerable1, enumerable2, null) where null is supposed to be an instance of IEquitableComparer.
ExceptIterator() creates an instance of internal class Set passing null as comparer.
Since comparer is null the property EqualityComparer<TElement>.Default is used.
Default property creates a comparer for TElement unless it's already created by calling CreateComparer(). Specifically 2 points were interesting for me:
If TElement implements IEquatable interface, then as far as I understood some generic comparer for IEquatable is created. I believe it would use then IEquatable.GetHashCode() and IEquatable.Equals().
For general cases (not type of byte, not implementing IEquatable, not Nullable, not enum) ObjectEqualityComparer instance is returned. ObjectEqualityComparer.GetHashCode() and ObjectEqualityComparer.Equals() generally call corresponding methods of the TElement.
So this gave me understanding for my case (each instance of BillInstance is generally immutable) it should be sufficient to override Object.GetHashCode() and Object.Equals().

Related

Interface for Equality-By-Value that doesn't change out of box behavior?

Okay, I'm reading up on all the advice on how to override object.Equals and == for value and reference types. In short, always override equality for structs and don't override equality for Reference types unless you have some unusual circumstance like class that wraps a single string. (But don't make a struct unless it is small, even if semantically or DDD terms it is a value type)
But most of my types that hold data are DTOS-- classes with lots of properties. They have more properties that is suitable for a struct (more than 16 bytes) and will be consumed by developers who will expect == and object.Equals to behave as usual. All three scenarios come up-- needing to check for equality by reference, value (especially in unit testing) and by key (especially when working with data that came from or is going to a relational database.)
Is there a .NET framework way to implement equality-by-value or equality-by-key without stomping the default behavior of object.Equals? Or must I create my own ad hoc interface, like ISameByValue<T>, ISameByKey<T>?

Create IEqualityComparer types. This allows you to create any number of different types capable of comparing your object by any number of different definitions of equality, all without changing any behavior on the type itself.

I had a problem where I had implemented property based comparisons in the overridden Equals method (to implement HasChanges type functionality), but it caused all sorts of problems when I updated property values of items in a collection.
My solution (found by helpful users of this website) was to move the property based comparisons into a new, custom method and to return the default object.Equals value instead. However, this meant that there was no longer any based comparisons when calling the Equals method.
The solution was then to provide custom implementations of the IEqualityComparer<T> Interface and to pass the instances through to any methods that require object comparisons, like the IEnumerable Intersect or Except methods for example:
if (digitalServiceProvider.PriceTiers[index].Territories.Count > 0 &&
digitalServiceProvider.PriceTiers[index].Territories.Intersect(
release.TerritorialRights, new CountryEqualityComparer()).Count() == 0) { ... }

What can you use as keys in a C# dictionary?

I come from a python world where only hashable objects may be used as keys to a dictionary. Is there a similar restriction in C#? Can you use custom types as dictionary keys?

The requirement for a dictionary key is that it is comparable and hashable. That's turtles all the way down in .NET, every type (other than pointer types) derives from System.Object and it is always comparable thanks to its Equals() method. And hashable thanks to its GetHashCode() method. So any .NET type automatically can be used as a key.
If you want to use your own type as the key then you only need to do something special if you want to re-define object identity. In other words, if you need the ability for two distinct objects to be equal. You'd then override the Equals() method, typically comparing fields of the object. And then you must also override GetHashCode(), equal objects must generate the same hash code.
If the type cannot be changed or you want to customize the behavior especially for the Dictionary then you can pass a custom IEqualityComparer<> to the constructor. Keep in mind that the quality of the hash code you generate with your own GetHashCode() determines the dictionary efficiency.

Yes, the important thing with keys is that they implement (or have a good default implementation of) GetHashCode and Equals. The Dictionary<T, K> implementation can take advantage of generic IEqualityComparer<T>.
All custom types will come with a default implementation of GetHashCode and Equals because these are members of object, however, that default might not always be relevant to your type.
The dictionary first attempts to get the hash code to determine the bucket the value will land in. If there is a hash collision, it falls back onto equality (I think).
Do note that the type of key you use (class, struct, primitive type, etc) can yield different performance characteristics. In our code base we found that the default implementation of GetHashCode in struct wasn't as fast as overriding it ourselves. We also found that nested dictionaries perform better in terms of access time than a single dictionary with a compound key.

Yes you can, just implement interface IEqualityComparer, override GetHashCode and Equals.

What would I do with a .NET object hashcode?

An object in C# has four methods - {Equals, GetType, ToString, GetHashCode}.
What sort of useful thing could someone do with the hash-code?

What sort of useful thing could someone do with the hashcode?
Quickly find potentially equal objects.
In particular, this method is usually used by types such as Dictionary<TKey, TValue> (for the keys) and HashSet<T>.
You should not assume that objects with equal hash codes are equal, however. See Eric Lippert's blog post for more information, and the Wikipedia hash table page for a more general discussion on the uses of hash codes.

A hash code is a numeric value that is used to identify an object
during equality testing. It can also serve as an index for an object
in a collection.
The GetHashCode method is suitable for use in hashing algorithms and
data structures such as a hash table.
The default implementation of the GetHashCode method does not
guarantee unique return values for different objects. Furthermore, the
.NET Framework does not guarantee the default implementation of the
GetHashCode method, and the value it returns will be the same between
different versions of the .NET Framework. Consequently, the default
implementation of this method must not be used as a unique object
identifier for hashing purposes.
The GetHashCode method can be overridden by a derived type. Value
types must override this method to provide a hash function that is
appropriate for that type and to provide a useful distribution in a
hash table. For uniqueness, the hash code must be based on the value
of an instance field or property instead of a static field or
property.
Objects used as a key in a Hashtable object must also override the
GetHashCode method because those objects must generate their own hash
code. If an object used as a key does not provide a useful
implementation of GetHashCode, you can specify a hash code provider
when the Hashtable object is constructed. Prior to the .NET Framework
version 2.0, the hash code provider was based on the
System.Collections.IHashCodeProvider interface. Starting with version
2.0, the hash code provider is based on the
System.Collections.IEqualityComparer interface.
- Sourced from MSDN:

The basic idea is that if two objects have a different hash code, they are different. If they have the same hash code, they could be different or equal.
To check if an object is present in a collection, you can first check against the hash codes, which is quick given you are comparing integers, and then do a more accurate test only on the objects with the same hash code.
This is used in the collection classes, for example.

GetHashCode
GetHashCode exists only for the benefit of these two types
->HashTable
->GenericDictionary
GetHashCode gives you varied keys for good hashtable performance.
Equals
Equals provides a null-safe equality comparison when types are unknown at compile time.
its signature is
public static bool Equals(object A,object B).
So you cant use operators like == or != if the type is unknown at compile time.You have to use Equals
Its useful when writing generic types
For Example:
class Test<T>
{
T value;
public void SetV(T newValue)
{
if(object.Equals(newValue,value))
//We have to use Object.Equals cant use == or !=since they cannot bind to unknown type at compile time
}
}
ToString
It returns default texual representation of a type instance.This method is overridden by all built in types
GetType
GetType is evaluated at runtime.It helps us to know the type's name,assemby,base type..and others

Why don't I ever have to override GetHashCode when using Dictionaries on personal classes?

It always seems to just "work" without ever having to do anything.
The only thing I can think of is that each class has a hidden sort of static identifier that Object.GetHashCode uses. (also, does anyone know how Object.GetHashCode is implemented? I couldn't find it in the .NET Reflector)
I have never overridden GetHashCode but I was reading around and people say you only need to when overriding Equals and providing custom equality checking to your application so I guess I'm fine?
I'd still like to know how the magic works, though =P

It always seems to just "work" without ever having to do anything.
You didn't tell us if you're using value types or reference types for your keys.
If you're using value types, the default implementation of Equals and GetHashCode are okay (Equals checks if the fields are equals, and GetHashCode is based on the fields (not necessarily all of them!)). If you're using reference types, the default implementation of Equals and GetHashCode use reference equality, which may or may not be okay; it depends on what you're doing.
The only thing I can think of is that each class has a hidden sort of static identifier that Object.GetHashCode uses.
No. The default is a hash code based on the fields for a value type, and the reference for a reference type.
(also, does anyone know how Object.GetHashCode is implemented? I couldn't find it in the .NET Reflector)
It's an implementation detail that you should never ever need to know, and never ever rely on it. It could change on you at any moment.
I have never overridden GetHashCode but I was reading around and people say you only need to when overriding Equals and providing custom equality checking to your application so I guess I'm fine?
Well, is default equality okay for you? If not, override Equals and GetHashCode or implmenet IEqualityComparer<T> for your T.
I'd still like to know how the magic works, though =P
Every object has Equals and GetHashCode. The default implementations are as follows:
For value types, Equals is value equality.
For reference types, Equals is reference equality.
For value types, GetHashCode is based on the fields (again, not necessarily all of them!).
For reference types, GetHashCode is based on the reference.
If you use a overload of Dictionary constructor that doesn't take a IEqualityComparer<T> for your T, it will use EqualityComparer<T>.Default. This IEqualityComparer<T> just uses Equals and GetHashCode. So, if you haven't overridden them, you get the implementations as defined above. If you override Equals and GetHashCode then this is what EqualityComparer<T>.Default will use.
Otherwise, pass a custom implementation of IEqualityComparer<T> to the constructor for Dictionary.

Are you using your custom classes as keys or values? If you are using them only for values, then their GetHashCode doesn't matter.
If you are using them as keys, then the quality of the hash affects performance. The Dictionary stores a list of elements for each hash code, since the hash codes don't need to be unique. In the worst case scenario, if all of your keys end up having the same hash code, then the lookup time for the dictionary will like a list, O(n), instead of like a hash table, O(1).
The documentation for Object.GetHashCode is quite clear:
The default implementation of the GetHashCode method does not guarantee unique return values for different objects... Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.

Object's implementations of Equals() and GetHashCode() (which you're inheriting) compare by reference.
Object.GetHashCode is implemented in native code; you can see it in the SSCLI (Rotor).
Two different instances of a class will (usually) have different hashcodes, even if their properties are equal.
You only need to override them if you want to compare by value – if you want to different instances with the same properties to compare equal.

It really depends on your definition of Equality.
class Person
{
public string Name {get; set;}
}
void Test()
{
var joe1 = new Person {Name="Joe"};
var joe2 = new Person {Name="Joe"};
Assert.AreNotEqual(joe1, joe2);
}
If you have a different definition for equality, you should override Equals & GetHashCode to get the appropriate behavior.

Hash codes are for optimizing lookup performance in hash tables (dictionaries). While hash codes have a goal of colliding as little as possible between instances of objects they are not guaranteed to be unique. The goal should be equal distribution among the int range given a set of typical types of those objects.
The way hash tables work is each object implements a function to compute a hash code hopefully as distributed as possible amongst the int range. Two different objects can produce the same hash code but an instance of an object given it's data should always product the same hash code. Therefore, they are not unique and should not be used for equality. The hash table allocates an array of size n (much smaller than the int range) and when an object is added to the hash table, it calls GetHashCode and then it's mod'd (%) against the size of the array allocated. For collisions in the table, typically a list of objects is chained. Since computing hash codes should be very fast, a lookup is fast - jump to the array offset and walk the chain. The larger the array (more memory), the less collisions and the faster the lookup.
Objects GetHashCode cannot possibly produce a good hash code because by definition it knows nothing about the concrete object that's inheriting from it. That's why if you have custom objects that need to be placed in dictionaries and you want to optimize the lookups (control creating an even distribution with minimal collisions), you should override GetHashCode.
If you need to compare two items, then override equals. If you need the object to be sortable (which is needed for sorted lists) then override IComparable.
Hope that helps explain the difference.

What can go wrong if one fails to override GetHashCode() when overriding Equals()? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why is it important to override GetHashCode when Equals method is overridden?
In C#, what specifically can go wrong if one fails to override GetHashCode() when overriding Equals()?

The most visible way is for mapping structures.
Any class which does this will have unpredictable behavior when used as the Key for a Dictionary or HashTable. The reason being is that the implementation uses both GetHashCode and Equals to properly find a value in the table. The short version of the algorithm is the following
Take the modulus of the HashCode by the number of buckets and that's the bucket index
Call .Equals() for the specified Key and every Key in the particular bucket.
If there is a match that is the value, no match = no value.
Failing to keep GetHashCode and Equals in sync will completely break this algorithm (and numerous others).

Think of a hash / dictionary structure as a collection of numbered buckets. If you always put things in the bucket corresponding to their GetHashCode(), then you only have to search one bucket (using Equals()) to see if something is there. This works, provided you're looking in the right bucket.
So the rule is: if Equals() says two objects are Equal(), they must have the same GetHashCode().

If you do not override GetHashCode, anything that compares your objects may get it wrong.
As it is documented that GetHashCode must return the same value if two instances are equal, then it is the prerogative of any code which wishes to test them for equality to use GetHashCode as a first pass to group objects which may be equal (as it knows that objects with different hash codes cannot be equal). If your GetHashCode method returns different values for equal objects then they may get into different groups in the first pass and never be compared using their Equals method.
This could affect any collection-type data structure, but would be particularly problematic in hash-code based ones such as dictionaries and hash-sets.
In short: Always override GetHashCode when you override Equals, and ensure their implementations are consistent.

Any algorithm that uses the Key will fail to work, assuming it relies on the intended behaviour of hash keys.
Two objects that are Equal should have the same hash key value, which is not remotely guaranteed by the default implementation.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.