I've got an entity that I build, I take an instantiated entity and a modified entity. This allows me to hold the initial data, to compare against the modified data. The question, is what would be the ideal approach? Should I implement IEquatable as an override on Object.Equals or implement ICompare? My original implementation was:
var properties = typeof(TEntity).GetProperties();
foreach(var property in properties)
{
var initialEntity = original.GetType().GetProperty(property.Name).GetValue(original, null);
var modifiedEntity = userChange.GetType().GetProperty(property.Name).GetValue(userChange, null);
if(initialEntity.Equals(modifiedEntity) == false && !ignore.Contains(property.Name))
{
// Do Something
}
}
My understanding was that it would return a boolean, also in this instance it would compare on Value Equality, I'm assuming though it is comparing based on reference equality.
Because it never distinguishes, it remains equal under all circumstances.
The simplest answer:
If you need to test equality, implement IEquatable<T> and override Equals() and GetHashCode()
If you need to sort objects, implement IComparable<T>
The default implementation of Object.Equals() determines if the memory location used by one object is the same as another object. This is essentially what Object.ReferenceEquals(obj1, obj2) does, but dot net needs you to tell it how to determine if two objects you create are equivalent.
Additionally, the default implementation of Object.GetHashCode() is the 32 bit address (or portion of an address) of where the object is located in memory. Unless you override it to generate a hash code that is a function of everything you compared in your Equals() method, you will get unexpected results when you attempt to store it in a hash set or use it as a dictionary key.
You may need to implement both, but it looks like in your case IEquatable<T> is the most pressing need.
Related
We Have to implement the method of object class => public virtual bool Equals(obj);
without using Equals or Reference Equals Method.
It has to work same as virtual Equals Method.
I can use objA == objB.
I won't give you a code answer, since this is an assignment after all.
Things you want to check:
Null - are both objects null? Is one object null and the other isn't?
object.ReferenceEquals(objA, null) is the old preferable way (since it doesn't potentially use an overriden Equals implementation as == would. With C# 7+, you can also use if (objA is null).
You can now compare if (objA == objB). Note that it's here where objA.Equals(objB) would be used, but since that's not allowed, I guess we can use ==.
There is also objA.GetHashCode() which indicates the potential for equality. I say "potential" because it's possible for two very different objects to have the same hash code. If two objects are equal (and correctly implemented) then they should have the same hash code. In short: you can rely on GetHashCode() to indicate the possibility of equality, but you need to do another check (2) to be sure.
See here for more on the relationship between GetHashCode() and Equals().
And see here for more information about == vs Equals()
I have a IEnumerable of objects that have redefined GetHashCode method. I assumed that if I add those objects to HashSet<T>, it would hold only the unique objects. But it doesn't:
var set = new HashSet<SomeObject>();
Count = 0
set.Add(first);
true
set.Add(second);
true
set.Count
2
first.GetHashCode()
-927637658
second.GetHashCode()
-927637658
So how could I reduce my IEnumerable structure of objects to those that are unique based on their GetHashCode() value.
Although I don't know if this helps in any way:
public class SomeObject
{
...
public string GetAggregateKey()
{
var json = ToJson();
json.Property("id").Remove();
return json.ToString(); // without the `id`, the json string of two separate objects with same content could be the same
}
override public int GetHashCode()
{
// two equal strings have same hash code
return GetAggregateKey().GetHashCode();
}
...
}
It is not enough to only have a GetHashCode method.
The GetHashCode method is used to quickly figure out if there are potential candidates already in the hashset (or dictionary):
If no existing object in the hashset has the same hash code, the new one is not a duplicate
If any existing object(s) in the hashset has the same hash code, the new one is a potential duplicate
To figure out if it is just a potential duplicate or an actual duplicate, Equals is used.
If you haven't implemented that then the object.Equals method will be used, which is simply comparing references. Two distinct objects will thus never be equal, even though they may both have the same property values and the same hash code.
The solution: Implement Equals with the same rules as the GetHashCode, or provide a IEqualityComparer<T> implementation to your hashset.
Have a look at the Reference Source for HashSet:
This line (960, and those around it) is what you're looking for:
if (m_slots[i].hashCode == hashCode && m_comparer.Equals(m_slots[i].value, value))
The hash of the object is only used to decide which bucket the object goes into. If Equals returns false for the two objects, the new one will still be inserted.
Failing to override GetHashCode and Equals when overloading the equality operator causes the compiler to produce warnings. Why would it be a good idea to change the implementation of either? After reading Eric Lippert's blog post on GetHashCode it's seems like there probably aren't many useful alternatives to GetHashCode's base implementation, why does the compiler I encourage you to change it?
Let's suppose you are implementing a class.
If you are overloading == then you are producing a type that has value equality as opposed to reference equality.
Given that, now the question is "how desirable is it to have a class that implements reference equality in .Equals() and value equality in ==?" and the answer is "not very desirable". That seems like a potential source of confusion. (And in fact, the company that I now work for, Coverity, produces a defect discovery tool that checks to see if you are confusing value equality with reference equality for precisely this reason. Coincidentally I was just reading the spec for it when I saw your question!)
Moreover, if you are going to have a class that implements both value and reference equality, the usual way to do it is to override Equals and leave == alone, not the other way around.
Therefore, given that you have overloaded ==, it is strongly suggested that you also override Equals.
If you are overriding Equals to produce value equality then you are required to override GetHashCode to match, as you know if you've read my article that you linked to.
If you don't override Equals() when you override == you will have some amazingly bad code.
How would you feel about this happening?
if (x == y)
{
if (!x.Equals(y))
throw new InvalidOperationException("Wut?");
}
Here's an example. Given this class:
class Test
{
public int Value;
public string Name;
public static bool operator==(Test lhs, Test rhs)
{
if (ReferenceEquals(lhs, rhs))
return true;
if (ReferenceEquals(lhs, null) || ReferenceEquals(rhs, null))
return false;
return lhs.Value == rhs.Value;
}
public static bool operator!=(Test lhs, Test rhs)
{
return !(lhs == rhs);
}
}
This code will behave oddly:
Test test1 = new Test { Value = 1, Name = "1" };
Test test2 = new Test { Value = 1, Name = "2" };
if (test1 == test2)
Console.WriteLine("test1 == test2"); // This gets printed.
else
Console.WriteLine("test1 != test2");
if (test1.Equals(test2))
Console.WriteLine("test1.Equals(test2)");
else
Console.WriteLine("NOT test1.Equals(test2)"); // This gets printed!
You do NOT want this!
My guess is that the compiler takes its clues from your actions, and decides that since you find it important to provide an alternative implementation of the equality operator, then you probably want the object equality to remain consistent with your new implementation of ==. After all, you do not want the two equality comparisons to mean drastically different things, otherwise your program would be hard to understand even on a very basic level. Therefore, the compiler thinks that you should redefine Equals as well.
Once you provide an alternative implementation Equals, however, you need to modify GetHashCode to stay consistent with the equality implementation. Hence the compiler warns you that your implementation might be incomplete, and suggests overriding both Equals and GetHashCode.
If you don't overload the Equals method too, then using it might give different results from the ones you'd get with the operator. Like, if you overload = for integers...
int i = 1;
(1 == 1) == (i.Equals(1))
Could evaluate to false.
For the same reason, you should reimplement the GetHashCode method so you don't mess up with hashtables and such other structures that rely on hash comparisons.
Notice I'm saying "might" and "could", not "will". The warnings are there just as a reminder that unexpected things might happen if you don't follow its suggestions. Otherwise you'd get errors instead of warnings.
The documentation is pretty clear about this:
The GetHashCode method can be overridden by a derived type. Value
types must override this method to provide a hash function that is
appropriate for that type and to provide a useful distribution in a
hash table. For uniqueness, the hash code must be based on the value
of an instance field or property instead of a static field or
property.
Objects used as a key in a Hashtable object must also override the
GetHashCode method because those objects must generate their own hash
code. If an object used as a key does not provide a useful
implementation of GetHashCode, you can specify a hash code provider
when the Hashtable object is constructed. Prior to the .NET Framework
version 2.0, the hash code provider was based on the
System.Collections.IHashCodeProvider interface. Starting with version
2.0, the hash code provider is based on the System.Collections.IEqualityComparer interface.
I overrode the Equals() of my class to compare ID values of type Guid.
Then Visual Studio warned:
... overrides Object.Equals(object o) but
does not override Object.GetHashCode()
So I then also overrode its GetHashCode() like this:
public partial class SomeClass
{
public override bool Equals(Object obj)
{
//Check for null and compare run-time types.
if (obj == null || this.GetType() != obj.GetType()) return false;
return this.Id == ((SomeClass)obj).Id;
}
public override int GetHashCode()
{
return this.Id.GetHashCode();
}
}
It seems to work. Have I done this correctly? Remember Id is of type Guid. Does it matter that my class is an Entity Framework object?
As others have said, the use of Reflection in Equals seems dodgy. Leaving that aside, let's concentrate on GetHashCode.
The primary rule for GetHashCode that you must not violate is if two objects are equal then they must both have the same hash code. Or, an equivalent way of saying that is if two objects have different hash codes then they must be unequal. Your implementation looks good there.
You are free to violate the converse. That is, if two objects have the same hash code then they are permitted to be equal or unequal, as you see fit.
I am assuming that "Id" is an immutable property. If "Id" can change over the lifetime of the object then you can have problems when putting the object in a hash table. Consider ensuring that only immutable properties are used in computing equality and hash code.
Your implementation looks good but the fact that you are asking the question indicates that you might not have a solid grasp of all the subtle factors that go into building an implementation of GetHashCode. A good place to start is my article on the subject:
http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/
It looks correct to me. Whenever I do something like this, I usually also implement IEquatable so that comparisons between variables of the same compile-time type will be a little more effecient.
public partial class SomeClass : IEquatable<SomeClass>
{
public override bool Equals(Object obj)
{
return Equals(obj as SomeClass);
}
public bool Equals(SomeClass obj)
{
if (obj == null)
return false;
return Id == obj.Id;
}
public override int GetHashCode()
{
return Id.GetHashCode();
}
}
This structure also allows a more derived object with the same Id to compare as equal to a less derived object. If this is not the desired behavior, then you will have to also compare the types as you do in the question.
if (obj.GetType() != typeof(SomeClass)) return false;
Since you're not dealing with a sealed class, I'd recommend against checking for class equality like this this.GetType() != obj.GetType(). Any sub-class of SomeClass should be able to participate in Equals also, so you might want to use this instead:
if (obj as SomeClass == null) return false;
Traditionally Equals is implemented in such a way that two objects will only be "Equal" if they are exactly the same in every way. For example, if you have two objects that represent the same object in the database, but where one has a different Name property than the other, the objects aren't considered "Equal", and should avoid producing the same "Hashcode" if possible.
It is better to err on the side of "not equal" than to risk calling two objects equal that aren't. This is why the default implementation for objects uses the memory location of the object itself: no two objects will ever be considered "equal" unless they are exactly the same object. So I'd say unless you want to write both GetHashCode and Equals in such a way that they check for equality of all their properties, you're better off not overriding either method.
If you have a data structure (like a HashSet) where you specifically want to determine equality based on the ID value, you can provide a specific IEqualityComparer implementation to that data structure.
You got excelent answers to your first question:
Have I done it correctly?
I will answer your second question
Does it matter that my class is an Entity Framework object?
Yes it matters a lot. Entity framework uses HashSet a lot internally. For example dynamic proxies use HashSet for representing collection navigation properties and EntityObjects use EntityCollection which in turn uses HashSet internally.
I have used Dictionary(TKey, TValue) for many purposes. But I haven't encountered any scenario to implement GetHashCode() which I believe is because my keys were of primary types like int and string.
I am curious to know the scenarios (real world examples) when one should use a custom object for key and thus implement methods GetHashCode() Equals() etc.
And, does using a custom object for key necessitate implementing these functions?
You should override Equals and GetHashCode whenever the default Object.Equals (tests for reference equality) will not suffice. This happens, for example, when the type of your key is a custom type and you want two keys to be considered equal even in cases when they are not the same instance of the custom type.
For example, if your key is as simple as
class Point {
public int X { get; set; }
public int Y { get; set; }
}
and you want two Points two be considered equal if their Xs are equal and their Ys are equal then you will need to override Equals and GetHashCode.
Just to make it clear: There is one important thing about Dictionary<TKey, TValue> and GetHashCode(): Dictionary uses GetHashCode to determine if two keys are equal i.e. if <TKey> is of custom type you should care about implementing GetHashCode() carefully. As Andrew Hare pointed out this is easy, if you have a simple type that identifies your custom object unambiguously. In case you have a combined identifier, it gets a little more complicated.
As example consider a complex number as TKey. A complex number is determined by its real and its imaginary part. Both are of simple type e.g. double. But how would you identify if two complex numbers are equal? You implement GetHashCode() for your custom complex type and combine both identifying parts.
You find further reading on the latter here.
UPDATE
Based on Ergwun's comment I checked the behavior of Dictionary<TKey, TValue>.Add with special respect to TKey's implementation of Equals(object) and GetHashCode(). I
must confess that I was rather surprised by the results.
Given two objects k1 and k2 of type TKey, two arbitrary objects v1 and v2 of type TValue, and an empty dictionary d of type Dictionary<TKey, TValue>, this is what happens when adding v1 with key k1 to d first and v2 with key k2 second (depending on the implementation of TKey.Equals(object) and TKey.GetHashCode()):
k1.Equals(k2) k1.GetHashCode() == k2.GetHashCode() d.Add(k2, v2)
false false ok
false true ok
true false ok
true true System.ArgumentException
Conclusion: I was wrong as I originally thought the second case (where Equals returns false but both key objects have same hash code) would raise an ArgumentException. But as the third case shows dictionary in some way does use GetHashCode(). Anyway it seems to be good advice that two objects that are the same type and are equal must return the same hash code to ensure that instances Dictionary<TKey, TValue> work correctly.
You have two questions here.
When do you need to implement
GetHashCode()
Would you ever use an object for a dictionary key.
Lets start with 1. If you are writing a class that might possibly be used by someone else, you will want to define GetHashCode() and Equals(), when reference Equals() is not enough. If you're not planning on using it in a dictionary, and it's for your own usage, then I see no reason to skip GetHashCode() etc.
For 2), you should use an object anytime you have a need to have a constant time lookup from an object to some other type. Since GetHashCode() returns a numeric value, and collections store references, there is no penalty for using an Object over an Int or a string (remember a string is an object).
One example is when you need to create a composite key (that is a key comprised of more that one piece of data). That composite key would be a custom type that would need to override those methods.
For example, let's say that you had an in-memory cache of address records and you wanted to check to see if an address was in cache to save an expensive trip to the database to retrieve it. Let's also say that addresses are unique in terms of their street 1 and zip code fields. You would implement your cache with something like this:
class AddressCacheKey
{
public String StreetOne { get; set; }
public String ZipCode { get; set; }
// overrides for Equals and GetHashCode
}
and
static Dictionary<AddressCacheKey,Address> cache;
Since your AddressCacheKey type overrides the Equals and GetHashCode methods they would be a good candidate for a key in the dictionary and you would be able to determine whether or not you needed to take a trip to the database to retrieve a record based on more than one piece of data.