Related
I've got an entity that I build, I take an instantiated entity and a modified entity. This allows me to hold the initial data, to compare against the modified data. The question, is what would be the ideal approach? Should I implement IEquatable as an override on Object.Equals or implement ICompare? My original implementation was:
var properties = typeof(TEntity).GetProperties();
foreach(var property in properties)
{
var initialEntity = original.GetType().GetProperty(property.Name).GetValue(original, null);
var modifiedEntity = userChange.GetType().GetProperty(property.Name).GetValue(userChange, null);
if(initialEntity.Equals(modifiedEntity) == false && !ignore.Contains(property.Name))
{
// Do Something
}
}
My understanding was that it would return a boolean, also in this instance it would compare on Value Equality, I'm assuming though it is comparing based on reference equality.
Because it never distinguishes, it remains equal under all circumstances.
The simplest answer:
If you need to test equality, implement IEquatable<T> and override Equals() and GetHashCode()
If you need to sort objects, implement IComparable<T>
The default implementation of Object.Equals() determines if the memory location used by one object is the same as another object. This is essentially what Object.ReferenceEquals(obj1, obj2) does, but dot net needs you to tell it how to determine if two objects you create are equivalent.
Additionally, the default implementation of Object.GetHashCode() is the 32 bit address (or portion of an address) of where the object is located in memory. Unless you override it to generate a hash code that is a function of everything you compared in your Equals() method, you will get unexpected results when you attempt to store it in a hash set or use it as a dictionary key.
You may need to implement both, but it looks like in your case IEquatable<T> is the most pressing need.
Failing to override GetHashCode and Equals when overloading the equality operator causes the compiler to produce warnings. Why would it be a good idea to change the implementation of either? After reading Eric Lippert's blog post on GetHashCode it's seems like there probably aren't many useful alternatives to GetHashCode's base implementation, why does the compiler I encourage you to change it?
Let's suppose you are implementing a class.
If you are overloading == then you are producing a type that has value equality as opposed to reference equality.
Given that, now the question is "how desirable is it to have a class that implements reference equality in .Equals() and value equality in ==?" and the answer is "not very desirable". That seems like a potential source of confusion. (And in fact, the company that I now work for, Coverity, produces a defect discovery tool that checks to see if you are confusing value equality with reference equality for precisely this reason. Coincidentally I was just reading the spec for it when I saw your question!)
Moreover, if you are going to have a class that implements both value and reference equality, the usual way to do it is to override Equals and leave == alone, not the other way around.
Therefore, given that you have overloaded ==, it is strongly suggested that you also override Equals.
If you are overriding Equals to produce value equality then you are required to override GetHashCode to match, as you know if you've read my article that you linked to.
If you don't override Equals() when you override == you will have some amazingly bad code.
How would you feel about this happening?
if (x == y)
{
if (!x.Equals(y))
throw new InvalidOperationException("Wut?");
}
Here's an example. Given this class:
class Test
{
public int Value;
public string Name;
public static bool operator==(Test lhs, Test rhs)
{
if (ReferenceEquals(lhs, rhs))
return true;
if (ReferenceEquals(lhs, null) || ReferenceEquals(rhs, null))
return false;
return lhs.Value == rhs.Value;
}
public static bool operator!=(Test lhs, Test rhs)
{
return !(lhs == rhs);
}
}
This code will behave oddly:
Test test1 = new Test { Value = 1, Name = "1" };
Test test2 = new Test { Value = 1, Name = "2" };
if (test1 == test2)
Console.WriteLine("test1 == test2"); // This gets printed.
else
Console.WriteLine("test1 != test2");
if (test1.Equals(test2))
Console.WriteLine("test1.Equals(test2)");
else
Console.WriteLine("NOT test1.Equals(test2)"); // This gets printed!
You do NOT want this!
My guess is that the compiler takes its clues from your actions, and decides that since you find it important to provide an alternative implementation of the equality operator, then you probably want the object equality to remain consistent with your new implementation of ==. After all, you do not want the two equality comparisons to mean drastically different things, otherwise your program would be hard to understand even on a very basic level. Therefore, the compiler thinks that you should redefine Equals as well.
Once you provide an alternative implementation Equals, however, you need to modify GetHashCode to stay consistent with the equality implementation. Hence the compiler warns you that your implementation might be incomplete, and suggests overriding both Equals and GetHashCode.
If you don't overload the Equals method too, then using it might give different results from the ones you'd get with the operator. Like, if you overload = for integers...
int i = 1;
(1 == 1) == (i.Equals(1))
Could evaluate to false.
For the same reason, you should reimplement the GetHashCode method so you don't mess up with hashtables and such other structures that rely on hash comparisons.
Notice I'm saying "might" and "could", not "will". The warnings are there just as a reminder that unexpected things might happen if you don't follow its suggestions. Otherwise you'd get errors instead of warnings.
The documentation is pretty clear about this:
The GetHashCode method can be overridden by a derived type. Value
types must override this method to provide a hash function that is
appropriate for that type and to provide a useful distribution in a
hash table. For uniqueness, the hash code must be based on the value
of an instance field or property instead of a static field or
property.
Objects used as a key in a Hashtable object must also override the
GetHashCode method because those objects must generate their own hash
code. If an object used as a key does not provide a useful
implementation of GetHashCode, you can specify a hash code provider
when the Hashtable object is constructed. Prior to the .NET Framework
version 2.0, the hash code provider was based on the
System.Collections.IHashCodeProvider interface. Starting with version
2.0, the hash code provider is based on the System.Collections.IEqualityComparer interface.
I overrode the Equals() of my class to compare ID values of type Guid.
Then Visual Studio warned:
... overrides Object.Equals(object o) but
does not override Object.GetHashCode()
So I then also overrode its GetHashCode() like this:
public partial class SomeClass
{
public override bool Equals(Object obj)
{
//Check for null and compare run-time types.
if (obj == null || this.GetType() != obj.GetType()) return false;
return this.Id == ((SomeClass)obj).Id;
}
public override int GetHashCode()
{
return this.Id.GetHashCode();
}
}
It seems to work. Have I done this correctly? Remember Id is of type Guid. Does it matter that my class is an Entity Framework object?
As others have said, the use of Reflection in Equals seems dodgy. Leaving that aside, let's concentrate on GetHashCode.
The primary rule for GetHashCode that you must not violate is if two objects are equal then they must both have the same hash code. Or, an equivalent way of saying that is if two objects have different hash codes then they must be unequal. Your implementation looks good there.
You are free to violate the converse. That is, if two objects have the same hash code then they are permitted to be equal or unequal, as you see fit.
I am assuming that "Id" is an immutable property. If "Id" can change over the lifetime of the object then you can have problems when putting the object in a hash table. Consider ensuring that only immutable properties are used in computing equality and hash code.
Your implementation looks good but the fact that you are asking the question indicates that you might not have a solid grasp of all the subtle factors that go into building an implementation of GetHashCode. A good place to start is my article on the subject:
http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/
It looks correct to me. Whenever I do something like this, I usually also implement IEquatable so that comparisons between variables of the same compile-time type will be a little more effecient.
public partial class SomeClass : IEquatable<SomeClass>
{
public override bool Equals(Object obj)
{
return Equals(obj as SomeClass);
}
public bool Equals(SomeClass obj)
{
if (obj == null)
return false;
return Id == obj.Id;
}
public override int GetHashCode()
{
return Id.GetHashCode();
}
}
This structure also allows a more derived object with the same Id to compare as equal to a less derived object. If this is not the desired behavior, then you will have to also compare the types as you do in the question.
if (obj.GetType() != typeof(SomeClass)) return false;
Since you're not dealing with a sealed class, I'd recommend against checking for class equality like this this.GetType() != obj.GetType(). Any sub-class of SomeClass should be able to participate in Equals also, so you might want to use this instead:
if (obj as SomeClass == null) return false;
Traditionally Equals is implemented in such a way that two objects will only be "Equal" if they are exactly the same in every way. For example, if you have two objects that represent the same object in the database, but where one has a different Name property than the other, the objects aren't considered "Equal", and should avoid producing the same "Hashcode" if possible.
It is better to err on the side of "not equal" than to risk calling two objects equal that aren't. This is why the default implementation for objects uses the memory location of the object itself: no two objects will ever be considered "equal" unless they are exactly the same object. So I'd say unless you want to write both GetHashCode and Equals in such a way that they check for equality of all their properties, you're better off not overriding either method.
If you have a data structure (like a HashSet) where you specifically want to determine equality based on the ID value, you can provide a specific IEqualityComparer implementation to that data structure.
You got excelent answers to your first question:
Have I done it correctly?
I will answer your second question
Does it matter that my class is an Entity Framework object?
Yes it matters a lot. Entity framework uses HashSet a lot internally. For example dynamic proxies use HashSet for representing collection navigation properties and EntityObjects use EntityCollection which in turn uses HashSet internally.
I ran into this situation today. I have an object which I'm testing for equality; the Create() method returns a subclass implementation of MyObject.
MyObject a = MyObject.Create();
MyObject b = MyObject.Create();
a == b; // is false
a.Equals(b); // is true
Note I have also over-ridden Equals() in the subclass implementation, which does a very basic check to see whether or not the passed-in object is null and is of the subclass's type. If both those conditions are met, the objects are deemed to be equal.
The other slightly odd thing is that my unit test suite does some tests similar to
Assert.AreEqual(MyObject.Create(), MyObject.Create()); // Green bar
and the expected result is observed. Therefore I guess that NUnit uses a.Equals(b) under the covers, rather than a == b as I had assumed.
Side note: I program in a mixture of .NET and Java, so I might be mixing up my expectations/assumptions here. I thought, however, that a == b worked more consistently in .NET than it did in Java where you often have to use equals() to test equality.
UPDATE Here's the implementation of Equals(), as requested:
public override bool Equals(object obj) {
return obj != null && obj is MyObjectSubclass;
}
The key difference between == and Equals is that == (like all operators) is not polymorphic, while Equals (like any virtual function) is.
By default, reference types will get identical results for == and Equals, because they both compare references. It's also certainly possible to code your operator logic and Equals logic entirely differently, though that seems nonsensical to do. The biggest gotcha comes when using the == (or any) operator at a higher level than the desired logic is declared (in other words, referencing the object as a parent class that either doesn't explicitly define the operator or defines it differently than the true class). In such cases the logic for the class that it's referenced as is used for operators, but the logic for Equals comes from whatever class the object actually is.
I want to state emphatically that, based solely upon the information in your question, there is absolutely no reason to think or assume that Equals compares values versus references. It's trivially easy to create such a class, but this is not a language specification.
Post-question-edit edit
Your implementation of Equals will return true for any non-null instance of your class. Though the syntax makes me think that you aren't, you may be confusing the is C# keyword (which confirms type) with the is keyword in VB.NET (which confirms referential equality). If that is indeed the case, then you can make an explicit reference comparison in C# by using Object.ReferenceEquals(this, obj).
In any case, this is why you are seeing true for Equals, since you're passing in a non-null instance of your class.
Incidentally, your comment about NUnit using Equals is true for the same reason; because operators are not polymorphic, there would be no way for a particular class to define custom equality behavior if the Assert function used ==.
a == b checks if they reference the same object.
a.Equals(b) compares the contents.
This is a link to a Jon Skeet article from 2004 that explains it better.
You pretty much answered your question yourself:
I have also over-ridden Equals() in the subclass implementation, which does a very basic check to see whether or not the passed-in object is null and is of the subclass's type. If both those conditions are met, the objects are deemed to be equal.
The == operator hasn't been overloaded - so it's returning false since a and b are different objects. But a.Equals is calling your override, which is presumably returning true because neither a nor b are null, and they're both of the subclass' type.
So your question was "When can a == b be false and a.Equals(b) true?" Your answer in this case is: when you explicitly code it to be so!
In Java a ==b check if the references of the two objects are equals (rougly, if the two objects are the same object "aliased")
a.equals(b) compare the values represented by the two objects.
They both do the same unless they are specifically overloaded within the object to do something else.
A quote from the Jon Skeet Article mentioned elsewhere.
The Equals method is just a virtual
one defined in System.Object, and
overridden by whichever classes choose
to do so. The == operator is an
operator which can be overloaded by
classes, but which usually has
identity behaviour.
The keyword here is USUALLY. They can be written to do whatever the underlying class wishes and in no way do they have to do the same.
The "==" operate tests absolute equality (unless overloaded); that is, it tests whether two objects are the same object. That's only true if you assigned one to the other, ie.
MyObject a = MyObject.Create();
MyObject b = a;
Just setting all the properties of two objects equal doesn't mean the objects themselves are. Under the hood, what the "==" operator is comparing is the addresses of the objects in memory. A practical effect of this is that if two objects are truly equal, changing a property on one of them will also change it on the other, whereas if they're only similar ("Equals" equal), it won't. This is perfectly consistent once you understand the principle.
I believe that a == b will check if the referenced object is the same.
Usually to see if the value is the same a.Equals(b) is used (this often needs to be overridden in order to work).
Why use one over the other?
== is the identity test. It will return true if the two objects being tested are in fact the same object. Equals() performs an equality test, and will return true if the two objects consider themselves equal.
Identity testing is faster, so you can use it when there's no need for more expensive equality tests. For example, comparing against null or the empty string.
It's possible to overload either of these to provide different behavior -- like identity testing for Equals() --, but for the sake of anybody reading your code, please don't.
Pointed out below: some types like String or DateTime provide overloads for the == operator that give it equality semantics. So the exact behavior will depend on the types of the objects you are comparing.
See also:
http://blogs.msdn.com/csharpfaq/archive/2004/03/29/102224.aspx
#John Millikin:
Pointed out below: some value types like DateTime provide overloads for the == operator >that give it equality semantics. So the exact behavior will depend on the types of the >objects you are comparing.
To elaborate:
DateTime is implemented as a struct. All structs are children of System.ValueType.
Since System.ValueType's children live on the stack, there is no reference pointer to the heap, and thus no way to do a reference check, you must compare objects by value only.
System.ValueType overrides .Equals() and == to use a reflection based equality check, it uses reflection to compare each fields value.
Because reflection is somewhat slow, if you implement your own struct, it is important to override .Equals() and add your own value checking code, as this will be much faster. Don't just call base.Equals();
Everyone else pretty much has you covered, but I have one more word of advice. Every now and again, you will get someone who swears on his life (and those of his loved ones) that .Equals is more efficient/better/best-practice or some other dogmatic line. I can't speak to efficiency (well, OK, in certain circumstances I can), but I can speak to a big issue which will crop up: .Equals requires an object to exist. (Sounds stupid, but it throws people off.)
You can't do the following:
StringBuilder sb = null;
if (sb.Equals(null))
{
// whatever
}
It seems obvious to me, and perhaps most people, that you will get a NullReferenceException. However, proponents of .Equals forget about that little factoid. Some are even "thrown" off (sorry, couldn't resist) when they see the NullRefs start to pop up.
(And years before the DailyWTF posting, I did actually work with someone who mandated that all equality checks be .Equals instead of ==. Even proving his inaccuracy didn't help. We just made damn sure to break all his other rules so that no reference returned from a method nor property was ever null, and it worked out in the end.)
== is generally the "identity" equals meaning "object a is in fact the exact same object in memory as object b".
equals() means that the objects logically equal (say, from a business point of view). So if you are comparing instances of a user-defined class, you would generally need to use and define equals() if you want things like a Hashtable to work properly.
If you had the proverbial Person class with properties "Name" and "Address" and you wanted to use this Person as a key into a Hashtable containing more information about them, you would need to implement equals() (and hash) so that you could create an instance of a Person and use it as a key into the Hashtable to get the information.
Using == alone, your new instance would not be the same.
According to MSDN:
In C#, there are two different kinds of equality: reference equality (also known as identity) and value equality. Value equality is the generally understood meaning of equality: it means that two objects contain the same values. For example, two integers with the value of 2 have value equality. Reference equality means that there are not two objects to compare. Instead, there are two object references and both of them refer to the same object.
...
By default, the operator == tests for reference equality by determining whether two references indicate the same object.
Both Equals and == can be overloaded, so the exact results of calling one or the other will vary. Note that == is determined at compile time, so while the actual implementation could change, which == is used is fixed at compile time, unlike Equals which could use a different implementation based on the run time type of the left side.
For instance string performs an equality test for ==.
Also note that the semantics of both can be complex.
Best practice is to implement equality like this example. Note that you can simplify or exclude all of this depending on how you plan on using you class, and that structs get most of this already.
class ClassName
{
public bool Equals(ClassName other)
{
if (other == null)
{
return false;
}
else
{
//Do your equality test here.
}
}
public override bool Equals(object obj)
{
ClassName other = obj as null; //Null and non-ClassName objects will both become null
if (obj == null)
{
return false;
}
else
{
return Equals(other);
}
}
public bool operator ==(ClassName left, ClassName right)
{
if (left == null)
{
return right == null;
}
else
{
return left.Equals(right);
}
}
public bool operator !=(ClassName left, ClassName right)
{
if (left == null)
{
return right != null;
}
else
{
return !left.Equals(right);
}
}
public override int GetHashCode()
{
//Return something useful here, typically all members shifted or XORed together works
}
}
Another thing to take into consideration: the == operator may not be callable or may have different meaning if you access the object from another language. Usually, it's better to have an alternative that can be called by name.
The example is because the class DateTime implements the IEquatable interface, which implements a "type-specific method for determining equality of instances." according to MSDN.
use equals if you want to express the contents of the objects compared should be equal. use == for primitive values or if you want to check that the objects being compared is one and the same object. For objects == checks whether the address pointer of the objects is the same.
I have seen Object.ReferenceEquals() used in cases where one wants to know if two references refer to the same object
In most cases, they are the same, so you should use == for clarity. According to the Microsoft Framework Design Guidelines:
"DO ensure that Object.Equals and the equality operators have exactly the same semantics and similar performance characteristics."
https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/equality-operators
But sometimes, someone will override Object.Equals without providing equality operators. In that case, you should use Equals to test for value equality, and Object.ReferenceEquals to test for reference equality.
If you do disassemble (by dotPeek for example) of Object, so
public virtual bool Equals(Object obj)
described as:
// Returns a boolean indicating if the passed in object obj is
// Equal to this. Equality is defined as object equality for reference
// types and bitwise equality for value types using a loader trick to
// replace Equals with EqualsValue for value types).
//
So, is depend on type.
For example:
Object o1 = "vvv";
Object o2 = "vvv";
bool b = o1.Equals(o2);
o1 = 555;
o2 = 555;
b = o1.Equals(o2);
o1 = new List<int> { 1, 2, 3 };
o2 = new List<int> { 1, 2, 3 };
b = o1.Equals(o2);
First time b is true (equal performed on value types), second time b is true (equal performed on value types), third time b is false (equal performed on reference types).