Related
Today I stumbled upon an interesting bug I wrote. I have a set of properties which can be set through a general setter. These properties can be value types or reference types.
public void SetValue( TEnum property, object value )
{
if ( _properties[ property ] != value )
{
// Only come here when the new value is different.
}
}
When writing a unit test for this method I found out the condition is always true for value types. It didn't take me long to figure out this is due to boxing/unboxing. It didn't take me long either to adjust the code to the following:
public void SetValue( TEnum property, object value )
{
if ( !_properties[ property ].Equals( value ) )
{
// Only come here when the new value is different.
}
}
The thing is I'm not entirely satisfied with this solution. I'd like to keep a simple reference comparison, unless the value is boxed.
The current solution I am thinking of is only calling Equals() for boxed values. Doing a check for a boxed values seems a bit overkill. Isn't there an easier way?
If you need different behaviour when you're dealing with a value-type then you're obviously going to need to perform some kind of test. You don't need an explicit check for boxed value-types, since all value-types will be boxed** due to the parameter being typed as object.
This code should meet your stated criteria: If value is a (boxed) value-type then call the polymorphic Equals method, otherwise use == to test for reference equality.
public void SetValue(TEnum property, object value)
{
bool equal = ((value != null) && value.GetType().IsValueType)
? value.Equals(_properties[property])
: (value == _properties[property]);
if (!equal)
{
// Only come here when the new value is different.
}
}
( ** And, yes, I know that Nullable<T> is a value-type with its own special rules relating to boxing and unboxing, but that's pretty much irrelevant here.)
Equals() is generally the preferred approach.
The default implementation of .Equals() does a simple reference comparison for reference types, so in most cases that's what you'll be getting. Equals() might have been overridden to provide some other behavior, but if someone has overridden .Equals() in a class it's because they want to change the equality semantics for that type, and it's better to let that happen if you don't have a compelling reason not to. Bypassing it by using == can lead to confusion when your class sees two things as different when every other class agrees that they're the same.
Since the input parameter's type is object, you will always get a boxed value inside the method's context.
I think your only chance is to change the method's signature and to write different overloads.
How about this:
if(object.ReferenceEquals(first, second)) { return; }
if(first.Equals(second)) { return; }
// they must differ, right?
Update
I realized this doesn't work as expected for a certain case:
For value types, ReferenceEquals returns false so we fall back to Equals, which behaves as expected.
For reference types where ReferenceEquals returns true, we consider them "same" as expected.
For reference types where ReferenceEquals returns false and Equals returns false, we consider them "different" as expected.
For reference types where ReferenceEquals returns false and Equals returns true, we consider them "same" even though we want "different"
So the lesson is "don't get clever"
I suppose
I'd like to keep a simple reference comparison, unless the value is boxed.
is somewhat equivalent to
If the value is boxed, I'll do a non-"simple reference comparison".
This means the first thing you'll need to do is to check whether the value is boxed or not.
If there exists a method to check whether an object is a boxed value type or not, it should be at least as complex as that "overkill" method you provided the link to unless that is not the simplest way. Nonetheless, there should be a "simplest way" to determine if an object is a boxed value type or not. It's unlikely that this "simplest way" is simpler than simply using the object Equals() method, but I've bookmarked this question to find out just in case.
(not sure if I was logical)
Failing to override GetHashCode and Equals when overloading the equality operator causes the compiler to produce warnings. Why would it be a good idea to change the implementation of either? After reading Eric Lippert's blog post on GetHashCode it's seems like there probably aren't many useful alternatives to GetHashCode's base implementation, why does the compiler I encourage you to change it?
Let's suppose you are implementing a class.
If you are overloading == then you are producing a type that has value equality as opposed to reference equality.
Given that, now the question is "how desirable is it to have a class that implements reference equality in .Equals() and value equality in ==?" and the answer is "not very desirable". That seems like a potential source of confusion. (And in fact, the company that I now work for, Coverity, produces a defect discovery tool that checks to see if you are confusing value equality with reference equality for precisely this reason. Coincidentally I was just reading the spec for it when I saw your question!)
Moreover, if you are going to have a class that implements both value and reference equality, the usual way to do it is to override Equals and leave == alone, not the other way around.
Therefore, given that you have overloaded ==, it is strongly suggested that you also override Equals.
If you are overriding Equals to produce value equality then you are required to override GetHashCode to match, as you know if you've read my article that you linked to.
If you don't override Equals() when you override == you will have some amazingly bad code.
How would you feel about this happening?
if (x == y)
{
if (!x.Equals(y))
throw new InvalidOperationException("Wut?");
}
Here's an example. Given this class:
class Test
{
public int Value;
public string Name;
public static bool operator==(Test lhs, Test rhs)
{
if (ReferenceEquals(lhs, rhs))
return true;
if (ReferenceEquals(lhs, null) || ReferenceEquals(rhs, null))
return false;
return lhs.Value == rhs.Value;
}
public static bool operator!=(Test lhs, Test rhs)
{
return !(lhs == rhs);
}
}
This code will behave oddly:
Test test1 = new Test { Value = 1, Name = "1" };
Test test2 = new Test { Value = 1, Name = "2" };
if (test1 == test2)
Console.WriteLine("test1 == test2"); // This gets printed.
else
Console.WriteLine("test1 != test2");
if (test1.Equals(test2))
Console.WriteLine("test1.Equals(test2)");
else
Console.WriteLine("NOT test1.Equals(test2)"); // This gets printed!
You do NOT want this!
My guess is that the compiler takes its clues from your actions, and decides that since you find it important to provide an alternative implementation of the equality operator, then you probably want the object equality to remain consistent with your new implementation of ==. After all, you do not want the two equality comparisons to mean drastically different things, otherwise your program would be hard to understand even on a very basic level. Therefore, the compiler thinks that you should redefine Equals as well.
Once you provide an alternative implementation Equals, however, you need to modify GetHashCode to stay consistent with the equality implementation. Hence the compiler warns you that your implementation might be incomplete, and suggests overriding both Equals and GetHashCode.
If you don't overload the Equals method too, then using it might give different results from the ones you'd get with the operator. Like, if you overload = for integers...
int i = 1;
(1 == 1) == (i.Equals(1))
Could evaluate to false.
For the same reason, you should reimplement the GetHashCode method so you don't mess up with hashtables and such other structures that rely on hash comparisons.
Notice I'm saying "might" and "could", not "will". The warnings are there just as a reminder that unexpected things might happen if you don't follow its suggestions. Otherwise you'd get errors instead of warnings.
The documentation is pretty clear about this:
The GetHashCode method can be overridden by a derived type. Value
types must override this method to provide a hash function that is
appropriate for that type and to provide a useful distribution in a
hash table. For uniqueness, the hash code must be based on the value
of an instance field or property instead of a static field or
property.
Objects used as a key in a Hashtable object must also override the
GetHashCode method because those objects must generate their own hash
code. If an object used as a key does not provide a useful
implementation of GetHashCode, you can specify a hash code provider
when the Hashtable object is constructed. Prior to the .NET Framework
version 2.0, the hash code provider was based on the
System.Collections.IHashCodeProvider interface. Starting with version
2.0, the hash code provider is based on the System.Collections.IEqualityComparer interface.
I overrode the Equals() of my class to compare ID values of type Guid.
Then Visual Studio warned:
... overrides Object.Equals(object o) but
does not override Object.GetHashCode()
So I then also overrode its GetHashCode() like this:
public partial class SomeClass
{
public override bool Equals(Object obj)
{
//Check for null and compare run-time types.
if (obj == null || this.GetType() != obj.GetType()) return false;
return this.Id == ((SomeClass)obj).Id;
}
public override int GetHashCode()
{
return this.Id.GetHashCode();
}
}
It seems to work. Have I done this correctly? Remember Id is of type Guid. Does it matter that my class is an Entity Framework object?
As others have said, the use of Reflection in Equals seems dodgy. Leaving that aside, let's concentrate on GetHashCode.
The primary rule for GetHashCode that you must not violate is if two objects are equal then they must both have the same hash code. Or, an equivalent way of saying that is if two objects have different hash codes then they must be unequal. Your implementation looks good there.
You are free to violate the converse. That is, if two objects have the same hash code then they are permitted to be equal or unequal, as you see fit.
I am assuming that "Id" is an immutable property. If "Id" can change over the lifetime of the object then you can have problems when putting the object in a hash table. Consider ensuring that only immutable properties are used in computing equality and hash code.
Your implementation looks good but the fact that you are asking the question indicates that you might not have a solid grasp of all the subtle factors that go into building an implementation of GetHashCode. A good place to start is my article on the subject:
http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/
It looks correct to me. Whenever I do something like this, I usually also implement IEquatable so that comparisons between variables of the same compile-time type will be a little more effecient.
public partial class SomeClass : IEquatable<SomeClass>
{
public override bool Equals(Object obj)
{
return Equals(obj as SomeClass);
}
public bool Equals(SomeClass obj)
{
if (obj == null)
return false;
return Id == obj.Id;
}
public override int GetHashCode()
{
return Id.GetHashCode();
}
}
This structure also allows a more derived object with the same Id to compare as equal to a less derived object. If this is not the desired behavior, then you will have to also compare the types as you do in the question.
if (obj.GetType() != typeof(SomeClass)) return false;
Since you're not dealing with a sealed class, I'd recommend against checking for class equality like this this.GetType() != obj.GetType(). Any sub-class of SomeClass should be able to participate in Equals also, so you might want to use this instead:
if (obj as SomeClass == null) return false;
Traditionally Equals is implemented in such a way that two objects will only be "Equal" if they are exactly the same in every way. For example, if you have two objects that represent the same object in the database, but where one has a different Name property than the other, the objects aren't considered "Equal", and should avoid producing the same "Hashcode" if possible.
It is better to err on the side of "not equal" than to risk calling two objects equal that aren't. This is why the default implementation for objects uses the memory location of the object itself: no two objects will ever be considered "equal" unless they are exactly the same object. So I'd say unless you want to write both GetHashCode and Equals in such a way that they check for equality of all their properties, you're better off not overriding either method.
If you have a data structure (like a HashSet) where you specifically want to determine equality based on the ID value, you can provide a specific IEqualityComparer implementation to that data structure.
You got excelent answers to your first question:
Have I done it correctly?
I will answer your second question
Does it matter that my class is an Entity Framework object?
Yes it matters a lot. Entity framework uses HashSet a lot internally. For example dynamic proxies use HashSet for representing collection navigation properties and EntityObjects use EntityCollection which in turn uses HashSet internally.
I was wondering why Nullable<T> is a value type, if it is designed to mimic the behavior of reference types? I understand things like GC pressure, but I don't feel convinced - if we want to have int acting like reference, we are probably OK with all the consequences of having real reference type. I can see no reason why Nullable<T> is not just boxed version of T struct.
As value type:
it still needs to be boxed and unboxed, and more, boxing must be a bit different than with "normal" structs (to treat null-valued nullable like real null)
it needs to be treated differently when checking for null (done simply in Equals, no real problem)
it is mutable, breaking the rule that structs should be immutable (ok, it is logically immutable)
it needs to have special restriction to disallow recursion like Nullable<Nullable<T>>
Doesn't making Nullable<T> a reference type solve that issues?
rephrased and updated:
I've modified my reason list a bit, but my general question is still open:
How will reference type Nullable<T> be worse than current value type implementation? Is it only GC pressure and "small, immutable" rule? It still feels strange for me...
The reason is that it was not designed to act like a reference type. It was designed to act like a value type, except in just one particular. Let's look at some ways value types and reference types differ.
The main difference between a value and reference type, is that value type is self-contained (the variable containing the actual value), while a reference type refers to another value.
Some other differences are entailed by this. The fact that we can alias reference types directly (which has both good and bad effects) comes from this. So too do differences in what equality means:
A value type has a concept of equality based on the value contained, which can optionally be redefined (there are logical restrictions on how this redefinition can happen*). A reference type has a concept of identity that is meaningless with value types (as they cannot be directly aliased, so two such values cannot be identical) that can not be redefined, which is also gives the default for its concept of equality. By default, == deals with this value-based equality when it comes to value types†, but with identity when it comes to reference types. Also, even when a reference type is given a value-based concept of equality, and has it used for == it never loses the ability to be compared to another reference for identity.
Another difference entailed by this is that reference types can be null - a value that refers to another value allows for a value that doesn't refer to any value, which is what a null reference is.
Also, some of the advantages of keeping value-types small relate to this, since being based on value, they are copied by value when passed to functions.
Some other differences are implied but not entailed by this. That it's often a good idea to make value types immutable is implied but not entailed by the core difference because while there are advantages to be found without considering implementation matters, there are also advantages in doing so with reference types (indeed some relating to safety with aliases apply more immediately to reference types) and reasons why one may break this guideline - so it's not a hard and fast rule (with nested value types the risks involved are so heavily reduced that I would have few qualms in making a nested value type mutable, even though my style leans heavily to making even reference types immutable when at all practical).
Some further differences between value types and reference types are arguably implementation details. That a value type in a local variable has the value stored on the stack has been argued as an implementation detail; probably a pretty obvious one if your implementation has a stack, and certainly an important one in some cases, but not core to the definition. It's also often overstated (for a start, a reference type in a local variable also has the reference itself in the stack, for another there are plenty of times when a value type value is stored in the heap).
Some further advantages in value types being small relate to this.
Now, Nullable<T> is a type that behaves like a value type in all the ways described above, except that it can take a null value. Maybe the matter of local values being stored on the stack isn't all that important (being more an implementation detail than anything else), but the rest is inherent to how it is defined.
Nullable<T> is defined as
struct Nullable<T>
{
private bool hasValue;
internal T value;
/* methods and properties I won't go into here */
}
Most of the implementation from this point is obvious. Some special handling is needed allow null to be assigned to it - treated as if default(Nullable<T>) had been assigned - and some special handling when boxed, and then the rest follows (including that it can be compared for equality with null).
If Nullable<T> was a reference type, then we'd have to have special handling to allow for all the rest to occur, along with special handling for features in how .NET helps the developer (such as we'd need special handling to make it descend from ValueType). I'm not even sure if it would be possible.
*There are some restrictions on how we are allowed to redefine equality. Combining those rules with those used in the defaults, then generally we can allow for two values to be considered equal that would be considered unequal by default, but it rarely makes sense to consider two values unequal that the default would consider equal. A exception is the case where a struct contains only value-types, but where said value-types redefine equality. This the a result of an optimisation, and generally considered a bug rather than by design.
†An exception is float-point types. Because of the definition of value-types in the CLI standard, double.NaN.Equals(double.NaN) and float.NaN.Equals(float.NaN) return true. But because of the definition of NaN in ISO 60559, float.NaN == float.NaN and double.NaN == double.NaN both return false.
Edited to address the updated question...
You can box and unbox objects if you want to use a struct as a reference.
However, the Nullable<> type basically allows to enhance any value type with an additional state flag which tells whether the value shall be used as null or if the stuct is "valid".
So to address your questions:
This is an advantage when used in collections, or because of the different semantics (copying instead of referencing)
No it doesn't. The CLR does respect this when boxing and unboxing, so that you actually never box a Nullable<> instance. Boxing a Nullable<> which "has" no value will return a null reference, and unboxing does the opposite.
Nope.
Again, this isn't the case. In fact generic constraints for a struct do not allow nullable structs to be used. This makes sense due to the special boxing/unboxing behavior. Therefore, if you have a where T: struct to constrain a generic type, nullable types will be disallowed. Since this constraint is defined on the Nullable<T> type as well, you cannot nest them, without any special treatment to prevent this.
Why not using references? I already mentioned the important semantic differences. But apart from this, reference types use much more memory space: Each reference, especially in 64-bit environments, uses up not only heap memory for the instance, but also memory for the reference, the instance type information, locking bits etc.. So, apart from the semantics and performance differences (indirection via reference), you end up with using a multiple of the memory used for the entity itself for most common entities. And the GC gets more objects to handle, which will make the total performance compared to structs even worse.
It is not mutable; check again.
The boxing is different too; an empty "boxes" to null.
But; it is small (barely bigger than T), immutable, and encapsulates only structs - ideal as a struct. Perhaps more importantly, so long as T is truly a "value", then so is T? a logical "value".
I coded MyNullable as a class.
Can't really understand why it cannot be a class, beside for avoid heap memory pressure.
namespace ClassLibrary1
{
using NFluent;
using NUnit.Framework;
[TestFixture]
class MyNullableShould
{
[Test]
public void operator_equals_btw_nullable_and_value_works()
{
var myNullable = new MyNullable<int>(1);
Check.That(myNullable == 1).IsEqualTo(true);
Check.That(myNullable == 2).IsEqualTo(false);
}
[Test]
public void Can_be_comparedi_with_operator_equal_equals()
{
var myNullable = new MyNullable<int>(1);
var myNullable2 = new MyNullable<int>(1);
Check.That(myNullable == myNullable2).IsTrue();
Check.That(myNullable == myNullable2).IsTrue();
var myNullable3 = new MyNullable<int>(2);
Check.That(myNullable == myNullable3).IsFalse();
}
}
}
namespace ClassLibrary1
{
using System;
public class MyNullable<T> where T : struct
{
internal T value;
public MyNullable(T value)
{
this.value = value;
this.HasValue = true;
}
public bool HasValue { get; }
public T Value
{
get
{
if (!this.HasValue) throw new Exception("Cannot grab value when has no value");
return this.value;
}
}
public static explicit operator T(MyNullable<T> value)
{
return value.Value;
}
public static implicit operator MyNullable<T>(T value)
{
return new MyNullable<T>(value);
}
public static bool operator ==(MyNullable<T> n1, MyNullable<T> n2)
{
if (!n1.HasValue) return !n2.HasValue;
if (!n2.HasValue) return false;
return Equals(n1.value, n2.value);
}
public static bool operator !=(MyNullable<T> n1, MyNullable<T> n2)
{
return !(n1 == n2);
}
public override bool Equals(object other)
{
if (!this.HasValue) return other == null;
if (other == null) return false;
return this.value.Equals(other);
}
public override int GetHashCode()
{
return this.HasValue ? this.value.GetHashCode() : 0;
}
public T GetValueOrDefault()
{
return this.value;
}
public T GetValueOrDefault(T defaultValue)
{
return this.HasValue ? this.value : defaultValue;
}
public override string ToString()
{
return this.HasValue ? this.value.ToString() : string.Empty;
}
}
}
Why use one over the other?
== is the identity test. It will return true if the two objects being tested are in fact the same object. Equals() performs an equality test, and will return true if the two objects consider themselves equal.
Identity testing is faster, so you can use it when there's no need for more expensive equality tests. For example, comparing against null or the empty string.
It's possible to overload either of these to provide different behavior -- like identity testing for Equals() --, but for the sake of anybody reading your code, please don't.
Pointed out below: some types like String or DateTime provide overloads for the == operator that give it equality semantics. So the exact behavior will depend on the types of the objects you are comparing.
See also:
http://blogs.msdn.com/csharpfaq/archive/2004/03/29/102224.aspx
#John Millikin:
Pointed out below: some value types like DateTime provide overloads for the == operator >that give it equality semantics. So the exact behavior will depend on the types of the >objects you are comparing.
To elaborate:
DateTime is implemented as a struct. All structs are children of System.ValueType.
Since System.ValueType's children live on the stack, there is no reference pointer to the heap, and thus no way to do a reference check, you must compare objects by value only.
System.ValueType overrides .Equals() and == to use a reflection based equality check, it uses reflection to compare each fields value.
Because reflection is somewhat slow, if you implement your own struct, it is important to override .Equals() and add your own value checking code, as this will be much faster. Don't just call base.Equals();
Everyone else pretty much has you covered, but I have one more word of advice. Every now and again, you will get someone who swears on his life (and those of his loved ones) that .Equals is more efficient/better/best-practice or some other dogmatic line. I can't speak to efficiency (well, OK, in certain circumstances I can), but I can speak to a big issue which will crop up: .Equals requires an object to exist. (Sounds stupid, but it throws people off.)
You can't do the following:
StringBuilder sb = null;
if (sb.Equals(null))
{
// whatever
}
It seems obvious to me, and perhaps most people, that you will get a NullReferenceException. However, proponents of .Equals forget about that little factoid. Some are even "thrown" off (sorry, couldn't resist) when they see the NullRefs start to pop up.
(And years before the DailyWTF posting, I did actually work with someone who mandated that all equality checks be .Equals instead of ==. Even proving his inaccuracy didn't help. We just made damn sure to break all his other rules so that no reference returned from a method nor property was ever null, and it worked out in the end.)
== is generally the "identity" equals meaning "object a is in fact the exact same object in memory as object b".
equals() means that the objects logically equal (say, from a business point of view). So if you are comparing instances of a user-defined class, you would generally need to use and define equals() if you want things like a Hashtable to work properly.
If you had the proverbial Person class with properties "Name" and "Address" and you wanted to use this Person as a key into a Hashtable containing more information about them, you would need to implement equals() (and hash) so that you could create an instance of a Person and use it as a key into the Hashtable to get the information.
Using == alone, your new instance would not be the same.
According to MSDN:
In C#, there are two different kinds of equality: reference equality (also known as identity) and value equality. Value equality is the generally understood meaning of equality: it means that two objects contain the same values. For example, two integers with the value of 2 have value equality. Reference equality means that there are not two objects to compare. Instead, there are two object references and both of them refer to the same object.
...
By default, the operator == tests for reference equality by determining whether two references indicate the same object.
Both Equals and == can be overloaded, so the exact results of calling one or the other will vary. Note that == is determined at compile time, so while the actual implementation could change, which == is used is fixed at compile time, unlike Equals which could use a different implementation based on the run time type of the left side.
For instance string performs an equality test for ==.
Also note that the semantics of both can be complex.
Best practice is to implement equality like this example. Note that you can simplify or exclude all of this depending on how you plan on using you class, and that structs get most of this already.
class ClassName
{
public bool Equals(ClassName other)
{
if (other == null)
{
return false;
}
else
{
//Do your equality test here.
}
}
public override bool Equals(object obj)
{
ClassName other = obj as null; //Null and non-ClassName objects will both become null
if (obj == null)
{
return false;
}
else
{
return Equals(other);
}
}
public bool operator ==(ClassName left, ClassName right)
{
if (left == null)
{
return right == null;
}
else
{
return left.Equals(right);
}
}
public bool operator !=(ClassName left, ClassName right)
{
if (left == null)
{
return right != null;
}
else
{
return !left.Equals(right);
}
}
public override int GetHashCode()
{
//Return something useful here, typically all members shifted or XORed together works
}
}
Another thing to take into consideration: the == operator may not be callable or may have different meaning if you access the object from another language. Usually, it's better to have an alternative that can be called by name.
The example is because the class DateTime implements the IEquatable interface, which implements a "type-specific method for determining equality of instances." according to MSDN.
use equals if you want to express the contents of the objects compared should be equal. use == for primitive values or if you want to check that the objects being compared is one and the same object. For objects == checks whether the address pointer of the objects is the same.
I have seen Object.ReferenceEquals() used in cases where one wants to know if two references refer to the same object
In most cases, they are the same, so you should use == for clarity. According to the Microsoft Framework Design Guidelines:
"DO ensure that Object.Equals and the equality operators have exactly the same semantics and similar performance characteristics."
https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/equality-operators
But sometimes, someone will override Object.Equals without providing equality operators. In that case, you should use Equals to test for value equality, and Object.ReferenceEquals to test for reference equality.
If you do disassemble (by dotPeek for example) of Object, so
public virtual bool Equals(Object obj)
described as:
// Returns a boolean indicating if the passed in object obj is
// Equal to this. Equality is defined as object equality for reference
// types and bitwise equality for value types using a loader trick to
// replace Equals with EqualsValue for value types).
//
So, is depend on type.
For example:
Object o1 = "vvv";
Object o2 = "vvv";
bool b = o1.Equals(o2);
o1 = 555;
o2 = 555;
b = o1.Equals(o2);
o1 = new List<int> { 1, 2, 3 };
o2 = new List<int> { 1, 2, 3 };
b = o1.Equals(o2);
First time b is true (equal performed on value types), second time b is true (equal performed on value types), third time b is false (equal performed on reference types).