GetHashCode() for OrdinalIgnoreCase-dependent string classes - c#

public class Address{
public string ContactName {get; private set;}
public string Company {get; private set;}
//...
public string Zip {get; private set;}
}
I'd like to implement a notion of distint addresses, so I overrode Equals() to test for case-insensitive equality in all of the fields (as these are US addresses, I used Ordinal instead of InvariantCulture for maximum performance):
public override bool Equals(Object obj){
if (obj == null || this.GetType() != obj.GetType())
return false;
Address o = (Address)obj;
return
(string.Compare(this.ContactName, o.ContactName, StringComparison.OrdinalIgnoreCase) == 0) &&
(string.Compare(this.Company, o.Company, StringComparison.OrdinalIgnoreCase) == 0)
// ...
(string.Compare(this.Zip, o.Zip, StringComparison.OrdinalIgnoreCase) == 0)
}
I'd like to write a GetHashCode() similarly like so (ignore the concatenation inefficiency for the moment):
public override int GetHashCode(){
return (this.contactName + this.address1 + this.zip).ToLowerOrdinal().GetHashCode();
}
but that doesn't exist. What should I use instead? Or should I just use InvariantCulture in my Equals() method?
(I'm thinking .ToLowerInvariant().GetHashCode(), but I'm not 100% sure that InvariantCulture can't decide that an identical character (such as an accent) has a different meaning in another context.)

Whatever string comparison method you use in Equals(), it makes sense to use the same in GetHashCode().
There's no need to create temporary strings just to calculate hash codes. For StringComparison.OrdinalIgnoreCase, use StringComparer.OrdinalIgnoreCase.GetHashCode()
Then you need to combine multiple hash codes into one. XOR should be ok (because it's unlikely that one person's zip code is another's contact name). However purists might disagree.
public override int GetHashCode()
{
return StringComparer.OrdinalIgnoreCase.GetHashCode(ContactName) ^
StringComparer.OrdinalIgnoreCase.GetHashCode(Company) ^
// ...
StringComparer.OrdinalIgnoreCase.GetHashCode(Zip);
}
Having said all that, I'd question whether it's sensible to use a composite structure like Address as the key to a dictionary. But the principle holds for identity-type strings.

Two unequal objects can have the same hashcode. Though two equal objects should never have different hashcodes. If you use InvariantCulture for your hashcode it will still be correct as far as the contract for Equals goes if it's implemented in terms of OrdinalIgnoreCase.
From the documentation on StringComparer.OrdinalIgnoreCase (emphasis mine):
http://msdn.microsoft.com/en-us/library/system.stringcomparer.ordinalignorecase.aspx
The StringComparer returned by the OrdinalIgnoreCase property treats
the characters in the strings to compare as if they were converted to
uppercase using the conventions of the invariant culture, and
then performs a simple byte comparison that is independent of
language. This is most appropriate when comparing strings that are
generated programmatically or when comparing case-insensitive
resources such as paths and filenames.

Now you can use System.HashCode
public class Address
{
public string ContactName { get; private set; }
public string Company { get; private set; }
// ...
public string Zip { get; private set; }
public override bool Equals(object obj)
{
return
obj is Address address &&
string.Equals(ContactName, address.ContactName, StringComparison.OrdinalIgnoreCase) &&
string.Equals(Company, address.Company, StringComparison.OrdinalIgnoreCase) &&
// ...
string.Equals(Zip, address.Zip, StringComparison.OrdinalIgnoreCase);
}
public override int GetHashCode()
{
var hash = new HashCode();
hash.Add(ContactName, StringComparer.OrdinalIgnoreCase);
hash.Add(Company, StringComparer.OrdinalIgnoreCase);
// ...
hash.Add(Zip, StringComparer.OrdinalIgnoreCase);
return hash.ToHashCode();
}
}

Related

How do the `EqualOperator()` and `NotEqualOperator()` methods work in this `ValueObject` implementation (Microsoft Docs)?

In Domain Driven Design, we're introduced to the concept of a ValueObject, where objects don't carry an identity.
Microsoft have provided an implementation of their ValueObject in their Microservices series, where they override Equals() so that two ValueObject's with the same values are considered identical.
I've included their implementation below, but my question is relating to the EqualOperator() and NotEqualOperator() methods - how does this work? when are they called?
I'm familiar with operator overloads, but this seems to be an implementation I've not seen before, and I can't find any documentation around it.
Here is the implementation:
public abstract class ValueObject
{
protected static bool EqualOperator(ValueObject left, ValueObject right)
{
if (ReferenceEquals(left, null) ^ ReferenceEquals(right, null))
{
return false;
}
return ReferenceEquals(left, null) || left.Equals(right);
}
protected static bool NotEqualOperator(ValueObject left,
ValueObject right)
{
return !(EqualOperator(left, right));
}
protected abstract IEnumerable<object> GetAtomicValues();
public override bool Equals(object obj)
{
if (obj == null || obj.GetType() != GetType())
{
return false;
}
ValueObject other = (ValueObject)obj;
IEnumerator<object> thisValues = GetAtomicValues().GetEnumerator();
IEnumerator<object> otherValues =
other.GetAtomicValues().GetEnumerator();
while (thisValues.MoveNext() && otherValues.MoveNext())
{
if (ReferenceEquals(thisValues.Current, null) ^
ReferenceEquals(otherValues.Current, null))
{
return false;
}
if (thisValues.Current != null &&
!thisValues.Current.Equals(otherValues.Current))
{
return false;
}
}
return !thisValues.MoveNext() && !otherValues.MoveNext();
}
// Other utilility methods
}
Here's an example of their object in use:
public class Address : ValueObject
{
public String Street { get; private set; }
public String City { get; private set; }
public String State { get; private set; }
public String Country { get; private set; }
public String ZipCode { get; private set; }
private Address() { }
public Address(string street, string city, string state, string country,
string zipcode)
{
Street = street;
City = city;
State = state;
Country = country;
ZipCode = zipcode;
}
protected override IEnumerable<object> GetAtomicValues()
{
// Using a yield return statement to return
// each element one at a time
yield return Street;
yield return City;
yield return State;
yield return Country;
yield return ZipCode;
}
}
Actually, I find it astonishing that Microsoft implemented a value type using a class. Usually, structs are much better for this purpose, unless your value objects get very large. For most usages of value types such as coordinates or colors, this is not the case.
Leaving this discussion aside, what happens here is the following: If you implement a value object, you need to implement Equals and GetHashCode correctly, which includes consistently to each other. However, these two methods are actually not very difficult but verbose to implement: You need to cast the object and then check each of its properties. In case you are using classes, you have an additional boilerplate factor which is the fact that you typically want to speed up equality check using the reference equality check. That is, two objects need not be the same to be equal, but if they are the same, then they are also equal.
The class you depicted here is an attempt to support this consistency problem and boilerplate problem for value objects that are using classes by abstracting away quite a few commonalities. All you need to provide are the fields that make up the identity. In most cases, these are simply all fields. You iterate them using a co-method.
Now, for the actual question of when EqualOperator and NotEqualOperator are actually called, I would guess they are simply helper functions to make the implementation of operators more easy: You would provide an overloaded == operator that simply returns EqualOperator and != that simply returns NotEqualOperator. You might ask why the value type base class does not have these operators? Well, I guess this is because this would mean that the compiler would allow you to apply == and != to different types of value objects using the overloaded operator.

Comparing/Equals two IList<T> objects

EDIT:
What I'm trying to do is to find if db.Id is equal to xml.Id and db.SubTitle is equal to xml.SubTitle ....etc.....all my prop
also I did tried
bool result = db.SequenceEqual(xml) it returns false all the time.
ENd EDIT
I did search before I end-up asking for help and I'm not sure what is the best way to approach to my problem.
I have two IList objects and both have exact same property but the data might be different.
one object is populating from db and other is reading from xml to compare both source is in sync.
here is my object looks like:
public class EmployeeObject
{
public Int32 Id { get; set; }
public string SubTitle { get; set; }
public string Desc { get; set; }
public bool Active { get; set; }
public string ActiveDateTime { get; set; }
}
here is what I have tried:
IList<EmployeeObject> db = Db.EmployeeRepository.PopulateFromDb();
IList<EmployeeObject> xml = Xml.EmployeeRepository.PopulateFromXml();
//both object populated with data so far so good....
Time to compare now:
I have tried some thing like this:
if ((object)xml == null || ((object)db) == null)
return Object.Equals(xml, db);
return xml.Equals(db); // returning false all the time
i have checked both object has the exact same data and but still returning false
The Equals method that you are using is going to determine if the two references refer to the same list, not if the contents are the same. You can use SequenceEqual to actually verify that two sequences have the same items in the same order.
Next you'll run into the issue that each item in the list will be compared to see if they refer to the same object, rather than containing the same field values, or the same ID values, as seems to be the what you want here. One option is a custom comparer, but another is to pull out the "identity" object in question:
bool areEqual = db.Select(item => item.id)
.SequenceEqual(xml.Select(item => item.id));
You should override Equals and GetHashCode in your class like this:
public class EmployeeObject {
public Int32 Id { get; set; }
public string SubTitle { get; set; }
public string Desc { get; set; }
public bool Active { get; set; }
public string ActiveDateTime { get; set; }
public override bool Equals(object o){
EmployeeObject e = o as EmployeeObject;
if(e == null) return false;
return Id == e.Id && SubTitle == e.SubTitle && Desc == e.Desc
&& Active == e.Active && ActiveDateTime == e.ActiveDateTime;
}
public override int GetHashCode(){
return Id.GetHashCode() ^ SubTitle.GetHashCode() ^ Desc.GetHashCode()
^ Active.GetHashCode() ^ ActiveDateTime.GetHashCode();
}
}
Then use the SequenceEqual method:
return db.OrderBy(e=>e.Id).SequenceEqual(xml.OrderBy(e=>e.Id));
IList does not have an Equals method. What you're calling is the standard Object equals which checks whether two variables point to the same object or not.
If you want to check that the lists are semantically equivalent, you will need to check that each object in the list is equivalent. If the EmployeeObject class has an appropriate Equals method, then you can use SequenceEquals to compare the lists.
You can implement an IEqualityComparer and use the overload of SequenceEquals that takes an IEqualityComparer. Here is sample code for an IEqualityComparer from msdn:
class BoxEqualityComparer : IEqualityComparer<Box>
{
public bool Equals(Box b1, Box b2)
{
if (b1.Height == b2.Height && b1.Length == b2.Length && b1.Width == b2.Width)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(Box bx)
{
int hCode = bx.Height ^ bx.Length ^ bx.Width;
return hCode.GetHashCode();
}
}
You can then use SequenceEquals like this:
if (db.SequnceEquals(xml), new MyEqualityComparer())
{ /* Logic here */ }
Note that this will only return true if the items also are ordered in the same order in the lists. If appropriate, you can pre-order the items like this:
if (db.OrderBy(item => item.id).SequnceEquals(xml.OrderBy(item => item.id)), new MyEqualityComparer())
{ /* Logic here */ }
Obviously, the return of return xml.Equals(db); will always be false if you are comparing two different lists.
The only way for this to make sense is for you to actually be more specific about what it means for those two lists to be equal. That is you need to go through the elements in the two lists and ensure that the lists both contain the same items. Even that is ambiguous but assuming that the elements in your provide a proper override for Equals() and GetHashCode() then you can proceed to implement that actual list comparison.
Generally, the most efficient method to compare two lists that don't contain duplicates will be to use a hash set constructed from elements of one of the lists and then iterate through the elements of the second, testing whether each element is found in the hash set.
If the lists contain duplicates your best bet is going to be to sort them both and then walk the lists in tandem ensuring that the elements at each point match up.
You can use SequenceEqual provided you can actually compare instances of EmployeeObject. You probably have to Equals on EmployeeObject:
public override bool Equals(object o)
{
EmployeeObject obj = o as EmployeeObject;
if(obj == null) return false;
// Return true if all the properties match
return (Id == obj.Id &&
SubTitle == obj.SubTitle &&
Desc == obj.Desc &&
Active == obj.Active &&
ActiveDateTime == obj.ActiveDateTime);
}
Then you can do:
var same = db.SequenceEqual(xml);
You can also pass in a class that implements IEqualityComparer which instructs SequenceEqual how to compare each instance:
var same = db.SequenceEqual(xml, someComparer);
Another quick way, though not as fast, would be to build two enumerations of the value you want to compare, probably the id property in your case:
var ids1 = db.Select(i => i.Id); // List of all Ids in db
var ids2 = xml.Select(i => i.Id); // List of all Ids in xml
var same = ids1.SequenceEqual(ids2); // Both lists are the same

XOR for Object Identitification

In the code below, I was wondering why XOR (^) is being used to combine the hascodes of the constituent members of the composition (this is source from MonoCross 1.3)?
Is the bitwise XOR of an MXViewPerspective object's Perspective and ModelType member's used to uniquely identify the instance ?
If so, is there a name for this property of the XOR operation (how XOR-ing two values (ie, hashcodes) guarantees uniqueness) ?
public class MXViewPerspective : IComparable
{
public MXViewPerspective(Type modelType, string perspective)
{
this.Perspective = perspective;
this.ModelType = modelType;
}
public string Perspective { get; set; }
public Type ModelType { get; set; }
public int CompareTo(object obj)
{
MXViewPerspective p =(MXViewPerspective)obj;
return this.GetHashCode() == p.GetHashCode() ? 0 : -1;
}
public static bool operator ==(MXViewPerspective a, MXViewPerspective b)
{
return a.CompareTo(b) == 0;
}
public static bool operator !=(MXViewPerspective a, MXViewPerspective b)
{
return a.CompareTo(b) != 0;
}
public override bool Equals(object obj)
{
return this == (MXViewPerspective)obj;
}
public override int GetHashCode()
{
return this.ModelType.GetHashCode() ^ this.Perspective.GetHashCode();
}
public override string ToString()
{
return string.Format("Model \"{0}\" with perspective \"{1}\"", ModelType, Perspective);
}
}
Thank you.
xor'ing hashcodes doesn't guarantee uniqueness, but is usually used to improve the distribution over a table without complicating the hashing.
You want to make 2 different values map to different hash keys if they differ in any of the fields (i.e. - same ModelType, but different Perspective, or vice versa). So you need to incorporate both values into your hash key. You could have used + for example, or shift and concatenate them (the latter would be better in fact, as it would guarantee uniqueness, but also extend the key length which might complicate hashing).
xor won't guarantee this uniqueness since if you flip the same bit in ModelType and Perspective, you'd get the same hash key, for example 5 ^ 7 = 1 ^ 3 = 2, but it's usually good enough. Eventually it all depends on the ranges and distributions of the values you provide.

Mono implementation of Dictionary<T,T> using .Equals(obj o) instead of .GetHashCode()

By searching though msdn c# documentation and stack overflow, I get the clear impression that Dictionary<T,T> is supposed to use GetHashCode() for checking key-uniqueness and to do look-up.
The Dictionary generic class provides a mapping from a set of keys to a set of values. Each addition to the dictionary consists of a value and its associated key. Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table.
...
The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey.
I Use mono (in Unity3D), and after getting some weird results in my work, I conducted this experiment:
public class DictionaryTest
{
public static void TestKeyUniqueness()
{
//Test a dictionary of type1
Dictionary<KeyType1, string> dictionaryType1 = new Dictionary<KeyType1, string>();
dictionaryType1[new KeyType1(1)] = "Val1";
if(dictionaryType1.ContainsKey(new KeyType1(1)))
{
Debug.Log ("Key in dicType1 was already present"); //This line does NOT print
}
//Test a dictionary of type1
Dictionary<KeyType2, string> dictionaryType2 = new Dictionary<KeyType2, string>();
dictionaryType2[new KeyType2(1)] = "Val1";
if(dictionaryType2.ContainsKey(new KeyType2(1)))
{
Debug.Log ("Key in dicType2 was already present"); // Only this line prints
}
}
}
//This type implements only GetHashCode()
public class KeyType1
{
private int var1;
public KeyType1(int v1)
{
var1 = v1;
}
public override int GetHashCode ()
{
return var1;
}
}
//This type implements both GetHashCode() and Equals(obj), where Equals uses the hashcode.
public class KeyType2
{
private int var1;
public KeyType2(int v1)
{
var1 = v1;
}
public override int GetHashCode ()
{
return var1;
}
public override bool Equals (object obj)
{
return GetHashCode() == obj.GetHashCode();
}
}
Only the when using type KeyType2 are the keys considered equal. To me this demonstrates that Dictionary uses Equals(obj) - and not GetHashCode().
Can someone reproduce this, and help me interpret the meaning is? Is it an incorrect implementation in mono? Or have I misunderstood something.
i get the clear impression that Dictionary is supposed to use
.GetHashCode() for checking key-uniqueness
What made you think that? GetHashCode doesn't return unique values.
And MSDN clearly says:
Dictionary requires an equality implementation to
determine whether keys are equal. You can specify an implementation of
the IEqualityComparer generic interface by using a constructor that
accepts a comparer parameter; if you do not specify an implementation,
the default generic equality comparer EqualityComparer.Default is
used. If type TKey implements the System.IEquatable generic
interface, the default equality comparer uses that implementation.
Doing this:
public override bool Equals (object obj)
{
return GetHashCode() == obj.GetHashCode();
}
is wrong in the general case because you might end up with KeyType2 instances that are equal to StringBuilder, SomeOtherClass, AnythingYouCanImagine and what not instances.
You should totally do it like so:
public override bool Equals (object obj)
{
if (obj is KeyType2) {
return (obj as KeyType2).var1 == this.var1;
} else
return false;
}
When you are trying to override Equals and inherently GetHashCode you must ensure the following points (given the class MyObject) in this order (you were doing it the other way around):
1) When are 2 instances of MyObject equal ? Say you have:
public class MyObject {
public string Name { get; set; }
public string Address { get; set; }
public int Age { get; set; }
public DateTime TimeWhenIBroughtThisInstanceFromTheDatabase { get; set; }
}
And you have 1 record in some database that you need to be mapped to an instance of this class.
And you make the convention that the time you read the record from the database will be stored
in the TimeWhenIBroughtThisInstanceFromTheDatabase:
MyObject obj1 = DbHelper.ReadFromDatabase( ...some params...);
// you do that at 14:05 and thusly the TimeWhenIBroughtThisInstanceFromTheDatabase
// will be assigned accordingly
// later.. at 14:07 you read the same record into a different instance of MyClass
MyObject obj2 = DbHelper.ReadFromDatabase( ...some params...);
// (the same)
// At 14:09 you ask yourself if the 2 instances are the same
bool theyAre = obj1.Equals(obj2)
Do you want the result to be true ? I would say you do.
Therefore the overriding of Equals should like so:
public class MyObject {
...
public override bool Equals(object obj) {
if (obj is MyObject) {
var that = obj as MyObject;
return (this.Name == that.Name) &&
(this.Address == that.Address) &&
(this.Age == that.Age);
// without the syntactically possible but logically challenged:
// && (this.TimeWhenIBroughtThisInstanceFromTheDatabase ==
// that.TimeWhenIBroughtThisInstanceFromTheDatabase)
} else
return false;
}
...
}
2) ENSURE THAT whenever 2 instances are equal (as indicated by the Equals method you implement)
their GetHashCode results will be identitcal.
int hash1 = obj1.GetHashCode();
int hash2 = obj2.GetHashCode();
bool theseMustBeAlso = hash1 == hash2;
The easiest way to do that is (in the sample scenario):
public class MyObject {
...
public override int GetHashCode() {
int result;
result = ((this.Name != null) ? this.Name.GetHashCode() : 0) ^
((this.Address != null) ? this.Address.GetHashCode() : 0) ^
this.Age.GetHashCode();
// without the syntactically possible but logically challenged:
// ^ this.TimeWhenIBroughtThisInstanceFromTheDatabase.GetHashCode()
}
...
}
Note that:
- Strings can be null and that .GetHashCode() might fail with NullReferenceException.
- I used ^ (XOR). You can use whatever you want as long as the golden rule (number 2) is respected.
- x ^ 0 == x (for whatever x)

Overridding Equals and GetHash

I have read that when you override Equals on an class/object you need to override GetHashCode.
public class Person : IEquatable<Person>
{
public int PersonId { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public Person(int personId, string firstName, string lastName)
{
PersonId = personId;
FirstName = firstName;
LastName = lastName;
}
public bool Equals(Person obj)
{
Person p = obj as Person;
if (ReferenceEquals(null, p))
return false;
if (ReferenceEquals(this, p))
return true;
return Equals(p.FirstName, FirstName) &&
Equals(p.LastName, LastName);
}
}
Now given the following:
public static Dictionary<Person, Person> ObjDic= new Dictionary<Person, Person>();
public static Dictionary<int, Person> PKDic = new Dictionary<int, Person>();
Will not overridding the GetHashCode affect both of the Dictionary's above? What I am basically asking is how is GetHashCode generated? IF I still look for an object in PKDic will I be able to find it just based of the PK. If I wanted to override the GetHashCode how would one go about doing that?
You should always override GetHashCode.
A Dictionary<int, Person> will function without GetHashCode, but as soon as you call LINQ methods like Distinct or GroupBy, it will stop working.
Note, by the way, that you haven't actually overridden Equals either.
The IEquatable.Equals method is not the same as the virtual bool Equals(object obj) inherited from Object. Although the default IEqualityComparer<T> will use the IEquatable<T> interface if the class implements it, you should still override Equals, because other code might not.
In your case, you should override Equals and GetHashCode like this:
public override bool Equals(object obj) { return Equals(obj as Person); }
public override int GetHashCode() {
return FirstName.GetHashCode() ^ LastName.GetHashCode();
}
In your scenario, not overriding GetHashCode on your type will affect only the first dictionary, as the key is what's used for hashing, not the value.
When looking for the presence of a key, the Dictionary<TKey,TValue> will use the hash code to find out if any keys could be equal. It's important to note that a hash is a value that can determine if two things could be equal or very likely are equal. A hash, strictly speaking cannot determine if two items are equal.
Two equal objects are required to return the same hash code. However, two non-equal objects are not required to return different hash codes. In other words, if the hash codes don't match, you're guaranteed that the objects are not equal. If the hash codes do match, then the objects could be equal.
Because of this, the Dictionary will only call Equals on two objects if their hash codes match.
As to "how to override GetHashCode", that's a complicated question. Clasically, a hashing algorithm should provide a balance between even distribution of the codes over the set of values with a low collision rate (a collision is when two non-equal objects produce the same code). This is a simple thing to describe and a very difficult thing to accomplish. It's easy to do one or the other, but hard to balance them.
From a practical perspective (meaning disregarding performance), you could just XOR all of the characters of the first and last names (or even use their respective hash codes, as Joel suggests) as your hash code. This will give a low degree of collision, but won't result in a terribly even distribution. Unless you're dealing with very large sets or very frequent lookups, it won't be an issue.
Your GetHashCode() and Equals() methods should look like this:
public int GetHashCode()
{
return (FirstName.GetHashCode()+1) ^ (LastName.GetHashCode()+2);
}
public bool Equals(Object obj)
{
Person p = obj as Person;
if (p == null)
return false;
return this.Firstname == p.FirstName && this.LastName == p.Lastname;
}
The rule is that GetHashCode() must use exactly the fields used in determining equality for the .Equals() method.
As for the dictionary part of your question, .GetHashCode() is used for determining the key in a dictionary. However, this has a different impact for each of the dictionarys in your question.
The dictionary with the int key (presumably your person ID) will use the GetHashCode() for the integer, while the other dictionary (ObjDic) will use the GetHashCode() from your Person object. Therefore PKDic will always differentiate between two people with different IDs, while ObjDic might treat two people with different IDs but the same first and last names as the same record.
Here is how I would do it. Since it is common for two different people to have exactly the same name it makes more sense to use a unique identifier (which you already have).
public class Person : IEquatable<Person>
{
public override int GetHashCode()
{
return PersonId.GetHashCode();
}
public override bool Equals(object obj)
{
var that = obj as Person;
if (that != null)
{
return Equals(that);
}
return false;
}
public bool Equals(Person that)
{
return this.PersonId == that.PersonId;
}
}
To answer your specific question: This only matters if you are using Person as a key in an IDictionary collection. For example, Dictionary<Person, string> or SortedDictionary<Person, Foo>, but not Dictionary<int, Person>.

Categories

Resources