I have two lists that I am trying to compare. So I have created a class that implements the IEqualityComparer interface, please see below in the bottom section of code.
When I step through my code, the code goes through my GetHashCode implementation but not the Equals? I do not really understand the GetHashCode method, despite reading around on the internet and what exactly it is doing.
List<FactorPayoffs> missingfactorPayoffList =
factorPayoffList.Except(
factorPayoffListOrg,
new FactorPayoffs.Comparer()).ToList();
List<FactorPayoffs> missingfactorPayoffListOrg =
factorPayoffListOrg.Except(
factorPayoffList,
new FactorPayoffs.Comparer()).ToList();
So in the two lines of code above the two lists return me every item, telling me that the two lists do not contain any items that are the same. This is not true, there is only row that is different. I'm guessing this is happening because the Equals method is not getting called which in turns makes me wonder if my GetHashCode method is working as its supposed to?
class FactorPayoffs
{
public string FactorGroup { get; set; }
public string Factor { get; set; }
public DateTime dtPrice { get; set; }
public DateTime dtPrice_e { get; set; }
public double Ret_USD { get; set; }
public class Comparer : IEqualityComparer<FactorPayoffs>
{
public bool Equals(FactorPayoffs x, FactorPayoffs y)
{
return x.dtPrice == y.dtPrice &&
x.dtPrice_e == y.dtPrice_e &&
x.Factor == y.Factor &&
x.FactorGroup == y.FactorGroup;
}
public int GetHashCode(FactorPayoffs obj)
{
int hash = 17;
hash = hash * 23 + (obj.dtPrice).GetHashCode();
hash = hash * 23 + (obj.dtPrice_e).GetHashCode();
hash = hash * 23 + (obj.Factor ?? "").GetHashCode();
hash = hash * 23 + (obj.FactorGroup ?? "").GetHashCode();
hash = hash * 23 + (obj.Ret_USD).GetHashCode();
return hash;
}
}
}
Your Equals and GetHashCode implementations should involve the exact same set of properties; they do not.
In more formal terms, GetHashCode must always return the same value for two objects that compare equal. With your current code, two objects that differ only in the Ret_USD value will always compare equal but are not guaranteed to have the same hash code.
So what happens is that LINQ calls GetHashCode on two objects you consider equal, gets back different values, concludes that since the values were different the objects cannot be equal so there's no point at all in calling Equals and moves on.
To fix the problem, either remove the Ret_USD factor from GetHashCode or introduce it also inside Equals (whatever makes sense for your semantics of equality).
GetHashCode is intended as a fast but rough estimate of equality, so many operations potentially involving large numbers of comparisons start by checking this result instead of Equals, and only use Equals when necessary. In particular, if x.GetHashCode()!=y.GetHashCode(), then we already know x.Equals(y) is false, so there is no reason to call Equals. Had x.GetHashCode()==y.GetHashCode(), then x might equal y, but only a call to Equals will give a definite answer.
If you implement GetHashCode in a way that causes GetHashCode to be different for two objects where Equals returns true, then you have a bug in your code and many collection classes and algorithms relying on these methods will silently fail.
If you want to force the execution of the Equals you can implement it as follows
public int GetHashCode(FactorPayoffs obj) {
return 1;
}
Rewrite you GetHashCode implementation like this, to match the semantics of your Equals implementation.
public int GetHashCode(FactorPayoffs obj)
{
unchecked
{
int hash = 17;
hash = hash * 23 + obj.dtPrice.GetHashCode();
hash = hash * 23 + obj.dtPrice_e.GetHashCode();
if (obj.Factor != null)
{
hash = hash * 23 + obj.Factor.GetHashCode();
}
if (obj.FactorGroup != null)
{
hash = hash * 23 + obj.FactorGroup.GetHashCode();
}
return hash;
}
}
Note, you should use unchecked because you don't care about overflows. Additionaly, coalescing to string.Empty is pointlessy wasteful, just exclude from the hash.
See here for the best generic answer I know,
Related
Moving from Java to C# developer. In Java, I used a lot of Objects.hash(array) and Objects.hashCode(object) to build hash code of an object in hashCode function. I can't find any equivalent for those functions in C#. Any ideas?
Indeed I can call GetHashCode and combine them as shown in Concise way to combine field hashcodes?, but that requires a lot of null checks for reference types and that is making code much longer than comparable Java code.
In Java:
public class MyObject {
private Integer type;
private String[] attrs;
#Override
public int hashCode() {
int hash = 1, p = 31;
hash = p * hash + Objects.hashCode(type); // Handle Null case
hash = p * hash + Objects.hash(attrs); // Handle Null case
return hash;
}
}
In C#:
public class MyObject {
private int? type;
private string[] attrs;
public override int GetHashCode()
int hash = 1, p = 31;
hash = p * hash + hash_of(type); // Any utilities?
hash = p * hash + hash_of_array(attrs); // Any utilities?
return hash;
}
}
Note: I'm not looking for compatible results (as I know that hash codes would be different for similar objects, and can be different even between versions/platform for the same framework) but rather code size & concise way.
The .NET equivalent of this is the new System.HashCode, in particular the HashCode.Combine<> method which you can use to create a hash code from multiple values:
public class MyObject
{
private int? type;
private string[] attrs;
public override int GetHashCode()
=> HashCode.Combine(type, attrs);
// remember to also override Equals
}
Note that this type is only available in .NET Core. If you are running on .NET Framework, you will have to implement the hash code calculation yourself. Good ways to do that are documented here.
It seems that this problem has already been encountered by quite a few people:
List not working as expected
Contains always giving false
So I saw the answers and tried to implement the override of Equals and of GetHashCode but there seems that I am coding something wrong.
This is the situation: I have a list of Users(Class), each user has a List and a Name property, the list property contains licenses. I am trying to do a
if (!users.Contains(currentUser))
but it is not working as expected. And this is the code I did to override the Equals and GetHashCode:
public override bool Equals(object obj)
{
return Equals(obj as User);
}
public bool Equals(User otherUser)
{
if (ReferenceEquals(otherUser, null))
return false;
if (ReferenceEquals(this, otherUser))
return true;
return this._userName.Equals(otherUser.UserName) &&
this._licenses.SequenceEqual<string>(otherUser.Licenses);
}
public override int GetHashCode()
{
int hash = 13;
if (!_licenses.Any() && !_userName.Equals(""))
{
unchecked
{
foreach (string str in Licenses)
{
hash *= 7;
if (str != null) hash = hash + str.GetHashCode();
}
hash = (hash * 7) + _userName.GetHashCode();
}
}
return hash;
}
thank you for your suggestions and help in advance!
EDIT 1:
this is the code where I am doing the List.Contains, I am trying to see if the list already contains certain user, if not then add the user that isn't there. The Contains only works the first time, when currentUser changes then the User inside the list changes to the current user maybe this is a problem that is unrelated to the equals, any ideas?
if (isIn)
{
if (!listOfLicenses.Contains(items[3]))
listOfLicenses.Add(items[3]);
if (!users.Contains(currentUser))
{
User user2Add = new User();
user2Add.UserName = currentUser.UserName;
users.Add(user2Add);
userIndexer++;
}
if (users[userIndexer - 1].UserName.Equals(currentUser.UserName))
{
users[userIndexer - 1].Licenses.Add(items[3]);
}
result.Rows.Add();
}
Well, one problem with your hash code - if either there are no licences or the username is empty, you're ignoring the other component. I'd rewrite it as:
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 31 + _userName.GetHashCode();
foreach (string licence in Licences)
{
hash = hash * 31 + licences.GetHashCode();
}
return hash;
}
}
Shorter and simpler. It doesn't matter if you use the hash code of the empty string, or if you iterate over an empty collection.
That said, I'd have expected the previous code to work anyway. Note that it's order sensitive for the licences... oh, and List<T> won't use GetHashCode anyway. (You should absolutely override it appropriately, but it won't be the cause of the error.)
It would really help if you could show a short but complete program demonstrating the problem - I strongly suspect that you'll find it's actually a problem with your test data.
After users[userIndexer - 1].Licenses.Add(items[3]) , users[userIndexer - 1] is not the same user anymore. You have changed the Licences which is used in equality comparison(in User.Equals).
--EDIT
See below code
public class Class
{
static void Main(string[] args)
{
User u1 = new User("1");
User u2 = new User("1");
Console.WriteLine(u1.Equals(u2));
u2.Lic = "2";
Console.WriteLine(u1.Equals(u2));
}
}
public class User
{
public string Lic;
public User(string lic)
{
this.Lic = lic;
}
public override bool Equals(object obj)
{
return (obj as User).Lic == Lic;
}
}
You need you implement Equals and GetHashcode for the License class, otherwise SequenceEqual will not work.
Does your class implement IEquatable<User>? From your equality methods it appears it does but just checking.
The documentation for List.Contains states that:
This method determines equality by using the default equality
comparer, as defined by the object's implementation of the
IEquatable(T).Equals method for T (the type of values in the list)
It is very important to make sure that the value returned by GetHashCode never ever ever changes for a specific instance of an object. If the value changes then lists and dictionaries won't work correctly.
Think of GetHashCode as "GetPrimaryKey". You would not change the primary key of a user record in a database if someone added a new license to the user. Likewise you mustn't change the GetHashCode.
It appears from your code that you are changing the licenses collection and you're using that to calculate your hash code. So that is probably causing your issue.
Now, it is perfectly legitimate to use a constant value for every hash code you produce - you could just return 42 for every instance, for example. This will force calling Equals to determine if two objects are equal or not. All that having distinct hash codes does is short circuits the need to call Equals.
If the _userName field doesn't change then just return its hash code and see it that works.
I'm trying to understand the role of the GetHashCode method of the interface IEqualityComparer.
The following example is taken from MSDN:
using System;
using System.Collections.Generic;
class Example {
static void Main() {
try {
BoxEqualityComparer boxEqC = new BoxEqualityComparer();
Dictionary<Box, String> boxes = new Dictionary<Box,
string>(boxEqC);
Box redBox = new Box(4, 3, 4);
Box blueBox = new Box(4, 3, 4);
boxes.Add(redBox, "red");
boxes.Add(blueBox, "blue");
Console.WriteLine(redBox.GetHashCode());
Console.WriteLine(blueBox.GetHashCode());
}
catch (ArgumentException argEx) {
Console.WriteLine(argEx.Message);
}
}
}
public class Box {
public Box(int h, int l, int w) {
this.Height = h;
this.Length = l;
this.Width = w;
}
public int Height { get; set; }
public int Length { get; set; }
public int Width { get; set; }
}
class BoxEqualityComparer : IEqualityComparer<Box> {
public bool Equals(Box b1, Box b2) {
if (b1.Height == b2.Height & b1.Length == b2.Length
& b1.Width == b2.Width) {
return true;
}
else {
return false;
}
}
public int GetHashCode(Box bx) {
int hCode = bx.Height ^ bx.Length ^ bx.Width;
return hCode.GetHashCode();
}
}
Shouldn't the Equals method implementation be enough to compare two Box objects? That is where we tell the framework the rule used to compare the objects. Why is the GetHashCode needed?
Thanks.
Lucian
A bit of background first...
Every object in .NET has an Equals method and a GetHashCode method.
The Equals method is used to compare one object with another object - to see if the two objects are equivalent.
The GetHashCode method generates a 32-bit integer representation of the object. Since there is no limit to how much information an object can contain, certain hash codes are shared by multiple objects - so the hash code is not necessarily unique.
A dictionary is a really cool data structure that trades a higher memory footprint in return for (more or less) constant costs for Add/Remove/Get operations. It is a poor choice for iterating over though. Internally, a dictionary contains an array of buckets, where values can be stored. When you add a Key and Value to a dictionary, the GetHashCode method is called on the Key. The hashcode returned is used to determine the index of the bucket in which the Key/Value pair should be stored.
When you want to access the Value, you pass in the Key again. The GetHashCode method is called on the Key, and the bucket containing the Value is located.
When an IEqualityComparer is passed into the constructor of a dictionary, the IEqualityComparer.Equals and IEqualityComparer.GetHashCode methods are used instead of the methods on the Key objects.
Now to explain why both methods are necessary, consider this example:
BoxEqualityComparer boxEqC = new BoxEqualityComparer();
Dictionary<Box, String> boxes = new Dictionary<Box, string>(boxEqC);
Box redBox = new Box(100, 100, 25);
Box blueBox = new Box(1000, 1000, 25);
boxes.Add(redBox, "red");
boxes.Add(blueBox, "blue");
Using the BoxEqualityComparer.GetHashCode method in your example, both of these boxes have the same hashcode - 100^100^25 = 1000^1000^25 = 25 - even though they are clearly not the same object. The reason that they are the same hashcode in this case is because you are using the ^ (bitwise exclusive-OR) operator so 100^100 cancels out leaving zero, as does 1000^1000. When two different objects have the same key, we call that a collision.
When we add two Key/Value pairs with the same hashcode to a dictionary, they are both stored in the same bucket. So when we want to retrieve a Value, the GetHashCode method is called on our Key to locate the bucket. Since there is more than one value in the bucket, the dictionary iterates over all of the Key/Value pairs in the bucket calling the Equals method on the Keys to find the correct one.
In the example that you posted, the two boxes are equivalent, so the Equals method returns true. In this case the dictionary has two identical Keys, so it throws an exception.
TLDR
So in summary, the GetHashCode method is used to generate an address where the object is stored. So a dictionary doesn't have to search for it. It just computes the hashcode and jumps to that location. The Equals method is a better test of equality, but cannot be used to map an object into an address space.
GetHashCode is used in Dictionary colections and it creates hash for storing objects in it. Here is a nice article why and how to use IEqualtyComparer and GetHashCode http://dotnetperls.com/iequalitycomparer
While it would be possible for a Dictionary<TKey,TValue> to have its GetValue and similar methods call Equals on every single stored key to see whether it matches the one being sought, that would be very slow. Instead, like many hash-based collections, it relies upon GetHashCode to quickly exclude most non-matching values from consideration. If calling GetHashCode on an item being sought yields 42, and a collection has 53,917 items, but calling GetHashCode on 53,914 of the items yielded a value other than 42, then only 3 items will have to be compared to the ones being sought. The other 53,914 may safely be ignored.
The reason a GetHashCode is included in an IEqualityComparer<T> is to allow for the possibility that a dictionary's consumer might want to regard as equal objects that would normally not regard each other as equal. The most common example would be a caller that wants to use strings as keys but use case-insensitive comparisons. In order to make that work efficiently, the dictionary will need to have some form of hash function that will yield the same value for "Fox" and "FOX", but hopefully yield something else for "box" or "zebra". Since the GetHashCode method built into String doesn't work that way, the dictionary will need to get such a method from somewhere else, and IEqualityComparer<T> is the most logical place since the need for such a hash code would be very strongly associated with an Equals method that considers "Fox" and "FOX" identical to each other, but not to "box" or "zebra".
I have a class with fields ColDescriptionOne(string), ColDescriptionTwo(string) and ColCodelist(int). I want to get the Intersect of two lists of this class where the desc are equal but the codelist is different.
I can use the Where clause and get what I need. However I can't seem to make it work using a custom Comparer like this:
internal class CodeListComparer: EqualityComparer<SheetRow>
{
public override bool Equals(SheetRow x, SheetRow y)
{
return Equals(x.ColDescriptionOne, y.ColDescriptionOne) &&
Equals(x.ColDescriptionSecond, y.ColDescriptionOne)
&& !Equals(x.ColCodelist, y.ColCodelist);
}
public override int GetHashCode(SheetRow obj)
{
return ((obj.ColDescriptionOne.GetHashCode()*397) + (obj.ColDescriptionSecond.GetHashCode()*397)
+ obj.ColCodelist.GetHashCode());
}
}
And then use it like this:
var onylByCodeList = firstSheet.Entries.Intersect(otherSheet.Entries, new CodeListComparer());
Any ideas what I'm doing wrong here ?
thanks
Sunit
You have a typo in the Equals method. The second line is comparing ColDescriptionOne to ColDescriptionSecond. They should both be ColDescriptionSecond.
return Equals(x.ColDescriptionOne, y.ColDescriptionOne)
&& Equals(x.ColDescriptionSecond, y.ColDescriptionSecond)
&& !Equals(x.ColCodelist, y.ColCodelist);
The second problem you have is you are including the ColCodeList in the GetHashCode method. The GetHashCode method must return the same value for objects that are equal. In this case though ColCodeList is supposed to be different when values are equal. This means that in cases where you want 2 objects to be considered equal they are more likely to have different hash codes which is incorrect.
Take that out of the GetHashCode method and everything should work.
public override int GetHashCode(SheetRow obj)
{
return ((obj.ColDescriptionOne.GetHashCode()*397)
+ (obj.ColDescriptionSecond.GetHashCode()*397));
}
Basically, I have the following so far:
class Foo {
public override bool Equals(object obj)
{
Foo d = obj as Foo ;
if (d == null)
return false;
return this.Equals(d);
}
#region IEquatable<Foo> Members
public bool Equals(Foo other)
{
if (this.Guid != String.Empty && this.Guid == other.Guid)
return true;
else if (this.Guid != String.Empty || other.Guid != String.Empty)
return false;
if (this.Title == other.Title &&
this.PublishDate == other.PublishDate &&
this.Description == other.Description)
return true;
return false;
}
}
So, the problem is this: I have a non-required field Guid, which is a unique identifier. If this isn't set, then I need to try to determine equality based on less accurate metrics as an attempt at determining if two objects are equal. This works fine, but it make GetHashCode() messy... How should I go about it? A naive implementation would be something like:
public override int GetHashCode() {
if (this.Guid != String.Empty)
return this.Guid.GetHashCode();
int hash = 37;
hash = hash * 23 + this.Title.GetHashCode();
hash = hash * 23 + this.PublishDate.GetHashCode();
hash = hash * 23 + this.Description.GetHashCode();
return hash;
}
But what are the chances of the two types of hash colliding? Certainly, I wouldn't expect it to be 1 in 2 ** 32. Is this a bad idea, and if so, how should I be doing it?
A very easy hash code method for custom classes is to bitwise XOR each of the fields' hash codes together. It can be as simple as this:
int hash = 0;
hash ^= this.Title.GetHashCode();
hash ^= this.PublishDate.GetHashCode();
hash ^= this.Description.GetHashCode();
return hash;
From the link above:
XOR has the following nice properties:
It does not depend on order of computation.
It does not “waste” bits. If you change even one bit in one of the components, the final value will change.
It is quick, a single cycle on even the most primitive computer.
It preserves uniform distribution. If the two pieces you combine are uniformly distributed so will the combination be. In other words, it does not tend to collapse the range of the digest into a narrower band.
XOR doesn't work well if you expect to have duplicate values in your fields as duplicate values will cancel each other out when XORed. Since you're hashing together three unrelated fields that should not be a problem in this case.
I don't think there is a problem with the approach you have chosen to use. Worrying 'too much' about hash collisions is almost always an indication of over-thinking the problem; as long as the hash is highly likely to be different you should be fine.
Ultimately you may even want to consider leaving out the Description from your hash anyway if it is reasonable to expect that most of the time objects can be distinguished based on their title and publication date (books?).
You could even consider disregarding the GUID in your hash function altogether, and only use it in the Equals implementation to disambiguate the unlikely(?) case of hash clashes.