It seems that this problem has already been encountered by quite a few people:
List not working as expected
Contains always giving false
So I saw the answers and tried to implement the override of Equals and of GetHashCode but there seems that I am coding something wrong.
This is the situation: I have a list of Users(Class), each user has a List and a Name property, the list property contains licenses. I am trying to do a
if (!users.Contains(currentUser))
but it is not working as expected. And this is the code I did to override the Equals and GetHashCode:
public override bool Equals(object obj)
{
return Equals(obj as User);
}
public bool Equals(User otherUser)
{
if (ReferenceEquals(otherUser, null))
return false;
if (ReferenceEquals(this, otherUser))
return true;
return this._userName.Equals(otherUser.UserName) &&
this._licenses.SequenceEqual<string>(otherUser.Licenses);
}
public override int GetHashCode()
{
int hash = 13;
if (!_licenses.Any() && !_userName.Equals(""))
{
unchecked
{
foreach (string str in Licenses)
{
hash *= 7;
if (str != null) hash = hash + str.GetHashCode();
}
hash = (hash * 7) + _userName.GetHashCode();
}
}
return hash;
}
thank you for your suggestions and help in advance!
EDIT 1:
this is the code where I am doing the List.Contains, I am trying to see if the list already contains certain user, if not then add the user that isn't there. The Contains only works the first time, when currentUser changes then the User inside the list changes to the current user maybe this is a problem that is unrelated to the equals, any ideas?
if (isIn)
{
if (!listOfLicenses.Contains(items[3]))
listOfLicenses.Add(items[3]);
if (!users.Contains(currentUser))
{
User user2Add = new User();
user2Add.UserName = currentUser.UserName;
users.Add(user2Add);
userIndexer++;
}
if (users[userIndexer - 1].UserName.Equals(currentUser.UserName))
{
users[userIndexer - 1].Licenses.Add(items[3]);
}
result.Rows.Add();
}
Well, one problem with your hash code - if either there are no licences or the username is empty, you're ignoring the other component. I'd rewrite it as:
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 31 + _userName.GetHashCode();
foreach (string licence in Licences)
{
hash = hash * 31 + licences.GetHashCode();
}
return hash;
}
}
Shorter and simpler. It doesn't matter if you use the hash code of the empty string, or if you iterate over an empty collection.
That said, I'd have expected the previous code to work anyway. Note that it's order sensitive for the licences... oh, and List<T> won't use GetHashCode anyway. (You should absolutely override it appropriately, but it won't be the cause of the error.)
It would really help if you could show a short but complete program demonstrating the problem - I strongly suspect that you'll find it's actually a problem with your test data.
After users[userIndexer - 1].Licenses.Add(items[3]) , users[userIndexer - 1] is not the same user anymore. You have changed the Licences which is used in equality comparison(in User.Equals).
--EDIT
See below code
public class Class
{
static void Main(string[] args)
{
User u1 = new User("1");
User u2 = new User("1");
Console.WriteLine(u1.Equals(u2));
u2.Lic = "2";
Console.WriteLine(u1.Equals(u2));
}
}
public class User
{
public string Lic;
public User(string lic)
{
this.Lic = lic;
}
public override bool Equals(object obj)
{
return (obj as User).Lic == Lic;
}
}
You need you implement Equals and GetHashcode for the License class, otherwise SequenceEqual will not work.
Does your class implement IEquatable<User>? From your equality methods it appears it does but just checking.
The documentation for List.Contains states that:
This method determines equality by using the default equality
comparer, as defined by the object's implementation of the
IEquatable(T).Equals method for T (the type of values in the list)
It is very important to make sure that the value returned by GetHashCode never ever ever changes for a specific instance of an object. If the value changes then lists and dictionaries won't work correctly.
Think of GetHashCode as "GetPrimaryKey". You would not change the primary key of a user record in a database if someone added a new license to the user. Likewise you mustn't change the GetHashCode.
It appears from your code that you are changing the licenses collection and you're using that to calculate your hash code. So that is probably causing your issue.
Now, it is perfectly legitimate to use a constant value for every hash code you produce - you could just return 42 for every instance, for example. This will force calling Equals to determine if two objects are equal or not. All that having distinct hash codes does is short circuits the need to call Equals.
If the _userName field doesn't change then just return its hash code and see it that works.
Related
I have two lists that I am trying to compare. So I have created a class that implements the IEqualityComparer interface, please see below in the bottom section of code.
When I step through my code, the code goes through my GetHashCode implementation but not the Equals? I do not really understand the GetHashCode method, despite reading around on the internet and what exactly it is doing.
List<FactorPayoffs> missingfactorPayoffList =
factorPayoffList.Except(
factorPayoffListOrg,
new FactorPayoffs.Comparer()).ToList();
List<FactorPayoffs> missingfactorPayoffListOrg =
factorPayoffListOrg.Except(
factorPayoffList,
new FactorPayoffs.Comparer()).ToList();
So in the two lines of code above the two lists return me every item, telling me that the two lists do not contain any items that are the same. This is not true, there is only row that is different. I'm guessing this is happening because the Equals method is not getting called which in turns makes me wonder if my GetHashCode method is working as its supposed to?
class FactorPayoffs
{
public string FactorGroup { get; set; }
public string Factor { get; set; }
public DateTime dtPrice { get; set; }
public DateTime dtPrice_e { get; set; }
public double Ret_USD { get; set; }
public class Comparer : IEqualityComparer<FactorPayoffs>
{
public bool Equals(FactorPayoffs x, FactorPayoffs y)
{
return x.dtPrice == y.dtPrice &&
x.dtPrice_e == y.dtPrice_e &&
x.Factor == y.Factor &&
x.FactorGroup == y.FactorGroup;
}
public int GetHashCode(FactorPayoffs obj)
{
int hash = 17;
hash = hash * 23 + (obj.dtPrice).GetHashCode();
hash = hash * 23 + (obj.dtPrice_e).GetHashCode();
hash = hash * 23 + (obj.Factor ?? "").GetHashCode();
hash = hash * 23 + (obj.FactorGroup ?? "").GetHashCode();
hash = hash * 23 + (obj.Ret_USD).GetHashCode();
return hash;
}
}
}
Your Equals and GetHashCode implementations should involve the exact same set of properties; they do not.
In more formal terms, GetHashCode must always return the same value for two objects that compare equal. With your current code, two objects that differ only in the Ret_USD value will always compare equal but are not guaranteed to have the same hash code.
So what happens is that LINQ calls GetHashCode on two objects you consider equal, gets back different values, concludes that since the values were different the objects cannot be equal so there's no point at all in calling Equals and moves on.
To fix the problem, either remove the Ret_USD factor from GetHashCode or introduce it also inside Equals (whatever makes sense for your semantics of equality).
GetHashCode is intended as a fast but rough estimate of equality, so many operations potentially involving large numbers of comparisons start by checking this result instead of Equals, and only use Equals when necessary. In particular, if x.GetHashCode()!=y.GetHashCode(), then we already know x.Equals(y) is false, so there is no reason to call Equals. Had x.GetHashCode()==y.GetHashCode(), then x might equal y, but only a call to Equals will give a definite answer.
If you implement GetHashCode in a way that causes GetHashCode to be different for two objects where Equals returns true, then you have a bug in your code and many collection classes and algorithms relying on these methods will silently fail.
If you want to force the execution of the Equals you can implement it as follows
public int GetHashCode(FactorPayoffs obj) {
return 1;
}
Rewrite you GetHashCode implementation like this, to match the semantics of your Equals implementation.
public int GetHashCode(FactorPayoffs obj)
{
unchecked
{
int hash = 17;
hash = hash * 23 + obj.dtPrice.GetHashCode();
hash = hash * 23 + obj.dtPrice_e.GetHashCode();
if (obj.Factor != null)
{
hash = hash * 23 + obj.Factor.GetHashCode();
}
if (obj.FactorGroup != null)
{
hash = hash * 23 + obj.FactorGroup.GetHashCode();
}
return hash;
}
}
Note, you should use unchecked because you don't care about overflows. Additionaly, coalescing to string.Empty is pointlessy wasteful, just exclude from the hash.
See here for the best generic answer I know,
Take the following:
var x = new Action(() => { Console.Write("") ; });
var y = new Action(() => { });
var a = x.GetHashCode();
var b = y.GetHashCode();
Console.WriteLine(a == b);
Console.WriteLine(x == y);
This will print:
True
False
Why is the hashcode the same?
It is kinda surprising, and will make using delegates in a Dictionary as slow as a List (aka O(n) for lookups).
Update:
The question is why. IOW who made such a (silly) decision?
A better hashcode implementation would have been:
return Method ^ Target == null ? 0 : Target.GetHashcode();
// where Method is IntPtr
Easy! Since here is the implementation of the GetHashCode (sitting on the base class Delegate):
public override int GetHashCode()
{
return base.GetType().GetHashCode();
}
(sitting on the base class MulticastDelegate which will call above):
public sealed override int GetHashCode()
{
if (this.IsUnmanagedFunctionPtr())
{
return ValueType.GetHashCodeOfPtr(base._methodPtr);
}
object[] objArray = this._invocationList as object[];
if (objArray == null)
{
return base.GetHashCode();
}
int num = 0;
for (int i = 0; i < ((int) this._invocationCount); i++)
{
num = (num * 0x21) + objArray[i].GetHashCode();
}
return num;
}
Using tools such as Reflector, we can see the code and it seems like the default implementation is as strange as we see above.
The type value here will be Action. Hence the result above is correct.
UPDATE
My first attempt of a better implementation:
public class DelegateEqualityComparer:IEqualityComparer<Delegate>
{
public bool Equals(Delegate del1,Delegate del2)
{
return (del1 != null) && del1.Equals(del2);
}
public int GetHashCode(Delegate obj)
{
if(obj==null)
return 0;
int result = obj.Method.GetHashCode() ^ obj.GetType().GetHashCode();
if(obj.Target != null)
result ^= RuntimeHelpers.GetHashCode(obj);
return result;
}
}
The quality of this should be good for single cast delegates, but not so much for multicast delegates (If I recall correctly Target/Method return the values of the last element delegate).
But I'm not really sure if it fulfills the contract in all corner cases.
Hmm it looks like quality requires referential equality of the targets.
This smells like some of the cases mentioned in this thread, maybe it will give you some pointers on this behaviour. else, you could log it there :-)
What's the strangest corner case you've seen in C# or .NET?
Rgds GJ
From MSDN :
The default implementation of
GetHashCode does not guarantee
uniqueness or consistency; therefore,
it must not be used as a unique object
identifier for hashing purposes.
Derived classes must override
GetHashCode with an implementation
that returns a unique hash code. For
best results, the hash code must be
based on the value of an instance
field or property, instead of a static
field or property.
So if you have not overwritten the GetHashCode method, it may return the same. I suspect this is because it generates it from the definition, not the instance.
Im working with a large dataset of point of interest (POI) which all have Lat/Long values.
I want to filter out POIs that are in close proximity to each other. I think that to achieve this I can round the Lat/Long down to X decimal places and group by the results (or call Distinct() or whatever)...
I have written a little LINQ statement which doesnt seem to do what I want,
var l1 = (from p in PointsOfInterest where p.IsVisibleOnMap select p).Distinct(new EqualityComparer()).ToList();
where EqualityComparer is
public class EqualityComparer : IEqualityComparer<PointOfInterest>
{
public bool Equals(PointOfInterest x, PointOfInterest y)
{
return Math.Round(x.Latitude.Value, 4) == Math.Round(y.Latitude.Value, 4) &&
Math.Round(x.Longitude.Value, 4) == Math.Round(y.Latitude.Value, 4);
}
public int GetHashCode(PointOfInterest obj)
{
return obj.GetHashCode();
}
}
but the Equals method never seems to get called?!?
Any thoughts on the best way to do this?
Equals() never gets called because GetHashCode() returns different values for any two objects because you are using GetHashCode() defined in System.Object class .
You'll need to implement GetHashCode() a little differently.
try something like
public int GetHashCode(PointOfInterest obj)
{
return obj.Longitude.Value.GetHashCode() ^ obj.Latitude.Value.GetHashCode();
}
This is the problem:
public int GetHashCode(PointOfInterest obj)
{
return obj.GetHashCode();
}
You'll have to override GetHashCode() appropriately as well, probably currently all items are considered different because the hash code doesn't match.
From MSDN for IEqualityComparer.GetHashCode():
Implement this method to provide
customized hash codes for
objects,corresponding to the
customized equality comparison
provided by the Equals method.
Any thoughts on the best way to do this?
Simply:
IEnumerable<PointOfInterest> result =
from p in PointsOfInterest
where p.IsVisibleOnMap
group p by new
{
Latitude = Math.Round(p.Latitude.Value, 4),
Longitude = Math.Round(p.Longitude.Value, 4)
} into g
let winner = g.First()
select winner;
I have a class with fields ColDescriptionOne(string), ColDescriptionTwo(string) and ColCodelist(int). I want to get the Intersect of two lists of this class where the desc are equal but the codelist is different.
I can use the Where clause and get what I need. However I can't seem to make it work using a custom Comparer like this:
internal class CodeListComparer: EqualityComparer<SheetRow>
{
public override bool Equals(SheetRow x, SheetRow y)
{
return Equals(x.ColDescriptionOne, y.ColDescriptionOne) &&
Equals(x.ColDescriptionSecond, y.ColDescriptionOne)
&& !Equals(x.ColCodelist, y.ColCodelist);
}
public override int GetHashCode(SheetRow obj)
{
return ((obj.ColDescriptionOne.GetHashCode()*397) + (obj.ColDescriptionSecond.GetHashCode()*397)
+ obj.ColCodelist.GetHashCode());
}
}
And then use it like this:
var onylByCodeList = firstSheet.Entries.Intersect(otherSheet.Entries, new CodeListComparer());
Any ideas what I'm doing wrong here ?
thanks
Sunit
You have a typo in the Equals method. The second line is comparing ColDescriptionOne to ColDescriptionSecond. They should both be ColDescriptionSecond.
return Equals(x.ColDescriptionOne, y.ColDescriptionOne)
&& Equals(x.ColDescriptionSecond, y.ColDescriptionSecond)
&& !Equals(x.ColCodelist, y.ColCodelist);
The second problem you have is you are including the ColCodeList in the GetHashCode method. The GetHashCode method must return the same value for objects that are equal. In this case though ColCodeList is supposed to be different when values are equal. This means that in cases where you want 2 objects to be considered equal they are more likely to have different hash codes which is incorrect.
Take that out of the GetHashCode method and everything should work.
public override int GetHashCode(SheetRow obj)
{
return ((obj.ColDescriptionOne.GetHashCode()*397)
+ (obj.ColDescriptionSecond.GetHashCode()*397));
}
Basically, I have the following so far:
class Foo {
public override bool Equals(object obj)
{
Foo d = obj as Foo ;
if (d == null)
return false;
return this.Equals(d);
}
#region IEquatable<Foo> Members
public bool Equals(Foo other)
{
if (this.Guid != String.Empty && this.Guid == other.Guid)
return true;
else if (this.Guid != String.Empty || other.Guid != String.Empty)
return false;
if (this.Title == other.Title &&
this.PublishDate == other.PublishDate &&
this.Description == other.Description)
return true;
return false;
}
}
So, the problem is this: I have a non-required field Guid, which is a unique identifier. If this isn't set, then I need to try to determine equality based on less accurate metrics as an attempt at determining if two objects are equal. This works fine, but it make GetHashCode() messy... How should I go about it? A naive implementation would be something like:
public override int GetHashCode() {
if (this.Guid != String.Empty)
return this.Guid.GetHashCode();
int hash = 37;
hash = hash * 23 + this.Title.GetHashCode();
hash = hash * 23 + this.PublishDate.GetHashCode();
hash = hash * 23 + this.Description.GetHashCode();
return hash;
}
But what are the chances of the two types of hash colliding? Certainly, I wouldn't expect it to be 1 in 2 ** 32. Is this a bad idea, and if so, how should I be doing it?
A very easy hash code method for custom classes is to bitwise XOR each of the fields' hash codes together. It can be as simple as this:
int hash = 0;
hash ^= this.Title.GetHashCode();
hash ^= this.PublishDate.GetHashCode();
hash ^= this.Description.GetHashCode();
return hash;
From the link above:
XOR has the following nice properties:
It does not depend on order of computation.
It does not “waste” bits. If you change even one bit in one of the components, the final value will change.
It is quick, a single cycle on even the most primitive computer.
It preserves uniform distribution. If the two pieces you combine are uniformly distributed so will the combination be. In other words, it does not tend to collapse the range of the digest into a narrower band.
XOR doesn't work well if you expect to have duplicate values in your fields as duplicate values will cancel each other out when XORed. Since you're hashing together three unrelated fields that should not be a problem in this case.
I don't think there is a problem with the approach you have chosen to use. Worrying 'too much' about hash collisions is almost always an indication of over-thinking the problem; as long as the hash is highly likely to be different you should be fine.
Ultimately you may even want to consider leaving out the Description from your hash anyway if it is reasonable to expect that most of the time objects can be distinguished based on their title and publication date (books?).
You could even consider disregarding the GUID in your hash function altogether, and only use it in the Equals implementation to disambiguate the unlikely(?) case of hash clashes.