C# Hashset.Contains with custom EqualityComparer never calls GetHashCode() - c#

I have a very large (hundreds of thousands) hashset of Customer objects in my database. Then I get a newly imported hashset of customer objects and have to check for every new object, if it is contained in the existing hashset. Performance is very important.
I cannot use the default Equalitycomparer as it needs to be compared based on only three properties. Also, I can't override the Equals and GetHashCode functions of the Customer class for other reasons. So I aimed for a custom EqualityComparer (I tried implementing IEqualityComparer or inheriting from EqualityComparer and overriding like you see below - both with the same end result).
public class CustomerComparer : EqualityComparer<Customer>
{
public CustomerComparer(){ }
public override bool Equals(Customer x, Customer y)
{
return x != null &&
y != null &&
x.Name == y.Name &&
x.Description == y.Description &&
x.AdditionalInfo == y.AdditionalInfo
}
public override int GetHashCode(Customer obj)
{
var hashCode = -1885141022;
hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(obj.Name);
hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(obj.Description);
hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(obj.AdditionalInfo);
return hashCode;
}
}
Now to my problem: When I use the default EqualityComparer, generally only the GetHashCode method of Customer is called and the performance for my use case is very good (1-2 seconds). When I use my custom EqualityComparer, the GetHashCode method is never called but always the Equals method. The performance for my use case is horrible (hours). See code below:
public void FilterImportedCustomers(ISet<Customer> dataBase, IEnumerable<Customer> imported){
var equalityComparer = new CustomerComparer();
foreach (var obj in imported){
//great performance, always calls Customer.GetHashCode
if (!dataBase.Contains(obj){
//...
}
//awful performance, only calls CustomerComparer.AreEqual
if (!dataBase.Contains(obj, equalityComparer))
//...
}
}
}
Does anyone have an idea, how I can solve this problem? That would be amazing, I'm really stuck trying to solve this huge performance problem.
EDIT :
I solved it by passing my EuqalityComparer when initializing the hashset! By using the constructor overload that takes an IEqualityComparer so var database = new HashSet(new CustomerComparer())
Thank you, guys!

I solved it by passing my EqualityComparer when initializing the hashset! Is used the constructor overload that takes an IEqualityComparer so var database = new HashSet(new CustomerComparer())
Thanks to Lee and NetMage who commented under my original post.

Related

Linq to SQL: Unable to Successively insert 2 objects that compare equal

I have a table Size, and I have overloaded GetHashCode and Equals on the corresponding object generated by Linq to SQL.
I am executing the following code:
Size s = new Size();
data_context.Sizes.InsertOnSubmit(s);
data_context.SubmitChanges();
s = new Size
{
Diameter=1
};
data_context.Sizes.InsertOnSubmit(s);
data_context.SubmitChanges();
s = new Size
{
Diameter=1
};
data_context.Sizes.InsertOnSubmit(s);
data_context.SubmitChanges();
On the third SubmitChanges, I get an InvalidOperationException with the message
"Cannot add an entity that already exists."
If I rerun the program I can add the first two again but not the third. I have no clue what's going on, could someone give me a pointer?
If I only override Size.Equals or Size.GetHashCode, this problem doesn't arise, but it does when I override both.
The Equals and GetHashCode override are as follow (but really any kind of method that implement value semantics lead to the same behavior):
public override bool Equals(Object obj)
{
if (obj == null)
{
return false;
}
Size p = obj as Size;
if ((System.Object)p == null)
{
return false;
}
return p.Diameter == Diameter;
}
public override int GetHashCode()
{
return Diameter?.GetHashCode()??0;
}
You said:
I have overloaded GetHashCode and Equals on the corresponding object generated by Linq to SQL.
If you overloaded those so that two Size entities with the same properties are considered equal, then this is working as expected; LINQ to SQL uses equality checks to determine if two objects refer to the same record to keep things in sync.
Why did you override Equals and GetHashCode in the first place? Simply removing those overrides will avoid this issue if you're wanting to be able to insert similar Size objects.

How to compare two list and get the differences from first list?

I am stucked in a situation where I am having two lists. What will be the correct way to compare the two and get the result in third list. But here is a small catch. See example below:
ListOld = {
[Name=Amit, Class=V, Roll=3],
[Name=Naveen, Class=V, Roll=3],
[Name=Sammy, Class=V, Roll=3],
[Name=Neil, Class=X, Roll=21],
[Name=John, Class=VI, Roll=63]};
ListNew = {
[Name=Amit, Class=VI, Roll=13],
[Name=Naveen, Class=VII, Roll=3],
[Name=Sammy, Class=V, Roll=3],
[Name=Sanjay, Class=VIII, Roll=2]};
ResultantList = {
[Name=Amit, Class=VI, Roll=13],
[Name=Naveen, Class=VII, Roll=3],
[Name=Sanjay, Class=VIII, Roll=2]};
In above example, ListNew has got 3 changes, that are updates in Amit and Naveen , and Sanjay as a new member.
So In my query I need to compare both the list and want to pick either updated or added item in first list.
I tried, Except(), Intersect(), union(), with Equality Interfaces but no success. Kindly help.
You can do it by writing an IEqualityComparer
public class SomeClassComparer : IEqualityComparer<SomeClass>
{
public bool Equals(SomeClass x, SomeClass y)
{
return x.Name == y.Name && x.Class == y.Class && x.Roll == y.Roll;
}
public int GetHashCode(SomeClass obj)
{
return (obj.Name + obj.Class).GetHashCode();
}
}
Now, the linq is simple
var result = ListNew.Except(ListOld , new SomeClassComparer()).ToList();
You can also do the same thing by overiding ToString and GetHashcode methods, but IEqualityComparer is good especially when you have no control over the class you use.

Remove one list from another mvc

I have two lists of the same type and I am trying to subtract the information in one list from the other and then save the result into the model.
I have tried two ways of doing it and so far I can't get either to work:
These are the two lists:
List<ApplicationsDetailsModel> AppList = ctx.Database.SqlQuery<ApplicationsDetailsModel>("exec get_applications_r").ToList();
var AppExceptionList = new List<ApplicationsDetailsModel>();
foreach(var g in AnIrrelevantList)
{
AppExceptionList.Add(new ApplicationsDetailsModel()
{
AppNum = g.AppNum,
AppName = g.AppName
});
}
So they now both have different data in the same format.
model.AppList = AppList.Except(AppExceptionList).ToList();
This doesn't bring up any errors but it also doesn't subtract the second list from the first.
var onlyInFirst = AppList.RemoveAll(a => AppExceptionList.Any(b => AppList == AppExceptionList));
I got this idea from this question.
Anyone know where I am going wrong?
The instances are not the same and are therefore not found to be equal by Except since it's checking for reference equal (which is obviously never going to be the case). For your situation you need to write a custom equality comparer... I've taken a stab at it here...
public class ApplicationsDetailsModelEqualityComparer : IEqualityComparer<ApplicationsDetailsModel>
{
public bool Equals(ApplicationsDetailsModel x, ApplicationsDetailsModel y)
{
return x.AppNum == y.AppNum && x.AppName == y.AppName;
}
public int GetHashCode(ApplicationsDetailsModel obj)
{
int hashCode = (obj.AppName != null ? obj.AppName.GetHashCode() : 0);
hashCode = (hashCode * 397) ^ obj.AppNum.GetHashCode();
return hashCode;
}
}
Usage...
model.AppList = AppList.Except(AppExceptionList, new ApplicationsDetailsModelEqualityComparer()).ToList();
Note that I'm assuming your AppNum and AppName together uniquely identify your objects in your list.
The Except method doesn't know how to compare two objects of type ApplicationsDetailsModel. You need to tell him explicitly, using an IEqualityComparer :
public class ApplicationsDetailsModelComparer : IEqualityComparer<ApplicationsDetailsModel> {
public bool Equals(ApplicationsDetailsModel first, ApplicationsDetailsModel second) {
return first.AppNum == second.AppNum;
}
public int GetHashCode(ApplicationsDetailsModel applicationsDetailsModel) {
return applicationsDetailsModel.AppNum.GetHashCode();
}
}
Then, you use it like this :
model.AppList = AppList.Except(AppExceptionList, new ApplicationsDetailsModelComparer ()).ToList();
If AppNum isn't an unique value in your collection (like a primary key), feel free to adapt the comparer class to your needs.
jgauffin's answer on the question that you linked to sums it up:
http://stackoverflow.com/a/13361682/89092
Except requires that Equals and GetHashCode is implemented in the traversed class.
The problem is that the Except method does not now how to compare instances of ApplicationsDetailsModel
You should implement GetHashCode in ApplicationsDetailsModel to create a way to uniquely identify an instance
You should implement Equals in ApplicationsDetailsModel and use the result of GetHashCode to return whether or no the instances should be considered "Equal". It is probably best to do this by implementing the IEquatable interface: http://msdn.microsoft.com/en-us/library/ms131187(v=vs.110).aspx
When you perform these steps, the Except method will work as expected

Retrieved Dictionary Key Not Found

I have a SortedDictionary declared like such:
SortedDictionary<MyObject,IMyInterface> dict = new SortedDictionary<MyObject,IMyInterface>();
When its populated with values, if I grab any key from the dictionary and then try to reference it immediately after, I get a KeyNotFoundException:
MyObject myObj = dict.Keys.First();
var value = dict[myObj]; // This line throws a KeyNotFoundException
As I hover over the dictionary (after the error) with the debugger, I can clearly see the same key that I tried to reference is in fact contained in the dictionary. I'm populating the dictionary using a ReadOnlyCollection of MyObjects. Perhaps something odd is happening there? I tried overriding the == operator and Equals methods to get the explicit comparison I wanted, but no such luck. That really shouldn't matter since I'm actually getting a key directly from the Dictionary then querying the Dictionary using that same key. I can't figure out what's causing this. Has anyone ever seen this behavior?
EDIT 1
In overriding Equals I also overloaded (as MS recommends) GetHashCode as well. Here's the implementation of MyObject for anyone interested:
public class MyObject
{
public string UserName { get; set;}
public UInt64 UserID { get; set;}
public override bool Equals(object obj)
{
if (obj == null || GetType()!= obj.GetType())
{
return false;
}
// Return true if the fields match:
return this.Equals((MyObject)obj);
}
public bool Equals(MyObject other)
{
// Return true if the fields match
return this.UserID == other.UserID;
}
public override int GetHashCode()
{
return (int)this.UserID;
}
public static bool operator ==( MyObject a, MyObject b)
{
// If both are null, or both are same instance, return true.
if (System.Object.ReferenceEquals(a, b))
{
return true;
}
// If one is null, but not both, return false.
if (((object)a == null) || ((object)b == null))
{
return false;
}
// Return true if the fields match:
return a.UserID == b.UserID
}
public static bool operator !=( MyObject a, MyObject b)
{
return !(a == b);
}
}
What I noticed from debugging is that if I add a quick watch (after the KeyNotFoundException is thrown) for the expression:
dict.ElementAt(0).Key == value;
it returns true. How can this be?
EDIT 2
So the problem ended up being because SortedDictionary (and Dictionary as well) are not thread-safe. There was a background thread that was performing some operations on the dictionary which seem to be triggering a resort of the collection (adding items to the collection would do this). At the same time, when the dictionary iterated through the values to find my key, the collection was being changed and it was not finding my key even though it was there.
Sorry for all of you who asked for code on this one, I'm currently debugging an application that I inherited and I didn't realize this was going on on a timed, background thread. As such, I thought I copied and pasted all the relevant code, but I didn't realize there was another thread running behind everything manipulating the collection.
It appears that the problem ended up being because SortedDictionary is not thread-safe. There was a background thread that was performing some operations on the dictionary (adding items to the collection) which seems to be triggering a resort of the collection. At the same time, when the dictionary was attempting to iterate through the values to find my key, the collection was being changed and resorted, rendering the enumerator invalid, and it was not finding my key even though it was there.
I have a suspicion - it's possible that you're changing the UserID of the key after insertion. For example, this would demonstrate the problem:
var key = new MyObject { UserId = 10 };
var dictionary = new Dictionary<MyObject, string>();
dictionary[key] = "foo";
key.UserId = 20; // This will change the hash code
var value = dict[key]; // Bang!
You shouldn't change properties involved in equality/hash-code considerations for an object which is being used as the key in a hash-based collection. Ideally, change your code so that this can't be changed - make UserId readonly, initialized on construction.
The above definitely would cause a problem - but it's possible that it's not the same as the problem you're seeing, of course.
In addition to overloading == and Equals, make sure you override GetHashCode with a suitable hash function. In particular, see this specification from the documentation:
If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not
compare as equal, the GetHashCode methods for the two objects do not
have to return different values.
The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state
that determines the return value of the object's Equals method. Note
that this is true only for the current execution of an application,
and that a different hash code can be returned if the application is
run again.
For the best performance, a hash function should generate an even distribution for all input, including input that is heavily clustered.
An implication is that small modifications to object state should
result in large modifications to the resulting hash code for best hash
table performance.
Hash functions should be inexpensive to compute.
The GetHashCode method should not throw exceptions.
I agree with Jon Skeet's suspicion that you're somehow unintentionally modifying UserID property after it's added as a key. But since the only property that's important for testing equality in MyObject is UserID (and therefore that's the only property that the Dictionary cares about), I'd recommend refactoring your code to use a simple Dictionary<ulong, IMyInterface> instead:
Dictionary<ulong, IMyInterface> dict = new Dictionary<string, IMyInterface>();
ulong userID = dict.Keys.First();
var value = dict[userID];

LINQ - Distinct is ignored?

So I have a problem with my LINQ code, where I have to select a Distinct data set, I implement the following IEqualityComparer:
public class ProjectRoleComparer : IEqualityComparer<ProjectUserRoleMap>
{
public bool Equals(ProjectUserRoleMap x, ProjectUserRoleMap y)
{
return x.RoleID.Equals(y.RoleID);
}
public int GetHashCode(ProjectUserRoleMap obj)
{
return obj.GetHashCode();
}
}
In this context, I wish to retrieve a bunch of ProjectUserRoleMap objects related to a given Project, identified by it's ID, I only want one ProjectUserRoleMap per unique RoleID, but my strict instruction to perform a distinct select on the RoleID is ignored. I am totally clueless as to why this is the case, and do not understand LINQ enough to think of a workaround. Here is the calling code:
ProjectRoleComparer prCom = new ProjectRoleComparer();
IEnumerable<ProjectUserRoleMap> roleList = ProjectData.AllProjectUserRoleMap.Where(x => x.ProjectID == id).Distinct(prCom);
This code gives me 6 entries, when the number of entries I know I want is just 4. Am I doing something wrong with my usage of LINQ?
For reference, the ProjectUserRoleMap object has a RoleID, (int)
Your implementation of GetHashCode is wrong. Return obj.RoleID.GetHashCode();
Background:
Code that consumes an IEqualityComparer<T> usually first compares the hash codes of two objects. Only if those hash codes are the same Equals is called.
It is implemented like this, because two unequal objects can have the same hash key, but two equal objects never can have different hash keys - if GetHashCode() is implemented correctly.
This knowledge is used to improve the efficiency and performance of the comparison as implementations of GetHashCode are supposed to be fast, cheap operations.
Try:
public int GetHashCode(ProjectUserRoleMap obj)
{
return obj.RoleID.GetHashCode();
}

Categories

Resources