HashSet<T>.CreateSetComparer() cannot specify IEqualityComparer<T>, is there an alternative? - c#

In the internal source there is such a constructor public HashSetEqualityComparer(IEqualityComparer<T> comparer) but it's internal so I can't use it.
By default, HashSet<T>.CreateSetComparer() just uses the parameterless constructor which will apply EqualityComparer<T>.Default.
Is there a way to get a HashSetEqualityComparer<T> with a IEqualityComparer<T> of choice, without copying out the code from the source?

I think best solution is using SetEquals. It does the job you need and exactly in the same way that HashSetEqualityComparer does but it will account for any custom comparers defined in the sets its comparing.
So, in your specific scenario where you want to use a HashSet<T> as a key of a dictionary, you need to implement an IEqualityComparer<HashSet<T>> that makes use of SetEquals and "borrows" the reference source of HashSetEqualityComparer.GetHashCode():
public class CustomHashSetEqualityComparer<T>
: IEqualityComparer<HashSet<T>>
{
public bool Equals(HashSet<T> x, HashSet<T> y)
{
if (ReferenceEquals(x, null))
return false;
return x.SetEquals(y);
}
public int GetHashCode(HashSet<T> set)
{
int hashCode = 0;
if (set != null)
{
foreach (T t in set)
{
hashCode = hashCode ^
(set.Comparer.GetHashCode(t) & 0x7FFFFFFF);
}
}
return hashCode;
}
}
But yes, its a small pain that there is not way to directly create a SetEqualityComparer that leverages custom comparers but this unfortunate behavior is due, IMHO, more to a bug of the existing implementation than a lack of the needed overload; there is no reason why CreateSetComparer() can't return an IEqualityComparer that actually uses the comparers of the sets its comparing as the code above demonstrates.
If I had a voice in it, CreateSetComparer() wouldn't be static method at all. It would then be obvious, or at least predictable, that whatever comparer was returned would be created with the current set's comparer.

I agree #InBetween, using SetEquals is the best way. Even if add the constructor still can not achieve what you want.
please see this code:
http://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs,1360
Here is I try to do:
class HashSetEqualityComparerWrapper<T> : IEqualityComparer<HashSet<T>>
{
static private Type HashSetEqualityComparerType = HashSet<T>.CreateSetComparer().GetType();
private IEqualityComparer<HashSet<T>> _comparer;
public HashSetEqualityComparerWrapper()
{
_comparer = HashSet<T>.CreateSetComparer();
}
public HashSetEqualityComparerWrapper(IEqualityComparer<T> comparer)
{
_comparer = HashSet<T>.CreateSetComparer();
if (comparer != null)
{
FieldInfo m_comparer_field = HashSetEqualityComparerType.GetField("m_comparer", BindingFlags.NonPublic | BindingFlags.Instance);
m_comparer_field.SetValue(_comparer, comparer);
}
}
public bool Equals(HashSet<T> x, HashSet<T> y)
{
return _comparer.Equals(x, y);
}
public int GetHashCode(HashSet<T> obj)
{
return _comparer.GetHashCode(obj);
}
}
UPDATE
I took 5 mins to implement another version form HashSetEqualityComparer<T> source code. And rewrite the bool Equals(HashSet<T> x, HashSet<T> y) method. It is not complex. All code just copy and paste from source, I just revise a bit.
class CustomHashSetEqualityComparer<T> : IEqualityComparer<HashSet<T>>
{
private IEqualityComparer<T> m_comparer;
public CustomHashSetEqualityComparer()
{
m_comparer = EqualityComparer<T>.Default;
}
public CustomHashSetEqualityComparer(IEqualityComparer<T> comparer)
{
if (comparer == null)
{
m_comparer = EqualityComparer<T>.Default;
}
else
{
m_comparer = comparer;
}
}
// using m_comparer to keep equals properties in tact; don't want to choose one of the comparers
public bool Equals(HashSet<T> x, HashSet<T> y)
{
// http://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs,1360
// handle null cases first
if (x == null)
{
return (y == null);
}
else if (y == null)
{
// set1 != null
return false;
}
// all comparers are the same; this is faster
if (AreEqualityComparersEqual(x, y))
{
if (x.Count != y.Count)
{
return false;
}
}
// n^2 search because items are hashed according to their respective ECs
foreach (T set2Item in y)
{
bool found = false;
foreach (T set1Item in x)
{
if (m_comparer.Equals(set2Item, set1Item))
{
found = true;
break;
}
}
if (!found)
{
return false;
}
}
return true;
}
public int GetHashCode(HashSet<T> obj)
{
int hashCode = 0;
if (obj != null)
{
foreach (T t in obj)
{
hashCode = hashCode ^ (m_comparer.GetHashCode(t) & 0x7FFFFFFF);
}
} // else returns hashcode of 0 for null hashsets
return hashCode;
}
// Equals method for the comparer itself.
public override bool Equals(Object obj)
{
CustomHashSetEqualityComparer<T> comparer = obj as CustomHashSetEqualityComparer<T>;
if (comparer == null)
{
return false;
}
return (this.m_comparer == comparer.m_comparer);
}
public override int GetHashCode()
{
return m_comparer.GetHashCode();
}
static private bool AreEqualityComparersEqual(HashSet<T> set1, HashSet<T> set2)
{
return set1.Comparer.Equals(set2.Comparer);
}
}

Avoid this class if you use custom comparers. It uses its own equality comparer to perform GetHashCode, but when performing Equals(Set1, Set2) if Set1 and Set2 have the same equality comparer, the the HashSetEqualityComparer will use the comparer of the sets. HashsetEqualityComparer will only use its own comparer for equals if Set1 and Set2 have different comparers
It gets worse. It calls HashSet.HashSetEquals, which has a bug in it (See https://referencesource.microsoft.com/#system.core/System/Collections/Generic/HashSet.cs line 1489, which is missing a if (set1.Count != set2.Count) return false before performing the subset check.
The bug is illustrated by the following program:
class Program
{
private class MyEqualityComparer : EqualityComparer<int>
{
public override bool Equals(int x, int y)
{
return x == y;
}
public override int GetHashCode(int obj)
{
return obj.GetHashCode();
}
}
static void Main(string[] args)
{
var comparer = HashSet<int>.CreateSetComparer();
var set1 = new HashSet<int>(new MyEqualityComparer()) { 1 };
var set2 = new HashSet<int> { 1, 2 };
Console.WriteLine(comparer.Equals(set1, set2));
Console.WriteLine(comparer.Equals(set2, set1)); //True!
Console.ReadKey();
}
}
Regarding other answers to this question (I don't have the rep to comment):
Wilhelm Liao: His answer also contains the bug because it's copied from the reference source
InBetween: The solution is not symmetric. CustomHashSetEqualityComparer.Equals(A, B) does not always equals CustomHashSetEqualityComparer.Equals(B, A). I would be scared of that.
I think a robust implementation should throw an exception if it encounters a set which has a different comparer to its own. It could always use its own comparer and ignore the set comparer, but that would give strange and unintuitive behaviour.

Additional to the original solution, we can simplify GetHashCode with HashCode.Combine function:
public int GetHashCode(HashSet<T> set) {
int hashCode = 0;
foreach (var item in set) {
hashCode ^= HashCode.Combine(item);
}
return hashCode;
}

Related

Prevent adding two items with the same byte[] key in from being added to a KeyedCollection<byte[], MyObject>

Code:
public class Coll : KeyedCollection<byte[], MyObject>
{
protected override byte[] GetKeyForItem(MyObject item) => item.Key;
}
public class EquComparer : IEqualityComparer<byte[]>
{
public bool Equals(byte[]? x, byte[]? y)
{
if (x is null && y is null) return true;
if (x is null) return false;
if (y is null) return false;
return x.SequenceEqual(y);
}
public int GetHashCode([DisallowNull] byte[] obj)
{
int result = Int32.MinValue;
foreach (var b in obj)
{
result += b;
}
return result;
}
}
My key is byte[]. I want to set the default equality comparer to compare keys with to something using byte[]::SequenceEqual() to keep two items with the same keys from being added.
Is there a way to do this?
Edit: As other have pointed out I could use the constructor to specify a non default equality comparer. I am certain it will be forgotten at some point giving rise to a bug that will be difficult to find. That is why I want to add some code to the class that makes my custom equality comparer the default for that class.
The KeyedCollection<TKey,TItem> class has a constructor that accepts an IEqualityComparer<TKey>. You could invoke this constructor when instantiating the derived class:
public class Coll : KeyedCollection<byte[], MyObject>
{
public Coll() : base(new EquComparer()) { }
protected override byte[] GetKeyForItem(MyObject item) => item.Key;
}

UnitTesting List<T> of custom objects with List<S> of custom objects for equality

I'm writing some UnitTests for a parser and I'm stuck at comparing two List<T> where T is a class of my own, that contains another List<S>.
My UnitTest compares two lists and fails. The code in the UnitTest looks like this:
CollectionAssert.AreEqual(list1, list2, "failed");
I've written a test scenario that should clarify my question:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ComparerTest
{
class Program
{
static void Main(string[] args)
{
List<SimplifiedClass> persons = new List<SimplifiedClass>()
{
new SimplifiedClass()
{
FooBar = "Foo1",
Persons = new List<Person>()
{
new Person(){ ValueA = "Hello", ValueB="Hello"},
new Person(){ ValueA = "Hello2", ValueB="Hello2"},
}
}
};
List<SimplifiedClass> otherPersons = new List<SimplifiedClass>()
{
new SimplifiedClass()
{
FooBar = "Foo1",
Persons = new List<Person>()
{
new Person(){ ValueA = "Hello2", ValueB="Hello2"},
new Person(){ ValueA = "Hello", ValueB="Hello"},
}
}
};
// The goal is to ignore the order of both lists and their sub-lists.. just check if both lists contain the exact items (in the same amount). Basically ignore the order
// This is how I try to compare in my UnitTest:
//CollectionAssert.AreEqual(persons, otherPersons, "failed");
}
}
public class SimplifiedClass
{
public String FooBar { get; set; }
public List<Person> Persons { get; set; }
public override bool Equals(object obj)
{
if (obj == null) { return false;}
PersonComparer personComparer = new PersonComparer();
SimplifiedClass obj2 = (SimplifiedClass)obj;
return this.FooBar == obj2.FooBar && Enumerable.SequenceEqual(this.Persons, obj2.Persons, personComparer); // I think here is my problem
}
public override int GetHashCode()
{
return this.FooBar.GetHashCode() * 117 + this.Persons.GetHashCode();
}
}
public class Person
{
public String ValueA { get; set; }
public String ValueB { get; set; }
public override bool Equals(object obj)
{
if (obj == null)
{
return false;
}
Person obj2 = (Person)obj;
return this.ValueA == obj2.ValueA && this.ValueB == obj2.ValueB;
}
public override int GetHashCode()
{
if (!String.IsNullOrEmpty(this.ValueA))
{
//return this.ValueA.GetHashCode() ^ this.ValueB.GetHashCode();
return this.ValueA.GetHashCode() * 117 + this.ValueB.GetHashCode();
}
else
{
return this.ValueB.GetHashCode();
}
}
}
public class PersonComparer : IEqualityComparer<Person>
{
public bool Equals(Person x, Person y)
{
if (x != null)
{
return x.Equals(y);
}
else
{
return y == null;
}
}
public int GetHashCode(Person obj)
{
return obj.GetHashCode();
}
}
}
The question is strongly related to C# Compare Lists with custom object but ignore order, but I can't find the difference, other than I wrap a list into another object and use the UnitTest one level above.
I've tried to use an IEqualityComparer:
public class PersonComparer : IEqualityComparer<Person>
{
public bool Equals(Person x, Person y)
{
if (x != null)
{
return x.Equals(y);
}
else
{
return y == null;
}
}
public int GetHashCode(Person obj)
{
return obj.GetHashCode();
}
}
Afterwards I've tried to implement the ''IComparable'' interface thats allows the objects to be ordered. (Basically like this: https://stackoverflow.com/a/4188041/225808)
However, I don't think my object can be brought into a natural order. Therefore I consider this a hack, if I come up with random ways to sort my class.
public class Person : IComparable<Person>
public int CompareTo(Person other)
{
if (this.GetHashCode() > other.GetHashCode()) return -1;
if (this.GetHashCode() == other.GetHashCode()) return 0;
return 1;
}
I hope I've made no mistakes while simplifying my problem. I think the main problems are:
How can I allow my custom objects to be comparable and define the equality in SimplifiedClass, that relies on the comparision of subclasses (e.g. Person in a list, like List<Person>). I assume Enumerable.SequenceEqual should be replaced with something else, but I don't know with what.
Is CollectionAssert.AreEqual the correct method in my UnitTest?
Equals on a List<T> will only check reference equality between the lists themselves, it does not attempt to look at the items in the list. And as you said you don't want to use SequenceEqual because you don't care about the ordering. In that case you should use CollectionAssert.AreEquivalent, it acts just like Enumerable.SequenceEqual however it does not care about the order of the two collections.
For a more general method that can be used in code it will be a little more complicated, here is a re-implemented version of what Microsoft is doing in their assert method.
public static class Helpers
{
public static bool IsEquivalent(this ICollection source, ICollection target)
{
//These 4 checks are just "shortcuts" so we may be able to return early with a result
// without having to do all the work of comparing every member.
if (source == null != (target == null))
return false; //If one is null and one is not, return false immediately.
if (object.ReferenceEquals((object)source, (object)target) || source == null)
return true; //If both point to the same reference or both are null (We validated that both are true or both are false last if statement) return true;
if (source.Count != target.Count)
return false; //If the counts are different return false;
if (source.Count == 0)
return true; //If the count is 0 there is nothing to compare, return true. (We validated both counts are the same last if statement).
int nullCount1;
int nullCount2;
//Count up the duplicates we see of each element.
Dictionary<object, int> elementCounts1 = GetElementCounts(source, out nullCount1);
Dictionary<object, int> elementCounts2 = GetElementCounts(target, out nullCount2);
//It checks the total number of null items in the collection.
if (nullCount2 != nullCount1)
{
//The count of nulls was different, return false.
return false;
}
else
{
//Go through each key and check that the duplicate count is the same for
// both dictionaries.
foreach (object key in elementCounts1.Keys)
{
int sourceCount;
int targetCount;
elementCounts1.TryGetValue(key, out sourceCount);
elementCounts2.TryGetValue(key, out targetCount);
if (sourceCount != targetCount)
{
//Count of duplicates for a element where different, return false.
return false;
}
}
//All elements matched, return true.
return true;
}
}
//Builds the dictionary out of the collection, this may be re-writeable to a ".GroupBy(" but I did not take the time to do it.
private static Dictionary<object, int> GetElementCounts(ICollection collection, out int nullCount)
{
Dictionary<object, int> dictionary = new Dictionary<object, int>();
nullCount = 0;
foreach (object key in (IEnumerable)collection)
{
if (key == null)
{
++nullCount;
}
else
{
int num;
dictionary.TryGetValue(key, out num);
++num;
dictionary[key] = num;
}
}
return dictionary;
}
}
What it does is it makes a dictionary out of the two collections, counting the duplicates and storing it as the value. It then compares the two dictionaries to make sure that the duplicate count matches for both sides. This lets you know that {1, 2, 2, 3} and {1, 2, 3, 3} are not equal where Enumerable.Execpt would tell you that they where.

How do I properly use distinct and compare?

I am using distinct which says
Returns distinct elements from a sequence by using the default
equality comparer to compare values.
Yet when I run this code, I get multiple same id's
var ls = ls2.Distinct().OrderByDescending(s => s.id);
foreach (var v in ls)
{
Console.WriteLine(v.id);
}
I implemented these in my class yet this still doesnt work
class Post : IComparable<Post>, IEqualityComparer<Post>, IComparer<Post>
This is how I implemented it
int IComparable<Post>.CompareTo(Post other)
{
return (int)(id - other.id);
}
bool IEqualityComparer<Post>.Equals(Post x, Post y)
{
return x.id == y.id;
}
int IEqualityComparer<Post>.GetHashCode(Post obj)
{
throw new NotImplementedException();
}
int IComparer<Post>.Compare(Post x, Post y)
{
return (int)(x.id - y.id);
}
You should implement GetHashCode().
Since you're delegating to the - and the == why not just delegate to the appropriate functions in id. ie. id.Compare(other.id), and obj.id.GetHashCode(), and delegate the Comparer to Compare. And also implement IEquatable
int IComparable<Post>.CompareTo(Post other)
{
return id.Compare(other.id);
}
bool IEquatable<Post>.Equals(Post x)
{
return id == y.id;
}
bool IEqualityComparer<Post>.Equals(Post x, Post y)
{
return x.Equals(y.id);
}
int IEqualityComparer<Post>.GetHashCode(Post obj)
{
return obj.id.GetHashCode();
}
int IComparer<Post>.Compare(Post x, Post y)
{
return x.Compare(y);
}
This assumes that id is an int, if not then you may have implement these for IEquatable for id too.
You need to properly implement GetHashCode() in your comparer - in your case you can just return the hash code of the id:
int IEqualityComparer<Post>.GetHashCode(Post obj)
{
return obj.id.GetHashCode();
}
Also as pointed out by #dash in a comment you need to implement IEquatable<T> in Post if you choose to go that route (option 1).
A comparer should be implemented in a separate class that you can then pass in in one of the Distinct() overloads (option 2), i.e. in your case could be class MyPostComparer:
var ls = ls2.Distinct(new MyPostComparer()).OrderByDescending(s => s.id);
A third option would be to use the DistinctBy() method of the MoreLinq project.

why doesn't .Except() and Intersect() work here using LINQ?

i have the following code which doesnt seem to be working:
Context:
I have two lists of objects:
* listOne has 100 records
* listTwo has 70 records
many of them have the same Id property (in both lists);
var listOneOnlyItems = listOne.Except(listTwo, new ItemComparer ());
here is the comparer
public class ItemComparer : IEqualityComparer<Item>
{
public bool Equals(Item x, Item y)
{
if (x.Id == y.Id)
return true;
return false;
}
public int GetHashCode(Item obj)
{
return obj.GetHashCode();
}
}
after i run this code and look into the results
listOneOnlyItems
still has 100 records (should only have 30). Can anyone help me?
also, running
IEnumerable<Item> sharedItems = listOne.Intersect(listTwo, new ItemComparer());
returns zero reesults in the sharedItems collection
public int GetHashCode(Item obj)
{
return obj.Id.GetHashCode();
}
Worth a check at least -- IIRC GetHashCode() is tested first before equality, and if they don't have the same hash it won't bother checking equality. I'm not sure what to expect from obj.GetHashCode() -- it depends on what you've implemented on the Item class.
Consider making GetHashCode() return obj.Id.GetHashCode()
This code works fine:
static void TestLinqExcept()
{
var seqA = Enumerable.Range(1, 10);
var seqB = Enumerable.Range(1, 7);
var seqAexceptB = seqA.Except(seqB, new IntComparer());
foreach (var x in seqAexceptB)
{
Console.WriteLine(x);
}
}
class IntComparer: EqualityComparer<int>
{
public override bool Equals(int x, int y)
{
return x == y;
}
public override int GetHashCode(int x)
{
return x;
}
}
You need to add 'override' keywords to your EqualityComparer methods. (I think not having 'override' as implicit was a mistake on the part of the C# designers).

IEnumerable.Except() and a custom comparer

I'm having troubles with the Except() method.
Instead of returning the difference, it returns the original set.
I've tried implementing the IEquatable and IEqualityComparer in the Account class.
I've also tried creating a separate IEqualityComparer class for Account.
When the Except() method is called from main, it doesn't seem to call my custom Equals() method, but when I tried Count(), it did call the custom GetHashCode() method!
I'm sure I made a trivial mistake somewhere and I hope a fresh pair of eyes can help me.
main:
IEnumerable<Account> everyPartnerID =
from partner in dataContext.Partners
select new Account { IDPartner = partner.ID, Name = partner.Name };
IEnumerable<Account> hasAccountPartnerID =
from partner in dataContext.Partners
from account in dataContext.Accounts
where
!partner.ID.Equals(Guid.Empty) &&
account.IDPartner.Equals(partner.ID) &&
account.Username.Equals("Special")
select new Account { IDPartner = partner.ID, Name = partner.Name };
IEnumerable<Account> noAccountPartnerID =
everyPartnerID.Except(
hasAccountPartnerID,
new LambdaComparer<Account>((x, y) => x.IDPartner.Equals(y.IDPartner)));
Account:
public class Account : IEquatable<Account>
{
public Guid IDPartner{ get; set; }
public string Name{ get; set; }
/* #region IEquatable<Account> Members
public bool Equals(Account other)
{
return this.IDPartner.Equals(other.IDPartner);
}
#endregion*/
}
LambdaComparer:
public class LambdaComparer<T> : IEqualityComparer<T>
{
private readonly Func<T, T, bool> _lambdaComparer;
private readonly Func<T, int> _lambdaHash;
public LambdaComparer(Func<T, T, bool> lambdaComparer) :
this(lambdaComparer, o => o.GetHashCode())
{
}
public LambdaComparer(Func<T, T, bool> lambdaComparer, Func<T, int> lambdaHash)
{
if (lambdaComparer == null)
throw new ArgumentNullException("lambdaComparer");
if (lambdaHash == null)
throw new ArgumentNullException("lambdaHash");
_lambdaComparer = lambdaComparer;
_lambdaHash = lambdaHash;
}
public bool Equals(T x, T y)
{
return _lambdaComparer(x, y);
}
public int GetHashCode(T obj)
{
return _lambdaHash(obj);
}
}
Basically your LambdaComparer class is broken when you pass in just a single function, because it uses the "identity hash code" provider if you don't provide anything else. The hash code is used by Except, and that's what's causing the problem.
Three options here:
Implement your own ExceptBy method and then preferably contribute it to MoreLINQ which contains that sort of thing.
Use a different implementation of IEqualityComparer<T>. I have a ProjectionEqualityComparer class you can use in MiscUtil - or you can use the code as posted in another question.
Pass a lambda expression into your LambdaComparer code to use for the hash:
new LambdaComparer<Account>((x, y) => x.IDPartner.Equals(y.IDPartner)),
x => x.IDPartner.GetHashCode());
You could also quickly fix your LambdaComparer to work when only the equality parameters are supplied like this:
public LambdaComparer(Func<T, T, bool> lambdaComparer) :
this(lambdaComparer, o => 1)
{
}
Look here, how to use and implementing IEqualityComparer in way with linq.Except and beyond.
https://www.dreamincode.net/forums/topic/352582-linq-by-example-3-methods-using-iequalitycomparer/
public class Department {
public string Code { get; set; }
public string Name { get; set; }
}
public class DepartmentComparer : IEqualityComparer {
// equal if their Codes are equal
public bool Equals(Department x, Department y) {
// reference the same objects?
if (Object.ReferenceEquals(x, y)) return true;
// is either null?
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
return x.Code == y.Code;
}
public int GetHashCode(Department dept) {
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
// if null default to 0
if (Object.ReferenceEquals(dept, null)) return 0;
return dept.Code.GetHashCode();
}
}
IEnumerable<Department> deptExcept = departments.Except(departments2,
new DepartmentComparer());
foreach (Department dept in deptExcept) {
Console.WriteLine("{0} {1}", dept.Code, dept.Name);
}
// departments not in departments2: AC, Accounts.
IMO, this answer above is the simplest solution compared to other solutions for this problem. I tweaked it such that I use the same logic for the Object class's Equals() and GetHasCode(). The benefit is that this solution is completely transparent to the client linq expression.
public class Ericsson4GCell
{
public string CellName { get; set; }
public string OtherDependantProperty { get; set; }
public override bool Equals(Object y)
{
var rhsCell = y as Ericsson4GCell;
// reference the same objects?
if (Object.ReferenceEquals(this, rhsCell)) return true;
// is either null?
if (Object.ReferenceEquals(this, null) || Object.ReferenceEquals(rhsCell, null))
return false;
return this.CellName == rhsCell.CellName;
}
public override int GetHashCode()
{
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
// if null default to 0
if (Object.ReferenceEquals(this, null)) return 0;
return this.CellName.GetHashCode();
}
}

Categories

Resources