I have a List of paths of files stored on my computer. My aim is to first filter out the files which have the same name and and then filter out those which have the same size.
To do so, I have made two classes implementing IEqualityComparer<string>, and implemented Equals and GetHashCode methods.
var query = FilesList.Distinct(new CustomTextComparer())
.Distinct(new CustomSizeComparer());
The code for both of the classes is given below:-
public class CustomTextComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
if (Path.GetFileName(x) == Path.GetFileName(y))
{
return true;
}
return false;
}
public int GetHashCode(string obj)
{
return obj.GetHashCode();
}
}
public class CustomSizeComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
if (new FileInfo(x).Length == new FileInfo(y).Length)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(string obj)
{
return obj.GetHashCode();
}
}
But the code does not work.
It doesn't throw any exceptions nor is there any compiler error, but the problem is that the code doesn't work(doesn't exclude duplicate files).
So, how can I correct this problem? Is there anything I can do to make the code work correctly.
Change your GetHashCode to work on the compared value. I.e. for your size comparer:
public int GetHashCode(string obj)
{
return FileInfo(x).Length.GetHashCode();
}
And for the other:
public int GetHashCode(string obj)
{
return Path.GetFileName(obj).GetHashCode();
}
According to this answer - What's the role of GetHashCode in the IEqualityComparer<T> in .NET?, the hash code is evaluated first. Equals is called in case of collision.
Obviously it would be sensible to work on FileInfos, not on strings.
So maybe:
FileList.Select(x => new FileInfo(x))
.Distinct(new CustomTextComparer())
.Distinct(new CustomSizeComparer());
Of course, then you have to change your comparers to work on the correct type.
Your GetHashCode must return the same value for any objects that are of equal value:
// Try this
public int GetHashCode(string obj)
{
return Path.GetFileName(x).GetHashCode();
}
// And this
public int GetHashCode(string obj)
{
return new FileInfo(x).Length.GetHashCode();
}
But this is a much easier way for the whole problem without the extra classes:
var query = FilesList
.GroupBy(f => Path.GetFileName(f)).Select(g => g.First())
.GroupBy(f => new FileInfo(f).Length).Select(g => g.First())
.ToList();
The hash code is used before Equals is ever called. Since your code gives different hash codes for items that are equal, you're not getting the desired result. Instead, you have to make sure the hash code returned is equal when the items are equal, so for example:
public class CustomTextComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
if (Path.GetFileName(x) == Path.GetFileName(y))
{
return true;
}
return false;
}
public int GetHashCode(string obj)
{
return Path.GetFileName(obj).GetHashCode();
}
}
However, as Piotr pointed out, this isn't exactly a good way to go about your goal, since you're going to be doing a lot of Path.GetFileName and new FileInfo respectively, which is a going to be a significant performance hit, especially since you're dealing with the file system, which is not exactly known for its speed of response.
Related
In the internal source there is such a constructor public HashSetEqualityComparer(IEqualityComparer<T> comparer) but it's internal so I can't use it.
By default, HashSet<T>.CreateSetComparer() just uses the parameterless constructor which will apply EqualityComparer<T>.Default.
Is there a way to get a HashSetEqualityComparer<T> with a IEqualityComparer<T> of choice, without copying out the code from the source?
I think best solution is using SetEquals. It does the job you need and exactly in the same way that HashSetEqualityComparer does but it will account for any custom comparers defined in the sets its comparing.
So, in your specific scenario where you want to use a HashSet<T> as a key of a dictionary, you need to implement an IEqualityComparer<HashSet<T>> that makes use of SetEquals and "borrows" the reference source of HashSetEqualityComparer.GetHashCode():
public class CustomHashSetEqualityComparer<T>
: IEqualityComparer<HashSet<T>>
{
public bool Equals(HashSet<T> x, HashSet<T> y)
{
if (ReferenceEquals(x, null))
return false;
return x.SetEquals(y);
}
public int GetHashCode(HashSet<T> set)
{
int hashCode = 0;
if (set != null)
{
foreach (T t in set)
{
hashCode = hashCode ^
(set.Comparer.GetHashCode(t) & 0x7FFFFFFF);
}
}
return hashCode;
}
}
But yes, its a small pain that there is not way to directly create a SetEqualityComparer that leverages custom comparers but this unfortunate behavior is due, IMHO, more to a bug of the existing implementation than a lack of the needed overload; there is no reason why CreateSetComparer() can't return an IEqualityComparer that actually uses the comparers of the sets its comparing as the code above demonstrates.
If I had a voice in it, CreateSetComparer() wouldn't be static method at all. It would then be obvious, or at least predictable, that whatever comparer was returned would be created with the current set's comparer.
I agree #InBetween, using SetEquals is the best way. Even if add the constructor still can not achieve what you want.
please see this code:
http://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs,1360
Here is I try to do:
class HashSetEqualityComparerWrapper<T> : IEqualityComparer<HashSet<T>>
{
static private Type HashSetEqualityComparerType = HashSet<T>.CreateSetComparer().GetType();
private IEqualityComparer<HashSet<T>> _comparer;
public HashSetEqualityComparerWrapper()
{
_comparer = HashSet<T>.CreateSetComparer();
}
public HashSetEqualityComparerWrapper(IEqualityComparer<T> comparer)
{
_comparer = HashSet<T>.CreateSetComparer();
if (comparer != null)
{
FieldInfo m_comparer_field = HashSetEqualityComparerType.GetField("m_comparer", BindingFlags.NonPublic | BindingFlags.Instance);
m_comparer_field.SetValue(_comparer, comparer);
}
}
public bool Equals(HashSet<T> x, HashSet<T> y)
{
return _comparer.Equals(x, y);
}
public int GetHashCode(HashSet<T> obj)
{
return _comparer.GetHashCode(obj);
}
}
UPDATE
I took 5 mins to implement another version form HashSetEqualityComparer<T> source code. And rewrite the bool Equals(HashSet<T> x, HashSet<T> y) method. It is not complex. All code just copy and paste from source, I just revise a bit.
class CustomHashSetEqualityComparer<T> : IEqualityComparer<HashSet<T>>
{
private IEqualityComparer<T> m_comparer;
public CustomHashSetEqualityComparer()
{
m_comparer = EqualityComparer<T>.Default;
}
public CustomHashSetEqualityComparer(IEqualityComparer<T> comparer)
{
if (comparer == null)
{
m_comparer = EqualityComparer<T>.Default;
}
else
{
m_comparer = comparer;
}
}
// using m_comparer to keep equals properties in tact; don't want to choose one of the comparers
public bool Equals(HashSet<T> x, HashSet<T> y)
{
// http://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs,1360
// handle null cases first
if (x == null)
{
return (y == null);
}
else if (y == null)
{
// set1 != null
return false;
}
// all comparers are the same; this is faster
if (AreEqualityComparersEqual(x, y))
{
if (x.Count != y.Count)
{
return false;
}
}
// n^2 search because items are hashed according to their respective ECs
foreach (T set2Item in y)
{
bool found = false;
foreach (T set1Item in x)
{
if (m_comparer.Equals(set2Item, set1Item))
{
found = true;
break;
}
}
if (!found)
{
return false;
}
}
return true;
}
public int GetHashCode(HashSet<T> obj)
{
int hashCode = 0;
if (obj != null)
{
foreach (T t in obj)
{
hashCode = hashCode ^ (m_comparer.GetHashCode(t) & 0x7FFFFFFF);
}
} // else returns hashcode of 0 for null hashsets
return hashCode;
}
// Equals method for the comparer itself.
public override bool Equals(Object obj)
{
CustomHashSetEqualityComparer<T> comparer = obj as CustomHashSetEqualityComparer<T>;
if (comparer == null)
{
return false;
}
return (this.m_comparer == comparer.m_comparer);
}
public override int GetHashCode()
{
return m_comparer.GetHashCode();
}
static private bool AreEqualityComparersEqual(HashSet<T> set1, HashSet<T> set2)
{
return set1.Comparer.Equals(set2.Comparer);
}
}
Avoid this class if you use custom comparers. It uses its own equality comparer to perform GetHashCode, but when performing Equals(Set1, Set2) if Set1 and Set2 have the same equality comparer, the the HashSetEqualityComparer will use the comparer of the sets. HashsetEqualityComparer will only use its own comparer for equals if Set1 and Set2 have different comparers
It gets worse. It calls HashSet.HashSetEquals, which has a bug in it (See https://referencesource.microsoft.com/#system.core/System/Collections/Generic/HashSet.cs line 1489, which is missing a if (set1.Count != set2.Count) return false before performing the subset check.
The bug is illustrated by the following program:
class Program
{
private class MyEqualityComparer : EqualityComparer<int>
{
public override bool Equals(int x, int y)
{
return x == y;
}
public override int GetHashCode(int obj)
{
return obj.GetHashCode();
}
}
static void Main(string[] args)
{
var comparer = HashSet<int>.CreateSetComparer();
var set1 = new HashSet<int>(new MyEqualityComparer()) { 1 };
var set2 = new HashSet<int> { 1, 2 };
Console.WriteLine(comparer.Equals(set1, set2));
Console.WriteLine(comparer.Equals(set2, set1)); //True!
Console.ReadKey();
}
}
Regarding other answers to this question (I don't have the rep to comment):
Wilhelm Liao: His answer also contains the bug because it's copied from the reference source
InBetween: The solution is not symmetric. CustomHashSetEqualityComparer.Equals(A, B) does not always equals CustomHashSetEqualityComparer.Equals(B, A). I would be scared of that.
I think a robust implementation should throw an exception if it encounters a set which has a different comparer to its own. It could always use its own comparer and ignore the set comparer, but that would give strange and unintuitive behaviour.
Additional to the original solution, we can simplify GetHashCode with HashCode.Combine function:
public int GetHashCode(HashSet<T> set) {
int hashCode = 0;
foreach (var item in set) {
hashCode ^= HashCode.Combine(item);
}
return hashCode;
}
I am using distinct which says
Returns distinct elements from a sequence by using the default
equality comparer to compare values.
Yet when I run this code, I get multiple same id's
var ls = ls2.Distinct().OrderByDescending(s => s.id);
foreach (var v in ls)
{
Console.WriteLine(v.id);
}
I implemented these in my class yet this still doesnt work
class Post : IComparable<Post>, IEqualityComparer<Post>, IComparer<Post>
This is how I implemented it
int IComparable<Post>.CompareTo(Post other)
{
return (int)(id - other.id);
}
bool IEqualityComparer<Post>.Equals(Post x, Post y)
{
return x.id == y.id;
}
int IEqualityComparer<Post>.GetHashCode(Post obj)
{
throw new NotImplementedException();
}
int IComparer<Post>.Compare(Post x, Post y)
{
return (int)(x.id - y.id);
}
You should implement GetHashCode().
Since you're delegating to the - and the == why not just delegate to the appropriate functions in id. ie. id.Compare(other.id), and obj.id.GetHashCode(), and delegate the Comparer to Compare. And also implement IEquatable
int IComparable<Post>.CompareTo(Post other)
{
return id.Compare(other.id);
}
bool IEquatable<Post>.Equals(Post x)
{
return id == y.id;
}
bool IEqualityComparer<Post>.Equals(Post x, Post y)
{
return x.Equals(y.id);
}
int IEqualityComparer<Post>.GetHashCode(Post obj)
{
return obj.id.GetHashCode();
}
int IComparer<Post>.Compare(Post x, Post y)
{
return x.Compare(y);
}
This assumes that id is an int, if not then you may have implement these for IEquatable for id too.
You need to properly implement GetHashCode() in your comparer - in your case you can just return the hash code of the id:
int IEqualityComparer<Post>.GetHashCode(Post obj)
{
return obj.id.GetHashCode();
}
Also as pointed out by #dash in a comment you need to implement IEquatable<T> in Post if you choose to go that route (option 1).
A comparer should be implemented in a separate class that you can then pass in in one of the Distinct() overloads (option 2), i.e. in your case could be class MyPostComparer:
var ls = ls2.Distinct(new MyPostComparer()).OrderByDescending(s => s.id);
A third option would be to use the DistinctBy() method of the MoreLinq project.
I am wondering if it is possible to use a HashSet and make the method Contains to return true if one of the field is in the hash for a giving object.
This is an example of what I would like
static void Main(string[] args)
{
HashSet<Product> hash = new HashSet<Product>();
// Since the Id is the same, both products are considered to be the same even if the URI is not the same
// The opposite is also true. If the URI is the same, both products are considered to be the same even if the Id is not the same
Product product1 = new Product("123", "www.test.com/123.html");
Product product2 = new Product("123", "www.test.com/123.html?lang=en");
hash.Add(product1);
if (hash.Contains(product2))
{
// I want the method "Contains" to return TRUE because one of the field is in the hash
}
}
Here is the definition of the class Product
public class Product
{
public string WebId
public string Uri
public Product(string Id, string uri)
{
WebId = Id;
Uri = uri;
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != typeof(Product)) return false;
return Equals((Product)obj);
}
public bool Equals(Product obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (String.Equals(WebId, obj.WebId) || String.Equals(Uri, obj.Uri))
return true;
else
return false;
}
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 23 + WebId.GetHashCode();
hash = hash * 23 + Uri.GetHashCode();
return hash;
}
}
}
When I run my program, the method Contains only runs GetHashCode and never the method Equals. Hence, the method Contains return FALSE.
How can I make my HashSet to return TRUE for the example above ? Should I be using a Dictionary instead and add each fields to the dictionary ?
Your GetHashCode() implementation isn't guaranteed to return the same value for two objects that are equal. Since you only require a match on, say, WebId. The Uri then screws up the hash code. Or the other way around. You cannot fix this, other than by returning 0. That's going to kill the HashSet<> perf, lookup will be O(n) instead of O(1).
In a recent project we had the same problem, where the class's Equals() implementation was logical ORing properties to determine equality. To do a quick Contains() we built a number of IEqualityComparer with each one checking ONE property. You need one for each property that is ORed in your equality check.
class WebIdComparer : IEqualityComparer<Product>
{
public bool Equals(Product x, Product y)
{
return Equals(x.WebId, y.WebId);
}
public int GetHashCode(Product obj)
{
unchecked
{
return obj.WebId.GetHashCode();
}
}
}
class UriComparer : IEqualityComparer<Product>
{
public bool Equals(Product x, Product y)
{
return Equals(x.Uri, y.Uri);
}
public int GetHashCode(Product obj)
{
unchecked
{
return obj.Uri.GetHashCode();
}
}
}
Then, create one hashtable per IEqualityComparer, passing in the comparer to the constructor. insert your collection into each hashtable, then for each item you want to test, do a contains() on each hashtable and OR the result. So For example:
var uriHashTable = new HashSet<Product>(existingProducts, new UriComparer());
var webIdHashTable = new HashSet<Product>(existingProducts, new WebIdComparer());
foreach (var newProduct in newProducts)
{
if (uriHashTable.Contains(newProduct) || webIdHashTable.Contains(newProduct))
//then it is equal to an existing product according to your equals implementation
}
Obviously this method suffers from using quite a bit more memory than the IEnumerable.Contains() method, needs more memory for every property that is ORed in your equals implementation.
Does it fit in your program design to use a lamba inside the Contains method call? It is the most straightforward way I can think of to achieve what you want.
if (hash.Contains(p => p.WedId == product2.WebId))
{
// "Contains" will now return TRUE because the WebId matches
}
i have the following code which doesnt seem to be working:
Context:
I have two lists of objects:
* listOne has 100 records
* listTwo has 70 records
many of them have the same Id property (in both lists);
var listOneOnlyItems = listOne.Except(listTwo, new ItemComparer ());
here is the comparer
public class ItemComparer : IEqualityComparer<Item>
{
public bool Equals(Item x, Item y)
{
if (x.Id == y.Id)
return true;
return false;
}
public int GetHashCode(Item obj)
{
return obj.GetHashCode();
}
}
after i run this code and look into the results
listOneOnlyItems
still has 100 records (should only have 30). Can anyone help me?
also, running
IEnumerable<Item> sharedItems = listOne.Intersect(listTwo, new ItemComparer());
returns zero reesults in the sharedItems collection
public int GetHashCode(Item obj)
{
return obj.Id.GetHashCode();
}
Worth a check at least -- IIRC GetHashCode() is tested first before equality, and if they don't have the same hash it won't bother checking equality. I'm not sure what to expect from obj.GetHashCode() -- it depends on what you've implemented on the Item class.
Consider making GetHashCode() return obj.Id.GetHashCode()
This code works fine:
static void TestLinqExcept()
{
var seqA = Enumerable.Range(1, 10);
var seqB = Enumerable.Range(1, 7);
var seqAexceptB = seqA.Except(seqB, new IntComparer());
foreach (var x in seqAexceptB)
{
Console.WriteLine(x);
}
}
class IntComparer: EqualityComparer<int>
{
public override bool Equals(int x, int y)
{
return x == y;
}
public override int GetHashCode(int x)
{
return x;
}
}
You need to add 'override' keywords to your EqualityComparer methods. (I think not having 'override' as implicit was a mistake on the part of the C# designers).
I'm having troubles with the Except() method.
Instead of returning the difference, it returns the original set.
I've tried implementing the IEquatable and IEqualityComparer in the Account class.
I've also tried creating a separate IEqualityComparer class for Account.
When the Except() method is called from main, it doesn't seem to call my custom Equals() method, but when I tried Count(), it did call the custom GetHashCode() method!
I'm sure I made a trivial mistake somewhere and I hope a fresh pair of eyes can help me.
main:
IEnumerable<Account> everyPartnerID =
from partner in dataContext.Partners
select new Account { IDPartner = partner.ID, Name = partner.Name };
IEnumerable<Account> hasAccountPartnerID =
from partner in dataContext.Partners
from account in dataContext.Accounts
where
!partner.ID.Equals(Guid.Empty) &&
account.IDPartner.Equals(partner.ID) &&
account.Username.Equals("Special")
select new Account { IDPartner = partner.ID, Name = partner.Name };
IEnumerable<Account> noAccountPartnerID =
everyPartnerID.Except(
hasAccountPartnerID,
new LambdaComparer<Account>((x, y) => x.IDPartner.Equals(y.IDPartner)));
Account:
public class Account : IEquatable<Account>
{
public Guid IDPartner{ get; set; }
public string Name{ get; set; }
/* #region IEquatable<Account> Members
public bool Equals(Account other)
{
return this.IDPartner.Equals(other.IDPartner);
}
#endregion*/
}
LambdaComparer:
public class LambdaComparer<T> : IEqualityComparer<T>
{
private readonly Func<T, T, bool> _lambdaComparer;
private readonly Func<T, int> _lambdaHash;
public LambdaComparer(Func<T, T, bool> lambdaComparer) :
this(lambdaComparer, o => o.GetHashCode())
{
}
public LambdaComparer(Func<T, T, bool> lambdaComparer, Func<T, int> lambdaHash)
{
if (lambdaComparer == null)
throw new ArgumentNullException("lambdaComparer");
if (lambdaHash == null)
throw new ArgumentNullException("lambdaHash");
_lambdaComparer = lambdaComparer;
_lambdaHash = lambdaHash;
}
public bool Equals(T x, T y)
{
return _lambdaComparer(x, y);
}
public int GetHashCode(T obj)
{
return _lambdaHash(obj);
}
}
Basically your LambdaComparer class is broken when you pass in just a single function, because it uses the "identity hash code" provider if you don't provide anything else. The hash code is used by Except, and that's what's causing the problem.
Three options here:
Implement your own ExceptBy method and then preferably contribute it to MoreLINQ which contains that sort of thing.
Use a different implementation of IEqualityComparer<T>. I have a ProjectionEqualityComparer class you can use in MiscUtil - or you can use the code as posted in another question.
Pass a lambda expression into your LambdaComparer code to use for the hash:
new LambdaComparer<Account>((x, y) => x.IDPartner.Equals(y.IDPartner)),
x => x.IDPartner.GetHashCode());
You could also quickly fix your LambdaComparer to work when only the equality parameters are supplied like this:
public LambdaComparer(Func<T, T, bool> lambdaComparer) :
this(lambdaComparer, o => 1)
{
}
Look here, how to use and implementing IEqualityComparer in way with linq.Except and beyond.
https://www.dreamincode.net/forums/topic/352582-linq-by-example-3-methods-using-iequalitycomparer/
public class Department {
public string Code { get; set; }
public string Name { get; set; }
}
public class DepartmentComparer : IEqualityComparer {
// equal if their Codes are equal
public bool Equals(Department x, Department y) {
// reference the same objects?
if (Object.ReferenceEquals(x, y)) return true;
// is either null?
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
return x.Code == y.Code;
}
public int GetHashCode(Department dept) {
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
// if null default to 0
if (Object.ReferenceEquals(dept, null)) return 0;
return dept.Code.GetHashCode();
}
}
IEnumerable<Department> deptExcept = departments.Except(departments2,
new DepartmentComparer());
foreach (Department dept in deptExcept) {
Console.WriteLine("{0} {1}", dept.Code, dept.Name);
}
// departments not in departments2: AC, Accounts.
IMO, this answer above is the simplest solution compared to other solutions for this problem. I tweaked it such that I use the same logic for the Object class's Equals() and GetHasCode(). The benefit is that this solution is completely transparent to the client linq expression.
public class Ericsson4GCell
{
public string CellName { get; set; }
public string OtherDependantProperty { get; set; }
public override bool Equals(Object y)
{
var rhsCell = y as Ericsson4GCell;
// reference the same objects?
if (Object.ReferenceEquals(this, rhsCell)) return true;
// is either null?
if (Object.ReferenceEquals(this, null) || Object.ReferenceEquals(rhsCell, null))
return false;
return this.CellName == rhsCell.CellName;
}
public override int GetHashCode()
{
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
// if null default to 0
if (Object.ReferenceEquals(this, null)) return 0;
return this.CellName.GetHashCode();
}
}