why doesn't .Except() and Intersect() work here using LINQ? - c#

i have the following code which doesnt seem to be working:
Context:
I have two lists of objects:
* listOne has 100 records
* listTwo has 70 records
many of them have the same Id property (in both lists);
var listOneOnlyItems = listOne.Except(listTwo, new ItemComparer ());
here is the comparer
public class ItemComparer : IEqualityComparer<Item>
{
public bool Equals(Item x, Item y)
{
if (x.Id == y.Id)
return true;
return false;
}
public int GetHashCode(Item obj)
{
return obj.GetHashCode();
}
}
after i run this code and look into the results
listOneOnlyItems
still has 100 records (should only have 30). Can anyone help me?
also, running
IEnumerable<Item> sharedItems = listOne.Intersect(listTwo, new ItemComparer());
returns zero reesults in the sharedItems collection

public int GetHashCode(Item obj)
{
return obj.Id.GetHashCode();
}
Worth a check at least -- IIRC GetHashCode() is tested first before equality, and if they don't have the same hash it won't bother checking equality. I'm not sure what to expect from obj.GetHashCode() -- it depends on what you've implemented on the Item class.

Consider making GetHashCode() return obj.Id.GetHashCode()

This code works fine:
static void TestLinqExcept()
{
var seqA = Enumerable.Range(1, 10);
var seqB = Enumerable.Range(1, 7);
var seqAexceptB = seqA.Except(seqB, new IntComparer());
foreach (var x in seqAexceptB)
{
Console.WriteLine(x);
}
}
class IntComparer: EqualityComparer<int>
{
public override bool Equals(int x, int y)
{
return x == y;
}
public override int GetHashCode(int x)
{
return x;
}
}
You need to add 'override' keywords to your EqualityComparer methods. (I think not having 'override' as implicit was a mistake on the part of the C# designers).

Related

How to get a distinct result for list of array?

I have a list of long type array.
List<ulong[]> TestList = new List<ulong[]>();
and list has following items.
{1,2,3,4,5,6},
{2,3,4,5,6,7},
{3,4,5,6,7,8},
{1,2,3,4,5,6}
and expected distinct result is
{1,2,3,4,5,6},
{2,3,4,5,6,7},
{3,4,5,6,7,8}
So I try as following, but useless.
TestList = TestList.Distinct().ToList();
Am I need something special comparer for getting distinct list?
Distinct() uses the default equality check, which for arrays is reference equality. It does not check the contents of the array for equality.
If you want to do that, you'll need the overload of Distinct() that takes an IEqualityComparer<T>. This allows you to customize the behaviour to determine if two items are equal or not.
For comparing arrays, IStructuralEquatable and friends already do the heavy lifting. You can wrap it simply, like so:
sealed class StructuralComparer<T> : IEqualityComparer<T>
{
public static IEqualityComparer<T> Instance { get; } = new StructuralComparer<T>();
public bool Equals(T x, T y)
=> StructuralComparisons.StructuralEqualityComparer.Equals(x, y);
public int GetHashCode(T obj)
=> StructuralComparisons.StructuralEqualityComparer.GetHashCode(obj);
}
Then, use it in the Distinct() call like this:
TestList = TestList.Distinct(StructuralComparer<ulong[]>.Instance).ToList();
You need to provide an equality comparer, default implementation does not know how to compare arrays of long (it uses reference equality):
class LongArrayComparer : EqualityComparer<long[]>
{
public override bool Equals(long[] a1, long[] a2)
{
if (a1 == null && a2 == null)
return true;
else if (a1 == null || a2 == null)
return false;
return a1.SequenceEqual(a2);
}
public override int GetHashCode(long[] arr)
{
long hCode = arr.Aggregate(0, (acc, it) => acc ^ it);
return hCode.GetHashCode();
}
}
Then use it:
TestList = TestList.Distinct(new LongArrayComparer()).ToList();
List<ulong[]> TestList = new List<ulong[]>() {
new ulong[]{ 1,2,3,4,5,6},
new ulong[]{ 2,3,4,5,6,7},
new ulong[]{ 3,4,5,6,7,8},
new ulong[]{ 1,2,3,4,5,6}
};
var result = TestList.GroupBy(x => String.Join(",", x))
.Select(x => x.First().ToArray())
.ToList();
You can implement an IEqualityComparer
public class IntArrayComparer : IEqualityComparer<string[]>
{
public bool Equals(int[] x, int[] y)
{
var shared = x.Intersect(y);
return x.Length == y.Length && shared.Count() == x.Length;;
}
public int GetHashCode(int[] obj)
{
int hashCode=obj.Length;
for(int i=0;i<obj.Length;++i)
{
hashCode=unchecked(hashCode*314159 +obj[i]);
}
return hashCode;
}
}
Then can implement it:
TestList = TestList.Distinct(new IntArrayComparer()).ToList();

How to sort List<T> in c#

I've got a List<Card>, and I want to sort these cards
So, I'm looking for a method to sort them with different criterias, like their ID, their Name ...
public class Card : IComparer
{
public string ID;
public string Name;
public int CompareId(object firstCard, object secondCard)
{
Card c1 = (Card)firstCard;
Card c2 = (Card)secondCard;
return c1.Id.CompareTo(c2.Id);
}
}
But then, visual studio sent me an error :
'Card' does not implement interface member 'IComparer<Card>.Compare(Card, Card)'
You, probably, want to have your class Comparable not a Comparator
public class Card : IComparable<Card>
{
public string ID;
public string Name;
public int CompareTo(Card other)
{
if (null == other)
return 1;
// string.Compare is safe when Id is null
return string.Compare(this.Id, other.Id);
}
}
then
List<Card> myList = ...
myList.Sort();
Edit: If you want to have several criteria to choose from, you have to implement several Comparers as separated classes, e.g.
public sealed class CardByIdComparer : IComparer<Card>
{
public int Compare(Card x, Card y)
{
if (object.ReferenceEquals(x, y))
return 0;
else if (null == x)
return -1;
else if (null == y)
return 1;
else
return string.Compare(x.Id, y.Id);
}
}
and when sorting provide the required:
List<Card> myList = ...
myList.Sort(new CardByIdComparer());
Edit 2: (inspired by spender's library). If you want to combine several comparers into one (i.e. use comparer1, on tie - comparer2 etc.)
public sealed class ComparerCombined<T> : IComparer<T> {
private IComparer<T>[] m_Comparers;
public ComparerCombined(params IComparer<T>[] comparers) {
if (null == comparers)
throw new ArgumentNullException(nameof(comparers));
m_Comparers = comparers
.Select(item => item == null ? Comparer<T>.Default : item)
.Where(item => item != null)
.Distinct()
.ToArray();
}
public int Compare(T x, T y) {
if (object.ReferenceEquals(x, y))
return 0;
else if (null == x)
return -1;
else if (null == y)
return 1;
foreach (var comparer in m_Comparers) {
int result = comparer.Compare(x, y);
if (result != 0)
return result;
}
return 0;
}
}
usage:
myList.Sort(new ComparerCombined(
new CardByIdComparer(), // Sort By Id
new CardByNameComparer() // On tie (equal Id's) sort by name
));
The easiest way You can use Linq:
List<Card> objSortedList = objListObject.OrderBy(o=>o.ID).ToList();
or
List<Card> objSortedList = objListObject.OrderByDescending(o=>o.ID).ToList();
Good examples for demonstrate the concept of
List<T>.Sort(IComparer <T>) method check the link please.
IComparer<T> in this example compare method used for strings IComparer<T>
but you can use this for ID(int) too.
using System;
using System.Collections.Generic;
class GFG : IComparer<string>
{
public int Compare(string x, string y)
{
if (x == null || y == null)
{
return 0;
}
// "CompareTo()" method
return x.CompareTo(y);
}
}
public class geek
{
public static void Main()
{
List<string> list1 = new List<string>();
// list elements
list1.Add("C++");
list1.Add("Java");
list1.Add("C");
list1.Add("Python");
list1.Add("HTML");
list1.Add("CSS");
list1.Add("Scala");
list1.Add("Ruby");
list1.Add("Perl");
int range = 4;
GFG gg = new GFG();
Console.WriteLine("\nSort a range with comparer:");
// sort the list within a
// range of index 1 to 4
// where range = 4
list1.Sort(1, range, gg);
Console.WriteLine("\nBinarySearch and Insert Dart");
// Binary Search and storing
// index value to "index"
int index = list1.BinarySearch(0, range,
"Dart", gg);
if (index < 0)
{
list1.Insert(~index, "Dart");
range++;
}
}
}
You need to implement IComparer
public int Compare(Card card1, Card card2)
{
if (card1.ID > card2.ID)
return 1; //move card1 up
if (card2.ID < card1.ID)
return -1; //move card2 up
return 0; //do nothing
}

HashSet<T>.CreateSetComparer() cannot specify IEqualityComparer<T>, is there an alternative?

In the internal source there is such a constructor public HashSetEqualityComparer(IEqualityComparer<T> comparer) but it's internal so I can't use it.
By default, HashSet<T>.CreateSetComparer() just uses the parameterless constructor which will apply EqualityComparer<T>.Default.
Is there a way to get a HashSetEqualityComparer<T> with a IEqualityComparer<T> of choice, without copying out the code from the source?
I think best solution is using SetEquals. It does the job you need and exactly in the same way that HashSetEqualityComparer does but it will account for any custom comparers defined in the sets its comparing.
So, in your specific scenario where you want to use a HashSet<T> as a key of a dictionary, you need to implement an IEqualityComparer<HashSet<T>> that makes use of SetEquals and "borrows" the reference source of HashSetEqualityComparer.GetHashCode():
public class CustomHashSetEqualityComparer<T>
: IEqualityComparer<HashSet<T>>
{
public bool Equals(HashSet<T> x, HashSet<T> y)
{
if (ReferenceEquals(x, null))
return false;
return x.SetEquals(y);
}
public int GetHashCode(HashSet<T> set)
{
int hashCode = 0;
if (set != null)
{
foreach (T t in set)
{
hashCode = hashCode ^
(set.Comparer.GetHashCode(t) & 0x7FFFFFFF);
}
}
return hashCode;
}
}
But yes, its a small pain that there is not way to directly create a SetEqualityComparer that leverages custom comparers but this unfortunate behavior is due, IMHO, more to a bug of the existing implementation than a lack of the needed overload; there is no reason why CreateSetComparer() can't return an IEqualityComparer that actually uses the comparers of the sets its comparing as the code above demonstrates.
If I had a voice in it, CreateSetComparer() wouldn't be static method at all. It would then be obvious, or at least predictable, that whatever comparer was returned would be created with the current set's comparer.
I agree #InBetween, using SetEquals is the best way. Even if add the constructor still can not achieve what you want.
please see this code:
http://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs,1360
Here is I try to do:
class HashSetEqualityComparerWrapper<T> : IEqualityComparer<HashSet<T>>
{
static private Type HashSetEqualityComparerType = HashSet<T>.CreateSetComparer().GetType();
private IEqualityComparer<HashSet<T>> _comparer;
public HashSetEqualityComparerWrapper()
{
_comparer = HashSet<T>.CreateSetComparer();
}
public HashSetEqualityComparerWrapper(IEqualityComparer<T> comparer)
{
_comparer = HashSet<T>.CreateSetComparer();
if (comparer != null)
{
FieldInfo m_comparer_field = HashSetEqualityComparerType.GetField("m_comparer", BindingFlags.NonPublic | BindingFlags.Instance);
m_comparer_field.SetValue(_comparer, comparer);
}
}
public bool Equals(HashSet<T> x, HashSet<T> y)
{
return _comparer.Equals(x, y);
}
public int GetHashCode(HashSet<T> obj)
{
return _comparer.GetHashCode(obj);
}
}
UPDATE
I took 5 mins to implement another version form HashSetEqualityComparer<T> source code. And rewrite the bool Equals(HashSet<T> x, HashSet<T> y) method. It is not complex. All code just copy and paste from source, I just revise a bit.
class CustomHashSetEqualityComparer<T> : IEqualityComparer<HashSet<T>>
{
private IEqualityComparer<T> m_comparer;
public CustomHashSetEqualityComparer()
{
m_comparer = EqualityComparer<T>.Default;
}
public CustomHashSetEqualityComparer(IEqualityComparer<T> comparer)
{
if (comparer == null)
{
m_comparer = EqualityComparer<T>.Default;
}
else
{
m_comparer = comparer;
}
}
// using m_comparer to keep equals properties in tact; don't want to choose one of the comparers
public bool Equals(HashSet<T> x, HashSet<T> y)
{
// http://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs,1360
// handle null cases first
if (x == null)
{
return (y == null);
}
else if (y == null)
{
// set1 != null
return false;
}
// all comparers are the same; this is faster
if (AreEqualityComparersEqual(x, y))
{
if (x.Count != y.Count)
{
return false;
}
}
// n^2 search because items are hashed according to their respective ECs
foreach (T set2Item in y)
{
bool found = false;
foreach (T set1Item in x)
{
if (m_comparer.Equals(set2Item, set1Item))
{
found = true;
break;
}
}
if (!found)
{
return false;
}
}
return true;
}
public int GetHashCode(HashSet<T> obj)
{
int hashCode = 0;
if (obj != null)
{
foreach (T t in obj)
{
hashCode = hashCode ^ (m_comparer.GetHashCode(t) & 0x7FFFFFFF);
}
} // else returns hashcode of 0 for null hashsets
return hashCode;
}
// Equals method for the comparer itself.
public override bool Equals(Object obj)
{
CustomHashSetEqualityComparer<T> comparer = obj as CustomHashSetEqualityComparer<T>;
if (comparer == null)
{
return false;
}
return (this.m_comparer == comparer.m_comparer);
}
public override int GetHashCode()
{
return m_comparer.GetHashCode();
}
static private bool AreEqualityComparersEqual(HashSet<T> set1, HashSet<T> set2)
{
return set1.Comparer.Equals(set2.Comparer);
}
}
Avoid this class if you use custom comparers. It uses its own equality comparer to perform GetHashCode, but when performing Equals(Set1, Set2) if Set1 and Set2 have the same equality comparer, the the HashSetEqualityComparer will use the comparer of the sets. HashsetEqualityComparer will only use its own comparer for equals if Set1 and Set2 have different comparers
It gets worse. It calls HashSet.HashSetEquals, which has a bug in it (See https://referencesource.microsoft.com/#system.core/System/Collections/Generic/HashSet.cs line 1489, which is missing a if (set1.Count != set2.Count) return false before performing the subset check.
The bug is illustrated by the following program:
class Program
{
private class MyEqualityComparer : EqualityComparer<int>
{
public override bool Equals(int x, int y)
{
return x == y;
}
public override int GetHashCode(int obj)
{
return obj.GetHashCode();
}
}
static void Main(string[] args)
{
var comparer = HashSet<int>.CreateSetComparer();
var set1 = new HashSet<int>(new MyEqualityComparer()) { 1 };
var set2 = new HashSet<int> { 1, 2 };
Console.WriteLine(comparer.Equals(set1, set2));
Console.WriteLine(comparer.Equals(set2, set1)); //True!
Console.ReadKey();
}
}
Regarding other answers to this question (I don't have the rep to comment):
Wilhelm Liao: His answer also contains the bug because it's copied from the reference source
InBetween: The solution is not symmetric. CustomHashSetEqualityComparer.Equals(A, B) does not always equals CustomHashSetEqualityComparer.Equals(B, A). I would be scared of that.
I think a robust implementation should throw an exception if it encounters a set which has a different comparer to its own. It could always use its own comparer and ignore the set comparer, but that would give strange and unintuitive behaviour.
Additional to the original solution, we can simplify GetHashCode with HashCode.Combine function:
public int GetHashCode(HashSet<T> set) {
int hashCode = 0;
foreach (var item in set) {
hashCode ^= HashCode.Combine(item);
}
return hashCode;
}

IEqualityComparer not working as intended

I have a List of paths of files stored on my computer. My aim is to first filter out the files which have the same name and and then filter out those which have the same size.
To do so, I have made two classes implementing IEqualityComparer<string>, and implemented Equals and GetHashCode methods.
var query = FilesList.Distinct(new CustomTextComparer())
.Distinct(new CustomSizeComparer());
The code for both of the classes is given below:-
public class CustomTextComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
if (Path.GetFileName(x) == Path.GetFileName(y))
{
return true;
}
return false;
}
public int GetHashCode(string obj)
{
return obj.GetHashCode();
}
}
public class CustomSizeComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
if (new FileInfo(x).Length == new FileInfo(y).Length)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(string obj)
{
return obj.GetHashCode();
}
}
But the code does not work.
It doesn't throw any exceptions nor is there any compiler error, but the problem is that the code doesn't work(doesn't exclude duplicate files).
So, how can I correct this problem? Is there anything I can do to make the code work correctly.
Change your GetHashCode to work on the compared value. I.e. for your size comparer:
public int GetHashCode(string obj)
{
return FileInfo(x).Length.GetHashCode();
}
And for the other:
public int GetHashCode(string obj)
{
return Path.GetFileName(obj).GetHashCode();
}
According to this answer - What's the role of GetHashCode in the IEqualityComparer<T> in .NET?, the hash code is evaluated first. Equals is called in case of collision.
Obviously it would be sensible to work on FileInfos, not on strings.
So maybe:
FileList.Select(x => new FileInfo(x))
.Distinct(new CustomTextComparer())
.Distinct(new CustomSizeComparer());
Of course, then you have to change your comparers to work on the correct type.
Your GetHashCode must return the same value for any objects that are of equal value:
// Try this
public int GetHashCode(string obj)
{
return Path.GetFileName(x).GetHashCode();
}
// And this
public int GetHashCode(string obj)
{
return new FileInfo(x).Length.GetHashCode();
}
But this is a much easier way for the whole problem without the extra classes:
var query = FilesList
.GroupBy(f => Path.GetFileName(f)).Select(g => g.First())
.GroupBy(f => new FileInfo(f).Length).Select(g => g.First())
.ToList();
The hash code is used before Equals is ever called. Since your code gives different hash codes for items that are equal, you're not getting the desired result. Instead, you have to make sure the hash code returned is equal when the items are equal, so for example:
public class CustomTextComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
if (Path.GetFileName(x) == Path.GetFileName(y))
{
return true;
}
return false;
}
public int GetHashCode(string obj)
{
return Path.GetFileName(obj).GetHashCode();
}
}
However, as Piotr pointed out, this isn't exactly a good way to go about your goal, since you're going to be doing a lot of Path.GetFileName and new FileInfo respectively, which is a going to be a significant performance hit, especially since you're dealing with the file system, which is not exactly known for its speed of response.

How do I properly use distinct and compare?

I am using distinct which says
Returns distinct elements from a sequence by using the default
equality comparer to compare values.
Yet when I run this code, I get multiple same id's
var ls = ls2.Distinct().OrderByDescending(s => s.id);
foreach (var v in ls)
{
Console.WriteLine(v.id);
}
I implemented these in my class yet this still doesnt work
class Post : IComparable<Post>, IEqualityComparer<Post>, IComparer<Post>
This is how I implemented it
int IComparable<Post>.CompareTo(Post other)
{
return (int)(id - other.id);
}
bool IEqualityComparer<Post>.Equals(Post x, Post y)
{
return x.id == y.id;
}
int IEqualityComparer<Post>.GetHashCode(Post obj)
{
throw new NotImplementedException();
}
int IComparer<Post>.Compare(Post x, Post y)
{
return (int)(x.id - y.id);
}
You should implement GetHashCode().
Since you're delegating to the - and the == why not just delegate to the appropriate functions in id. ie. id.Compare(other.id), and obj.id.GetHashCode(), and delegate the Comparer to Compare. And also implement IEquatable
int IComparable<Post>.CompareTo(Post other)
{
return id.Compare(other.id);
}
bool IEquatable<Post>.Equals(Post x)
{
return id == y.id;
}
bool IEqualityComparer<Post>.Equals(Post x, Post y)
{
return x.Equals(y.id);
}
int IEqualityComparer<Post>.GetHashCode(Post obj)
{
return obj.id.GetHashCode();
}
int IComparer<Post>.Compare(Post x, Post y)
{
return x.Compare(y);
}
This assumes that id is an int, if not then you may have implement these for IEquatable for id too.
You need to properly implement GetHashCode() in your comparer - in your case you can just return the hash code of the id:
int IEqualityComparer<Post>.GetHashCode(Post obj)
{
return obj.id.GetHashCode();
}
Also as pointed out by #dash in a comment you need to implement IEquatable<T> in Post if you choose to go that route (option 1).
A comparer should be implemented in a separate class that you can then pass in in one of the Distinct() overloads (option 2), i.e. in your case could be class MyPostComparer:
var ls = ls2.Distinct(new MyPostComparer()).OrderByDescending(s => s.id);
A third option would be to use the DistinctBy() method of the MoreLinq project.

Categories

Resources