Sortedset not using custom equals - c#

My class implements IEquatable and IComparable. It then gets added to a sortedset.
The goal is to have it be sorted by its "Date" property and be equal if both "ID1" and "ID2" are the same.
The implemented methods:
public int CompareTo(MyClass other)
{
return other.Date.CompareTo(Date);
}
public override int GetHashCode()
{
unchecked
{
var hashCode = ID1;
hashCode = (hashCode * 397) ^ ID2;
return hashCode;
}
}
public bool Equals(MyClass other)
{
if (ReferenceEquals(null, other))
return false;
if (ReferenceEquals(this, other))
return true;
return ID1 == other.ID1
&& ID2 == other.ID2;
}
The resulting sortedset is correctly sorted, but there are still elements that should be equal and therefore not in the set. Using breakpoints it seems neither GetHashCode nor Equals get called.
Any tips on how to solve this?

SortedSet uses CompareTo for both sorting and equality comparison.
There isn't an in-built ordered collection which will allow you to specify a different method to compare for equality in ordering than to compare for equality in maintaining distinctness. I'm not completely sure why this is, but it may be to do with the asssumptions behind the algorithms used for sorting.
If you want this behaviour, probably the simplest way to do it is wrap the underlying collection in its own class which will check for distinctness before adding new items to the collection.
You also need to be careful that items which are equal for sorting but not equal by your Equals method can both be added to the underlying collection. In your case, where the sorting is done by Date and the equality is done by ID1 and ID2 this might look something like:
public int CompareTo(MyClass other)
{
var result = other.Date.CompareTo(Date);
if(result != 0)
return result;
result = other.ID1.CompareTo(ID1);
if(result != 0)
return result;
return other.ID2.CompareTo(ID2);
}
This imposes extra ordering if the dates are the same, but I wouldn't expect that to be a problem.
Alternatively, you can "cheat", by forcing items which are equal to compare as equal in sort position. This would probably best be moved to a custom IComparer, because it's not the normal sorting behaviour you want, outside a SortedSet:
public int CompareTo(MyClass other)
{
if(other.Equals(this))
return 0;
var result = other.Date.CompareTo(Date);
if(result != 0)
return result;
result = other.ID1.CompareTo(ID1);
if(result != 0)
return result;
return other.ID2.CompareTo(ID2);
}
This would allow you to avoid having to create your own wrapper class around the collection, and is safer. It is rather hacky, though.

Related

Using Linq Except not Working as I Thought

List1 contains items { A, B } and List2 contains items { A, B, C }.
What I need is to be returned { C } when I use Except Linq extension. Instead I get returned { A, B } and if I flip the lists around in my expression the result is { A, B, C }.
Am I misunderstanding the point of Except? Is there another extension I am not seeing to use?
I have looked through and tried a number of different posts on this matter with no success thus far.
var except = List1.Except(List2); //This is the line I have thus far
EDIT: Yes I was comparing simple objects. I have never used IEqualityComparer, it was interesting to learn about.
Thanks all for the help. The problem was not implementing the comparer. The linked blog post and example below where helpful.
If you are storing reference types in your list, you have to make sure there is a way to compare the objects for equality. Otherwise they will be checked by comparing if they refer to same address.
You can implement IEqualityComparer<T> and send it as a parameter to Except() function. Here's a blog post you may find helpful.
edit: the original blog post link was broken and has been replaced above
So just for completeness...
// Except gives you the items in the first set but not the second
var InList1ButNotList2 = List1.Except(List2);
var InList2ButNotList1 = List2.Except(List1);
// Intersect gives you the items that are common to both lists
var InBothLists = List1.Intersect(List2);
Edit: Since your lists contain objects you need to pass in an IEqualityComparer for your class... Here is what your except will look like with a sample IEqualityComparer based on made up objects... :)
// Except gives you the items in the first set but not the second
var equalityComparer = new MyClassEqualityComparer();
var InList1ButNotList2 = List1.Except(List2, equalityComparer);
var InList2ButNotList1 = List2.Except(List1, equalityComparer);
// Intersect gives you the items that are common to both lists
var InBothLists = List1.Intersect(List2);
public class MyClass
{
public int i;
public int j;
}
class MyClassEqualityComparer : IEqualityComparer<MyClass>
{
public bool Equals(MyClass x, MyClass y)
{
return x.i == y.i &&
x.j == y.j;
}
public int GetHashCode(MyClass obj)
{
unchecked
{
if (obj == null)
return 0;
int hashCode = obj.i.GetHashCode();
hashCode = (hashCode * 397) ^ obj.i.GetHashCode();
return hashCode;
}
}
}
You simply confused the order of arguments. I can see where this confusion arose, because the official documentation isn't as helpful as it could be:
Produces the set difference of two sequences by using the default equality comparer to compare values.
Unless you're versed in set theory, it may not be clear what a set difference actually is—it's not simply what's different between the sets. In reality, Except returns the list of elements in the first set that are not in the second set.
Try this:
var except = List2.Except(List1); // { C }
Writing a custom comparer does seem to solve the problem, but I think https://stackoverflow.com/a/12988312/10042740 is a much more simple and elegant solution.
It overwrites the GetHashCode() and Equals() methods in your object defining class, then the default comparer does its magic without extra code cluttering up the place.
Just for Ref:
I wanted to compare USB Drives connected and available to the system.
So this is the class which implements interface IEqualityComparer
public class DriveInfoEqualityComparer : IEqualityComparer<DriveInfo>
{
public bool Equals(DriveInfo x, DriveInfo y)
{
if (object.ReferenceEquals(x, y))
return true;
if (x == null || y == null)
return false;
// compare with Drive Level
return x.VolumeLabel.Equals(y.VolumeLabel);
}
public int GetHashCode(DriveInfo obj)
{
return obj.VolumeLabel.GetHashCode();
}
}
and you can use it like this
var newDeviceLst = DriveInfo.GetDrives()
.ToList()
.Except(inMemoryDrives, new DriveInfoEqualityComparer())
.ToList();

C# Dictionary throws KeyNotFoundException when key exists

I am storing a 2D array that represents a distance matrix between vectors as a Dictionary<DistanceCell, double>. My implementation of DistanceCell has two string fields representing the vectors being compared.
class DistanceCell
{
public string Group1 { get; private set; }
public string Group2 { get; private set; }
public DistanceCell(string group1, string group2)
{
if (group1 == null)
{
throw new ArgumentNullException("group1");
}
if (group2 == null)
{
throw new ArgumentNullException("group2");
}
this.Group1 = group1;
this.Group2 = group2;
}
}
Since I'm using this class as a key, I overrode Equals() and GetHashCode():
public override bool Equals(object obj)
{
// False if the object is null
if (obj == null)
{
return false;
}
// Try casting to a DistanceCell. If it fails, return false;
DistanceCell cell = obj as DistanceCell;
if (cell == null)
{
return false;
}
return (this.Group1 == cell.Group1 && this.Group2 == cell.Group2)
|| (this.Group1 == cell.Group2 && this.Group2 == cell.Group1);
}
public bool Equals(DistanceCell cell)
{
if (cell == null)
{
return false;
}
return (this.Group1 == cell.Group1 && this.Group2 == cell.Group2)
|| (this.Group1 == cell.Group2 && this.Group2 == cell.Group1);
}
public static bool operator ==(DistanceCell a, DistanceCell b)
{
// If both are null, or both are same instance, return true.
if (System.Object.ReferenceEquals(a, b))
{
return true;
}
// If either is null, return false.
// Cast a and b to objects to check for null to avoid calling this operator method
// and causing an infinite loop.
if ((object)a == null || (object)b == null)
{
return false;
}
return (a.Group1 == b.Group1 && a.Group2 == b.Group2)
|| (a.Group1 == b.Group2 && a.Group2 == b.Group1);
}
public static bool operator !=(DistanceCell a, DistanceCell b)
{
return !(a == b);
}
public override int GetHashCode()
{
int hash;
unchecked
{
hash = Group1.GetHashCode() * Group2.GetHashCode();
}
return hash;
}
As you can see, one of the requirements of DistanceCell is that Group1 and Group2 are interchangeable. So for two strings x and y, DistanceCell("x", "y") must equal DistanceCell("y", "x"). This is why I implemented GetHashCode() with multiplication because DistanceCell("x", "y").GetHashCode() must equal DistanceCell("y", "x").GetHashCode().
The problem that I'm having is that it works correctly roughly 90% of the time, but it throws a KeyNotFoundException or a NullReferenceException the remainder of the time. The former is thrown when getting a key from the dictionary, and the latter when I iterate over the dictionary with a foreach loop and it retrieves a key that is null that it then tries to call Equals() on. I suspect this has something to do with an error in my GetHashCode() implementation, but I'm not positive. Also note that there should never be a case where the key wouldn't exist in the dictionary when I check it because of the nature of my algorithm. The algorithm takes the same path every execution.
Update
I just wanted to update everyone that the problem is fixed. It turns out that it had nothing to do with my Equals() or GetHashCode() implementation. I did some extensive debugging and found that the reason I was getting a KeyNotFoundException was because the key didn't exist in the dictionary in the first place, which was strange because I was positive that it was being added. The problem was that I was adding the keys to the dictionary with multiple threads, and according to this, the c# Dictionary class isn't thread-safe. So the timing must have been perfect that an Add() failed and thus the key was never added to the dictionary. I believe this can also explain how the foreach loop was bringing up a null key on occasion. The Add()'s from multiple threads must have messed up some internal structures in the dictionary and introduced a null key.
Thanks to every one for your help! I'm sorry that it ended up being a fault entirely on my end.
Take a look at this blog post by Eric Lippert
It says that the result of GetHashCode should never change even if you change the content of the object. The reason is that Dictionary is using buckets for faster indexing. Once you change the GetHasCode result, the Dictionary will not be able to find the right bucket for your object and this can lead to "Key not found"
I might be wrong, but it is worth testing.
I believe what is missing for you is this answer
Using an object as a generic Dictionary key
You probably not stating you implement IEquatable interface
To me your code looks correct. (It can possibly be (micro-)optimized, but it looks as if it will be correct in all cases.) Can you provide some code where you create and use the Dictionary<DistanceCell, double>, and things go bad? The problem may lie in code you haven't shown us.

Why do 2 delegate instances return the same hashcode?

Take the following:
var x = new Action(() => { Console.Write("") ; });
var y = new Action(() => { });
var a = x.GetHashCode();
var b = y.GetHashCode();
Console.WriteLine(a == b);
Console.WriteLine(x == y);
This will print:
True
False
Why is the hashcode the same?
It is kinda surprising, and will make using delegates in a Dictionary as slow as a List (aka O(n) for lookups).
Update:
The question is why. IOW who made such a (silly) decision?
A better hashcode implementation would have been:
return Method ^ Target == null ? 0 : Target.GetHashcode();
// where Method is IntPtr
Easy! Since here is the implementation of the GetHashCode (sitting on the base class Delegate):
public override int GetHashCode()
{
return base.GetType().GetHashCode();
}
(sitting on the base class MulticastDelegate which will call above):
public sealed override int GetHashCode()
{
if (this.IsUnmanagedFunctionPtr())
{
return ValueType.GetHashCodeOfPtr(base._methodPtr);
}
object[] objArray = this._invocationList as object[];
if (objArray == null)
{
return base.GetHashCode();
}
int num = 0;
for (int i = 0; i < ((int) this._invocationCount); i++)
{
num = (num * 0x21) + objArray[i].GetHashCode();
}
return num;
}
Using tools such as Reflector, we can see the code and it seems like the default implementation is as strange as we see above.
The type value here will be Action. Hence the result above is correct.
UPDATE
My first attempt of a better implementation:
public class DelegateEqualityComparer:IEqualityComparer<Delegate>
{
public bool Equals(Delegate del1,Delegate del2)
{
return (del1 != null) && del1.Equals(del2);
}
public int GetHashCode(Delegate obj)
{
if(obj==null)
return 0;
int result = obj.Method.GetHashCode() ^ obj.GetType().GetHashCode();
if(obj.Target != null)
result ^= RuntimeHelpers.GetHashCode(obj);
return result;
}
}
The quality of this should be good for single cast delegates, but not so much for multicast delegates (If I recall correctly Target/Method return the values of the last element delegate).
But I'm not really sure if it fulfills the contract in all corner cases.
Hmm it looks like quality requires referential equality of the targets.
This smells like some of the cases mentioned in this thread, maybe it will give you some pointers on this behaviour. else, you could log it there :-)
What's the strangest corner case you've seen in C# or .NET?
Rgds GJ
From MSDN :
The default implementation of
GetHashCode does not guarantee
uniqueness or consistency; therefore,
it must not be used as a unique object
identifier for hashing purposes.
Derived classes must override
GetHashCode with an implementation
that returns a unique hash code. For
best results, the hash code must be
based on the value of an instance
field or property, instead of a static
field or property.
So if you have not overwritten the GetHashCode method, it may return the same. I suspect this is because it generates it from the definition, not the instance.

How can you do custom sorting in LINQ with null always on the end?

I need to sort in memory lists of strings or numbers in ascending or descending order. However, the list can contain null values and all null values must appear after the numbers or strings.
That is the input data might be:
1, 100, null, 5, 32.3
The ascending result would be
1, 5, 32.3, 100, null
The descending list would be
100, 32.3, 5, 1, null
Any ideas on how to make this work?
I don't have a compiler in front of me to check, but I'm thinking something like:
x.OrderBy(i => i == null).ThenBy(i => i)
You can write your own comparer which delegates to an existing one for non-nulls, but always sorts nulls at the end. Something like this:
public class NullsLastComparer<T> : IComparer<T>
{
private readonly IComparer<T> proxy;
public NullsLastComparer(IComparer<T> proxy)
{
this.proxy = proxy;
}
public override int Compare(T first, T second)
{
if (first == null && second == null)
{
return 0;
}
if (first == null)
{
return 1;
}
if (second == null)
{
return -1;
}
return proxy.Compare(first, second);
}
}
EDIT: A few issues with this approach:
Firstly, it doesn't play nicely with anonymous types; you might need a separate extension method to make that work well. Or use Ken's answer :)
More importantly, it violates the IComparer<T> contract, which specifies that nulls should be first. Now personally, I think this is a mistake in the IComparer<T> specification - it should perhaps define that the comparer should handle nulls, but it should not specify whether they come first or last... it makes requirements like this (which are perfectly reasonable) impossible to fulfil as cleanly as we might want, and has all kinds of awkward ramifications for things like a reversing comparer. You'd expect such a thing to completely reverse the ordering, but according to the spec it should still maintain nulls at the start :(
I don't think I've seen any .NET sorting implementations which actually rely on this, but it's definitely worth being aware of.
As Jon said, you need to define your custom Comparer, implementing IComparer. Here's how your Compare method in the custom comparer would like that can keep null at the end.
public int Compare(Object x, Object y)
{
int retVal = 0;
IComparable valX = x as IComparable;
IComparable valY = y as IComparable;
if (valX == null && valY == null)
{
return 0;
}
if (valX == null)
{
return 1;
}
else if (valY == null)
{
return -1;
}
return valX.CompareTo(valY);
}

Issue with custom EqualityComparer

I have a class with fields ColDescriptionOne(string), ColDescriptionTwo(string) and ColCodelist(int). I want to get the Intersect of two lists of this class where the desc are equal but the codelist is different.
I can use the Where clause and get what I need. However I can't seem to make it work using a custom Comparer like this:
internal class CodeListComparer: EqualityComparer<SheetRow>
{
public override bool Equals(SheetRow x, SheetRow y)
{
return Equals(x.ColDescriptionOne, y.ColDescriptionOne) &&
Equals(x.ColDescriptionSecond, y.ColDescriptionOne)
&& !Equals(x.ColCodelist, y.ColCodelist);
}
public override int GetHashCode(SheetRow obj)
{
return ((obj.ColDescriptionOne.GetHashCode()*397) + (obj.ColDescriptionSecond.GetHashCode()*397)
+ obj.ColCodelist.GetHashCode());
}
}
And then use it like this:
var onylByCodeList = firstSheet.Entries.Intersect(otherSheet.Entries, new CodeListComparer());
Any ideas what I'm doing wrong here ?
thanks
Sunit
You have a typo in the Equals method. The second line is comparing ColDescriptionOne to ColDescriptionSecond. They should both be ColDescriptionSecond.
return Equals(x.ColDescriptionOne, y.ColDescriptionOne)
&& Equals(x.ColDescriptionSecond, y.ColDescriptionSecond)
&& !Equals(x.ColCodelist, y.ColCodelist);
The second problem you have is you are including the ColCodeList in the GetHashCode method. The GetHashCode method must return the same value for objects that are equal. In this case though ColCodeList is supposed to be different when values are equal. This means that in cases where you want 2 objects to be considered equal they are more likely to have different hash codes which is incorrect.
Take that out of the GetHashCode method and everything should work.
public override int GetHashCode(SheetRow obj)
{
return ((obj.ColDescriptionOne.GetHashCode()*397)
+ (obj.ColDescriptionSecond.GetHashCode()*397));
}

Categories

Resources