This question is similar to the one here.
We all know what PointF is, don't we? This is the data structure:
public struct PointF
{
public float X;
public float Y;
}
How to implement IEqualityComparer<PointF> with tolerance? Let's say my Equals code is like this
public const float Epsilon = 0.01; //say
public bool Equals(PointF pt1, PointF pt2)
{
return Math.Abs(pt1.X-pt2.X)<Epsilon && Math.Abs(pt1.Y-pt2.Y)<Epsilon;
}
Question: How to implement the correct GetHashCode so that for a dictionary of PointF, I will access the element correctly?
I crack my head a few days but still can't find a satisfactory solution.
Instead of defining the tolerance by the distance, you could place the points in a grid.
If two points are in the same cell, they're considered equal and have the same hash code.
public bool Equals(PointF pt1, PointF pt2)
{
return GetCell(pt1.X) == GetCell(pt2.X)
&& GetCell(pt1.Y) == GetCell(pt2.Y);
}
public int GetHashCode(PointF pt)
{
return GetCell(pt.X) ^ GetCell(pt.Y);
}
private static int GetCell(float f)
{
return (int)(f / 10); // cell size is 10 pixels
}
Thesis: There is no implementation of Equals and GetHashCode that meets your requirements.
Proof: Consider the following three points, A, B, and C:
As per your requirements,
Equals(A, B) == true // (i)
Equals(B, C) == true // (ii)
Equals(A, C) == false // (iii)
GetHashCode(A) == GetHashCode(B) // (iv)
GetHashCode(B) == GetHashCode(C) // (v)
GetHashCode(A) != GetHashCode(C) // (vi)
But from (iv) and (v) follows
GetHashCode(A) == GetHashCode(C)
and thereby
Equals(A, C) == true
which contradicts (iii) and (vi).
Since Equals and GetHashCode cannot return different values for the same arguments, there is no implementation that meets your requirements.
q.e.d.
I don't think it's possible because you could have an infinite sequence of values that are equal (within tolerance) to the previous and next value in the sequence but not any other value and GetHashCode would need to return an identical value for all of them.
Well, the answer based on grids is good, but sometimes you need to group the close points anyway, even if they are not in the same grid cell. My approach is to implement this with a grouping: two points are in the same group if either they are close or there is a sequence of close points connecting them. This semantics cannot be done with a proper IEqualityComparer, because it needs to know all the items in advance before producing the groups. So I've done a simple LINQ-style operator GroupByCluster, which basically achieves this.
The code is here: http://ideone.com/8l0LH. It compiles on my VS 2010, but fails to compile on Mono because HashSet<> cannot be implicitly converted to IEnumerable<> (why?).
The approach is generic and thus not very efficient: it's quadratic on input size. For the concrete types it can be made more efficient: for example, for T = double we can just sort the input array and have O(n log n) performance. The analogous though more complicated trick is applicable for 2D points as well.
Note aside: your initial proposition is impossible to implement with IEqualityComparer, since your "approximate equality" is not transitive (but the equality in IEqualityComparer has to be so).
Related
I'm currently following a Unity course. In one of the lessons, the lecturer uses an anonymous function that sorts the results by actionValue.
Here's the relevant code:
public EnemyAIAction GetBestEnemyAIAction()
{
List<EnemyAIAction> enemyAIActionList = new List<EnemyAIAction>();
List<GridPosition> validActionGridPositionList = GetValidActionGridPositionList();
foreach(GridPosition gridPosition in validActionGridPositionList)
{
EnemyAIAction enemyAIAction = GetEnemyAIAction(gridPosition);
enemyAIActionList.Add(enemyAIAction);
}
enemyAIActionList.Sort(
(EnemyAIAction a, EnemyAIAction b) => b.actionValue - a.actionValue
);
}
The lecturer doesn't bother explaining why this approach sorts the list by actionValue. I'm having trouble understanding how, exactly, subtracting the inputs from each other sorts the list by that input value.
The Sort method is declared as
public void Sort (Comparison<T> comparison);
Comparison<T> is declared as
public delegate int Comparison<in T>(T x, T y);
According to the documentation, it returns A signed integer that indicates the relative values of x and y, as shown in the following table.
Value
Meaning
Less than 0
x is less than y.
0
x equals y.
Greater than 0
x is greater than y.
I.e., the Sort method expects a delegate. You can think of a delegate as the address of a function. In this specific case the function accepts two items of the list as input parameters. The return value is a negative int when x is less than y, 0 when both items are considered as equal, and a positive int when x is greater than y.
Now you could declare your own method like this to sort in ascending order:
int EnemyAIActionComparison(EnemyAIAction x, EnemyAIAction y)
{
if (x.actionValue > y.actionValue) return +1;
if (x.actionValue < y.actionValue) return -1;
return 0; // both are equal
}
Since it does not matter how large the result is (only the sign matters), you could simply write
int EnemyAIActionComparison(EnemyAIAction x, EnemyAIAction y)
{
return x.actionValue - y.actionValue;
}
Then call the Sort method like this:
enemyAIActionList.Sort(EnemyAIActionComparison);
Note that no braces must follow EnemyAIActionComparison() because we are not calling the method here, we are passing the method itself as a parameter to Sort. Sort then calls this method on many pairs of list items according to a sorting algorithm (e.g., Quick Sort) until the list is sorted.
Now, there is a shortcut in defining this method: you can use a lambda expression. A lambda expression is a very concise syntax for declaring an anonymous method on the fly.
So (x, y) => x.actionValue - y.actionValue is equivalent to the method above. The type of the parameters is inferred from the declaration int Comparison<in T>(T x, T y) and T is given in the declaration of the list. So, you do not need to specify it as in the example you have given. (Note that the names you give to the parameters does not matter. Specifically, they do not need to be the same as in the declaration of Comparison.)
If you want to sort in descending order, just swap the signs, i.e., swap the values in the subtraction.
Here is the equality comparer I just wrote because I wanted a distinct set of items from a list containing entities.
class InvoiceComparer : IEqualityComparer<Invoice>
{
public bool Equals(Invoice x, Invoice y)
{
// A
if (Object.ReferenceEquals(x, y)) return true;
// B
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null)) return false;
// C
return x.TxnID == y.TxnID;
}
public int GetHashCode(Invoice obj)
{
if (Object.ReferenceEquals(obj, null)) return 0;
return obj.TxnID2.GetHashCode();
}
}
Why does Distinct require a comparer as opposed to a Func<T,T,bool>?
Are (A) and (B) anything other than optimizations, and are there scenarios when they would not act the expected way, due to subtleness in comparing references?
If I wanted to, could I replace (C) with
return GetHashCode(x) == GetHashCode(y)
So it can use hashcodes to be O(n) as opposed to O(n2)
(A) is an optimization.
(B) is necessary; otherwise, it would throw an NullReferenceException.
If Invoice is a struct, however, they're both unnecessary and slower.
No. Hashcodes are not unique
A is a simple and quick way to ensure that both objects located at the same memory address so both references the same object.
B - if one of the references is null - obviuosly it does not make any sense doing equality comparision
C - no, sometimes GetHashCode() can return the same value for different objects (hash collision) so you should do equality comparison
Regarding the same hash code value for different objects, MSDN:
If two objects compare as equal, the GetHashCode method for each
object must return the same value. However, if two objects do not
compare as equal, the GetHashCode methods for the two object do not
have to return different values.
Distinct() basically works on the term "not equal". therefore, if your list contains non-primitiv types, you must implement your own EqualityComparer.
At A, you check out whether the objects are identical or not. If two objects are equal, they don't have to be identical, but if they are identical, you can be sure that they are equal. So the A part can increase the method's effectivity in some cases.
I'm writing an implementation of IComparable<T>.CompareTo(T) for a struct. I'm doing a member-wise comparison (i.e. d = a.CompareTo(other.a); if (d != 0) { return d; } etc.), but one of the members is of a class (let's call it Y) that doesn't implement IComparable or have any other reasonable way of comparing so I just want to compare the references (in this case I know that all instances of Y are unique). Is there a way of doing that that will yield an int that is suitable for use in a comparison method?
It's not meaningful to compare references looking for order relationships. It's only meaningful to look for equality.
Your statement that the class
doesn't implement IComparable or have
any other reasonable way of comparing
Seems to be contra-indicative of finding
a way of doing that that will yield an
int that is suitable for use in a
comparison method
If the objects cannot be reasonably compared for ordering, it is best to exclude them from the comparison logic entirely.
Comparing the results of Object.GetHashCode() works well enough for my purposes (I just care about reference equality and it doesn't matter where anything unequal is sorted. I want instances of T that have the same instances of Y as members to be next to each other in the sorted result). My T.CompareTo() uses the following helper:
static int Compare(object a, object b)
{
if (a == null)
{
return b == null ? 0 : 1;
}
if (b == null)
{
return -1;
}
return a.GetHashCode().CompareTo(b.GetHashCode());
}
My understanding is that you're typically supposed to use xor with GetHashCode() to produce an int to identify your data by its value (as opposed to by its reference). Here's a simple example:
class Foo
{
int m_a;
int m_b;
public int A
{
get { return m_a; }
set { m_a = value; }
}
public int B
{
get { return m_b; }
set { m_b = value; }
}
public Foo(int a, int b)
{
m_a = a;
m_b = b;
}
public override int GetHashCode()
{
return A ^ B;
}
public override bool Equals(object obj)
{
return this.GetHashCode() == obj.GetHashCode();
}
}
The idea being, I want to compare one instance of Foo to another based on the value of properties A and B. If Foo1.A == Foo2.A and Foo1.B == Foo2.B, then we have equality.
Here's the problem:
Foo one = new Foo(1, 2);
Foo two = new Foo(2, 1);
if (one.Equals(two)) { ... } // This is true!
These both produce a value of 3 for GetHashCode(), causing Equals() to return true. Obviously, this is a trivial example, and with only two properties I could simply compare the individual properties in the Equals() method. However, with a more complex class this would get out of hand quickly.
I know that sometimes it makes good sense to set the hash code only once, and always return the same value. However, for mutable objects where an evaluation of equality is necessary, I don't think this is reasonable.
What's the best way to handle property values that could easily be interchanged when implementing GetHashCode()?
See Also
What is the best algorithm for an overridden System.Object.GetHashCode?
First off - Do not implement Equals() only in terms of GetHashCode() - hashcodes will sometimes collide even when objects are not equal.
The contract for GetHashCode() includes the following:
different hashcodes means that objects are definitely not equal
same hashcodes means objects might be equal (but possibly might not)
Andrew Hare suggested I incorporate his answer:
I would recommend that you read this solution (by our very own Jon Skeet, by the way) for a "better" way to calculate a hashcode.
No, the above is relatively slow and
doesn't help a lot. Some people use
XOR (eg a ^ b ^ c) but I prefer the
kind of method shown in Josh Bloch's
"Effective Java":
public override int GetHashCode()
{
int hash = 23;
hash = hash*37 + craneCounterweightID;
hash = hash*37 + trailerID;
hash = hash*37 + craneConfigurationTypeCode.GetHashCode();
return hash;
}
The 23 and 37 are arbitrary numbers
which are co-prime.
The benefit of the above over the XOR
method is that if you have a type
which has two values which are
frequently the same, XORing those
values will always give the same
result (0) whereas the above will
differentiate between them unless
you're very unlucky.
As mentioned in the above snippet, you might also want to look at Joshua Bloch's book, Effective Java, which contains a nice treatment of the subject (the hashcode discussion applies to .NET as well).
Andrew has posted a good example for generating a better hash code, but also bear in mind that you shouldn't use hash codes as an equality check, since they are not guaranteed to be unique.
For a trivial example of why this is consider a double object. It has more possible values than an int so it is impossible to have a unique int for each double. Hashes are really just a first pass, used in situations like a dictionary when you need to find the key quickly, by first comparing hashes a large percentage of the possible keys can be ruled out and only the keys with matching hashes need to have the expense of a full equality check (or other collision resolution methods).
Hashing always involves collisions and you have to deal with it (f.e., compare hash values and if they are equal, exactly compare the values inside the classes to be sure the classes are equal).
Using a simple XOR, you'll get many collisions. If you want less, use some mathematical functions that distribute values across different bits (bit shifts, multiplying with primes etc.).
Read Overriding GetHashCode for mutable objects? C# and think about implementing IEquatable<T>
There are several better hash implementations. FNV hash for example.
Out of curiosity since hashcodes are typically a bad idea for comparison, wouldn't it be better to just do the following code, or am I missing something?
public override bool Equals(object obj)
{
bool isEqual = false;
Foo otherFoo = obj as Foo;
if (otherFoo != null)
{
isEqual = (this.A == otherFoo.A) && (this.B == otherFoo.B);
}
return isEqual;
}
A quick generate and good distribution of hash
public override int GetHashCode()
{
return A.GetHashCode() ^ B.GetHashCode(); // XOR
}
Overloading the comparison operator, how to compare if the two variables points to the same object(i.e. not value)
public static bool operator ==(Landscape a, Landscape b)
{
return a.Width == b.Width && a.Height == b.Height;
}
public static bool operator !=(Landscape a, Landscape b)
{
return !(a.Width == b.Width && a.Height == b.Height);
}
Use the Object.ReferenceEquals static method.
Of course, in order for the == and != method to retain their full functionality, you should also be overriding Equals and GetHashCode so that they return a consistent set of responses to callers.
Try a.ReferenceEquals(b);
To check whether both points to same object. You should use Object.ReferenceEquals method. It will return true if both are same or if both are null. Else it will return false
I know its an old question, but if you're going to overload the == or Object.Equals method, you should also overload the reverse operator !=.
And in this case, since you're comparing internal numbers, you should overload the other comparison operators <, >, <=, >=.
People who consume your class in the future, whether it be third-party consumers, or developers who take over your code, might use something like CodeRush or Refactor, that'll automatically "flip" the logic (also called reversing the conditional) and then flatten it, to break out of the 25 nested if's syndrome. If their code does that, and you've overloaded the == operator without overloading the != operator, it could change the intended meaning of your code.