I'm working on implementing GetHashCode() based on the HashCode struct in this answer here. Since my Equals method will consider collections using Enumerable.SequenceEqual(), I need to include the collections in my GetHashCode() implementation.
As a starting point, I'm using Jon Skeet's embedded GetHashCode() implementation to test the output of the HashCode struct implementation. This works as expected using the following test below -
private class MyObjectEmbeddedGetHashCode
{
public int x;
public string y;
public DateTimeOffset z;
public List<string> collection;
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 31 + x.GetHashCode();
hash = hash * 31 + y.GetHashCode();
hash = hash * 31 + z.GetHashCode();
return hash;
}
}
}
private class MyObjectUsingHashCodeStruct
{
public int x;
public string y;
public DateTimeOffset z;
public List<string> collection;
public override int GetHashCode()
{
return HashCode.Start
.Hash(x)
.Hash(y)
.Hash(z);
}
}
[Test]
public void GetHashCode_CollectionExcluded()
{
DateTimeOffset now = DateTimeOffset.Now;
MyObjectEmbeddedGetHashCode a = new MyObjectEmbeddedGetHashCode()
{
x = 1,
y = "Fizz",
z = now,
collection = new List<string>()
{
"Foo",
"Bar",
"Baz"
}
};
MyObjectUsingHashCodeStruct b = new MyObjectUsingHashCodeStruct()
{
x = 1,
y = "Fizz",
z = now,
collection = new List<string>()
{
"Foo",
"Bar",
"Baz"
}
};
Console.WriteLine("MyObject::GetHashCode(): {0}", a.GetHashCode());
Console.WriteLine("MyObjectEx::GetHashCode(): {0}", b.GetHashCode());
Assert.AreEqual(a.GetHashCode(), b.GetHashCode());
}
The next step is to consider the collection in the GetHashCode() calculation. This requires a small addition to the GetHashCode() implementation in MyObjectEmbeddedGetHashCode.
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 31 + x.GetHashCode();
hash = hash * 31 + y.GetHashCode();
hash = hash * 31 + z.GetHashCode();
int collectionHash = 17;
foreach (var item in collection)
{
collectionHash = collectionHash * 31 + item.GetHashCode();
}
hash = hash * 31 + collectionHash;
return hash;
}
}
However, this is a little bit more difficult in the HashCode struct. In this example, when a collection of type List is passed into the Hash method, T is List so trying to cast obj to ICollection or IEnumberable doesn't work. I can successfully cast to IEnumerable, but it causes boxing and I found I have to worry about excluding types like string that implement IEnumerable.
Is there a way to reliably cast obj to ICollection or IEnumerable in this scenario?
public struct HashCode
{
private readonly int hashCode;
public HashCode(int hashCode)
{
this.hashCode = hashCode;
}
public static HashCode Start
{
get { return new HashCode(17); }
}
public static implicit operator int(HashCode hashCode)
{
return hashCode.GetHashCode();
}
public HashCode Hash<T>(T obj)
{
// I am able to detect if obj implements one of the lower level
// collection interfaces. However, I am not able to cast obj to
// one of them since T in this case is defined as List<string>,
// so using as to cast obj to ICollection<T> or IEnumberable<T>
// doesn't work.
var isGenericICollection = obj.GetType().GetInterfaces().Any(
x => x.IsGenericType &&
x.GetGenericTypeDefinition() == typeof(ICollection<>));
var c = EqualityComparer<T>.Default;
// This works but using IEnumerable causes boxing.
// var h = c.Equals(obj, default(T)) ? 0 : ( !(obj is string) && (obj is IEnumerable) ? GetCollectionHashCode(obj as IEnumerable) : obj.GetHashCode());
var h = c.Equals(obj, default(T)) ? 0 : obj.GetHashCode();
unchecked { h += this.hashCode * 31; }
return new HashCode(h);
}
public override int GetHashCode()
{
return this.hashCode;
}
}
You can address the collection issue in a couple of ways:
Use a non-generic interface, e.g. ICollection or IEnumerable.
Add an overload for the Hash() method, e.g. Hash<T>(IEnumerable<T> list) { ... }
That said, IMHO it would be better to just leave the struct HashCode alone and put the collection-specific code in your actual GetHashCode() method. E.g.:
public override int GetHashCode()
{
HashCode hash = HashCode.Start
.Hash(x)
.Hash(y)
.Hash(z);
foreach (var item in collection)
{
hash = hash.Hash(item);
}
return hash;
}
If you do want a full-featured version of the struct HashCode type, it looks to me as though that same page you referenced has one: https://stackoverflow.com/a/2575444/3538012
The naming of the members is different, but it's basically the same idea as the struct HashCode type, but with overloads for other complex types (as in my suggestion #2 above). You could use that, or just apply the techniques there to your implementation of struct HashCode, preserving the naming conventions used in it.
Related
I'm looking at how build the best HashCode for a class and I see some algorithms. I saw this one : Hash Code implementation, seems to be that .NET classes HashCode methods are similar (see by reflecting the code).
So question is, why don't create the above static class in order to build a HashCode automatically, just by passing fields we consider as a "key".
// Old version, see edit
public static class HashCodeBuilder
{
public static int Hash(params object[] keys)
{
if (object.ReferenceEquals(keys, null))
{
return 0;
}
int num = 42;
checked
{
for (int i = 0, length = keys.Length; i < length; i++)
{
num += 37;
if (object.ReferenceEquals(keys[i], null))
{ }
else if (keys[i].GetType().IsArray)
{
foreach (var item in (IEnumerable)keys[i])
{
num += Hash(item);
}
}
else
{
num += keys[i].GetHashCode();
}
}
}
return num;
}
}
And use it as like this :
// Old version, see edit
public sealed class A : IEquatable<A>
{
public A()
{ }
public string Key1 { get; set; }
public string Key2 { get; set; }
public string Value { get; set; }
public override bool Equals(object obj)
{
return this.Equals(obj as A);
}
public bool Equals(A other)
{
if(object.ReferenceEquals(other, null))
? false
: Key1 == other.Key1 && Key2 == other.Key2;
}
public override int GetHashCode()
{
return HashCodeBuilder.Hash(Key1, Key2);
}
}
Will be much simpler that always is own method, no? I'm missing something?
EDIT
According all remarks, I got the following code :
public static class HashCodeBuilder
{
public static int Hash(params object[] args)
{
if (args == null)
{
return 0;
}
int num = 42;
unchecked
{
foreach(var item in args)
{
if (ReferenceEquals(item, null))
{ }
else if (item.GetType().IsArray)
{
foreach (var subItem in (IEnumerable)item)
{
num = num * 37 + Hash(subItem);
}
}
else
{
num = num * 37 + item.GetHashCode();
}
}
}
return num;
}
}
public sealed class A : IEquatable<A>
{
public A()
{ }
public string Key1 { get; set; }
public string Key2 { get; set; }
public string Value { get; set; }
public override bool Equals(object obj)
{
return this.Equals(obj as A);
}
public bool Equals(A other)
{
if(ReferenceEquals(other, null))
{
return false;
}
else if(ReferenceEquals(this, other))
{
return true;
}
return Key1 == other.Key1
&& Key2 == other.Key2;
}
public override int GetHashCode()
{
return HashCodeBuilder.Hash(Key1, Key2);
}
}
Your Equals method is broken - it's assuming that two objects with the same hash code are necessarily equal. That's simply not the case.
Your hash code method looked okay at a quick glance, but could actually do some with some work - see below. It means boxing any value type values and creating an array any time you call it, but other than that it's okay (as SLaks pointed out, there are some issues around the collection handling). You might want to consider writing some generic overloads which would avoid those performance penalties for common cases (1, 2, 3 or 4 arguments, perhaps). You might also want to use a foreach loop instead of a plain for loop, just to be idiomatic.
You could do the same sort of thing for equality, but it would be slightly harder and messier.
EDIT: For the hash code itself, you're only ever adding values. I suspect you were trying to do this sort of thing:
int hash = 17;
hash = hash * 31 + firstValue.GetHashCode();
hash = hash * 31 + secondValue.GetHashCode();
hash = hash * 31 + thirdValue.GetHashCode();
return hash;
But that multiplies the hash by 31, it doesn't add 31. Currently your hash code will always return the same for the same values, whether or not they're in the same order, which isn't ideal.
EDIT: It seems there's some confusion over what hash codes are used for. I suggest that anyone who isn't sure reads the documentation for Object.GetHashCode and then Eric Lippert's blog post about hashing and equality.
This is what I'm using:
public static class ObjectExtensions
{
/// <summary>
/// Simplifies correctly calculating hash codes based upon
/// Jon Skeet's answer here
/// http://stackoverflow.com/a/263416
/// </summary>
/// <param name="obj"></param>
/// <param name="memberThunks">Thunks that return all the members upon which
/// the hash code should depend.</param>
/// <returns></returns>
public static int CalculateHashCode(this object obj, params Func<object>[] memberThunks)
{
// Overflow is okay; just wrap around
unchecked
{
int hash = 5;
foreach (var member in memberThunks)
hash = hash * 29 + member().GetHashCode();
return hash;
}
}
}
Example usage:
public class Exhibit
{
public virtual Document Document { get; set; }
public virtual ExhibitType ExhibitType { get; set; }
#region System.Object
public override bool Equals(object obj)
{
return Equals(obj as Exhibit);
}
public bool Equals(Exhibit other)
{
return other != null &&
Document.Equals(other.Document) &&
ExhibitType.Equals(other.ExhibitType);
}
public override int GetHashCode()
{
return this.CalculateHashCode(
() => Document,
() => ExhibitType);
}
#endregion
}
Instead of calling keys[i].GetType().IsArray, you should try to cast it to IEnumerable (using the as keyword).
You can fix the Equals method without repeating the field list by registering a static list of fields, like I do here using a collection of delegates.
This also avoids the array allocation per-call.
Note, however, that my code doesn't handle collection properties.
I have two objects with these definitions:
public static Dictionary<string, Container> cont1 = new Dictionary<string, Container>();
public static Dictionary<string, Container> cont2 = new Dictionary<string, Container>();
The schema of Container class is as following:
public class Container
{
public string IDx { get; set; }
public string IDy { get; set; }
public string Name { get; set; }
public Dictionary<string, Sub> Subs = new Dictionary<string, Sub>();
}
public class Sub
{
public string Namex { get; set; }
public string Namey { get; set; }
public string Value { get; set; }
public Dictionary<string, string> Paths { get; set; }
}
My question is: How can I deep check the equity of cont1 and cont2? I mean the equality of every member and value even deep down within Subs objects;
Is there any functionality in c# for such situations or I have to write a custom method for checking equality based on the structure of the objects myself;
Second Question: I can obviate the equality problem if I can create two different copies of Products; I mean say we have a base Container object with all the members and values and then create two separate copies of Container, namely cont1 and cont2 which changing a value in cont1 wont change the same value in cont2.
Note1: this method for cloning is not working:
cont2 = new Dictionary<string, Container>(cont1);
Note2: most of the proposed methods in other answers are based on a one level dictionary (using loops or LINQ for checking) and not such a case when we have properties and dictionary objects (which having their own properties) within the object.
A Dictionary is a Sequence, so in general what you're probably looking for is Enumerable<T>.SequenceEquals which allows passing in an IEquityComparer<T>.
Your sequence (Dictionary) is an IEnumerable<KeyValuePair<string,Container>> so you need an comparer which implements IEquityComparer<IEnumerable<KeyValuePair<string,Container>>> (Thats a lot of angle braces!).
var equal = cont1.SequenceEquals(cont2, new StringContainerPairEquityComparer());
Note that the order of elements dictionaries is not guaranteed, so to use the method properly you should probably use OrderBy before comparing sequences - however this adds to the inefficiency of this method.
For your second question, what you're trying to do is Clone the dictionary. In general your Container should implement ICloneable interface, which you can then use to create a copy
var cont2 = cont1.ToDictionary(k => k.Key, v => v.Value.Clone());
Yes, you have to write a custom method for checking equality based on the structure of the objects yourself. I would provide a custom IEqualityComparer<Container> and an IEqualityComparer<Sub> like here (GetHashCode implementation based on this):
public class ContainerCheck : IEqualityComparer<Container>
{
private SubCheck subChecker = new SubCheck();
public bool Equals(Container x, Container y)
{
if (ReferenceEquals(x, y))
return true;
if (x == null || y == null)
return false;
if (x.IDx != y.IDx || x.IDy != y.IDy || x.Name != y.Name)
return false;
// check dictionary
if (ReferenceEquals(x.Subs, y.Subs))
return true;
if (x.Subs == null || y.Subs == null || x.Subs.Count != y.Subs.Count)
return false;
foreach (var kv in x.Subs)
if (!y.Subs.ContainsKey(kv.Key) || subChecker.Equals(y.Subs[kv.Key], kv.Value))
return false;
return true;
}
public int GetHashCode(Container obj)
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + obj.IDx.GetHashCode();
hash = hash * 23 + obj.IDy.GetHashCode();
hash = hash * 23 + obj.Name.GetHashCode();
foreach (var kv in obj.Subs)
{
hash = hash * 23 + kv.Key.GetHashCode();
hash = hash * 23 + subChecker.GetHashCode(kv.Value);
}
return hash;
}
}
}
public class SubCheck : IEqualityComparer<Sub>
{
public bool Equals(Sub x, Sub y)
{
if (ReferenceEquals(x, y))
return true;
if (x == null || y == null)
return false;
if (x.Namex != y.Namex || x.Namey != y.Namey || x.Value != y.Value)
return false;
// check dictionary
if (ReferenceEquals(x.Paths, y.Paths))
return true;
if (x.Paths == null || y.Paths == null || x.Paths.Count != y.Paths.Count)
return false;
foreach(var kv in x.Paths)
if (!y.Paths.ContainsKey(kv.Key) || y.Paths[kv.Key] != kv.Value)
return false;
return true;
}
public int GetHashCode(Sub obj)
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + obj.Namex.GetHashCode();
hash = hash * 23 + obj.Namey.GetHashCode();
hash = hash * 23 + obj.Value.GetHashCode();
foreach (var kv in obj.Paths)
{
hash = hash * 23 + kv.Key.GetHashCode();
hash = hash*23 + kv.Value.GetHashCode();
}
return hash;
}
}
}
This should deep check all properties and the dictionaries. Then you could use following loop to compare both dictionaries with each other:
bool equal = true;
var allKeys = cont1.Keys.Concat(cont2.Keys).ToList();
var containerChecker = new ContainerCheck();
foreach (string key in allKeys)
{
Container c1;
Container c2;
if (!cont1.TryGetValue(key, out c1) || !cont2.TryGetValue(key, out c2))
{
equal = false;
}
else
{
// deep check both containers
if (!containerChecker.Equals(c1, c2))
equal = false;
}
if(!equal)
break; // or collect differences
}
I've got a class which consists of two strings and an enum. I'm trying to use instances of this class as keys in a dictionary. Unfortunately I don't seem to be implementing IEquatable properly. Here's how I've done it:
public enum CoinSide
{
Heads,
Tails
}
public class CoinDetails : IComparable, IEquatable<CoinDetails>
{
private string denomination;
private string design;
private CoinSide side;
//...
public int GetHashCode(CoinDetails obj)
{
return string.Concat(obj.Denomination, obj.Design, obj.Side.ToString()).GetHashCode();
}
public bool Equals(CoinDetails other)
{
return (this.Denomination == other.Denomination && this.Design == other.Design && this.Side == other.Side);
}
}
However, I still can't seem to look up items in my dictionary. Additionally, the following tests fail:
[TestMethod]
public void CoinDetailsHashCode()
{
CoinDetails a = new CoinDetails("1POUND", "1997", CoinSide.Heads);
CoinDetails b = new CoinDetails("1POUND", "1997", CoinSide.Heads);
Assert.AreEqual(a.GetHashCode(), b.GetHashCode());
}
[TestMethod]
public void CoinDetailsCompareForEquality()
{
CoinDetails a = new CoinDetails("1POUND", "1997", CoinSide.Heads);
CoinDetails b = new CoinDetails("1POUND", "1997", CoinSide.Heads);
Assert.AreEqual<CoinDetails>(a, b);
}
Would someone be able to point out where I'm going wrong? I'm sure I'm missing something rather simple, but I'm not sure what.
You class has to override Equals and GetHashCode:
public class CoinDetails
{
private string Denomination;
private string Design;
private CoinSide Side;
public override bool Equals(object obj)
{
CoinDetails c2 = obj as CoinDetails;
if (c2 == null)
return false;
return Denomination == c2.Denomination && Design == c2.Design;
}
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 23 + (Denomination ?? "").GetHashCode();
hash = hash * 23 + (Design ?? "").GetHashCode();
return hash;
}
}
}
Note that i've also improved your GetHashCode algorithm according to: What is the best algorithm for an overridden System.Object.GetHashCode?
You could also pass a custom IEqualityComparer<CoinDetail> to the dictionary:
public class CoinComparer : IEqualityComparer<CoinDetails>
{
public bool Equals(CoinDetails x, CoinDetails y)
{
if (x == null || y == null) return false;
if(object.ReferenceEquals(x, y)) return true;
return x.Denomination == y.Denomination && x.Design == y.Design;
}
public int GetHashCode(CoinDetails obj)
{
unchecked
{
int hash = 17;
hash = hash * 23 + (obj.Denomination ?? "").GetHashCode();
hash = hash * 23 + (obj.Design ?? "").GetHashCode();
return hash;
}
}
}
Now this works and does not require CoinDetails to override Equals+GetHashCode:
var dict = new Dictionary<CoinDetails, string>(new CoinComparer());
dict.Add(new CoinDetails("1POUND", "1997"), "");
dict.Add(new CoinDetails("1POUND", "1997"), ""); // FAIL!!!!
i have the following code which doesnt seem to be working:
Context:
I have two lists of objects:
* listOne has 100 records
* listTwo has 70 records
many of them have the same Id property (in both lists);
var listOneOnlyItems = listOne.Except(listTwo, new ItemComparer ());
here is the comparer
public class ItemComparer : IEqualityComparer<Item>
{
public bool Equals(Item x, Item y)
{
if (x.Id == y.Id)
return true;
return false;
}
public int GetHashCode(Item obj)
{
return obj.GetHashCode();
}
}
after i run this code and look into the results
listOneOnlyItems
still has 100 records (should only have 30). Can anyone help me?
also, running
IEnumerable<Item> sharedItems = listOne.Intersect(listTwo, new ItemComparer());
returns zero reesults in the sharedItems collection
public int GetHashCode(Item obj)
{
return obj.Id.GetHashCode();
}
Worth a check at least -- IIRC GetHashCode() is tested first before equality, and if they don't have the same hash it won't bother checking equality. I'm not sure what to expect from obj.GetHashCode() -- it depends on what you've implemented on the Item class.
Consider making GetHashCode() return obj.Id.GetHashCode()
This code works fine:
static void TestLinqExcept()
{
var seqA = Enumerable.Range(1, 10);
var seqB = Enumerable.Range(1, 7);
var seqAexceptB = seqA.Except(seqB, new IntComparer());
foreach (var x in seqAexceptB)
{
Console.WriteLine(x);
}
}
class IntComparer: EqualityComparer<int>
{
public override bool Equals(int x, int y)
{
return x == y;
}
public override int GetHashCode(int x)
{
return x;
}
}
You need to add 'override' keywords to your EqualityComparer methods. (I think not having 'override' as implicit was a mistake on the part of the C# designers).
I used to use the apache hashcode builder a lot
Does this exist for C#
This is my homemade builder.
Usage:
hash = new HashCodeBuilder().
Add(a).
Add(b).
Add(c).
Add(d).
GetHashCode();
It does not matter what type fields a,b,c and d are, easy to extend, no need to create array.
Source:
public sealed class HashCodeBuilder
{
private int hash = 17;
public HashCodeBuilder Add(int value)
{
unchecked
{
hash = hash * 31 + value; //see Effective Java for reasoning
// can be any prime but hash * 31 can be opimised by VM to hash << 5 - hash
}
return this;
}
public HashCodeBuilder Add(object value)
{
return Add(value != null ? value.GetHashCode() : 0);
}
public HashCodeBuilder Add(float value)
{
return Add(value.GetHashCode());
}
public HashCodeBuilder Add(double value)
{
return Add(value.GetHashCode());
}
public override int GetHashCode()
{
return hash;
}
}
Sample usage:
public sealed class Point
{
private readonly int _x;
private readonly int _y;
private readonly int _hash;
public Point(int x, int y)
{
_x = x;
_y = y;
_hash = new HashCodeBuilder().
Add(_x).
Add(_y).
GetHashCode();
}
public int X
{
get { return _x; }
}
public int Y
{
get { return _y; }
}
public override bool Equals(object obj)
{
return Equals(obj as Point);
}
public bool Equals(Point other)
{
if (other == null) return false;
return (other._x == _x) && (other._y == _y);
}
public override int GetHashCode()
{
return _hash;
}
}
I use the following:
public static int ComputeHashFrom(params object[] obj) {
ulong res = 0;
for(uint i=0;i<obj.Length;i++) {
object val = obj[i];
res += val == null ? i : (ulong)val.GetHashCode() * (1 + 2 * i);
}
return (int)(uint)(res ^ (res >> 32));
}
Using such a helper is quick, easy and reliable, but it has potential two downsides (which you aren't likely to encounter frequently, but are good to be aware of):
It can generate poor hashcodes for some distributions of params. For instance, for any int x, ComputeHashFrom(x*-3, x) == 0 - so if your objects have certain pathological properties you may get many hash code collisions resulting in poorly performing Dictionaries and HashSets. It's not likely to happen, but a type-aware hash code computation can avoid such problems more easily.
The computation of the hashcode is slower than a specialized computation could be. In particular, it involved the allocation of the params array and a loop - which quite a bit of unnecessary overhead if you've just got two members to process.
Neither of the drawbacks causes any errors merely inefficiency; and both with show up in a profiler as blips in either this method or in the internals of the hash-code consumer.
C# doesn't have a built-in HashCode builder, but you can roll your own. I recently had this precise problem and created this hashcode generator that doesn't use boxing, by using generics, and implements a modified FNV algorithm for generating the specific hash. But you could use any algorithm you'd like, like one of those in System.Security.Cryptography.
public static int GetHashCode<T>(params T[] args)
{
return args.GetArrayHashCode();
}
public static int GetArrayHashCode<T>(this T[] objects)
{
int[] data = new int[objects.Length];
for (int i = 0; i < objects.Length; i++)
{
T obj = objects[i];
data[i] = obj == null ? 1 : obj.GetHashCode();
}
return GetFnvHash(data);
}
private static int GetFnvHash(int[] data)
{
unchecked
{
const int p = 16777619;
long hash = 2166136261;
for (int i = 0; i < data.Length; i++)
{
hash = (hash ^ data[i]) * p;
}
hash += hash << 13;
hash ^= hash >> 7;
hash += hash << 3;
hash ^= hash >> 17;
hash += hash << 5;
return (int)hash;
}
}
Microsoft recently released a class to compute hashcodes. Please see https://learn.microsoft.com/en-us/dotnet/api/system.hashcode. You need to include NuGet package Microsoft.Bcl.HashCode in your project to use it.
Usage example:
using System.Collections.Generic;
public class MyClass {
public int MyVar { get; }
public string AnotherVar { get; }
public object MoreVars;
public override int GetHashCode()
=> HashCode.Combine(MyVar, AnotherVar, MoreVars);
}
Nowadays I leverage ValueTuples, ref Tuples or anonymous types:
var hash = (1, "seven").GetHashCode();
var hash2 = Tuple.Create(1, "seven").GetHashCode();
var hash3 = new { Number = 1, String = "seven" }.GetHashCode();
I believe value tuples will be fastest.