Slow dictionary with custom class key - c#

I have a custom class that I was trying to use as a key for a dictionary:
// I tried setting more than enough capacity also...
var dict = new Dictionary<MyPoint, MyPoint>(capacity);
Now let me be clear, the goal here is to compare two SIMILAR but DIFFERENT lists, using X, Y, and Date as a composite key. The values will vary between these two lists, and I'm trying to quickly compare them and compute their differences.
Here is the class code:
public class MyPoint : IEquatable<MyPoint>
{
public short X { get; set; }
public short Y { get; set; }
public DateTime Date { get; set; }
public double MyValue { get; set; }
public override bool Equals(object obj)
{
return base.Equals(obj as MyPoint);
}
public bool Equals(MyPoint other)
{
if (other == null)
{
return false;
}
return (Date == other.Date)
&& (X == other.X)
&& (Y == other.Y);
}
public override int GetHashCode()
{
return Date.GetHashCode()
| X.GetHashCode()
| Y.GetHashCode();
}
}
I also tried keying with a struct:
public struct MyPointKey
{
public short X;
public short Y;
public DateTime Date;
// The value is not on these, because the struct is only used as key
}
In both cases dictionary writing was very, very slow (reading was quick).
I changed the key to a string, with the format:
var dict = new Dictionary<string, MyPoint>(capacity);
var key = string.Format("{0}_{1}", item.X, item.Y);
I was amazed at how much quicker this is -- it's at least 10 times faster. I tried Release mode, no debugger, and every scenario I could think of.
This dictionary will contain 350,000 or more items, so performance does matter.
Any thoughts or suggestions? Thanks!
Another edit...
I'm trying to compare two lists of things in the fastest way I can. This is what I'm working with. The Dictionary is important for fast lookups against the source list.
IList<MyThing> sourceList;
IDictionary<MyThing, MyThing> comparisonDict;
Parallel.ForEach(sourceList,
sourceItem =>
{
double compareValue = 0;
MyThing compareMatch = null;
if (comparisonDict.TryGetValue(sourceItem, out compareMatch))
{
compareValue = compareMatch.MyValue;
}
// Do a delta check on the item
double difference = sourceItem.MyValue- compareValue;
if (Math.Abs(difference) > 1)
{
// Record the difference...
}
});

As others have said in the comments, the problem is in your GetHashCode() implementation. Taking your code, and running 10,000,000 iterations with the string key took 11-12 seconds. Running with your existing hashCode I stopped it after over a minute. Using the following hashCode implementation took under 5 seconds.
public override int GetHashCode()
{
var hashCode = Date.GetHashCode();
hashCode = (hashCode * 37) ^ X.GetHashCode();
hashCode = (hashCode * 37) ^ Y.GetHashCode();
return hashCode;
}
The problem is that when you get into large numbers, the items are all colliding in the same buckets, due to the ORs. A dictionary where everything is in the same bucket is just a list.

If I got you right, you like to use a set while still maintaining the order of the keys. In this case, take SortedSet`1 instead.
Code:
class Program {
static void Main(string[] args) {
SortedSet<MyKey> list = new SortedSet<MyKey>() {
new MyKey(0, 0, new DateTime(2015, 6, 4)),
new MyKey(0, 1, new DateTime(2015, 6, 3)),
new MyKey(1, 1, new DateTime(2015, 6, 3)),
new MyKey(0, 0, new DateTime(2015, 6, 3)),
new MyKey(1, 0, new DateTime(2015, 6, 3)),
};
foreach(var entry in list) {
Console.WriteLine(string.Join(", ", entry.X, entry.Y, entry.Date));
}
Console.ReadKey();
}
}
I changed your MyPoint class as follows:
public sealed class MyKey : IEquatable<MyKey>, IComparable<MyKey> {
public readonly short X;
public readonly short Y;
public readonly DateTime Date;
public MyKey(short x, short y, DateTime date) {
this.X = x;
this.Y = y;
this.Date = date;
}
public override bool Equals(object that) {
return this.Equals(that as MyKey);
}
public bool Equals(MyKey that) {
if(that == null) {
return false;
}
return this.Date == that.Date
&& this.X == that.X
&& this.Y == that.Y;
}
public static bool operator ==(MyKey lhs, MyKey rhs) {
return lhs != null ? lhs.Equals(rhs) : rhs == null;
}
public static bool operator !=(MyKey lhs, MyKey rhs) {
return lhs != null ? !lhs.Equals(rhs) : rhs != null;
}
public override int GetHashCode() {
int result;
unchecked {
result = (int)X;
result = 31 * result + (int)Y;
result = 31 * result + Date.GetHashCode();
}
return result;
}
public int CompareTo(MyKey that) {
int result = this.X.CompareTo(that.X);
if(result != 0) {
return result;
}
result = this.Y.CompareTo(that.Y);
if(result != 0) {
return result;
}
result = this.Date.CompareTo(that.Date);
return result;
}
}
Output:
0, 0, 03.06.2015 00:00:00
0, 0, 04.06.2015 00:00:00
0, 1, 03.06.2015 00:00:00
1, 0, 03.06.2015 00:00:00
1, 1, 03.06.2015 00:00:00

Related

Unable to debug GetHashCode method

I have implemented a equality comparer in below manner.
class BoxEqualityComparer : IEqualityComparer<Box>
{
public bool Equals(Box b1, Box b2)
{
if (b2 == null && b1 == null)
return true;
else if (b1 == null | b2 == null)
return false;
else if(b1.Height == b2.Height && b1.Length == b2.Length
&& b1.Width == b2.Width)
return true;
else
return false;
}
public int GetHashCode(Box bx)
{
int hCode = bx.Height ^ bx.Length ^ bx.Width;
return hCode.GetHashCode();
}
}
Then I have created a Dictionary, in that I will add some values. So here it will compare object based on it's properties (height, width, length). I am getting the expected output. But I am wondering about the execution of GetHashCode method. I put a breakpoint in there, but I am unable to debug it. My question is when does GeHashCode method will be executed and how many times?
class Example
{
static void Main()
{
BoxEqualityComparer boxEqC = new BoxEqualityComparer();
var boxes = new Dictionary<Box, string>(boxEqC);
var redBox = new Box(4, 3, 4);
AddBox(boxes, redBox, "red");
var blueBox = new Box(4, 3, 4);
AddBox(boxes, blueBox, "blue");
var greenBox = new Box(3, 4, 3);
AddBox(boxes, greenBox, "green");
Console.WriteLine();
Console.WriteLine("The dictionary contains {0} Box objects.",
boxes.Count);
}
private static void AddBox(Dictionary<Box, String> dict, Box box, String name)
{
try {
dict.Add(box, name);
}
catch (ArgumentException e) {
Console.WriteLine("Unable to add {0}: {1}", box, e.Message);
}
}
}
public class Box
{
public Box(int h, int l, int w)
{
this.Height = h;
this.Length = l;
this.Width = w;
}
public int Height { get; set; }
public int Length { get; set; }
public int Width { get; set; }
public override String ToString()
{
return String.Format("({0}, {1}, {2})", Height, Length, Width);
}
}
See https://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs,fd1acf96113fbda9.
Add(key, value) calls the insert method, which in turn will always calculate a hash code via
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
So in other words, each call to Dictionary.Add should always trigger a calculation of the key's hash via the IEqualityComparer you provided.
As for your example code, this works fine for me, VS 2015 does break at BoxEqualityComparer.GetHashCode() for me.

How to check if key exists in a dictionary when the key itself is a dictionary (C#)?

I have a dictionary[mapData] as below and check key is a [Dictionary< string, DateTime >] exists in [mapData], but it did not work:
var mapData = new Dictionary<Dictionary<string, DateTime>, List<Student>>();
foreach (var st in listStudent)
{
// Create Key
var dicKey = new Dictionary<string, DateTime>();
dicKey.Add(st.Name, st.Birthday);
// Get mapData
if (!mapData.ContainsKey(dicKey)) // ===> Can not check key exists
{
mapData.Add(dicKey, new List<Student>());
}
mapData[dicKey].Add(st);
}
I tried with extension method as the below, but also not work:
public static bool Contains<Tkey>(this Dictionary<Tkey, List<Student>> dic, Tkey key)
{
if (dic.ContainsKey(key))
return true;
return false;
}
Any tips on these will be great help. Thanks in advance.
this is because you try to find the key based on object reference not the key content.
Your dictionnary key is a composite of Key + value (string + DateTime) reference.
this wont work unless you rewrite and IEqualityComparer.
var objectA = new ObjectA();
var objectB = new ObjectA();
objectA != objectB unless you rewritte the equals.
EDIT
This sample is comming from MSDN https://msdn.microsoft.com/en-us/library/ms132151(v=vs.110).aspx
using System;
using System.Collections.Generic;
class Example
{
static void Main()
{
BoxEqualityComparer boxEqC = new BoxEqualityComparer();
var boxes = new Dictionary<Box, string>(boxEqC);
var redBox = new Box(4, 3, 4);
AddBox(boxes, redBox, "red");
var blueBox = new Box(4, 3, 4);
AddBox(boxes, blueBox, "blue");
var greenBox = new Box(3, 4, 3);
AddBox(boxes, greenBox, "green");
Console.WriteLine();
Console.WriteLine("The dictionary contains {0} Box objects.",
boxes.Count);
}
private static void AddBox(Dictionary<Box, String> dict, Box box, String name)
{
try {
dict.Add(box, name);
}
catch (ArgumentException e) {
Console.WriteLine("Unable to add {0}: {1}", box, e.Message);
}
}
}
public class Box
{
public Box(int h, int l, int w)
{
this.Height = h;
this.Length = l;
this.Width = w;
}
public int Height { get; set; }
public int Length { get; set; }
public int Width { get; set; }
public override String ToString()
{
return String.Format("({0}, {1}, {2})", Height, Length, Width);
}
}
class BoxEqualityComparer : IEqualityComparer<Box>
{
public bool Equals(Box b1, Box b2)
{
if (b2 == null && b1 == null)
return true;
else if (b1 == null | b2 == null)
return false;
else if(b1.Height == b2.Height && b1.Length == b2.Length
&& b1.Width == b2.Width)
return true;
else
return false;
}
public int GetHashCode(Box bx)
{
int hCode = bx.Height ^ bx.Length ^ bx.Width;
return hCode.GetHashCode();
}
}
// The example displays the following output:
// Unable to add (4, 3, 4): An item with the same key has already been added.
//
// The dictionary contains 2 Box objects.
You don't want to go there. Complex objects are not meant to be keys in a dictionary. I would suggest to move the dictionary to the value and create a more tree-like structure.
There is one possibility to get this working, but I would advice against it for above reasons: the implementation of a custom IEqualityComparer which compares the keys in the dictionary against another dictionary.

create dictionary key from 3 integer values

I've got an object which has 3 integer values, combined the 3 integer are always unique. I want a quick way to find the specific object out of thousands.
my idea was to combine the 3 integers in a string so 1, 2533 and 9 would become a unique string: 1-2533-9. But is this the most efficient way? The numbers cannot be bigger than 2^16, so I could also use bit shifting and create a long which would be faster than creating a string from them I think. Are there other options? what should I do?
The main thing I want to achieve is finding the object quickly even with a collection of thousands of objects.
public class SomeClass
{
private readonly IDictionary<CompositeIntegralTriplet, object> _dictionary = new Dictionary<CompositeIntegralTriplet, object>();
}
public sealed class CompositeIntegralTriplet : IEquatable<CompositeIntegralTriplet>
{
public CompositeIntegralTriplet(int first, int second, int third)
{
First = first;
Second = second;
Third = third;
}
public int First { get; }
public int Second { get; }
public int Third { get; }
public override bool Equals(object other)
{
var otherAsTriplet = other as CompositeIntegralTriplet;
return Equals(otherAsTriplet);
}
public override int GetHashCode()
{
unchecked
{
var hashCode = First;
hashCode = (hashCode*397) ^ Second;
hashCode = (hashCode*397) ^ Third;
return hashCode;
}
}
public bool Equals(CompositeIntegralTriplet other) => other != null && First == other.First && Second == other.Second && Third == other.Third;
}

Class implementation of IEquatable for use as a key in a dictionary

I've got a class which consists of two strings and an enum. I'm trying to use instances of this class as keys in a dictionary. Unfortunately I don't seem to be implementing IEquatable properly. Here's how I've done it:
public enum CoinSide
{
Heads,
Tails
}
public class CoinDetails : IComparable, IEquatable<CoinDetails>
{
private string denomination;
private string design;
private CoinSide side;
//...
public int GetHashCode(CoinDetails obj)
{
return string.Concat(obj.Denomination, obj.Design, obj.Side.ToString()).GetHashCode();
}
public bool Equals(CoinDetails other)
{
return (this.Denomination == other.Denomination && this.Design == other.Design && this.Side == other.Side);
}
}
However, I still can't seem to look up items in my dictionary. Additionally, the following tests fail:
[TestMethod]
public void CoinDetailsHashCode()
{
CoinDetails a = new CoinDetails("1POUND", "1997", CoinSide.Heads);
CoinDetails b = new CoinDetails("1POUND", "1997", CoinSide.Heads);
Assert.AreEqual(a.GetHashCode(), b.GetHashCode());
}
[TestMethod]
public void CoinDetailsCompareForEquality()
{
CoinDetails a = new CoinDetails("1POUND", "1997", CoinSide.Heads);
CoinDetails b = new CoinDetails("1POUND", "1997", CoinSide.Heads);
Assert.AreEqual<CoinDetails>(a, b);
}
Would someone be able to point out where I'm going wrong? I'm sure I'm missing something rather simple, but I'm not sure what.
You class has to override Equals and GetHashCode:
public class CoinDetails
{
private string Denomination;
private string Design;
private CoinSide Side;
public override bool Equals(object obj)
{
CoinDetails c2 = obj as CoinDetails;
if (c2 == null)
return false;
return Denomination == c2.Denomination && Design == c2.Design;
}
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 23 + (Denomination ?? "").GetHashCode();
hash = hash * 23 + (Design ?? "").GetHashCode();
return hash;
}
}
}
Note that i've also improved your GetHashCode algorithm according to: What is the best algorithm for an overridden System.Object.GetHashCode?
You could also pass a custom IEqualityComparer<CoinDetail> to the dictionary:
public class CoinComparer : IEqualityComparer<CoinDetails>
{
public bool Equals(CoinDetails x, CoinDetails y)
{
if (x == null || y == null) return false;
if(object.ReferenceEquals(x, y)) return true;
return x.Denomination == y.Denomination && x.Design == y.Design;
}
public int GetHashCode(CoinDetails obj)
{
unchecked
{
int hash = 17;
hash = hash * 23 + (obj.Denomination ?? "").GetHashCode();
hash = hash * 23 + (obj.Design ?? "").GetHashCode();
return hash;
}
}
}
Now this works and does not require CoinDetails to override Equals+GetHashCode:
var dict = new Dictionary<CoinDetails, string>(new CoinComparer());
dict.Add(new CoinDetails("1POUND", "1997"), "");
dict.Add(new CoinDetails("1POUND", "1997"), ""); // FAIL!!!!

C# HashCode Builder

I used to use the apache hashcode builder a lot
Does this exist for C#
This is my homemade builder.
Usage:
hash = new HashCodeBuilder().
Add(a).
Add(b).
Add(c).
Add(d).
GetHashCode();
It does not matter what type fields a,b,c and d are, easy to extend, no need to create array.
Source:
public sealed class HashCodeBuilder
{
private int hash = 17;
public HashCodeBuilder Add(int value)
{
unchecked
{
hash = hash * 31 + value; //see Effective Java for reasoning
// can be any prime but hash * 31 can be opimised by VM to hash << 5 - hash
}
return this;
}
public HashCodeBuilder Add(object value)
{
return Add(value != null ? value.GetHashCode() : 0);
}
public HashCodeBuilder Add(float value)
{
return Add(value.GetHashCode());
}
public HashCodeBuilder Add(double value)
{
return Add(value.GetHashCode());
}
public override int GetHashCode()
{
return hash;
}
}
Sample usage:
public sealed class Point
{
private readonly int _x;
private readonly int _y;
private readonly int _hash;
public Point(int x, int y)
{
_x = x;
_y = y;
_hash = new HashCodeBuilder().
Add(_x).
Add(_y).
GetHashCode();
}
public int X
{
get { return _x; }
}
public int Y
{
get { return _y; }
}
public override bool Equals(object obj)
{
return Equals(obj as Point);
}
public bool Equals(Point other)
{
if (other == null) return false;
return (other._x == _x) && (other._y == _y);
}
public override int GetHashCode()
{
return _hash;
}
}
I use the following:
public static int ComputeHashFrom(params object[] obj) {
ulong res = 0;
for(uint i=0;i<obj.Length;i++) {
object val = obj[i];
res += val == null ? i : (ulong)val.GetHashCode() * (1 + 2 * i);
}
return (int)(uint)(res ^ (res >> 32));
}
Using such a helper is quick, easy and reliable, but it has potential two downsides (which you aren't likely to encounter frequently, but are good to be aware of):
It can generate poor hashcodes for some distributions of params. For instance, for any int x, ComputeHashFrom(x*-3, x) == 0 - so if your objects have certain pathological properties you may get many hash code collisions resulting in poorly performing Dictionaries and HashSets. It's not likely to happen, but a type-aware hash code computation can avoid such problems more easily.
The computation of the hashcode is slower than a specialized computation could be. In particular, it involved the allocation of the params array and a loop - which quite a bit of unnecessary overhead if you've just got two members to process.
Neither of the drawbacks causes any errors merely inefficiency; and both with show up in a profiler as blips in either this method or in the internals of the hash-code consumer.
C# doesn't have a built-in HashCode builder, but you can roll your own. I recently had this precise problem and created this hashcode generator that doesn't use boxing, by using generics, and implements a modified FNV algorithm for generating the specific hash. But you could use any algorithm you'd like, like one of those in System.Security.Cryptography.
public static int GetHashCode<T>(params T[] args)
{
return args.GetArrayHashCode();
}
public static int GetArrayHashCode<T>(this T[] objects)
{
int[] data = new int[objects.Length];
for (int i = 0; i < objects.Length; i++)
{
T obj = objects[i];
data[i] = obj == null ? 1 : obj.GetHashCode();
}
return GetFnvHash(data);
}
private static int GetFnvHash(int[] data)
{
unchecked
{
const int p = 16777619;
long hash = 2166136261;
for (int i = 0; i < data.Length; i++)
{
hash = (hash ^ data[i]) * p;
}
hash += hash << 13;
hash ^= hash >> 7;
hash += hash << 3;
hash ^= hash >> 17;
hash += hash << 5;
return (int)hash;
}
}
Microsoft recently released a class to compute hashcodes. Please see https://learn.microsoft.com/en-us/dotnet/api/system.hashcode. You need to include NuGet package Microsoft.Bcl.HashCode in your project to use it.
Usage example:
using System.Collections.Generic;
public class MyClass {
public int MyVar { get; }
public string AnotherVar { get; }
public object MoreVars;
public override int GetHashCode()
=> HashCode.Combine(MyVar, AnotherVar, MoreVars);
}
Nowadays I leverage ValueTuples, ref Tuples or anonymous types:
var hash = (1, "seven").GetHashCode();
var hash2 = Tuple.Create(1, "seven").GetHashCode();
var hash3 = new { Number = 1, String = "seven" }.GetHashCode();
I believe value tuples will be fastest.

Categories

Resources