By "increasingly" what I mean is that Add is fast at the beginning when there is a low number of keys. After inserting 20% of the keys, it gets very slow. After 50% it gets unbearably slow.
I get that the lower the number of keys, the faster the "key collision search" when adding new elements to the dictionary. But is there any possible way to skip this downside while keeping the Dictionary? I know beforehand that keys don't collide so no check is needed, but I don't know if there is any way to successfully use this info in the code.
BTW I am forced to use the dictionary structure because of architecture restrictions (this structure is swallowed later by a db exporter).
What my code does:
var keyList = GetKeyList();
var resultDict = new Dictionary<T,T>();
foreach (var key in keyList)
{
resultDict.Add(key,someResult);
}
Edit: since people is asking how the hash code is generated, I will try to clarify this.
Theoretically I have no control over the hash code generation, because unfortunately it uses a convention between multiple systems that are connected through the same db.
In practice, the piece of code that generates the hash code is indeed my code (disclaimer: it wasn't me choosing the convention that is used in the generation).
The key generation is way more complicated than that, but it all boils down to this:
private List<ResultKey> GetKeyList(string prefix, List<float> xCoordList, List<float> yCoordList)
{
var keyList = new List<ResultKey>();
var constantSensorName = "xxx";
foreach (float xCoord in xCoordList)
{
foreach (float yCoord in yCoordList)
{
string stationName = string.Format("{0}_E{1}N{2}", prefix, xCoord, yCoord);
keyList.Add(new ResultKey(constantSensorName, stationName));
}
}
return keyList;
}
public struct ResultKey
{
public string SensorName { get; set; }
public string StationName { get; set; }
public ResultKey(string sensorName, string stationName)
{
this.SensorName = sensorName;
this.StationName = stationName;
}
}
The first thing that comes to mind is to create your own hashing function. The Add method for the dictionary is going to call the default implementation of the getHashCode() method when it goes to add it to the structure. If you put a wrapper class around your keys and overwrote the getHashCode() method, then you could write your own hashing function which, presumably, could implement a less collision prone hash function.
You are using the default hash code generation for your struct ResultKey. The default hash code generation for structs is disappointingly bad. You can't rely on that here because your struct contains two strings which trigger a bad case (see the linked answer). Essentially, only your SensorName field makes it into the hash code, nothing else. That causes all keys with the same SensorName to collide.
Write your own function. I quickly generated one using Resharper:
public struct ResultKey : IEquatable<ResultKey>
{
public string SensorName { get; set; }
public string StationName { get; set; }
public ResultKey(string sensorName, string stationName)
{
this.SensorName = sensorName;
this.StationName = stationName;
}
public bool Equals(ResultKey other)
{
return string.Equals(SensorName, other.SensorName) && string.Equals(StationName, other.StationName);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
return obj is ResultKey && Equals((ResultKey)obj);
}
public override int GetHashCode()
{
unchecked
{
return ((SensorName != null ? SensorName.GetHashCode() : 0)*397) ^ (StationName != null ? StationName.GetHashCode() : 0);
}
}
public static bool operator ==(ResultKey left, ResultKey right)
{
return left.Equals(right);
}
public static bool operator !=(ResultKey left, ResultKey right)
{
return !left.Equals(right);
}
}
Your ResultKey contains two strings, so you need a hashcode that combine them.
"How do I calculate a good hash code for a list of strings?" contains some answer showing how to do this.
However you get do a lot worse then
public override int GetHashCode()
{
return (SensorName + StationName).GetHashCode();
}
If you just want to fulfill API requirements and need a dirty solution, you could implement your own Dictionary.
public class FakeFastDictionary<TKey, TValue> : Dictionary<TKey, TValue>
{
protected IList<KeyValuePair<TKey, TValue>> _list
= new List<KeyValuePair<TKey, TValue>>();
public new void Add(TKey key, TValue value)
{
_list.Add(new KeyValuePair<TKey, TValue>(key, value));
}
public new ICollection<TValue> Values
{
get
{
// there may be faster ways to to it:
return _list.Select(x => x.Value).ToArray();
}
}
public new ICollection<TKey> Keys
{
get
{
// there may be faster ways to to it:
return _list.Select(x => x.Key).ToArray();
}
}
}
This is a running sample:
https://dotnetfiddle.net/BDyks0
Related
I am checking if a total group of edges already contains the connection between 2 points.
I want to use HashSet's that will contain 2 vectors as Dictionary keys. Then I want to be able to call a performant Dictionary.ContainsKey(hashSet). I want the contains/equality check to be dependent on the Vectors in the Set.
Fex. If I add HashSet [V000 V001] to the Dict. I want to get Dictionary.ContainsKey(HashSet [V001 V000]) return true. (HashSet, so the order can vary, just the same Elements)
The Problem seems to be, that the Dictionary.ContainsKey() method does see separately created HashSets as different objects, even though, they contain the same elements.
Dictionary<HashSet<Vector3>, Vector3> d = new Dictionary<HashSet<Vector3>, Vector3>();
HashSet<Vector3> s = new HashSet<Vector3>();
s.Add(Vector3.one);
s.Add(Vector3.zero);
d.Add(s);
HashSet<Vector3> s2 = new HashSet<Vector3>();
s2.Add(Vector3.zero);
s2.Add(Vector3.one);
bool doesContain = d.ContainsKey(s2); // should be true
You also may suggest a better way of doing this 'Contains()' check efficiently.
The HashSet type doesn't do the equality comparison you want out of the box. It only has reference equality.
To get what you want, you'll need a new type to use as the Dictionary key. The new type will have a HashSet property, and overload Equals() and GetHashCode(), and may as well implement IEquatable at this point as well.
I'll get you started:
public class HashKey<T> : IEquatable<HashKey<T>>
{
private HashSet<T> _items;
public HashSet<T> Items
{
get {return _items;}
private set {_items = value;}
}
public HashKey()
{
_items = new HashSet<T>();
}
public HashKey(HashSet<T> initialSet)
{
_items = initialSet ?? new HashSet();
}
public override int GetHashCode()
{
// I'm leaving this for you to do
}
public override bool Equals(Object obj)
{
if (! (obj is HashKey)) return false;
return this.GetHashCode().Equals(obj.GetHashCode());
}
public bool Equals(HashSet<T> obj)
{
if (obj is null) return false;
return this.GetHashCode().Equals(obj.GetHashCode());
}
}
You want to use a hashset as key.
So the keys are references where one key is one hashset reference.
The ContainsKey compare references.
For what you want to do, you can create a class that implements IEqualityComparer to pass it to the dictionary constructor.
https://learn.microsoft.com/dotnet/api/system.collections.generic.iequalitycomparer-1
If you want a full management, you should create a new class embedding the dictionary and implement your own public operations wrapping that of the dictionary : ContainsKey and all others methods you need.
public class MyDictionary : IEnumerable<>
{
private Dictionary<HashSet<Vector3>, Vector3> d
= new Dictionary<HashSet<Vector3>, Vector3>();
public int Count { get; }
public this...
public ContainsKey()
{
// implements your own comparison algorithm
}
public Add();
public Remove();
...
}
So you will have a strongly typed dictionary for your intended usage.
I am having class below
class Group
{
public Collection<int> UserIds { get; set; }
public int CreateByUserId { get; set; }
public int HashKey { get; set; }
}
I want to generate some unique hashkey based on UsersIds[] and CreateByUserId and store it to mongo and search on it.
Conditions:
each time the hashkey should me same for same UsersIds[] and CreateByUserId
hashkey should be different when number of users increases in UsersIds[]
In a soultion for this I am overriding GetHashCode() function:
public override int GetHashCode()
{
unchecked
{
var hash = (int)2166136261;
const int fnvPrime = 16777619;
List<int> users = new List<int>() { CreateByUserId };
UserIds.ToList().ForEach(x => users.Add(x));
users.Sort();
users.ForEach(x => hash = (hash * fnvPrime) ^ x.GetHashCode());
return hash;
}
}
Is it a better solution or suggest some better solution.
So if the intention is to save the hash value in the database dont override GetHashCode on the object, that is for use with HashTables (Dictionary, HashSet..) in conjunction with Equals and not unique enough for your purpose. Instead use an established hash function such as SHA1 for example.
public string Hash(IEnumerable<int> values)
{
using (var hasher = new SHA1Managed())
{
var hash = hasher.ComputeHash(Encoding.UTF8.GetBytes(string.Join("-", values)));
return BitConverter.ToString(hash).Replace("-", "");
}
}
Usage:
var hashKey = Hash(UsersIds.Concat(new[]{ CreateByUserId });
Sort UsersIds if so desired.
A HashKey is a value calculated to check if a call of Equals() may yield a result that's true.
The hashkey is used to make a fast desicion if the element may be the right one or if it's for sure the false one.
First thing is, replace the wording HashKey with Unique Id.
If you want a unique Id, I'd recommend using the database with a Id column if you store it there anyway and then fetch the Id with the other data. + In mongo DB, each entry also already has a own Id:
See here
Each object in mongo already has an id, and they are sortable in
insertion order. What is wrong with getting collection of user
objects, iterating over it and use this as incremented ID?[...]
That way: Use the DB for the unique ID and calculate your HashKey (if you need it anymore) with simple cheap math like adding up the user Ids.
To make it programatically:
If you want to check it programatically and we ignore Ids from the DB, you need to implement the GetHashKey()-Function and the Equals()-Function of the given objects.
class Group
{
public Collection<int> UserIds { get; set; }
public int CreateByUserId { get; set; }
public override bool Equals(object obj)
{
Group objectToCompare = (Group)obj;
if (this.UserIds.Count != objectToCompare.UserIds.Count)
return false;
if (this.CreateByUserId != objectToCompare.CreateByUserId)
return false;
foreach (int ownUserId in this.UserIds)
if (!objectToCompare.UserIds.Contains(ownUserId))
return false;
//some elements might be double, i.e. 1: 1,2,2 vs 2: 1,2,3 => not equal. cross check to avoid this error
foreach (int foreignUserId in objectToCompare.UserIds)
if (!this.UserIds.Contains(foreignUserId))
return false;
return true;
}
public override int GetHashCode()
{
int sum = CreateByUserId;
foreach (int userId in UserIds)
sum += userId;
return sum;
}
}
Usage:
Group group1 = new Group() { UserIds = ..., CreateByUserId = ...};
Group group2 = new Group() { UserIds = ..., CreateByUserId = ...};
group1.Equals(group2);
Here is the answer to "Why do we need the GetHashCode-Function when we use Equals?"
Note: This is for sure not the most performant solution for the Equals()-Method here. Adjust as needed.
In general, without some extra information about the data, you cannot create unique integer from a whole bunch of other integers. You cannot create unique int key even from a single long value if there are no constraints on its range of allowed values.
The GetHashCode function does not guarantee that you get unique integer hash key for every possible Group object. However, the good hash function tries to minimize collisions - cases when the same hashcode is generated for different objects. There are good examples of hash functions in this SO answer:
What is the best algorithm for an overridden System.Object.GetHashCode?
Usually you need GetHashCode to store object as a key in dictionaries and hashsets. Like the previous answer said you need to override Equals method for that case because hashtables like dictionaries and hashsets resolve the collision by storing items with the same hashcode in lists called buckets. They use Equals method to identify the item in the bucket. It is recommended practice to override Equals when you are overriding GetHashCode just as a precaution.
It was not specified what type of equality should you expect from 'Group' objects. Imagine two objects with the same CreateByUserID and the following UserIds: {1, 2} and {2, 1}. Are they Equal? Or the order matters?
It's not a good idea to allow changes for Group fields from any place. I would implemented it with read only fields like this:
class Group : IEquatable<Group>
{
private readonly Collection<int> userIds;
public ReadOnlyCollection<int> UserIds { get; }
public int CreateByUserId { get; }
public int HashKey { get; }
public Group(int createByUserId, IList<int> createdByUserIDs)
{
CreateByUserId = createByUserId;
userIds = createdByUserIDs != null
? new Collection<int>(createdByUserIDs)
: new Collection<int>();
UserIds = new ReadOnlyCollection<int>(userIds);
HashKey = GetHashCode();
}
public void AddUserID(int userID)
{
userIds.Add(userID);
HashKey = GetHashCode();
}
//IEquatable<T> implementation is generally a good practice in such cases, especially for value types
public override bool Equals(object obj) => Equals(obj as Group);
public bool Equals(Group objectToCompare)
{
if (objectToCompare == null)
return false;
if (ReferenceEquals(this, objectToCompare))
return true;
if (UserIds.Count != objectToCompare.UserIds.Count || CreateByUserId != objectToCompare.CreateByUserId)
return false;
//If you need equality when order matters - use this
//return UserIds.SequenceEqual(objectToCompare.UserIds);
//This is for set equality. If this is your case and you don't allow duplicates then I would suggest to use HashSet<int> or ISet<int> instead of Collection<int>
//and use their methods for more concise and effective comparison
return UserIds.All(id => objectToCompare.UserIds.Contains(id)) && objectToCompare.UserIds.All(id => UserIds.Contains(id));
}
public override int GetHashCode()
{
unchecked // to suppress overflow exceptions
{
int hash = 17;
hash = hash * 23 + CreateByUserId.GetHashCode();
foreach (int userId in UserIds)
hash = hash * 23 + userId.GetHashCode();
return hash;
}
}
}
TL;DR I'm looking for a way to obtain IEqualityComparer<T> from IComparer<T>, no matter which datatype is T, including case-insensitive options if T is string. Or I need a different solution for this problem.
Here's full story: I'm implementing simple, generic cache with LFU policy. Requirement is that it must be possible to select whether the cache will be case sensitive or case insensitive -- if string happens to be the datatype for cache keys (which is not necessary). In the solution I primarily develop the cache for, I expect hundreds of billions of cache lookups, and cache sizes of max 100.000 entries. Because of that numbers I immediately resigned from using any string manipulation that causes allocations (such as .ToLower().GetHashCode() etc.) and instead opted to use IComparer and IEqualityComparer, as they are standard BCL features. User of this cache can pass the comparers to constructor. Here are relevant fragments of the code:
public class LFUCache<TKey,TValue>
{
private readonly Dictionary<TKey,CacheItem> entries;
private readonly SortedSet<CacheItem> lfuList;
private class CacheItem
{
public TKey Key;
public TValue Value;
public int UseCount;
}
private class CacheItemComparer : IComparer<CacheItem>
{
private readonly IComparer<TKey> cacheKeyComparer;
public CacheItemComparer(IComparer<TKey> cacheKeyComparer)
{
this.cacheKeyComparer = cacheKeyComparer;
if (cacheKeyComparer == null)
this.cacheKeyComparer = Comparer<TKey>.Default;
}
public int Compare(CacheItem x, CacheItem y)
{
int UseCount = x.UseCount - y.UseCount;
if (UseCount != 0) return UseCount;
return cacheKeyComparer.Compare(x.Key, y.Key);
}
}
public LFUCache(int capacity, IEqualityComparer<TKey> keyEqualityComparer,
IComparer<TKey> keyComparer) // <- here's my problem
{
// ...
entries = new Dictionary<TKey, CacheItem>(keyEqualityComparer);
lfuList = new SortedSet<CacheItem>(new CacheItemComparer(keyComparer));
}
// ...
}
The keyEqualityComparer is used to manage cache entries (so e.g. the key "ABC" and "abc" are equal if user wants to). The keyComparer is used to manage cache entries sorted by UseCount so that it's easy to select the least frequently used one (implemented in CacheItemComparer class).
Example correct usage with custom comparison:
var cache = new LFUCache<string, int>(10000,
StringComparer.InvariantCultureIgnoreCase,
StringComparer.InvariantCultureIgnoreCase);
(That looks stupid, but StringComparer implements both IComparer<string> and IEqualityComparer<string>.) The problem is that if user gives incompatible comparers (i.e. case insensitive keyEqualityComparer and case sensitive keyComparer), then the most likely outcome is invalid LFU statistics, and thus lower cache hits at best. The other scenario is also less than desired. Also if the key is more sophisticated (I'll have something resembling Tuple<string,DateTime,DateTime>), it's possible to mess it up more severely.
That's why I'd like to only have a single comparer argument in constructor, but that doesn't seem to work. I'm able to create IEqualityComparer<T>.Equals() with help of IComparer<T>.Compare(), but I'm stuck at IEqualityComparer<T>.GetHashCode() -- which is very important, as you know. If I had got access to private properties of the comparer to check if it's case sensitive or not, I would have used CompareInfo to get hash code.
I like this approach with 2 different data structures, because it gives me acceptable performance and controllable memory consumption -- on my laptop around 500.000 cache additions/sec with cache size 10.000 elements. Dictionary<TKey,TValue> is just used to find data in O(1), and SortedSet<CacheItem> inserts data in O(log n), find element to remove by calling lfuList.Min in O(log n), and find the entry to increment use count also in O(log n).
Any suggestions on how to solve this are welcome. I'll appreciate any ideas, including different designs.
It's not possible to implement an IComparer from an IEqualityComparer as you have no way of knowing whether an unequal item is greater than or less than the other item.
It's not possible to implement an IEqualityComparer from an IComparer as there's no way for you to generate a hash code that is in line with the IComparer's identity.
That said, there's no need for you to have both types of comparers in your case. When computing LRU you're comparing the time since an item was used as the primary comparer and then comparing based on a passed in comparer as a tiebreaker. Just remove that last part; don't have a tiebreaker. Let it be undefined which item leaves the cache when there is a tie for the least recently used. When you do that you only need to accept an IEqualityComparer, not an IComparer.
As I alluded to in my comment, you could add a helper method that might make things a little simpler for a basic use case:
public class LFUCache<TKey,TValue>
{
public static LFUCache<TKey, TValue> Create<TComp>(int capacity, TComp comparer) where TComp : IEqualityComparer<TKey>, IComparer<TKey>
{
return new LFUCache<TKey, TValue>(capacity, comparer, comparer);
}
}
and you'd use it like this:
var cache = LFUCache<string, int>.Create(10000, StringComparer.InvariantCultureIgnoreCase);
Okay next try. Here is an implementation for Add and Touch for LFU:
public class LfuCache<TKey, TValue>
{
private readonly Dictionary<TKey, LfuItem> _items;
private readonly int _limit;
private LfuItem _first, _last;
public LfuCache(int limit, IEqualityComparer<TKey> keyComparer = null)
{
this._limit = limit;
this._items = new Dictionary<TKey,LfuItem>(keyComparer);
}
public void Add(TKey key, TValue value)
{
if (this._items.Count == this._limit)
{
this.RemoveLast();
}
var lfuItem = new LfuItem { Key = key, Value = value, Prev = this._last };
this._items.Add(key, lfuItem);
if (this._last != null)
{
this._last.Next = lfuItem;
lfuItem.Prev = this._last;
}
this._last = lfuItem;
if (this._first == null)
{
this._first = lfuItem;
}
}
public TValue this[TKey key]
{
get
{
var lfuItem = this._items[key];
++lfuItem.UseCount;
this.TryMoveUp(lfuItem);
return lfuItem.Value;
}
}
private void TryMoveUp(LfuItem lfuItem)
{
if (lfuItem.Prev == null || lfuItem.Prev.UseCount >= lfuItem.UseCount) // maybe > if you want LRU and LFU
{
return;
}
var prev = lfuItem.Prev;
prev.Next = lfuItem.Next;
lfuItem.Prev = prev.Prev;
prev.Prev = lfuItem;
if (lfuItem.Prev == null)
{
this._first = lfuItem;
}
}
private void RemoveLast()
{
if (this._items.Remove(this._last.Key))
{
this._last = this._last.Prev;
if (this._last != null)
{
this._last.Next = null;
}
}
}
private class LfuItem
{
public TKey Key { get; set; }
public TValue Value { get; set; }
public long UseCount { get; set; }
public LfuItem Prev { get; set; }
public LfuItem Next { get; set; }
}
}
In my opinion it looks like that Add and Touch is in O(1), isn't it?
Currently I don't see any use case for _first but maybe anyone else need it. To remove an item _last should be enough.
EDIT
A single linked list will also do if you don't need a MoveDown operation.
EDIT No a single linked list will not work because MoveUp need the Next pointer to change it's Prev pointer.
Instead of taking an IEqualityComparer and an IComparer in your constructor, you could try taking an IComparer and a lambda which defines GetHashCode(). Then build an IEqualityComparer based on if(IComparer==0) and GetHashCode() = lambda.
Although I would say it is small, you still have the risk of getting HashCode mismatches when IComparer returns 0. If you want to make it super clear to the user of your code, you could always extend the strategy by taking two lambdas in the constructor: Func<T,T,int> used for both IComparer and IEqualityComparer, and Func<T,int> for GetHashCode.
By searching though msdn c# documentation and stack overflow, I get the clear impression that Dictionary<T,T> is supposed to use GetHashCode() for checking key-uniqueness and to do look-up.
The Dictionary generic class provides a mapping from a set of keys to a set of values. Each addition to the dictionary consists of a value and its associated key. Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table.
...
The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey.
I Use mono (in Unity3D), and after getting some weird results in my work, I conducted this experiment:
public class DictionaryTest
{
public static void TestKeyUniqueness()
{
//Test a dictionary of type1
Dictionary<KeyType1, string> dictionaryType1 = new Dictionary<KeyType1, string>();
dictionaryType1[new KeyType1(1)] = "Val1";
if(dictionaryType1.ContainsKey(new KeyType1(1)))
{
Debug.Log ("Key in dicType1 was already present"); //This line does NOT print
}
//Test a dictionary of type1
Dictionary<KeyType2, string> dictionaryType2 = new Dictionary<KeyType2, string>();
dictionaryType2[new KeyType2(1)] = "Val1";
if(dictionaryType2.ContainsKey(new KeyType2(1)))
{
Debug.Log ("Key in dicType2 was already present"); // Only this line prints
}
}
}
//This type implements only GetHashCode()
public class KeyType1
{
private int var1;
public KeyType1(int v1)
{
var1 = v1;
}
public override int GetHashCode ()
{
return var1;
}
}
//This type implements both GetHashCode() and Equals(obj), where Equals uses the hashcode.
public class KeyType2
{
private int var1;
public KeyType2(int v1)
{
var1 = v1;
}
public override int GetHashCode ()
{
return var1;
}
public override bool Equals (object obj)
{
return GetHashCode() == obj.GetHashCode();
}
}
Only the when using type KeyType2 are the keys considered equal. To me this demonstrates that Dictionary uses Equals(obj) - and not GetHashCode().
Can someone reproduce this, and help me interpret the meaning is? Is it an incorrect implementation in mono? Or have I misunderstood something.
i get the clear impression that Dictionary is supposed to use
.GetHashCode() for checking key-uniqueness
What made you think that? GetHashCode doesn't return unique values.
And MSDN clearly says:
Dictionary requires an equality implementation to
determine whether keys are equal. You can specify an implementation of
the IEqualityComparer generic interface by using a constructor that
accepts a comparer parameter; if you do not specify an implementation,
the default generic equality comparer EqualityComparer.Default is
used. If type TKey implements the System.IEquatable generic
interface, the default equality comparer uses that implementation.
Doing this:
public override bool Equals (object obj)
{
return GetHashCode() == obj.GetHashCode();
}
is wrong in the general case because you might end up with KeyType2 instances that are equal to StringBuilder, SomeOtherClass, AnythingYouCanImagine and what not instances.
You should totally do it like so:
public override bool Equals (object obj)
{
if (obj is KeyType2) {
return (obj as KeyType2).var1 == this.var1;
} else
return false;
}
When you are trying to override Equals and inherently GetHashCode you must ensure the following points (given the class MyObject) in this order (you were doing it the other way around):
1) When are 2 instances of MyObject equal ? Say you have:
public class MyObject {
public string Name { get; set; }
public string Address { get; set; }
public int Age { get; set; }
public DateTime TimeWhenIBroughtThisInstanceFromTheDatabase { get; set; }
}
And you have 1 record in some database that you need to be mapped to an instance of this class.
And you make the convention that the time you read the record from the database will be stored
in the TimeWhenIBroughtThisInstanceFromTheDatabase:
MyObject obj1 = DbHelper.ReadFromDatabase( ...some params...);
// you do that at 14:05 and thusly the TimeWhenIBroughtThisInstanceFromTheDatabase
// will be assigned accordingly
// later.. at 14:07 you read the same record into a different instance of MyClass
MyObject obj2 = DbHelper.ReadFromDatabase( ...some params...);
// (the same)
// At 14:09 you ask yourself if the 2 instances are the same
bool theyAre = obj1.Equals(obj2)
Do you want the result to be true ? I would say you do.
Therefore the overriding of Equals should like so:
public class MyObject {
...
public override bool Equals(object obj) {
if (obj is MyObject) {
var that = obj as MyObject;
return (this.Name == that.Name) &&
(this.Address == that.Address) &&
(this.Age == that.Age);
// without the syntactically possible but logically challenged:
// && (this.TimeWhenIBroughtThisInstanceFromTheDatabase ==
// that.TimeWhenIBroughtThisInstanceFromTheDatabase)
} else
return false;
}
...
}
2) ENSURE THAT whenever 2 instances are equal (as indicated by the Equals method you implement)
their GetHashCode results will be identitcal.
int hash1 = obj1.GetHashCode();
int hash2 = obj2.GetHashCode();
bool theseMustBeAlso = hash1 == hash2;
The easiest way to do that is (in the sample scenario):
public class MyObject {
...
public override int GetHashCode() {
int result;
result = ((this.Name != null) ? this.Name.GetHashCode() : 0) ^
((this.Address != null) ? this.Address.GetHashCode() : 0) ^
this.Age.GetHashCode();
// without the syntactically possible but logically challenged:
// ^ this.TimeWhenIBroughtThisInstanceFromTheDatabase.GetHashCode()
}
...
}
Note that:
- Strings can be null and that .GetHashCode() might fail with NullReferenceException.
- I used ^ (XOR). You can use whatever you want as long as the golden rule (number 2) is respected.
- x ^ 0 == x (for whatever x)
This question comes out of the discussion on tuples.
I started thinking about the hash code that a tuple should have.
What if we will accept KeyValuePair class as a tuple? It doesn't override the GetHashCode() method, so probably it won't be aware of the hash codes of it's "children"... So, run-time will call Object.GetHashCode(), which is not aware of the real object structure.
Then we can make two instances of some reference type, which are actually Equal, because of the overloaded GetHashCode() and Equals(). And use them as "children" in tuples to "cheat" the dictionary.
But it doesn't work! Run-time somehow figures out the structure of our tuple and calls the overloaded GetHashCode of our class!
How does it work? What's the analysis made by Object.GetHashCode()?
Can it affect the performance in some bad scenario, when we use some complicated keys? (probably, impossible scenario... but still)
Consider this code as an example:
namespace csharp_tricks
{
class Program
{
class MyClass
{
int keyValue;
int someInfo;
public MyClass(int key, int info)
{
keyValue = key;
someInfo = info;
}
public override bool Equals(object obj)
{
MyClass other = obj as MyClass;
if (other == null) return false;
return keyValue.Equals(other.keyValue);
}
public override int GetHashCode()
{
return keyValue.GetHashCode();
}
}
static void Main(string[] args)
{
Dictionary<object, object> dict = new Dictionary<object, object>();
dict.Add(new KeyValuePair<MyClass,object>(new MyClass(1, 1), 1), 1);
//here we get the exception -- an item with the same key was already added
//but how did it figure out the hash code?
dict.Add(new KeyValuePair<MyClass,object>(new MyClass(1, 2), 1), 1);
return;
}
}
}
Update I think I've found an explanation for this as stated below in my answer. The main outcomes of it are:
Be careful with your keys and their hash codes :-)
For complicated dictionary keys you must override Equals() and GetHashCode() correctly.
Don't override GetHashcode() and Equals() on mutable classes, only override it on immutable classes or structures, else if you modify a object used as key the hash table won't function properly anymore (you won't be able to retrieve the value associated to the key after the key object was modified)
Also hash tables don't use hashcodes to identify objects they use the key objects themselfes as identifiers, it's not required that all keys that are used to add entries in a hash table return different hashcodes, but it is recommended that they do, else performance suffers greatly.
Here are the proper Hash and equality implementations for the Quad tuple (contains 4 tuple components inside). This code ensures proper usage of this specific tuple in HashSets and the dictionaries.
More on the subject (including the source code) here.
Note usage of the unchecked keyword (to avoid overflows) and throwing NullReferenceException if obj is null (as required by the base method)
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj))
throw new NullReferenceException("obj is null");
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != typeof (Quad<T1, T2, T3, T4>)) return false;
return Equals((Quad<T1, T2, T3, T4>) obj);
}
public bool Equals(Quad<T1, T2, T3, T4> obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
return Equals(obj.Item1, Item1)
&& Equals(obj.Item2, Item2)
&& Equals(obj.Item3, Item3)
&& Equals(obj.Item4, Item4);
}
public override int GetHashCode()
{
unchecked
{
int result = Item1.GetHashCode();
result = (result*397) ^ Item2.GetHashCode();
result = (result*397) ^ Item3.GetHashCode();
result = (result*397) ^ Item4.GetHashCode();
return result;
}
}
public static bool operator ==(Quad<T1, T2, T3, T4> left, Quad<T1, T2, T3, T4> right)
{
return Equals(left, right);
}
public static bool operator !=(Quad<T1, T2, T3, T4> left, Quad<T1, T2, T3, T4> right)
{
return !Equals(left, right);
}
Check out this post by Brad Abrams and also the comment by Brian Grunkemeyer for some more information on how object.GetHashCode works. Also, take a look at the first comment on Ayande's blog post. I don't know if the current releases of the Framework still follow these rules or if they have actually changed it like Brad implied.
It seems that I have a clue now.
I thought KeyValuePair is a reference type, but it is not, it is a struct. And so it uses ValueType.GetHashCode() method. MSDN for it says: "One or more fields of the derived type is used to calculate the return value".
If you will take a real reference type as a "tuple-provider" you'll cheat the dictionary (or yourself...).
using System.Collections.Generic;
namespace csharp_tricks
{
class Program
{
class MyClass
{
int keyValue;
int someInfo;
public MyClass(int key, int info)
{
keyValue = key;
someInfo = info;
}
public override bool Equals(object obj)
{
MyClass other = obj as MyClass;
if (other == null) return false;
return keyValue.Equals(other.keyValue);
}
public override int GetHashCode()
{
return keyValue.GetHashCode();
}
}
class Pair<T, R>
{
public T First { get; set; }
public R Second { get; set; }
}
static void Main(string[] args)
{
var dict = new Dictionary<Pair<int, MyClass>, object>();
dict.Add(new Pair<int, MyClass>() { First = 1, Second = new MyClass(1, 2) }, 1);
//this is a pair of the same values as previous! but... no exception this time...
dict.Add(new Pair<int, MyClass>() { First = 1, Second = new MyClass(1, 3) }, 1);
return;
}
}
}
I don't have the book reference anymore, and I'll have to find it just to confirm, but I thought the default base hash just hashed together all of the members of your object. It got access to them because of the way the CLR worked, so it wasn't something that you could write as well as they had.
That is completely from memory of something I briefly read so take it for what you will.
Edit: The book was Inside C# from MS Press. The one with the Saw blade on the cover. The author spent a good deal of time explaining how things were implemented in the CLR, how the language translated down to MSIL, ect. ect. If you can find the book it's not a bad read.
Edit: Form the link provided it looks like
Object.GetHashCode() uses an
internal field in the System.Object class to generate the hash value. Each
object created is assigned a unique object key, stored as an integer,when it
is created. These keys start at 1 and increment every time a new object of
any type gets created.
Hmm I guess I need to write a few of my own hash codes, if I expect to use objects as hash keys.
so probably it won't be aware of the hash codes of it's "children".
Your example seems to prove otherwise :-) The hash code for the key MyClass and the value 1 is the same for both KeyValuePair's . The KeyValuePair implementation must be using both its Key and Value for its own hash code
Moving up, the dictionary class wants unique keys. It is using the hashcode provided by each key to figure things out. Remember that the runtime isn't calling Object.GetHashCode(), but it is calling the GetHashCode() implementation provided by the instance you give it.
Consider a more complex case:
public class HappyClass
{
enum TheUnit
{
Points,
Picas,
Inches
}
class MyDistanceClass
{
int distance;
TheUnit units;
public MyDistanceClass(int theDistance, TheUnit unit)
{
distance = theDistance;
units = unit;
}
public static int ConvertDistance(int oldDistance, TheUnit oldUnit, TheUnit newUnit)
{
// insert real unit conversion code here :-)
return oldDistance * 100;
}
/// <summary>
/// Figure out if we are equal distance, converting into the same units of measurement if we have to
/// </summary>
/// <param name="obj">the other guy</param>
/// <returns>true if we are the same distance</returns>
public override bool Equals(object obj)
{
MyDistanceClass other = obj as MyDistanceClass;
if (other == null) return false;
if (other.units != this.units)
{
int newDistance = MyDistanceClass.ConvertDistance(other.distance, other.units, this.units);
return distance.Equals(newDistance);
}
else
{
return distance.Equals(other.distance);
}
}
public override int GetHashCode()
{
// even if the distance is equal in spite of the different units, the objects are not
return distance.GetHashCode() * units.GetHashCode();
}
}
static void Main(string[] args)
{
// these are the same distance... 72 points = 1 inch
MyDistanceClass distPoint = new MyDistanceClass(72, TheUnit.Points);
MyDistanceClass distInch = new MyDistanceClass(1, TheUnit.Inch);
Debug.Assert(distPoint.Equals(distInch), "these should be true!");
Debug.Assert(distPoint.GetHashCode() != distInch.GetHashCode(), "But yet they are fundimentally different values");
Dictionary<object, object> dict = new Dictionary<object, object>();
dict.Add(new KeyValuePair<MyDistanceClass, object>(distPoint, 1), 1);
//this should not barf
dict.Add(new KeyValuePair<MyDistanceClass, object>(distInch, 1), 1);
return;
}
}
Basically... in the case of my example, you'd want two objects that are the same distance to return "true" for Equals, but yet return different hash codes.