Use HashSet as Dictionary Key - Compare all elements - c#

I am checking if a total group of edges already contains the connection between 2 points.
I want to use HashSet's that will contain 2 vectors as Dictionary keys. Then I want to be able to call a performant Dictionary.ContainsKey(hashSet). I want the contains/equality check to be dependent on the Vectors in the Set.
Fex. If I add HashSet [V000 V001] to the Dict. I want to get Dictionary.ContainsKey(HashSet [V001 V000]) return true. (HashSet, so the order can vary, just the same Elements)
The Problem seems to be, that the Dictionary.ContainsKey() method does see separately created HashSets as different objects, even though, they contain the same elements.
Dictionary<HashSet<Vector3>, Vector3> d = new Dictionary<HashSet<Vector3>, Vector3>();
HashSet<Vector3> s = new HashSet<Vector3>();
s.Add(Vector3.one);
s.Add(Vector3.zero);
d.Add(s);
HashSet<Vector3> s2 = new HashSet<Vector3>();
s2.Add(Vector3.zero);
s2.Add(Vector3.one);
bool doesContain = d.ContainsKey(s2); // should be true
You also may suggest a better way of doing this 'Contains()' check efficiently.

The HashSet type doesn't do the equality comparison you want out of the box. It only has reference equality.
To get what you want, you'll need a new type to use as the Dictionary key. The new type will have a HashSet property, and overload Equals() and GetHashCode(), and may as well implement IEquatable at this point as well.
I'll get you started:
public class HashKey<T> : IEquatable<HashKey<T>>
{
private HashSet<T> _items;
public HashSet<T> Items
{
get {return _items;}
private set {_items = value;}
}
public HashKey()
{
_items = new HashSet<T>();
}
public HashKey(HashSet<T> initialSet)
{
_items = initialSet ?? new HashSet();
}
public override int GetHashCode()
{
// I'm leaving this for you to do
}
public override bool Equals(Object obj)
{
if (! (obj is HashKey)) return false;
return this.GetHashCode().Equals(obj.GetHashCode());
}
public bool Equals(HashSet<T> obj)
{
if (obj is null) return false;
return this.GetHashCode().Equals(obj.GetHashCode());
}
}

You want to use a hashset as key.
So the keys are references where one key is one hashset reference.
The ContainsKey compare references.
For what you want to do, you can create a class that implements IEqualityComparer to pass it to the dictionary constructor.
https://learn.microsoft.com/dotnet/api/system.collections.generic.iequalitycomparer-1
If you want a full management, you should create a new class embedding the dictionary and implement your own public operations wrapping that of the dictionary : ContainsKey and all others methods you need.
public class MyDictionary : IEnumerable<>
{
private Dictionary<HashSet<Vector3>, Vector3> d
= new Dictionary<HashSet<Vector3>, Vector3>();
public int Count { get; }
public this...
public ContainsKey()
{
// implements your own comparison algorithm
}
public Add();
public Remove();
...
}
So you will have a strongly typed dictionary for your intended usage.

Related

C# Dictionary with a class as key

My question is basically the opposite of Dictionary.ContainsKey return False, but a want True and of "the given key was not present in the dictionary" error when using a self-defined class as key:
I want to use a medium-sized class as the dictionary's key, and the dictionary must compare the keys by reference, not by value equality. The problem is, that the class already implements Equals() (which is performing value equality - which is what not what I want here).
Here's a small test class for reproduction:
class CTest
{
public int m_iValue;
public CTest (int i_iValue)
{
m_iValue = i_iValue;
}
public override bool Equals (object i_value)
{
if (ReferenceEquals (null, i_value))
return false;
if (ReferenceEquals (this, i_value))
return true;
if (i_value.GetType () != GetType ())
return false;
return m_iValue == ((CTest)i_value).m_iValue;
}
}
I have NOT yet implemented GetHashCode() (actually I have, but it only returns base.GetHashCode() so far).
Now I created a test program with a dictionary that uses instances of this class as keys. I can add multiple identical instances to the dictionary without problems, but this only works because GetHashCode() returns different values:
private static void Main ()
{
var oTest1 = new CTest (1);
var oTest2 = new CTest (1);
bool bEquals = Equals (oTest1, oTest2); // true
var dict = new Dictionary<CTest, int> ();
dict.Add (oTest1, 1);
dict.Add (oTest2, 2); // works
var iValue1 = dict[oTest1]; // correctly returns 1
var iValue2 = dict[oTest2]; // correctly returns 2
int iH1 = oTest1.GetHashCode (); // values different on each execution
int iH2 = oTest2.GetHashCode (); // values different on each execution, but never equals iH1
}
And the hash values are different every time, maybe because the calculatation in object.GetHashCode() uses some randomization or some numbers that come from the reference handle (which is different for each object).
However, this answer on Why is it important to override GetHashCode when Equals method is overridden? says that GetHashCode() must return the same values for equal objects, so I added
public override int GetHashCode ()
{
return m_iValue;
}
After that, I could not add multiple equal objects to the dictionary any more.
Now, there are two conclusions:
If I removed my own GetHashCode() again, the hash values will be different again and the dictionary can be used. But there may be situations that accidentally give the same hash code for two equal objects, which will cause an exception at runtime, whose cause will for sure never be found. Because of that (little, but not zero) risk, I cannot use a dictionary.
If I correctly implement GetHashCode() like I am supposed to do, I cannot use a dictionary anyway.
What possibilities exist to still use a dictionary?
Like many times before, I had the idea for a solution when writing this question.
You can specify an IEqualityComparer<TKey> in the constructor of the dictionary. There is one in the .net framework, but it's internal sealed, so you need to implement your own:
Is there any kind of "ReferenceComparer" in .NET?
internal class ReferenceComparer<T> : IEqualityComparer<T> where T : class
{
static ReferenceComparer ()
{
Instance = new ReferenceComparer<T> ();
}
public static ReferenceComparer<T> Instance { get; }
public bool Equals (T x, T y)
{
return ReferenceEquals (x, y);
}
public int GetHashCode (T obj)
{
return System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode (obj);
}
}

HashSet item can be changed into same item in Set

I have a Node class :
public class Node : INode
{
public object Value { get; set; }
}
And I have EqualityComparer for this Node class like this :
public class INodeEqualityComparer : EqualityComparer<INode>
{
private INodeEqualityComparer()
{
}
private static readonly INodeEqualityComparer _instance =
new INodeEqualityComparer();
public static INodeEqualityComparer Instance
{
get { return _instance; }
}
public override bool Equals(INode x, INode y)
{
return (int)(x.Value) == (int)(y.Value);
}
public override int GetHashCode(INode obj)
{
return ((int)(obj.Value)).GetHashCode();
}
}
I create my HashSet by passing the NodeEqualityComparer.
I have 4 Node instances :
Node n1 = new Node(1);
Node n2 = new Node(2);
Node n3 = new Node(3);
Node n4 = new Node(1);
When I add n1,n2,n3,n4 to my hashset , n4 get ignored.
HashSet<INode> nodes = new HashSet<INode>(INodeEqualityComparer.Instance);
nodes.Add(n1);
nodes.Add(n2);
nodes.Add(n3);
nodes.Add(n4);
BUT after I use this changing :
nodes.Where(n => (int)(n.Value) == 3).FirstOrDefault().Value = 1;
there will be 2 elements that are equal together (value=1) based on NodeEqualityComparer. those are n1 and n3.
WHY the hashset does not prevent updating node or remove it ?
This is by design: hashing collections (whether it's dictionaries or hash sets or any other) assume that an object's hash code does not change after it's been inserted in the collection. And since two objects that are considered equal must also have the same hash code, then it also means that whatever its Equals implementation returns must not change for the same parameter.
This applies to what is hashed: in dictionaries, that's the key. In sets, that's the entire object.
The .NET documentation states:
In general, for mutable reference types, you should override GetHashCode() only if:
You can compute the hash code from fields that are not mutable; or
You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.
In your Node class, you use a mutable property (Value) to compute the hash code. That's usually a bad idea, and it's actually something that ReSharper will warn against. Then again, overriding Equals and GetHashCode normally means that you're treating the type as a "value" rather than an "entity", and values should be treated as immutable when possible.
If you can't make your object immutable, don't store it in a hash collection.

Is there a way to derive IEqualityComparer from IComparer?

TL;DR I'm looking for a way to obtain IEqualityComparer<T> from IComparer<T>, no matter which datatype is T, including case-insensitive options if T is string. Or I need a different solution for this problem.
Here's full story: I'm implementing simple, generic cache with LFU policy. Requirement is that it must be possible to select whether the cache will be case sensitive or case insensitive -- if string happens to be the datatype for cache keys (which is not necessary). In the solution I primarily develop the cache for, I expect hundreds of billions of cache lookups, and cache sizes of max 100.000 entries. Because of that numbers I immediately resigned from using any string manipulation that causes allocations (such as .ToLower().GetHashCode() etc.) and instead opted to use IComparer and IEqualityComparer, as they are standard BCL features. User of this cache can pass the comparers to constructor. Here are relevant fragments of the code:
public class LFUCache<TKey,TValue>
{
private readonly Dictionary<TKey,CacheItem> entries;
private readonly SortedSet<CacheItem> lfuList;
private class CacheItem
{
public TKey Key;
public TValue Value;
public int UseCount;
}
private class CacheItemComparer : IComparer<CacheItem>
{
private readonly IComparer<TKey> cacheKeyComparer;
public CacheItemComparer(IComparer<TKey> cacheKeyComparer)
{
this.cacheKeyComparer = cacheKeyComparer;
if (cacheKeyComparer == null)
this.cacheKeyComparer = Comparer<TKey>.Default;
}
public int Compare(CacheItem x, CacheItem y)
{
int UseCount = x.UseCount - y.UseCount;
if (UseCount != 0) return UseCount;
return cacheKeyComparer.Compare(x.Key, y.Key);
}
}
public LFUCache(int capacity, IEqualityComparer<TKey> keyEqualityComparer,
IComparer<TKey> keyComparer) // <- here's my problem
{
// ...
entries = new Dictionary<TKey, CacheItem>(keyEqualityComparer);
lfuList = new SortedSet<CacheItem>(new CacheItemComparer(keyComparer));
}
// ...
}
The keyEqualityComparer is used to manage cache entries (so e.g. the key "ABC" and "abc" are equal if user wants to). The keyComparer is used to manage cache entries sorted by UseCount so that it's easy to select the least frequently used one (implemented in CacheItemComparer class).
Example correct usage with custom comparison:
var cache = new LFUCache<string, int>(10000,
StringComparer.InvariantCultureIgnoreCase,
StringComparer.InvariantCultureIgnoreCase);
(That looks stupid, but StringComparer implements both IComparer<string> and IEqualityComparer<string>.) The problem is that if user gives incompatible comparers (i.e. case insensitive keyEqualityComparer and case sensitive keyComparer), then the most likely outcome is invalid LFU statistics, and thus lower cache hits at best. The other scenario is also less than desired. Also if the key is more sophisticated (I'll have something resembling Tuple<string,DateTime,DateTime>), it's possible to mess it up more severely.
That's why I'd like to only have a single comparer argument in constructor, but that doesn't seem to work. I'm able to create IEqualityComparer<T>.Equals() with help of IComparer<T>.Compare(), but I'm stuck at IEqualityComparer<T>.GetHashCode() -- which is very important, as you know. If I had got access to private properties of the comparer to check if it's case sensitive or not, I would have used CompareInfo to get hash code.
I like this approach with 2 different data structures, because it gives me acceptable performance and controllable memory consumption -- on my laptop around 500.000 cache additions/sec with cache size 10.000 elements. Dictionary<TKey,TValue> is just used to find data in O(1), and SortedSet<CacheItem> inserts data in O(log n), find element to remove by calling lfuList.Min in O(log n), and find the entry to increment use count also in O(log n).
Any suggestions on how to solve this are welcome. I'll appreciate any ideas, including different designs.
It's not possible to implement an IComparer from an IEqualityComparer as you have no way of knowing whether an unequal item is greater than or less than the other item.
It's not possible to implement an IEqualityComparer from an IComparer as there's no way for you to generate a hash code that is in line with the IComparer's identity.
That said, there's no need for you to have both types of comparers in your case. When computing LRU you're comparing the time since an item was used as the primary comparer and then comparing based on a passed in comparer as a tiebreaker. Just remove that last part; don't have a tiebreaker. Let it be undefined which item leaves the cache when there is a tie for the least recently used. When you do that you only need to accept an IEqualityComparer, not an IComparer.
As I alluded to in my comment, you could add a helper method that might make things a little simpler for a basic use case:
public class LFUCache<TKey,TValue>
{
public static LFUCache<TKey, TValue> Create<TComp>(int capacity, TComp comparer) where TComp : IEqualityComparer<TKey>, IComparer<TKey>
{
return new LFUCache<TKey, TValue>(capacity, comparer, comparer);
}
}
and you'd use it like this:
var cache = LFUCache<string, int>.Create(10000, StringComparer.InvariantCultureIgnoreCase);
Okay next try. Here is an implementation for Add and Touch for LFU:
public class LfuCache<TKey, TValue>
{
private readonly Dictionary<TKey, LfuItem> _items;
private readonly int _limit;
private LfuItem _first, _last;
public LfuCache(int limit, IEqualityComparer<TKey> keyComparer = null)
{
this._limit = limit;
this._items = new Dictionary<TKey,LfuItem>(keyComparer);
}
public void Add(TKey key, TValue value)
{
if (this._items.Count == this._limit)
{
this.RemoveLast();
}
var lfuItem = new LfuItem { Key = key, Value = value, Prev = this._last };
this._items.Add(key, lfuItem);
if (this._last != null)
{
this._last.Next = lfuItem;
lfuItem.Prev = this._last;
}
this._last = lfuItem;
if (this._first == null)
{
this._first = lfuItem;
}
}
public TValue this[TKey key]
{
get
{
var lfuItem = this._items[key];
++lfuItem.UseCount;
this.TryMoveUp(lfuItem);
return lfuItem.Value;
}
}
private void TryMoveUp(LfuItem lfuItem)
{
if (lfuItem.Prev == null || lfuItem.Prev.UseCount >= lfuItem.UseCount) // maybe > if you want LRU and LFU
{
return;
}
var prev = lfuItem.Prev;
prev.Next = lfuItem.Next;
lfuItem.Prev = prev.Prev;
prev.Prev = lfuItem;
if (lfuItem.Prev == null)
{
this._first = lfuItem;
}
}
private void RemoveLast()
{
if (this._items.Remove(this._last.Key))
{
this._last = this._last.Prev;
if (this._last != null)
{
this._last.Next = null;
}
}
}
private class LfuItem
{
public TKey Key { get; set; }
public TValue Value { get; set; }
public long UseCount { get; set; }
public LfuItem Prev { get; set; }
public LfuItem Next { get; set; }
}
}
In my opinion it looks like that Add and Touch is in O(1), isn't it?
Currently I don't see any use case for _first but maybe anyone else need it. To remove an item _last should be enough.
EDIT
A single linked list will also do if you don't need a MoveDown operation.
EDIT No a single linked list will not work because MoveUp need the Next pointer to change it's Prev pointer.
Instead of taking an IEqualityComparer and an IComparer in your constructor, you could try taking an IComparer and a lambda which defines GetHashCode(). Then build an IEqualityComparer based on if(IComparer==0) and GetHashCode() = lambda.
Although I would say it is small, you still have the risk of getting HashCode mismatches when IComparer returns 0. If you want to make it super clear to the user of your code, you could always extend the strategy by taking two lambdas in the constructor: Func<T,T,int> used for both IComparer and IEqualityComparer, and Func<T,int> for GetHashCode.

Accessor for formatted sub-list of dictionary possible without creating a new object every time?

first off - yes, I had a look at this question: Is object creation in getters bad practice?.
I am also not talking about initializing an object in the accessors / mutators, it is about a specific part of the object I want to be returned in a specific way.
My question is more specific; It does not necessarily only apply to C#, however I am currently looking for a solution to implement in my C# project.
I have a class with a dictionary that maps date objects to a decimal value. In one accessor, I want to return a list of all the keys of the dictionary, another accessors returns the values.
What I also want to have is an accessor that gives me the decimal values in a specific format. It would look something like this:
class Class
{
// Some other properties...
// ....
private Dictionary<DateTime, decimal> dict;
public Class(Dictionary<DateTime, decimal> dict)
{
this.dict = dict;
}
private string FormatTheWayIWant(decimal dt)
{
// Format decimal value.
string s = String.Format("{0:F}", dt);
return s;
}
public ReadOnlyCollection<DateTime> DateTimes
{
get { return new ReadOnlyCollection<DateTime>(this.dict.Keys.ToList()); }
}
public ReadOnlyCollection<decimal> Values
{
get { return new ReadOnlyCollection<decimal>(this.dict.Values.ToList()); }
}
public ReadOnlyCollection<string> FormattedStrings
{
get
{
// Format each decimal value they way I want.
List<string> list = new List<string>();
foreach (decimal dt in dict.Keys)
{
list.Add(FormatTheWayIWant(dt));
}
return new ReadOnlyCollection<string>(list);
}
}
}
This way I can make the following calls (which is my goal!):
DateTime dateTime = DateTimes[0];
decimal s = Values[0];
string formattedS = FormattedStrings[0];
The problem with this approach is that I create a new list everytime I invoke the FormattedStrings accessor, even if I only need one of the formatted strings. I know this is not good practice and can introduce unnecessary performance issues...
The alternatives I thought of are:
I could extend the decimal class and implement a custom ToString()-method.
Or overwrite the KeyValuePair<DateTime, decimal> class and use an indexer in my class.
Or I create a method with a parameter for the index and return just the one formatted string.
Or I could have an own list for the accessor, which gets updated in the set-method for my dictionary everytime I update the dictionary.
The question I have is, is there a way to make this work with an accessor instead of a method, creating custom classes or having strange side effects on other objects when assigning a value?
Thank you in advance.
Ofcourse this can be done with an accessor. You just have to create 3 separate classes for each desired element of your processed collection. Those classes should have their own indexers, so you would be able to access the elements as a list. The difference would be, that they compute each element on demand (wchich is called lazy initialization). So it would go like this (example for your FormattedStrings):
class Class
{
// ...
MyFormattedStrings FormattedStrings
{
get {return new MyFormattedStringsIndexer<string>(this.dict.Values.ToList());}
}
}
class MyFormattedStringsIndexer<T>
{
private IList<T> list; // we take only reference, so there is no overhead
public MyFormattedStringsCollection (IList<T> list)
{
this.list = list;
}
// the indexer:
public T this[int i]
{
get
{
// this is where the lazy stuff happens:
// compute the desired element end return it
}
set
{
// ...
}
}
}
Now you can use your Class like this:
string formattedS = FormattedStrings[5];
and each element you access will be computed as you access it. This solution also has the advantage of separating concerns, so should you ever had to implement different logic for one of your 3 accessors it would be just a matter of extending one of the indexers.
You can read more about indexeres here: http://msdn.microsoft.com/en-us/library/6x16t2tx.aspx
This is VB, but you get the idea...
Public Class Something
Public Property Items As Dictionary(Of DateTime, String)
Public Readonly Property FormattedItem(ByVal index As Int32) As String
' add error checking/handling as appropriate
Return Me.Items.Keys(index).ToString("custom format") ' or whatever your formatting function looks like.
End Property
End Class
It looks like a good candidate for a new class
public class MyObject
{
public DateTime Key {get;set;}
public String Name {get;set;}
public String FormattedString {get;}
}
And then it can be used in any container (List<MyObject>, Dictionary<MyObject>, etc).
Your Dates and Strings property getters are returning a new list on each call. Therefore if a caller does the following:
Class myClass = ...
for(i=0; i<myClass.Strings.Count; i++)
{
var s = myClass.Strings[i];
...
}
then each iteration of the loop will create a new list.
I'm not clear on what you're really trying to achieve here. You are wrapping the dictionary's Keys and Values properties in ReadOnlyCollections. This gives you an indexer, which doesn't have much meaning as the order of the Keys in a Dictionary<TKey, TValue> is unspecified.
Coming (at last!) to your question, if you want to do the formatting in a "lazy" manner, you could create a custom class that implements a readonly IList<string>, and wraps your list of keys (IList<DateTime>). Most of the implementation is boilerplate, and your indexer will do the formatting. You could also cache the formatted values so that you only format once if accessed multiple times. Something like:
public class MyFormattingCollection : IList<string>
{
private IList<decimal> _values;
private IList<string> _formattedValues;
public MyFormattingCollection(IList<DateTime> values)
{
_values = values;
_formattedValues = new string[_values.Count];
}
public string this[int index]
{
get
{
var result = _formattedValues[index];
if (result == null)
{
result = FormatTheWayIWant(_values[index]);
_formattedValues[index] = result;
}
return result;
}
set
{
// Throw: it's a readonly collection
}
}
// Boilerplate implementation of readonly IList<string> ...
}

Sorting Hashtable by Order in Which It Was Created

This is similar to How to keep the order of elements in hashtable, except for .NET.
Is there any Hashtable or Dictionary in .NET that allows you to access it's .Index property for the entry in the order in which it was added to the collection?
A NameValueCollection can retrieve elements by index (but you cannot ask for the index of a specific key or element). So,
var coll = new NameValueCollection();
coll.Add("Z", "1");
coll.Add("A", "2");
Console.WriteLine("{0} = {1}", coll.GetKey(0), coll[0]); // prints "Z = 1"
However, it behaves oddly (compared to an IDictionary) when you add a key multiple times:
var coll = new NameValueCollection();
coll.Add("Z", "1");
coll.Add("A", "2");
coll.Add("Z", "3");
Console.WriteLine(coll[0]); // prints "1,3"
The behaviour is well documented, however.
Caution: NameValueCollection does not implement IDictionary.
As an aside: Dictionary<K,V> does not have any index you can use, but as long as you only add elements, and never remove any, the order of the elements is the insertion order. Note that this is a detail of Microsoft's current implementation: the documentation explicitly states that the order is random, so this behavior can change in future versions of the .NET Framework or Mono.
If this is something that you need to keep track of efficiently, then you are using the wrong data structure. Instead, you should use a SortedDictionary where the key is tagged with the index of when it was added (or a timestamp) and a custom IComparer that compares two keys based on the index (or the timestamp).
You can use a separate list to store the elements in the order they are added. Something along the lines of the following sample:
public class ListedDictionary<TKey, TValue> : IDictionary<TKey, TValue>
{
List<TValue> _list = new List<TValue>();
Dictionary<TKey, TValue> _dictionary = new Dictionary<TKey,TValue>();
public IEnumerable<TValue> ListedValues
{
get { return _list; }
}
public void Add(TKey key, TValue value)
{
_dictionary.Add(key, value);
_list.Add(value);
}
public bool ContainsKey(TKey key)
{
return _dictionary.ContainsKey(key);
}
public ICollection<TKey> Keys { get { return _dictionary.Keys; } }
public bool Remove(TKey key)
{
_list.Remove(_dictionary[key]);
return _dictionary.Remove(key);
}
// further interface methods...
}
Is there any Hashtable or Dictionary in .NET that allows you to access it's .Index property for the entry in the order in which it was added to the collection?
No. You can enumerate over all the items in a Hastable or Dictionary, but these are not guaranteed to be in any sort of order (most likely they are not)
You would have to either use a different data structure altogether, (such as SortedDictionary or SortedList) or use a separate list to store the order in which they were added. You would want to wrap the ordered list and your dictionary/hashtable in another class to keep them synched.
Take a look at the OrderedDictionary class. Not only can you access it via keys, but also via an index (position).
An alternative is to create an array of stuctures, so instead of using
dictionary.Add{"key1","value1"}
you create a structure with the key/value like:
public struct myStruct{
private string _sKey;
public string sKey{
get { return _sKey; }
set { _sKey = value; }
}
private string _sValue;
public string sValue {
get { return _sValue; }
set { _sValue = value; }
}
}
// create list here
List<myStruct> myList = new List<myStruct>();
// create an instance of the structure to add to the list
myStruct item = new myStruct();
item.sKey = "key1";
item.sValue = "value1";
// then add the structure to the list
myList.Add(item);
Using this method you can add extra dimensions to the list without too much effort, just add a new member in the struct.
Note, if you need to modify items in the list after they have been added you will have to change the struct into a class. See this page for more info on this issue: error changing value of structure in a list

Categories

Resources