Get a key equal to an item from SortedDictionary? - c#

Is there any way to retrieve a key from a SortedDictionary that is equal to a given object? To illustrate, lets say I create a dictionary that has a fairly memory-heavy, immutable key type:
var dictionary = SortedDictionary<MyHugeType, int>();
var myEnormousKey = new MyHugeType();
dictionary[myEnormousKey] = 123;
Then later on, I do something like this:
// This is a new instance, but it's identical to the previous key
var myIdenticalKey = new MyHugeType();
if(dictionary.ContainsKey(myIdenticalKey)) {
myIdenticalKey = dictionary.GetKeyEqualTo(myIdenticalKey);
}
// Use myIdenticalKey reference...
Obviously, SortedDictionary does not have a "GetKeyEqualTo" method. But is there some way I could achieve a similar effect? This would basically have the effect of intern-ing the heavy key objects so that identical instances could be discarded. I know I can do this using the SortedList class by retrieving the key's index and subsequently its matching object instance, but SortedDictionary's consistent insertion performance would be better for my uses.
Short of iterating through all the dictionary's keys to search for a match, or writing my own BST class, is there any way to achieve this end with .NET's built in collections?

You could change your value object from int to a struct or class containing both the value and the original key. Then to access the original key you can do:
dictionary[myIdenticalKey].OriginalKey
and for the value something like:
dictionary[myIdenticalKey].Value

If you override Equals() and GetHashCode() in MyHugeType with code that determines if two instances are the same, then you won't get duplicate keys in the dictionary. Is this what you mean?

You could implement the IEquatable interface in your key class. There you specify when two objects of the class are equal to each other. After that you simply test the existence of an entry using ContainsKey and when that returns true you can obtain it using the [] operator.
You could also provide a IComparer implementation to achieve the same result.

Related

C# .NET 4.5 how to get list of unique objects based on `GetHashCode`

I have a IEnumerable of objects that have redefined GetHashCode method. I assumed that if I add those objects to HashSet<T>, it would hold only the unique objects. But it doesn't:
var set = new HashSet<SomeObject>();
Count = 0
set.Add(first);
true
set.Add(second);
true
set.Count
2
first.GetHashCode()
-927637658
second.GetHashCode()
-927637658
So how could I reduce my IEnumerable structure of objects to those that are unique based on their GetHashCode() value.
Although I don't know if this helps in any way:
public class SomeObject
{
...
public string GetAggregateKey()
{
var json = ToJson();
json.Property("id").Remove();
return json.ToString(); // without the `id`, the json string of two separate objects with same content could be the same
}
override public int GetHashCode()
{
// two equal strings have same hash code
return GetAggregateKey().GetHashCode();
}
...
}
It is not enough to only have a GetHashCode method.
The GetHashCode method is used to quickly figure out if there are potential candidates already in the hashset (or dictionary):
If no existing object in the hashset has the same hash code, the new one is not a duplicate
If any existing object(s) in the hashset has the same hash code, the new one is a potential duplicate
To figure out if it is just a potential duplicate or an actual duplicate, Equals is used.
If you haven't implemented that then the object.Equals method will be used, which is simply comparing references. Two distinct objects will thus never be equal, even though they may both have the same property values and the same hash code.
The solution: Implement Equals with the same rules as the GetHashCode, or provide a IEqualityComparer<T> implementation to your hashset.
Have a look at the Reference Source for HashSet:
This line (960, and those around it) is what you're looking for:
if (m_slots[i].hashCode == hashCode && m_comparer.Equals(m_slots[i].value, value))
The hash of the object is only used to decide which bucket the object goes into. If Equals returns false for the two objects, the new one will still be inserted.

make generic types in code variable

I want to create a container class which works like a kind of Directory, but with multiple keys and the possibility to assign several values to one key. The number of possible keys should be variable. I want to do that through creating for every key-Type in inputTypes a new Dictionary which contains the key and the index of the value in the values' list.
class SampleContainer<Tvalue>
{
public SampleContainer(params Type[]inputTypes)
{
foreach(Type t in inputTypes)
{
ls.Add(new Dictionary(t,int));//won't compile
}
values=new List<Tvalue>();
}
List<Dictionary< ???,int>> ls;/*object as ??? doesn't work,
what to fill in to keep it "generic"*/
List<Tvalue>values;
}
Try
List<IDictionary> ls = new List<IDictionary>();
ls.Add(typeof(Dictionary<,>).MakeGenericType(typeof(int), typeof(bool))
.GetConstructor(Type.EmptyTypes)
.Invoke(null) as IDictionary);
There are side-effects though, the IDictionary interface uses object key/value pairs. It may also cause boxing when attempting to retrieve a value (because it returns object types).
Maybe somebody can think of a really neat way to do it and keep the strong typing, but this is the closest I can think to come to what you are asking to do.

Sorting IDictionary Generic

I need to know if there is any way of ordering an IDictionary without knowing what type it is exactly ...
For example, I have a method that received an object and within this I have a Dictionary object ... all I know is that it is a Dictionary
so I can do this:
public void MyMethod(PropertyInfo propriedadeParametro, object parameters){
IDictionary dictionary = ((IDictionary) propriedadeParametro.GetValue (parameters, null));
}
but need sort the items of this Dictionary by EnumPersonalizado regardless of what the other Type "something?" has
You can't sort a dictionary. A dictionary, by definition, doesn't have an "order" of the items within it. The items are stored in some mechanism that is beyond your control that is designed to make it as efficient as possible for adding, removing, and searching.
The best that you can do is take all of the items out of the dictionary, put them in some other sort of collection, and then sort that.
As to your particular case, it's not clear to us what the type of the key or the value in the dictionary is, and that would need to be known in order to be able to try to sort the data.
see this question.
Dictionaries by themselves don't have an index order. Consider inheriting from the KeyedCollection class instead. It's a merge of a dictionary and an ordinary list, and it's designed to use a member of your items as the key, and have an index order.
There are plenty of legitimate reasons to want to apply a partial ordering to dictionaries based on key, it isn't a fundamental quality that keys be unordered, only that a given key will yield a given value.
That being said, if you find yourself with a non-generic IDictionary, it can actually be quite troublesome to 'sort' by key without knowledge of the key type. In my specific scenario, I wanted a function which would transform an IDictionary into another IDictionary where the entries could be enumerated by the ordered keys.
IDictionary ToSortedDictionary(IDictionary dictionary) {
return new SortedList(dictionary);
}
This will construct a new dictionary instance, such that traversals (foreach) will visit the entries based on the sort order of the keys.
The oddly named SortedList can be found in System.Collections and orders keys using the ĂŚComparable interface.
IDictionary is IEnumerable, so you can try to do something like new ArrayList(dictionary).Sort(), but it will try to cast members to IComparable, or you can use a Sort overload which accepts an IComparer object.
Another way is to use a reflection - first you find actual Keys/Values types and then you create a call to generic OrderBy.

C# unique index generic collection

I need a collection that exposes [] operator, contains only unique objects, and are generic. Anyone can help?
Dictionary(Of TKey, TValue) Class represents a collection of keys and values.
HashSet<T>
It depends what you mean by "exposes the [] operator."
If you want to be able to access objects in a unique collection by some arbitrary key, then use a Dictionary<string key, object value>.
If you want to be able to create a list of unique objects which permits access by an ordinal index, in the order in which objects were added, you will need to roll something of your own. I am not aware of any framework class that offers both uniqueness like a HashSet<T> and also allows access to objects in the order in which they were added, like a List<T>. SortedSet<T> almost does it, but does not have indexer access - so while it does maintain order, it does not allow access using that order except through enumeration. You could use Linq extension method ElementAt to access the element at a particular ordinal index, but performance would be very bad since this method works by iteration.
You could use also Dictionary<int key, object value> but you will still have to maintain the index yourself, and if anything is ever removed, you'd have a hole in your list. This would be a good solution if you never had to remove elements.
To have both uniqueness and access by index, and also be able to remove elements, you need a combination of a hash table and an ordered list. I created such a class recently. I don't think this is necessarily the most efficient implementation since it does its work by keeping two copies of the lists (one as a List<T> and one as a HashSet<T>).
In my situation, I valued speed over storage efficiency, since the amount of data wasn't large. This class offers the speed of a List<T> for indexed access and the speed of a HashTable<T> for element access (e.g. ensuring uniqueness when adding) at the expense of twice the storage requirements.
An alternative would be to use just a List<T> as your basis, and verify uniqueness before any add/insert operation. This would be more memory efficient, but much slower for add/insert operations because it doesn't take advantage of a hash table.
Here's the class I used.
http://snipt.org/xlRl
The HashSet class should do the trick. See HashSet(Of T) for more information. If you need them to maintain a sorted order, the SortedSet should do the trick. See SortedSet(Of T) for more information about that class.
If you're looking to store unique objects (entities, for example) while exposing a [], then you want to use the KeyedCollection class.
MSDN KeyedCollection
using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
// This class represents a very simple keyed list of OrderItems,
// inheriting most of its behavior from the KeyedCollection and
// Collection classes. The immediate base class is the constructed
// type KeyedCollection<int, OrderItem>. When you inherit
// from KeyedCollection, the second generic type argument is the
// type that you want to store in the collection -- in this case
// OrderItem. The first type argument is the type that you want
// to use as a key. Its values must be calculated from OrderItem;
// in this case it is the int field PartNumber, so SimpleOrder
// inherits KeyedCollection<int, OrderItem>.
//
public class SimpleOrder : KeyedCollection<int, OrderItem>
{
// The parameterless constructor of the base class creates a
// KeyedCollection with an internal dictionary. For this code
// example, no other constructors are exposed.
//
public SimpleOrder() : base() {}
// This is the only method that absolutely must be overridden,
// because without it the KeyedCollection cannot extract the
// keys from the items. The input parameter type is the
// second generic type argument, in this case OrderItem, and
// the return value type is the first generic type argument,
// in this case int.
//
protected override int GetKeyForItem(OrderItem item)
{
// In this example, the key is the part number.
return item.PartNumber;
}
}

When do we do GetHashCode() for a Dictionary?

I have used Dictionary(TKey, TValue) for many purposes. But I haven't encountered any scenario to implement GetHashCode() which I believe is because my keys were of primary types like int and string.
I am curious to know the scenarios (real world examples) when one should use a custom object for key and thus implement methods GetHashCode() Equals() etc.
And, does using a custom object for key necessitate implementing these functions?
You should override Equals and GetHashCode whenever the default Object.Equals (tests for reference equality) will not suffice. This happens, for example, when the type of your key is a custom type and you want two keys to be considered equal even in cases when they are not the same instance of the custom type.
For example, if your key is as simple as
class Point {
public int X { get; set; }
public int Y { get; set; }
}
and you want two Points two be considered equal if their Xs are equal and their Ys are equal then you will need to override Equals and GetHashCode.
Just to make it clear: There is one important thing about Dictionary<TKey, TValue> and GetHashCode(): Dictionary uses GetHashCode to determine if two keys are equal i.e. if <TKey> is of custom type you should care about implementing GetHashCode() carefully. As Andrew Hare pointed out this is easy, if you have a simple type that identifies your custom object unambiguously. In case you have a combined identifier, it gets a little more complicated.
As example consider a complex number as TKey. A complex number is determined by its real and its imaginary part. Both are of simple type e.g. double. But how would you identify if two complex numbers are equal? You implement GetHashCode() for your custom complex type and combine both identifying parts.
You find further reading on the latter here.
UPDATE
Based on Ergwun's comment I checked the behavior of Dictionary<TKey, TValue>.Add with special respect to TKey's implementation of Equals(object) and GetHashCode(). I
must confess that I was rather surprised by the results.
Given two objects k1 and k2 of type TKey, two arbitrary objects v1 and v2 of type TValue, and an empty dictionary d of type Dictionary<TKey, TValue>, this is what happens when adding v1 with key k1 to d first and v2 with key k2 second (depending on the implementation of TKey.Equals(object) and TKey.GetHashCode()):
k1.Equals(k2) k1.GetHashCode() == k2.GetHashCode() d.Add(k2, v2)
false false ok
false true ok
true false ok
true true System.ArgumentException
Conclusion: I was wrong as I originally thought the second case (where Equals returns false but both key objects have same hash code) would raise an ArgumentException. But as the third case shows dictionary in some way does use GetHashCode(). Anyway it seems to be good advice that two objects that are the same type and are equal must return the same hash code to ensure that instances Dictionary<TKey, TValue> work correctly.
You have two questions here.
When do you need to implement
GetHashCode()
Would you ever use an object for a dictionary key.
Lets start with 1. If you are writing a class that might possibly be used by someone else, you will want to define GetHashCode() and Equals(), when reference Equals() is not enough. If you're not planning on using it in a dictionary, and it's for your own usage, then I see no reason to skip GetHashCode() etc.
For 2), you should use an object anytime you have a need to have a constant time lookup from an object to some other type. Since GetHashCode() returns a numeric value, and collections store references, there is no penalty for using an Object over an Int or a string (remember a string is an object).
One example is when you need to create a composite key (that is a key comprised of more that one piece of data). That composite key would be a custom type that would need to override those methods.
For example, let's say that you had an in-memory cache of address records and you wanted to check to see if an address was in cache to save an expensive trip to the database to retrieve it. Let's also say that addresses are unique in terms of their street 1 and zip code fields. You would implement your cache with something like this:
class AddressCacheKey
{
public String StreetOne { get; set; }
public String ZipCode { get; set; }
// overrides for Equals and GetHashCode
}
and
static Dictionary<AddressCacheKey,Address> cache;
Since your AddressCacheKey type overrides the Equals and GetHashCode methods they would be a good candidate for a key in the dictionary and you would be able to determine whether or not you needed to take a trip to the database to retrieve a record based on more than one piece of data.

Categories

Resources