Referencing an object using its hashcode? - c#

I have created an object, say details.
I then assign: int x = details.GetHashCode();
Later in the program, I would like to access this object using the integer x. Is there a way to do this in C#?
Many thanks
Paul

No:
It may have been garbage collected, unless you've got something in place to stop that.
Hash codes aren't unique - what if there are two objects with the same hash code? (See Eric Lippert's post about hash codes for more information.)
You could create (say) a Dictionary<int, Details> and use the hash code as the key - but I'd strongly recommend that you didn't do that.
Any reason you don't want to just keep a reference to the object instead of the hash code?

No.
A Hashcode represents some hashing function's value on the object.
You can't recreate the original object's refrence from this, and more importantly, there no guarantee that the object still exists.

If there is a way it would go against object orientation. You should expose the details reference to the consuming code.

Yes, you could create a Dictionary<int,Details> and store the object in this dictionary using the hashcode from details.GetHashCode( ) and then later on pull the object out of the dictionary using x.
But it's not something I would suggest doing! What is it you're trying to acheive?

Store the integer and just check the Hashcode again.
Note that in C# the Hashcode is not guarenteed unique. If you are dealing with a few million objects, you can and will run across duplicate hashes with the default implementation very easily.
http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx
"The default implementation of the GetHashCode method does not guarantee unique return values for different objects."

public static Detail GetDetailsFromHash(this List<Detail> detailsList, int x) {
foreach (var details in detailslist) {
if (details.GetHashCode() == x) {
return details;
}
}
return null;
}
However Hashcodes are not guaranteed to be unique

A hash code isn't really meant to be consumed directly. Its main purpose is so that the item can be used as a key in a collection. It is then the collections responsibility to map the hash back to the object. basically it lets you do something like Dictonary<details, myClass2> which would be much harder if GetHashCode wasn't implemented... but the function isn't a whole lot of use unless you are implementing your own collection or equality operator.

Related

Are hash codes of System.Type objects of types from the same assembly guaranteed to be unique?

Clarifying edit: The keys in the dictionary are actual instances of System.Type. More specifically every value is stored with its type as the key.
In a specific part of my program the usage of Dictionary<System.Type, SomeThing> takes a large chunk of CPU time, as per Visual Studio 2017 performance profiler.
A change in the type of the dictionary to Dictionary<int, SomeThing> and instead of passing the type object directly I pass the type.GetHashCode() seems to be about 20%-25% faster.
The above optimization will result in a nasty bug if two types have the same hash code, but it seems plausible to me that types can have unique hash codes, at least when it comes to types from the same assembly - which all the types used in this dictionary are.
Possibly relevant information - As per this answer the number of possible types in an assembly is far smaller than the number of values represented by System.Int32.
No. The documentation on object.GetHashCode() make no guarantees, and states:
A hash code is intended for efficient insertion and lookup in collections that are based on a hash table. A hash code is not a permanent value. For this reason:
...
Do not use the hash code as the key to retrieve an object from a keyed collection.
Because equal hash codes is necessary, but not sufficient, for two objects to be equal.
If you're wondering if Type.GetHashCode() follows a more restrictive definition, its documentation makes no mention of such a change, so it still does not guarantee uniqueness. The reference source does not show any attempt to make this guarantee, either.
A hash-code is never garantueed to be unique for different values, so you should not use it like you are doing.
The same value should however generate the same hashcode.
This is also stated in MSDN:
Two objects that are equal return hash codes that are equal. However, the reverse is not true: equal hash codes do not imply object equality, because different (unequal) objects can have identical hash codes.
and somewhat further:
Do not use the hash code as the key to retrieve an object from a keyed collection.
Therefore, I would also not rely for GetHashCode for different types to be unique, but at least, you can verify it:
Dictionary<int, string> s = new Dictionary<int, string>();
var types = typeof(int).Assembly.GetTypes();
Console.WriteLine($"Inspecting {types.Length} types...");
foreach (var t in typeof(-put a type from that assembly here-).Assembly.GetTypes())
{
if (s.ContainsKey(t.GetHashCode()))
{
Console.WriteLine($"{t.Name} has the same hashcode as {s[t.GetHashCode()]}");
}
else
{
s.Add(t.GetHashCode(), t.Name);
}
}
Console.WriteLine("done!");
But even if the above test would conclude that there are no collisions, I wouldn't do it, since the implementation of GetHashCode can change over time, which means that collisions in the future might be possible.
A hashcode isn´t ment do be unique. Instead it is used in hash-based collections such as Dictionary in order to limit the number of possible ambiguities. A hashc-ode is nothing but an index, so instead of searching the entire collection for a match only a few items that share a common value - the hash-code - have to be considered.
In fact you could even have a hash-implementation that allways returns the same number for every item. However that´ll leads to O(n) to look for a key in your dictionary, as every key has to be compared.
Anyway you shouldn´t strive for micro-optimizations that get you some nan-seconds in exchange for maintainability and understandability. You should instead use some data-structure that gets the job done and is easy to understand.

Get original value from HashSet

UPDATE:
Starting with .Net 4.7.2, HashSet.TryGetValue - docs is available.
HashSet.TryGetValue - SO post
I have a problem with HashSet because it does not provide any method similar to TryGetValue known from Dictionary. And I need such method -- passing element to find in the set, and set returning element from its collection (when found).
Sidenote -- "why do you need element from the set, you already have that element?". No, I don't, equality and identity are two different things.
HashSet is not sealed but all its fields are private, so deriving from it is pointless. I cannot use Dictionary instead because I need SetEquals method. I was thinking about grabbing a source for HashSet and adding desired method, but the license is not truly open source (I can look, but I cannot distribute/modify). I could use reflection but the arrays in HashSet are not readonly meaning I cannot bind to those fields once per instance lifetime.
And I don't want to use full blown library for just single class.
So far I am stuck with LINQ SingleOrDefault. So the question is how fix this -- have HashSet with TryGetValue?
Probably you should switch from a HashSet to a SortedSet
There is a simple TryGetValue() for a SortedSet:
public bool TryGetValue(ref T element)
{
var foundSet = sortedSet.GetViewBetween(element, element);
if(foundSet.Count == 1)
{
element = foundSet.First();
return true;
}
return false;
}
when called, the element needs just all properties set which are used in the Comparer. It returns the element found in the Set.
I agree this is something which is basically missing. While it's only useful in rare cases, I think they're significant rare cases - most notable, key canonicalization.
I can only think of one suggestion at the moment, and it's truly foul.
You can specify your own IEqualityComparer<T> when creating a HashSet<T> - so create one which remembers the arguments to the last positive (i.e. true-returning) Equals comparison it has performed. You can then call Contains, and see what the equality comparer was asked to compare.
Caveats:
This holds on to references unnecessarily, so could end up preventing objects being garbage collected
You'd potentially want to do this on a per-thread basis (if you've got a set that isn't modified after initialization, but is then read by multiple threads, for example)
It assumes that HashSet<T> doesn't use any optimization such as "if the references are equal, don't bother consulting the equality comparer"
It's fundamentally a horrible abuse
I've been trying to think of other alternatives in terms of finding intersections, but I haven't got anywhere yet...
As noted in comments, it would be worth encapsulating this as far as possible - I suspect you only need a very limited set of operations, so I'd wrap a HashSet<T> in your own class and only expose the operations you really need - that way you get to clear the "cache" after each operation, removing my first objection above.
It still feels like a horrible abuse to me, but...
As others have suggested, an alternative would be to use a Dictionary<TKey, TValue> and implement SetEquals yourself. That would be simple enough to do - and again, you'd want to encapsulate this in your own type. Either way, you should probably design the type itself first, and then implement it using either a HashSet<> or a Dictionary<,> as an implementation detail.
Sounds like you trying to use the wrong tool. True, you can save some memory using a HashSet but it seems to me that you are trying to acheeve a different goal: Get the actual element that is just equal to a representation.
So in reality they are two different elements. Just the memento (a unique representation) is equal.
Therefore you'd be better of using a Dictionary where you add your elements as Key and Value. So you're able to get it back (the identical) but you miss your SetEquals....
I suppose SetEquals in it's implementation does nothing much different than sequencially compare two HashSets in it's bucket order and fails on first non-equality.
So you should be equally good off using a simple SequenceEqual() (LINQ) comparing the two Keys collections.
So this extension method could do
public static SetEqual<T,G>(this IDictionary<T,G> d, IDictionary<T,G> e)
{
return d.Keys.SequenceEqual(e.Keys);
}
This should work, because a Dictionary basically is a HashSet with an associated value. And more appropriate to your problem. (OK, to be correct, the code should go for Dictionary<> instead of IDictionary<> because Key order matters)
If you need an IEnumerable<> on the second parameter try sorting to get a defined order (not so efficient).
Finally added in .NET 4.7.2:
HashSet.TryGetValue(T, T) Method
An SO post with more details
hopefully not blind but I haven't seen this answer anywhere. If you want dictionary's TryGetValue, you can just steal it.
theHashset.ToDictionary(item => item.ID).TryGetValue(key, out value)
All you need is a quick lambda for determining unique keys.

How optimize the update a c# dictionary with a single key lookup?

Say for example I have
Dictionary<string, double> foo;
I can do
foo["hello"] = foo["hello"] + 2.0
Or I could do
foo["hello"] += 2.0
but the compiler just expands this to the code above. I verified that by using JetBrains .Peek to look at the assemblies.
This seems wasteful as two key lookups are required to update. Is there a dictionary implementation that can do this in one lookup? Note I'm using a dictionary to store 100k items of geometry information from a mesh and the lookups are in an inner loop. Please no "premature optimization is the root of all evil" answers. :)
Yes I have profiled.
Using a class would probably be faster as the comments mention because:
With a struct, you must do a double look-up as mentioned in the comments.
With a class, you simply go to the memory of the class reference and can update it there.
Each Lookup:
GetHashCode
Get the bucket
Iterate through to find the right one
(This all involves reading multiple ref object values)
However, if you use a class and update its value:
Change the value at the correct position relative to that ref.
It's a single change in memory.
#George Duckett's solution should be much faster. Change to a class and get the ref and update the object's value:
var hello = foo["hello"];
hello.howAreYou += 2.0;
By the way, this is an example case where a mutable class will win in performance over the immutable struct.
There's a method in ConcurrentDictionary, ConcurrentDictionary.AddOrUpdate, that does what you want. You can update an existing value in the dictionary based on its previous value in one go.
However, the concurrent dictionary is supposed to be used in multiple thread situations, so I can imagine it does some locking which might defeat your optimization goal. But then again, you can always benchmark and see how it goes.
No, it is not. As noted in the comment by bradgonesurfing, the language lacks a way to return reference to the stored value, so when it has to change that value, it needs to find it again.
Also, you said you are storing pairs of integers. Did you thought about using an array? Even 100k long array is not even 1MB big. And I'm sure it would be fastest you can get.

HashSet limit - how to proceed?

My program creates custom objects, I want to get a distinct list of. So I want to use a set and add object by object. The set would prevent duplicates. And at last I have a set of unique objects.
I would usually use a HashSet, because I don't need a sorted set. Only, there are so many different potential objects. More than 2^32. The GetHashCode function returns an int, so this cannot work as a unique key for my objects.
I assume that I cannot use the HashSet hence and must use the slower SortedSet and have my object implement IComparable / CompareTo. Is this correct? Or is there a way to have a HashSet with long hash codes?
GetHashCode does return an int, but if the comparison for the hash codes determines they are the same, it folllows by calling the Equals method (which you should override).
So, no, you don't have to switch. You can keep using the same old lovable HashSet (as long as you don't run out of memory).

How to print object ID?

`I need to know if two references from completely different parts of the program refers to the same object.
I can not compare references programaticaly because they are from the different context (one reference is not visible from another and vice versa).
Then I want to print unique identifier for each object using Console.WriteLine(). But ToString() method doesn't return "unique" identifier, it just returns "classname".
Is it possible to print unique identifier in C# (like in Java)?
The closest you can easily get (which won't be affected by the GC moving objects around etc) is probably RuntimeHelpers.GetHashCode(Object). This gives the hash code which would be returned by calling Object.GetHashCode() non-virtually on the object. This is still not a unique identifier though. It's probably good enough for diagnostic purposes, but you shouldn't rely on it for production comparisons.
EDIT: If this is just for diagnostics, you could add a sort of "canonicalizing ID generator" which was just a List<object>... when you ask for an object's "ID" you'd check whether it already existed in the list (by comparing references) and then add it to the end if it didn't. The ID would be the index into the list. Of course, doing this without introducing a memory leak would involve weak references etc, but as a simple hack this might work for you.
one reference is not visible from another and vice versa
I don't buy that. If you couldn't even get the handles, how would you get their ID's?
In C# you can always get handles to objects, and you can always compare them. Even if you have to use reflection to do it.
If you need to know if two references are pointing the same object, I'll just citate this.
By default, the operator == tests for
reference equality. This is done by
determining if two references indicate
the same object. Therefore reference
types do not need to implement
operator == in order to gain this
functionality.
So, == operator will do the trick without doing the Id workaround.
I presume you're calling ToString on your object reference, but not entirely clear on this or your explained situatyion, TBH, so just bear with me.
Does the type expose an ID property? If so, try this:
var idAsString = yourObjectInstance.ID.ToString();
Or, print directly:
Console.WriteLine(yourObjectInstance.ID);
EDIT:
I see Jon seen right through this problem, and makes my answer look rather naive - regardless, I'm leaving it in if for nothing else but to emphasise the lack of clarity of the question. And also, maybe, provide an avenue to go down based on Jon's statement that 'This [GetHashCode] is still not a unique identifier', should you decide to expose your own uniqueness by way of an identifier.

Categories

Resources