Is there a way of getting a unique identifier of an instance?
GetHashCode() is the same for the two references pointing to the same instance. However, two different instances can (quite easily) get the same hash code:
Hashtable hashCodesSeen = new Hashtable();
LinkedList<object> l = new LinkedList<object>();
int n = 0;
while (true)
{
object o = new object();
// Remember objects so that they don't get collected.
// This does not make any difference though :(
l.AddFirst(o);
int hashCode = o.GetHashCode();
n++;
if (hashCodesSeen.ContainsKey(hashCode))
{
// Same hashCode seen twice for DIFFERENT objects (n is as low as 5322).
Console.WriteLine("Hashcode seen twice: " + n + " (" + hashCode + ")");
break;
}
hashCodesSeen.Add(hashCode, null);
}
I'm writing a debugging addin, and I need to get some kind of ID for a reference which is unique during the run of the program.
I already managed to get internal ADDRESS of the instance, which is unique until the garbage collector (GC) compacts the heap (= moves the objects = changes the addresses).
Stack Overflow question Default implementation for Object.GetHashCode() might be related.
The objects are not under my control as I am accessing objects in a program being debugged using the debugger API. If I was in control of the objects, adding my own unique identifiers would be trivial.
I wanted the unique ID for building a hashtable ID -> object, to be able to lookup already seen objects. For now I solved it like this:
Build a hashtable: 'hashCode' -> (list of objects with hash code == 'hashCode')
Find if object seen(o) {
candidates = hashtable[o.GetHashCode()] // Objects with the same hashCode.
If no candidates, the object is new
If some candidates, compare their addresses to o.Address
If no address is equal (the hash code was just a coincidence) -> o is new
If some address equal, o already seen
}
.NET 4 and later only
Good news, everyone!
The perfect tool for this job is built in .NET 4 and it's called ConditionalWeakTable<TKey, TValue>. This class:
can be used to associate arbitrary data with managed object instances much like a dictionary (although it is not a dictionary)
does not depend on memory addresses, so is immune to the GC compacting the heap
does not keep objects alive just because they have been entered as keys into the table, so it can be used without making every object in your process live forever
uses reference equality to determine object identity; moveover, class authors cannot modify this behavior so it can be used consistently on objects of any type
can be populated on the fly, so does not require that you inject code inside object constructors
Checked out the ObjectIDGenerator class? This does what you're attempting to do, and what Marc Gravell describes.
The ObjectIDGenerator keeps track of previously identified objects. When you ask for the ID of an object, the ObjectIDGenerator knows whether to return the existing ID, or generate and remember a new ID.
The IDs are unique for the life of the ObjectIDGenerator instance. Generally, a ObjectIDGenerator life lasts as long as the Formatter that created it. Object IDs have meaning only within a given serialized stream, and are used for tracking which objects have references to others within the serialized object graph.
Using a hash table, the ObjectIDGenerator retains which ID is assigned to which object. The object references, which uniquely identify each object, are addresses in the runtime garbage-collected heap. Object reference values can change during serialization, but the table is updated automatically so the information is correct.
Object IDs are 64-bit numbers. Allocation starts from one, so zero is never a valid object ID. A formatter can choose a zero value to represent an object reference whose value is a null reference (Nothing in Visual Basic).
The reference is the unique identifier for the object. I don't know of any way of converting this into anything like a string etc. The value of the reference will change during compaction (as you've seen), but every previous value A will be changed to value B, so as far as safe code is concerned it's still a unique ID.
If the objects involved are under your control, you could create a mapping using weak references (to avoid preventing garbage collection) from a reference to an ID of your choosing (GUID, integer, whatever). That would add a certain amount of overhead and complexity, however.
RuntimeHelpers.GetHashCode() may help (MSDN).
You can develop your own thing in a second. For instance:
class Program
{
static void Main(string[] args)
{
var a = new object();
var b = new object();
Console.WriteLine("", a.GetId(), b.GetId());
}
}
public static class MyExtensions
{
//this dictionary should use weak key references
static Dictionary<object, int> d = new Dictionary<object,int>();
static int gid = 0;
public static int GetId(this object o)
{
if (d.ContainsKey(o)) return d[o];
return d[o] = gid++;
}
}
You can choose what you will like to have as unique ID on your own, for instance, System.Guid.NewGuid() or simply integer for fastest access.
How about this method:
Set a field in the first object to a new value. If the same field in the second object has the same value, it's probably the same instance. Otherwise, exit as different.
Now set the field in the first object to a different new value. If the same field in the second object has changed to the different value, it's definitely the same instance.
Don't forget to set field in the first object back to it's original value on exit.
Problems?
It is possible to make a unique object identifier in Visual Studio: In the watch window, right-click the object variable and choose Make Object ID from the context menu.
Unfortunately, this is a manual step, and I don't believe the identifier can be accessed via code.
You would have to assign such an identifier yourself, manually - either inside the instance, or externally.
For records related to a database, the primary key may be useful (but you can still get duplicates). Alternatively, either use a Guid, or keep your own counter, allocating using Interlocked.Increment (and make it large enough that it isn't likely to overflow).
I know that this has been answered, but it's at least useful to note that you can use:
http://msdn.microsoft.com/en-us/library/system.object.referenceequals.aspx
Which will not give you a "unique id" directly, but combined with WeakReferences (and a hashset?) could give you a pretty easy way of tracking various instances.
If you are writing a module in your own code for a specific usage, majkinetor's method MIGHT have worked. But there are some problems.
First, the official document does NOT guarantee that the GetHashCode() returns an unique identifier (see Object.GetHashCode Method ()):
You should not assume that equal hash codes imply object equality.
Second, assume you have a very small amount of objects so that GetHashCode() will work in most cases, this method can be overridden by some types.
For example, you are using some class C and it overrides GetHashCode() to always return 0. Then every object of C will get the same hash code.
Unfortunately, Dictionary, HashTable and some other associative containers will make use this method:
A hash code is a numeric value that is used to insert and identify an object in a hash-based collection such as the Dictionary<TKey, TValue> class, the Hashtable class, or a type derived from the DictionaryBase class. The GetHashCode method provides this hash code for algorithms that need quick checks of object equality.
So, this approach has great limitations.
And even more, what if you want to build a general purpose library?
Not only are you not able to modify the source code of the used classes, but their behavior is also unpredictable.
I appreciate that Jon and Simon have posted their answers, and I will post a code example and a suggestion on performance below.
using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Runtime.Serialization;
using System.Collections.Generic;
namespace ObjectSet
{
public interface IObjectSet
{
/// <summary> check the existence of an object. </summary>
/// <returns> true if object is exist, false otherwise. </returns>
bool IsExist(object obj);
/// <summary> if the object is not in the set, add it in. else do nothing. </summary>
/// <returns> true if successfully added, false otherwise. </returns>
bool Add(object obj);
}
public sealed class ObjectSetUsingConditionalWeakTable : IObjectSet
{
/// <summary> unit test on object set. </summary>
internal static void Main() {
Stopwatch sw = new Stopwatch();
sw.Start();
ObjectSetUsingConditionalWeakTable objSet = new ObjectSetUsingConditionalWeakTable();
for (int i = 0; i < 10000000; ++i) {
object obj = new object();
if (objSet.IsExist(obj)) { Console.WriteLine("bug!!!"); }
if (!objSet.Add(obj)) { Console.WriteLine("bug!!!"); }
if (!objSet.IsExist(obj)) { Console.WriteLine("bug!!!"); }
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
public bool IsExist(object obj) {
return objectSet.TryGetValue(obj, out tryGetValue_out0);
}
public bool Add(object obj) {
if (IsExist(obj)) {
return false;
} else {
objectSet.Add(obj, null);
return true;
}
}
/// <summary> internal representation of the set. (only use the key) </summary>
private ConditionalWeakTable<object, object> objectSet = new ConditionalWeakTable<object, object>();
/// <summary> used to fill the out parameter of ConditionalWeakTable.TryGetValue(). </summary>
private static object tryGetValue_out0 = null;
}
[Obsolete("It will crash if there are too many objects and ObjectSetUsingConditionalWeakTable get a better performance.")]
public sealed class ObjectSetUsingObjectIDGenerator : IObjectSet
{
/// <summary> unit test on object set. </summary>
internal static void Main() {
Stopwatch sw = new Stopwatch();
sw.Start();
ObjectSetUsingObjectIDGenerator objSet = new ObjectSetUsingObjectIDGenerator();
for (int i = 0; i < 10000000; ++i) {
object obj = new object();
if (objSet.IsExist(obj)) { Console.WriteLine("bug!!!"); }
if (!objSet.Add(obj)) { Console.WriteLine("bug!!!"); }
if (!objSet.IsExist(obj)) { Console.WriteLine("bug!!!"); }
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
public bool IsExist(object obj) {
bool firstTime;
idGenerator.HasId(obj, out firstTime);
return !firstTime;
}
public bool Add(object obj) {
bool firstTime;
idGenerator.GetId(obj, out firstTime);
return firstTime;
}
/// <summary> internal representation of the set. </summary>
private ObjectIDGenerator idGenerator = new ObjectIDGenerator();
}
}
In my test, the ObjectIDGenerator will throw an exception to complain that there are too many objects when creating 10,000,000 objects (10x than in the code above) in the for loop.
Also, the benchmark result is that the ConditionalWeakTable implementation is 1.8x faster than the ObjectIDGenerator implementation.
The information I give here is not new, I just added this for completeness.
The idea of this code is quite simple:
Objects need a unique ID, which isn't there by default. Instead, we have to rely on the next best thing, which is RuntimeHelpers.GetHashCode to get us a sort-of unique ID
To check uniqueness, this implies we need to use object.ReferenceEquals
However, we would still like to have a unique ID, so I added a GUID, which is by definition unique.
Because I don't like locking everything if I don't have to, I don't use ConditionalWeakTable.
Combined, that will give you the following code:
public class UniqueIdMapper
{
private class ObjectEqualityComparer : IEqualityComparer<object>
{
public bool Equals(object x, object y)
{
return object.ReferenceEquals(x, y);
}
public int GetHashCode(object obj)
{
return RuntimeHelpers.GetHashCode(obj);
}
}
private Dictionary<object, Guid> dict = new Dictionary<object, Guid>(new ObjectEqualityComparer());
public Guid GetUniqueId(object o)
{
Guid id;
if (!dict.TryGetValue(o, out id))
{
id = Guid.NewGuid();
dict.Add(o, id);
}
return id;
}
}
To use it, create an instance of the UniqueIdMapper and use the GUID's it returns for the objects.
Addendum
So, there's a bit more going on here; let me write a bit down about ConditionalWeakTable.
ConditionalWeakTable does a couple of things. The most important thing is that it doens't care about the garbage collector, that is: the objects that you reference in this table will be collected regardless. If you lookup an object, it basically works the same as the dictionary above.
Curious no? After all, when an object is being collected by the GC, it checks if there are references to the object, and if there are, it collects them. So if there's an object from the ConditionalWeakTable, why will the referenced object be collected then?
ConditionalWeakTable uses a small trick, which some other .NET structures also use: instead of storing a reference to the object, it actually stores an IntPtr. Because that's not a real reference, the object can be collected.
So, at this point there are 2 problems to address. First, objects can be moved on the heap, so what will we use as IntPtr? And second, how do we know that objects have an active reference?
The object can be pinned on the heap, and its real pointer can be stored. When the GC hits the object for removal, it unpins it and collects it. However, that would mean we get a pinned resource, which isn't a good idea if you have a lot of objects (due to memory fragmentation issues). This is probably not how it works.
When the GC moves an object, it calls back, which can then update the references. This might be how it's implemented judging by the external calls in DependentHandle - but I believe it's slightly more sophisticated.
Not the pointer to the object itself, but a pointer in the list of all objects from the GC is stored. The IntPtr is either an index or a pointer in this list. The list only changes when an object changes generations, at which point a simple callback can update the pointers. If you remember how Mark & Sweep works, this makes more sense. There's no pinning, and removal is as it was before. I believe this is how it works in DependentHandle.
This last solution does require that the runtime doesn't re-use the list buckets until they are explicitly freed, and it also requires that all objects are retrieved by a call to the runtime.
If we assume they use this solution, we can also address the second problem. The Mark & Sweep algorithm keeps track of which objects have been collected; as soon as it has been collected, we know at this point. Once the object checks if the object is there, it calls 'Free', which removes the pointer and the list entry. The object is really gone.
One important thing to note at this point is that things go horribly wrong if ConditionalWeakTable is updated in multiple threads and if it isn't thread safe. The result would be a memory leak. This is why all calls in ConditionalWeakTable do a simple 'lock' which ensures this doesn't happen.
Another thing to note is that cleaning up entries has to happen once in a while. While the actual objects will be cleaned up by the GC, the entries are not. This is why ConditionalWeakTable only grows in size. Once it hits a certain limit (determined by collision chance in the hash), it triggers a Resize, which checks if objects have to be cleaned up -- if they do, free is called in the GC process, removing the IntPtr handle.
I believe this is also why DependentHandle is not exposed directly - you don't want to mess with things and get a memory leak as a result. The next best thing for that is a WeakReference (which also stores an IntPtr instead of an object) - but unfortunately doesn't include the 'dependency' aspect.
What remains is for you to toy around with the mechanics, so that you can see the dependency in action. Be sure to start it multiple times and watch the results:
class DependentObject
{
public class MyKey : IDisposable
{
public MyKey(bool iskey)
{
this.iskey = iskey;
}
private bool disposed = false;
private bool iskey;
public void Dispose()
{
if (!disposed)
{
disposed = true;
Console.WriteLine("Cleanup {0}", iskey);
}
}
~MyKey()
{
Dispose();
}
}
static void Main(string[] args)
{
var dep = new MyKey(true); // also try passing this to cwt.Add
ConditionalWeakTable<MyKey, MyKey> cwt = new ConditionalWeakTable<MyKey, MyKey>();
cwt.Add(new MyKey(true), dep); // try doing this 5 times f.ex.
GC.Collect(GC.MaxGeneration);
GC.WaitForFullGCComplete();
Console.WriteLine("Wait");
Console.ReadLine(); // Put a breakpoint here and inspect cwt to see that the IntPtr is still there
}
I have some methods that take 20 or more strings as parameters. I was wondering what works better: passing 20 string parameters to the method or putting them all in a dictionary and passing it as only parameter.
Multiple strings:
public Boolean isNice(string aux, string aux2, string aux3, string aux4, string aux5, string aux6, string aux7,
string aux8, string aux9, string aux10, string aux11, string aux12, string aux13, string aux14, string aux15, string aux16)
{
string foo1 = aux;
string foo2 = aux2;
// etc
return true;
}
public void yeah()
{
string aux = "whatever";
string aux2 = "whatever2";
// etc
isNice(aux, aux2, ..., ..., ...);
}
Dictionary of strings
public Boolean isNice(Dictionary<string, string> aux)
{
string foo1 = aux["aux1"];
string foo2 = aux["aux2"];
// etc
return true;
}
public void yeah()
{
string aux = "whatever";
string aux2 = "whatever2";
// etc
Dictionary<string, string> auxDict = new Dictionary<string,string>();
auxDict.Add("key1", aux);
auxDict.Add("key2", aux2);
// etc
isNice(auxDict);
}
My question is regarding performance, readability and code simplicity.
Right now I'm using multiple strings: should I use dictionaries instead?
This depends. Are all 20 parameters required for the function to work?
If so, create a data type that can communicate all 20 values and pass in an instance of that data type. You could create helper classes to easily initialize that type of object. You can easily pass in a new instance of that data type, and provide flexible ways to initialize the type:
isNice(new niceParams
{
aux1 = "Foo",
aux2 = "Bar"
// ...
}
);
If not, put the optional parameters at the end of the signature, and give them default values.
public Boolean isNice(string req1, string req2, string optional1 = null)
This way, you have overloads to specify exactly which values you want to provide.
Another benefit of this is you can used named parameters to call the function:
isNice(req1, req2, optional1: "Foo", optional15: "Bar");
With that said, I would not use a dictionary. It forces the caller to understand the signature, and completely breaks any compiler type safely. What if required values aren't provided? What if a key is misspelled? All this checking has to now be done at runtime, causing errors that can only be caught at runtime. To me, it seems to be asking for trouble.
The main difference is that in case when you have 20 string parameters the compiler will ensure that all of them are explicitly set, even if they are set to null. In case of passing a collection the compiler will not be able to detect that somebody has forgotten to set the aux17 parameter: the code that uses a dictionary-based API would continue to compile, so you would be forced to add an extra check at run-time.
If it is OK with your code to not have a compiler check, for example, because all your string values are optional, then a collection-based approach is easier to maintain.
The difference in speed cannot be predicted until you implement the change. A collection-based approach would perform an additional memory allocation, so it would consume more CPU cycles. On the other hand, the difference may be too small to have a real impact on the overall performance of your program.
Note that since your parameters are named uniformly, it appears that they may be placed in a "flat" collection, rather than a dictionary. For example, you could make an API taking a list or an array of strings. In case of an array, you could also make your method take variable number of parameters, so that the callers could use the old syntax to call your method:
public bool isNice(params string[] args)
I have a class named ACTIVITY. This class contains a list of Laps, and each Lap has a collection of TRACPOINTS.
ACTIVITY --many--> LAPS ---many --> TRACPOINTS.
Whenever I fLatten the TRACPOINTS collection I get the list of all the TRACPOINTS. But when I modify those of course the originals don't get modified since it's a copy.
Is there any way that whatever change I made to the flattened tracpoints gets changed in the Tracpoints list for each lap?
As long as TRACPOINT is a struct, it is not possible in any reasonable way.
Whenever you assign a value of struct variable or field to another variable or field, its contents are copied. The same holds for passing it as a method argument or returning it from a method, its value is copied. This is value semantics [1]. Compare this to atomic types like int, which have value semantics too. You would probably expect the following code to print 2, not 3.
static function Change(int j) { j = 3; }
static void Main(string[] args) {
int i = 2;
Change(i);
System.Console.WriteLine(i);
}
If you do SelectMany, each value from the collection is probably assigned to some temporary local variable and then returned from the iterator (SelectMany), therefore it is copied and in fact possibly copied many times before it comes out from the iterator. So what you are updating is a copy of the struct. Like in the example, you're not changing variable i, but its copy stored in variable j.
This is why structs should be immutable. Instead of having properties with getters and setter in your struct, they should have only getters. For changing values of properties of a struct, you can implement methods that copy the whole original struct, change the value of the desired property and return the new struct instance. In fact, again, its copy will be returned. Example:
struct S {
int f;
public int F { get { return this.f; } }
public S SetF(int newVal) {
var s = new S();
s.f = newVal;
return s;
}
}
var x = new S();
x = x.SetF(30);
That said, it could be possible to achieve what you want with pointers and unsafe C#, but believe me, it will be way easier to change your structs to classes, so that they have reference semantics instead of value semantics, or keep them structs, but make them immutable and do not use Linq, but old school loops. If you want to use Linq for something like SelectMany in such scenario, you probably do not care about performance difference between structs and classes so much...
[1] http://msdn.microsoft.com/en-us/library/aa664472(v=vs.71).aspx
I have a list of Action<object, object> and I end up getting this back into an Action<string, int> at some point, or any other two types. It represents a dynamic referential mapping from one type to another. For various reasons, I cannot use ref or Func<..>.
Basically the issue is that inside the callback code for that Action<string, int> I need a way to set the value of the int that is passed in, say after I convert it from a string. Though since it is not ref, it's not obvious how to do it.
Does anyone happen to know if there is a way to rewrite the method dynamically or otherwise get at the value that is passed into it (up the stack perhaps) and set the int value... one step up the CLR call stack?
To head off anyone saying "Why not change your whole program" or "When would you ever need this?", I am simply experimenting with the idea of a new kind of object mapping library.
Interesting question, I would very much like to know the root answer!
So far, my best approach is to create your own delegate, and wrap them in generic
delegate void MyAction<T,T1>(ref T a, T1 b);
static void Main(string[] args)
{
MyAction<string, int> action = Foo;
var arr = new object[] { "", 5 };
action.DynamicInvoke(arr);
}
private static void Foo(ref string a, int b)
{
a = b.ToString();
}
Not possible with value types. They are always copied on the stack and in case of boxing (for every boxing that is encountered) they are moved to a new location in heap and a reference is passed back.
You can try and wrap your int into a reference type and take advantage of side effects (since this is exactly what you are trying to do. Or you can keep the int parameter and store the result on a closed variable. E.g.: (string str, int i) => { myDictionary.Add(str, i); }.
I guess this could also be asked as to how long the created type name is attached to an anonymous type. Here's the issue:
A blog had something like this:
var anonymousMagic = new {test.UserName};
lblShowText.Text = lblShowText
.Text
.Format("{UserName}", test);
As sort of a wish list and a couple ways to go at it. Being bored and adventurous I took to creating an string extension method that could handle this:
var anonymousMagic = new {test.UserName, test.UserID};
lblShowText.Text = "{UserName} is user number {UserID}"
.FormatAdvanced(anonymousMagic);
With the idea that I would get the property info from the anonymous type and match that to the bracketted strings. Now with property info comes reflection, so I would want to save the property info the first time the type came through so that I wouldn't have to get it again. So I did something like this:
public static String FormatAdvanced(this String stringToFormat, Object source)
{
Dictionary<String, PropertyInfo> info;
Type test;
String typeName;
//
currentType = source.GetType();
typeName = currentType.Name;
//
//info list is a static list for the class holding this method
if (infoList == null)
{
infoList = new Dictionary<String, Dictionary<String, PropertyInfo>>();
}
//
if (infoList.ContainsKey(typeName))
{
info = infoList[typeName];
}
else
{
info = test.GetProperties()
.ToDictionary(item => item.Name);
infoList.Add(typeName, info);
}
//
foreach (var propertyInfoPair in info)
{
String currentKey;
String replacement;
replacement = propertyInfoPair.Value.GetValue(source, null).ToString();
currentKey = propertyInfoPair.Key;
if (stringToFormat.Contains("{" + currentKey + "}"))
{
stringToFormat = stringToFormat
.Replace("{" + currentKey + "}", replacement);
}
}
//
return stringToFormat;
}
Now in testing, it seems to keep the name it created for the anonymous type so that the second time through it doesn't get the property info off the type but off the dictionary.
If multiple people are hitting this method at the same time, is it pretty much going to work in a Session like fassion; IE the names of the types will be specific to each instance of the program? Or would it be even worse than that? At what point does that name get chucked and overwritten?
It never does. The type is generated at compile-time and you can consider it constant and unique throughout the life of the app-domain.
I question the value of this function though. The obvious first reason is because you don't have much of the functionality of the Format method on the String class (no escape for brackets, no formatting of values in the brackets, etc, etc).
The second is that it basically links the format string to the type being passed in, so they are not swapped out easily. If I had two classes which had the same conceptual value, but different properties naming it, I have to change my format string to display it with your method to compensate for the fact that the property name is embedded in the format string.
Anonymous types are generated at compile time, and so the reflection names should be static as long as you don't re-compile the assembly.
There is a detailed post here that describes the names, but I believe what you are doing is safe.
I have two things to say, but this isn't really an answer.
First of all, the Dictionary doesn't have to have a string key; the key could be the actual type. i.e. it could be a Dictionary<Type, Dictionary<String, PropertyInfo>>. That would be faster because you don't have to get the type name, and less error prone - what if I send that method two non-anonymous types with the same name but different namespaces?
Second, you should read Phil Haack's recent blog post about this subject. It contains a full implementation of such a method.