How to deep clone interconnected objects in C#?

How to deep clone interconnected objects in C#? - c#

What is the best way to deep clone an interconnected set of objects? Example:
class A {
B theB; // optional
// ...
}
class B {
A theA; // optional
// ...
}
class Container {
A[] a;
B[] b;
}
The obvious thing to do is walk the objects and deep clone everything as I come to it. This creates a problem however -- if I clone an A that contains a B, and that B is also in the Container, that B will be cloned twice after I clone the Container.
The next logical step is to create a Dictionary and look up every object before I clone it. This seems like it could be a slow and ungraceful solution, however.
Any thoughts?

Its not an elegant solution for sure, but it isn't uncommon to use a dictionary (or hashmap). One of the benefits is that a hashmap has a constant lookup time, so speed does not really suffer here.

Not that I am familiar with C#, but typically any type of crawling of a graph for some sort of processing will require a lookup table to stop processing an object due to cyclic references. So I would think you will need to do the same here.

The dictionary solution you suggested is the best I know of. To optimize further, you could use object.GetHashCode() to get a hash for the object, and use that as the dictionary key. Should be fast unless you're talking about huge object trees (10s to 100s of thousands of objects).

maybe create a bit flag to indicate whether this object has been cloned before.

Another possible solution you could investigate is serializing the objects into a stream, and then reconstructing them from that same stream into new instances. This often works wonders when everything else seems awfully convoluted and messy.
Marc

One of the practical ways to do deep cloning is serializing and then deserializing a source graph. Some serializers in .NET like DataContractSerializer are even capable of processing cycles within graphs. You can choose which serializer is the best choice for your scenario by looking at the feature comparison chart.

Related

Copying objects without keeping a reference

Background: I have a list of objects that are directly linked to the UI in WPF. I need to do some work with those objects, but while I am working with them (asynchronously) I do not want any refreshes on the UI (performance and aesthetic reasons).
So I thought, I might copy the items (using Collection<T>.CopyTo(T[] array, int index)), work on the copy, then override my original list with the copied one. The problem is, that even then the reference are kept and the UI is continuously refreshed.
Code example of what I did:
MyUIObject[] myCopiedList = new MyUIObject[MyObjectsLinkedToTheUI.Count];
MyObjectsLinkedToTheUI.CopyTo(myCopiedList);
foreach (MyUIObject myCopiedItem in myCopiedList)
{
//while I do this, the UI is still updated
PerformLongAndWearyOperationAsync(myCopiedItem);
}
MyObjectsLinkedToTheUI.Clear();
foreach (var myCopiedItem in myCopiedList)
{
MyObjectsLinkedToTheUI.Add(myCopiedItem);
}
Is there a possibility to copy my items without keeping a reference to the original object?
UPDATE 1
Thank you for your contributions so far. One thing I forgot to mention: This is for Windows Phone 8.1, so ICloneable is not available.

You need to clone them somehow. Either implement ICloneable interface and do all rewriting manually or you can use some hacks/tricks like serializing and deserializing object.
Flow with serialization is something like this:
Take your object
Serialize it to, for example, JSON, binary format etc.
Now deserialize what you got in step 2 into new object
You'll have a copy of your object that way but it costs more processing power and is prone to some hard to catch errors. But it's an easy way to go. Thing with implementing ICloneable is more reliable but you need to write all that mapping by yourself.
Also, consider using structs instead of classes. Structs are always copied by value not reference. It has some drawbacks so it's up to you if they suit your usage scenario.

Return object to pool when no references point to it

Ok, I want to do the following to me it seems like a good idea so if there's no way to do what I'm asking, I'm sure there's a reasonable alternative.
Anyways, I have a sparse matrix. It's pretty big and mostly empty. I have a class called MatrixNode that's basically a wrapper around each of the cells in the matrix. Through it you can get and set the value of that cell. It also has Up, Down, Left and Right properties that return a new MatrixNode that points to the corresponding cell.
Now, since the matrix is mostly empty, having a live node for each cell, including the empty ones, is an unacceptable memory overhead. The other solution is to make new instances of MatrixNode every time a node is requested. This will make sure that only the needed nodes are kept in the memory and the rest will be collected. What I don't like about it is that a new object has to be created every time. I'm scared about it being too slow.
So here's what I've come up with. Have a dictionary of weak references to nodes. When a node is requested, if it doesn't exist, the dictionary creates it and stores it as a weak reference. If the node does already exist (probably referenced somewhere), it just returns it.
Then, if the node doesn't have any live references left, instead of it being collected, I want to store it in a pool. Later, when a new node is needed, I want to first check if the pool is empty and only make a new node if there isn't one already available that can just have it's data swapped out.
Can this be done?
A better question would be, does .NET already do this for me? Am I right in worrying about the performance of creating single use objects in large numbers?

Instead of guessing, you should make a performance test to see if there are any issues at all. You may be surprised to know that managed memory allocation can often outperform explicit allocation because your code doesn't have to pay for deallocation when your data goes out of scope.
Performance may become an issue only when you are allocating new objects so frequently that the garbage collector has no chance to collect them.
That said, there are sparse array implementations in C# already, like Math.NET and MetaNumerics. These libraries are already optimized for performance and will probably avoid performance issues you will run into if you start your implementation from stratch
An SO search for c# and sparse-matrix will return many related questions, including answers pointing to commercial libraries like ILNumerics (has a community edition), NMath and Extreme Optimization's libraries

Most sparse matrix implementations use one of a few well-known schemes for their data; I generally recommend CSR or CSC, as those are efficient for common operations.
If that seems too complex, you can start using COO. What this means in your code is that you will not store anything for empty members; however, you have an item for every non-empty one. A simple implementation might be:
public struct SparseMatrixItem
{
int Row;
int Col;
double Value;
}
And your matrix would generally be a simple container:
public interface SparseMatrix
{
public IList<SparseMatrixItem> Items { get; }
}
You should make sure that the Items list stays sorted according to the row and col indices, because then you can use binary search to quickly find out if an item exists for a specific (i,j).

The idea of having a pool of objects that people use and then return to the pool is used for really expensive objects. Objects representing a network connection, a new thread, etc. It sounds like your object is very small and easy to create. Given that, you're almost certainly going to harm performance pooling it; the overhead of managing the pool will be greater than the cost of just creating a new one each time.
Having lots of short lived very small objects is the exact case that the GC is designed to handle quickly. Creating a new object is dirt cheap; it's just moving a pointer up and clearing out the bits for that object. The real overhead for objects comes in when a new garbage collection happens; for that it needs to find all "alive" objects and move them around, leaving all "dead" objects in their place. If your small object doesn't live through a single collection it has added almost no overhead. Keeping the objects around for a long time (like, say, by pooling them so you can reuse them) means copying them through several collections, consuming a fair bit of resources.

Transfer objects on per field basis over network

I need to transfer .NET objects (with hierarchy) over network (multiplayer game). To save bandwidth, I'd like to transfer only fields (and/or properties) that changes, so fields that won't change won't transfer.
I also need some mechanism to match proper objects on the other client side (global object identifier...something like object ID?)
I need some suggestions how to do it.
Would you use reflection? (performance is critical)
I also need mechanism to transfer IList deltas (added objects, removed objects).
How is MMO networking done, do they transfer whole objects?
(maybe my idea of per field transfer is stupid)
EDIT:
To make it clear: I've already got mechanism to track changes (lets say every field has property, setter adds field to some sort of list or dictionary, which contains changes - structure is not final now).
I don't know how to serialize this list and then deserialize it on other client. And mostly how to do it effectively and how to update proper objects.
There's about one hundred of objects, so I'm trying avoid situation when I would write special function for each object. Decorating fields or properties with attributes would be ok (for example to specify serializer, field id or something similar).
More about objects: Each object has 5 fields in average. Some object are inherited from other.
Thank you for all answeres.

Another approach; don't try to serialize complex data changes: instead, send just the actual commands to apply (in a terse form), for example:
move 12432 134, 146
remove 25727
(which would move 1 object and remove another).
You would then apply the commands at the receiver, allowing for a full resync if they get out of sync.
I don't propose you would actually use text for this - that is just to make the example clearer.
One nice thing about this: it also provides "replay" functionality for free.

The cheapest way to track dirty fields is to have it as a key feature of your object model, I.e. with a "fooDirty" field for every data field "foo", that you set to true in the "set" (if the value differs). This could also be twinned with conditional serialization, perhaps the "ShouldSerializeFoo()" pattern observed by a few serializers. I'm not aware of any libraries that match exactly what you describe (unless we include DataTable, but ... think of the kittens!)
Perhaps another issue is the need to track all the objects for merge during deserialization; that by itself doesn't come for free.
All things considered, though, I think you could do something alon the above lines (fooDirty/ShouldSerializeFoo) and use protobuf-net as the serializer, because (importantly) that supports both conditional serialization and merge. I would also suggest an interface like:
ISomeName {
int Key {get;}
bool IsDirty {get;}
}
The IsDrty would allow you to quickly check all your objects for those with changes, then add the key to a stream, then the (conditional) serialization. The caller would read the key, obtain the object needed (or allocate a new one with that key), and then use the merge-enabled deserialize (passing in the existing/new object).
Not a full walk-through, but if it was me, that is the approach I would be looking at. Note: the addition/removal/ordering of objects in child-collections is a tricky area, that might need thought.

I'll just say up front that Marc Gravell's suggestion is really the correct approach. He glosses over some minor details, like conflict resolution (you might want to read up on Leslie Lamport's work. He's basically spent his whole career describing different approaches to dealing with conflict resolution in distributed systems), but the idea is sound.
If you do want to transmit state snapshots, instead of procedural descriptions of state changes, then I suggest you look into building snapshot diffs as prefix trees. The basic idea is that you construct a hierarchy of objects and fields. When you change a group of fields, any common prefix they have is only included once. This might look like:
world -> player 1 -> lives: 1
... -> points: 1337
... -> location -> X: 100
... -> Y: 32
... -> player 2 -> lives: 3
(everything in a "..." is only transmitted once).

It is not logical to transfer only changed fields because you would be wasting your time on detecting which fields changed and which didn't and how to reconstruct on the receiver's side which will add a lot of latency to your game and make it unplayable online.
My proposed solution is for you to decompose your objects to the minimum and sending these small objects which is fast. Also, you can use compression to reduce bandwidth usage.
For the Object ID, you can use a static ID which increases when you construct a new Object.
Hope this answer helps.

You will need to do this by hand. Automatically keeping track of property and instance changes in a hierarchy of objects is going to be very slow compared to anything crafted by hand.
If you decide to try it out anyway, I would try to map your objects to a DataSet and use its built in modification tracking mechanisms.
I still think you should do this by hand, though.

Is a Cache of type T possible?

Can we avoid casting T to Object when placing it in a Cache?
WeakReference necessitate the use of objects. System.Runtime.Caching.MemoryCache is locked to type object.
Custom Dictionaries / Collections cause issues with the Garbage Collector, or you have to run a Garbage Collector of your own (a seperate thread)?
Is it possible to have the best of both worlds?
I know I accepted an answer already, but using WeakReference is now possible! Looks like they snuck it into .Net 4.
http://msdn.microsoft.com/en-us/library/gg712911(v=VS.96).aspx
an old feature request for the same.
http://connect.microsoft.com/VisualStudio/feedback/details/98270/make-a-generic-form-of-weakreference-weakreference-t-where-t-class

There's nothing to stop you writing a generic wrapper around MemoryCache - probably with a constraint to require reference types:
public class Cache<T> where T : class
{
private readonly MemoryCache cache = new MemoryCache();
public T this[string key]
{
get { return (T) cache[key]; }
set { cache[key] = value; }
}
// etc
}
Obviously it's only worth delegating the parts of MemoryCache you're really interested in.

So you basically want to dependanct inject a cache provider that only returns certain types?
Isn't that kind of against everything OOP?
The idea of the "object" type is that anything and everything is an object so by using a cache that caches instances of "objects" of type object you are saying you can cache anything.
By building a cache that only caches objects of some predetermined type you are limiting the functionality of your cache however ...
There is nothing stopping you implementing a custom cache provider that has a generic constraint so it only allows you cache certain object types, and this in theory would save you about 2 "ticks" (not even a millisecond) per retrieval.
The way to look at this is ...
What's more important to me:
Good OOP based on best practise
about 20 milliseconds over the lifetime of my cache provider
The other thing is ... .net is already geared to optimise the boxing and unboxing process to the extreme and at the end of the day when you "cache" something you are simply putting it somewhere it can be quickly retrieved and storing a pointer to its location for that retrieval later.
I've seen solutions that involve streaming 4GB XML files through a business process that use objects that are destroyed and recreated on every call .. the point is that the process flow was important not so much the initialisation and prep work if that makes sense.
How important is this casting time loss to you?
I would be interested to know more about scenario that requires such speed.
As a side note:
Another thing i've noticed about newer technologies like linq and entity framework is that the result of query is something that is important to cache when the query takes a long time but not so much the side effects on the result.
What this means is that (for example):
If i was to cache a basic "default instance" of an object that uses a complex set of entity queries to create, I wouldn't cache the resulting object but the queries.
With microsoft already doing the ground work i'd ask ... what am i caching and why?

Get a copy of a large (160000+ internal object tree) object

Ok, I have a set of very large, identical, trees cached in memory (to be populated with non-identical data [they contain information about stuff inside each node]).
I want to copy a single instance of the tree, and populate each copy with a seperate set of data.
However, at the moment, the cached 'blank' copy of the tree is not being copied, but simply referenced and filled with every single set of data.
How can I force the method that gets the cached blank tree to return a copy of the object, instead of a reference?

An alternative to Clone() - serialize it in the memory binary stream and then deserialize as a new instance.
EDIT
Also, if you will consider serialization, and if performance is you primary concern, please also take into account the following performance test Manual Serialization 200% + Faster than BinaryFormatter.

There are several ways, but I recommend implementing ICloneable on the tree object, and then call Clone() to create a deep copy.

I would suggest to look closely at your tree classes, and if you are going to be enforcing copy semantics, then use struct instead of class. Else use ICloneable interface to provide Clone() method, as chris166 suggested.

With such a large tree, having multiple copies of it will incur a lot of memory overhead. Why not just organise the data at each node (with a Dictionary, for example) so that it holds all the different data (as you're getting at the moment), but organised in a way which is convenient to your actual need?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.