Get a copy of a large (160000+ internal object tree) object - c#

Ok, I have a set of very large, identical, trees cached in memory (to be populated with non-identical data [they contain information about stuff inside each node]).
I want to copy a single instance of the tree, and populate each copy with a seperate set of data.
However, at the moment, the cached 'blank' copy of the tree is not being copied, but simply referenced and filled with every single set of data.
How can I force the method that gets the cached blank tree to return a copy of the object, instead of a reference?

An alternative to Clone() - serialize it in the memory binary stream and then deserialize as a new instance.
EDIT
Also, if you will consider serialization, and if performance is you primary concern, please also take into account the following performance test Manual Serialization 200% + Faster than BinaryFormatter.

There are several ways, but I recommend implementing ICloneable on the tree object, and then call Clone() to create a deep copy.

I would suggest to look closely at your tree classes, and if you are going to be enforcing copy semantics, then use struct instead of class. Else use ICloneable interface to provide Clone() method, as chris166 suggested.

With such a large tree, having multiple copies of it will incur a lot of memory overhead. Why not just organise the data at each node (with a Dictionary, for example) so that it holds all the different data (as you're getting at the moment), but organised in a way which is convenient to your actual need?

Related

Copying objects without keeping a reference

Background: I have a list of objects that are directly linked to the UI in WPF. I need to do some work with those objects, but while I am working with them (asynchronously) I do not want any refreshes on the UI (performance and aesthetic reasons).
So I thought, I might copy the items (using Collection<T>.CopyTo(T[] array, int index)), work on the copy, then override my original list with the copied one. The problem is, that even then the reference are kept and the UI is continuously refreshed.
Code example of what I did:
MyUIObject[] myCopiedList = new MyUIObject[MyObjectsLinkedToTheUI.Count];
MyObjectsLinkedToTheUI.CopyTo(myCopiedList);
foreach (MyUIObject myCopiedItem in myCopiedList)
{
//while I do this, the UI is still updated
PerformLongAndWearyOperationAsync(myCopiedItem);
}
MyObjectsLinkedToTheUI.Clear();
foreach (var myCopiedItem in myCopiedList)
{
MyObjectsLinkedToTheUI.Add(myCopiedItem);
}
Is there a possibility to copy my items without keeping a reference to the original object?
UPDATE 1
Thank you for your contributions so far. One thing I forgot to mention: This is for Windows Phone 8.1, so ICloneable is not available.
You need to clone them somehow. Either implement ICloneable interface and do all rewriting manually or you can use some hacks/tricks like serializing and deserializing object.
Flow with serialization is something like this:
Take your object
Serialize it to, for example, JSON, binary format etc.
Now deserialize what you got in step 2 into new object
You'll have a copy of your object that way but it costs more processing power and is prone to some hard to catch errors. But it's an easy way to go. Thing with implementing ICloneable is more reliable but you need to write all that mapping by yourself.
Also, consider using structs instead of classes. Structs are always copied by value not reference. It has some drawbacks so it's up to you if they suit your usage scenario.

Transfer objects on per field basis over network

I need to transfer .NET objects (with hierarchy) over network (multiplayer game). To save bandwidth, I'd like to transfer only fields (and/or properties) that changes, so fields that won't change won't transfer.
I also need some mechanism to match proper objects on the other client side (global object identifier...something like object ID?)
I need some suggestions how to do it.
Would you use reflection? (performance is critical)
I also need mechanism to transfer IList deltas (added objects, removed objects).
How is MMO networking done, do they transfer whole objects?
(maybe my idea of per field transfer is stupid)
EDIT:
To make it clear: I've already got mechanism to track changes (lets say every field has property, setter adds field to some sort of list or dictionary, which contains changes - structure is not final now).
I don't know how to serialize this list and then deserialize it on other client. And mostly how to do it effectively and how to update proper objects.
There's about one hundred of objects, so I'm trying avoid situation when I would write special function for each object. Decorating fields or properties with attributes would be ok (for example to specify serializer, field id or something similar).
More about objects: Each object has 5 fields in average. Some object are inherited from other.
Thank you for all answeres.
Another approach; don't try to serialize complex data changes: instead, send just the actual commands to apply (in a terse form), for example:
move 12432 134, 146
remove 25727
(which would move 1 object and remove another).
You would then apply the commands at the receiver, allowing for a full resync if they get out of sync.
I don't propose you would actually use text for this - that is just to make the example clearer.
One nice thing about this: it also provides "replay" functionality for free.
The cheapest way to track dirty fields is to have it as a key feature of your object model, I.e. with a "fooDirty" field for every data field "foo", that you set to true in the "set" (if the value differs). This could also be twinned with conditional serialization, perhaps the "ShouldSerializeFoo()" pattern observed by a few serializers. I'm not aware of any libraries that match exactly what you describe (unless we include DataTable, but ... think of the kittens!)
Perhaps another issue is the need to track all the objects for merge during deserialization; that by itself doesn't come for free.
All things considered, though, I think you could do something alon the above lines (fooDirty/ShouldSerializeFoo) and use protobuf-net as the serializer, because (importantly) that supports both conditional serialization and merge. I would also suggest an interface like:
ISomeName {
int Key {get;}
bool IsDirty {get;}
}
The IsDrty would allow you to quickly check all your objects for those with changes, then add the key to a stream, then the (conditional) serialization. The caller would read the key, obtain the object needed (or allocate a new one with that key), and then use the merge-enabled deserialize (passing in the existing/new object).
Not a full walk-through, but if it was me, that is the approach I would be looking at. Note: the addition/removal/ordering of objects in child-collections is a tricky area, that might need thought.
I'll just say up front that Marc Gravell's suggestion is really the correct approach. He glosses over some minor details, like conflict resolution (you might want to read up on Leslie Lamport's work. He's basically spent his whole career describing different approaches to dealing with conflict resolution in distributed systems), but the idea is sound.
If you do want to transmit state snapshots, instead of procedural descriptions of state changes, then I suggest you look into building snapshot diffs as prefix trees. The basic idea is that you construct a hierarchy of objects and fields. When you change a group of fields, any common prefix they have is only included once. This might look like:
world -> player 1 -> lives: 1
... -> points: 1337
... -> location -> X: 100
... -> Y: 32
... -> player 2 -> lives: 3
(everything in a "..." is only transmitted once).
It is not logical to transfer only changed fields because you would be wasting your time on detecting which fields changed and which didn't and how to reconstruct on the receiver's side which will add a lot of latency to your game and make it unplayable online.
My proposed solution is for you to decompose your objects to the minimum and sending these small objects which is fast. Also, you can use compression to reduce bandwidth usage.
For the Object ID, you can use a static ID which increases when you construct a new Object.
Hope this answer helps.
You will need to do this by hand. Automatically keeping track of property and instance changes in a hierarchy of objects is going to be very slow compared to anything crafted by hand.
If you decide to try it out anyway, I would try to map your objects to a DataSet and use its built in modification tracking mechanisms.
I still think you should do this by hand, though.

C# Garbage Collector question

I have a library which returns a hierarchical list composed of IDictionary, IList and primitive types (string, and ints). At present I cannot change how this data is returned.
I have another strongly typed class which is consuming this data and converting it into business objects. There is a list of "properties" in the data returned, which I want to import into my strongly typed class. I then can dispose of the hierarchy.
My question is this: If I do this:
MyCustomClass.Properties = HierarchicalData["some_name"]
Where MyCustomClass is my strongly typed class, and HierarchicalData is the IDictionary data what happens when I later call:
HierarchicalData = null
Can Hierarchical data be disposed and released? "some_data" in this case is another Dictionary and so technically that is all that needs to be kept. Do I need to do a explicit copy instead of importing, such as:
MyCustomClass.Properties = HierarchicalData["some_name"].ToDictionary<string, string>( /* selector */)
Clarification: I am not worried about the dictionary containing the properties being garbage collected. I want to make sure that HierarchicalData __can__ be deleted as it is quite large and I need to work with several of them.
Yes. Once there are no references to HierarchicalData, it will be a candidate for collection.
Since you have a reference to the data stored for the "some_name" key, that specific element (the other dictionary) will not be collected. However, the other, unreferenced, portions will become unrooted as far as the GC is concerned, and get finalized at some point.
This will work as you expect. Because you will have created a reference to the dictionary referenced by HierarchicalData["some_name"] in another place, the garbage collector will keep it around for you.
You definitely do not need to copy the dictionary.
Assuming that the class returns a standard Dictionary<TKey, TValue>, you probably don't need to do anything else.
The objects probably don't hold references to the dictionary that contains them, so they probably won't prevent the dictionary from being collected.
However, there's no way to be sure without checking; you should inspect the objects in the Visual Studio debugger's Watch window (or look at the source) and see whether they reference the dictionary.
You do not need to perform a copy.
The line:
MyCustomClass.Properties = HierarchicalData["some_name"]
assigns a reference, and while a reference to the object is alive, it will not be garbage collected.
Can Hierarchical data be disposed and released?
You mean by the GC? Not in this case, as it's referenced by your object(s). GC is not going to mess it up.

How to deep clone interconnected objects in C#?

What is the best way to deep clone an interconnected set of objects? Example:
class A {
B theB; // optional
// ...
}
class B {
A theA; // optional
// ...
}
class Container {
A[] a;
B[] b;
}
The obvious thing to do is walk the objects and deep clone everything as I come to it. This creates a problem however -- if I clone an A that contains a B, and that B is also in the Container, that B will be cloned twice after I clone the Container.
The next logical step is to create a Dictionary and look up every object before I clone it. This seems like it could be a slow and ungraceful solution, however.
Any thoughts?
Its not an elegant solution for sure, but it isn't uncommon to use a dictionary (or hashmap). One of the benefits is that a hashmap has a constant lookup time, so speed does not really suffer here.
Not that I am familiar with C#, but typically any type of crawling of a graph for some sort of processing will require a lookup table to stop processing an object due to cyclic references. So I would think you will need to do the same here.
The dictionary solution you suggested is the best I know of. To optimize further, you could use object.GetHashCode() to get a hash for the object, and use that as the dictionary key. Should be fast unless you're talking about huge object trees (10s to 100s of thousands of objects).
maybe create a bit flag to indicate whether this object has been cloned before.
Another possible solution you could investigate is serializing the objects into a stream, and then reconstructing them from that same stream into new instances. This often works wonders when everything else seems awfully convoluted and messy.
Marc
One of the practical ways to do deep cloning is serializing and then deserializing a source graph. Some serializers in .NET like DataContractSerializer are even capable of processing cycles within graphs. You can choose which serializer is the best choice for your scenario by looking at the feature comparison chart.

Return collection as read-only

I have an object in a multi-threaded environment that maintains a collection of information, e.g.:
public IList<string> Data
{
get
{
return data;
}
}
I currently have return data; wrapped by a ReaderWriterLockSlim to protect the collection from sharing violations. However, to be doubly sure, I'd like to return the collection as read-only, so that the calling code is unable to make changes to the collection, only view what's already there. Is this at all possible?
If your underlying data is stored as list you can use List(T).AsReadOnly method.
If your data can be enumerated, you can use Enumerable.ToList method to cast your collection to List and call AsReadOnly on it.
I voted for your accepted answer and agree with it--however might I give you something to consider?
Don't return a collection directly. Make an accurately named business logic class that reflects the purpose of the collection.
The main advantage of this comes in the fact that you can't add code to collections so whenever you have a native "collection" in your object model, you ALWAYS have non-OO support code spread throughout your project to access it.
For instance, if your collection was invoices, you'd probably have 3 or 4 places in your code where you iterated over unpaid invoices. You could have a getUnpaidInvoices method. However, the real power comes in when you start to think of methods like "payUnpaidInvoices(payer, account);".
When you pass around collections instead of writing an object model, entire classes of refactorings will never occur to you.
Note also that this makes your problem particularly nice. If you don't want people changing the collections, your container need contain no mutators. If you decide later that in just one case you actually HAVE to modify it, you can create a safe mechanism to do so.
How do you solve that problem when you are passing around a native collection?
Also, native collections can't be enhanced with extra data. You'll recognize this next time you find that you pass in (Collection, Extra) to more than one or two methods. It indicates that "Extra" belongs with the object containing your collection.
If your only intent is to get calling code to not make a mistake, and modify the collection when it should only be reading all that is necessary is to return an interface which doesn't support Add, Remove, etc.. Why not return IEnumerable<string>? Calling code would have to cast, which they are unlikely to do without knowing the internals of the property they are accessing.
If however your intent is to prevent the calling code from observing updates from other threads you'll have to fall back to solutions already mentioned, to perform a deep or shallow copy depending on your need.
I think you're confusing concepts here.
The ReadOnlyCollection provides a read-only wrapper for an existing collection, allowing you (Class A) to pass out a reference to the collection safe in the knowledge that the caller (Class B) cannot modify the collection (i.e. cannot add or remove any elements from the collection.)
There are absolutely no thread-safety guarantees.
If you (Class A) continue to modify the underlying collection after you hand it out as a ReadOnlyCollection then class B will see these changes, have any iterators invalidated, etc. and generally be open to any of the usual concurrency issues with collections.
Additionally, if the elements within the collection are mutable, both you (Class A) and the caller (Class B) will be able to change any mutable state of the objects within the collection.
Your implementation depends on your needs:
- If you don't care about the caller (Class B) from seeing any further changes to the collection then you can just clone the collection, hand it out, and stop caring.
- If you definitely need the caller (Class B) to see changes that are made to the collection, and you want this to be thread-safe, then you have more of a problem on your hands. One possibility is to implement your own thread-safe variant of the ReadOnlyCollection to allow locked access, though this will be non-trivial and non-performant if you want to support IEnumerable, and it still won't protect you against mutable elements in the collection.
One should note that aku's answer will only protect the list as being read only. Elements in the list are still very writable. I don't know if there is any way of protecting non-atomic elements without cloning them before placing them in the read only list.
You can use a copy of the collection instead.
public IList<string> Data {
get {
return new List<T>(data);
}}
That way it doesn't matter if it gets updated.
You want to use the yield keyword. You loop through the IEnumerable list and return the results with yeild. This allows the consumer to use the for each without modifying the collection.
It would look something like this:
List<string> _Data;
public IEnumerable<string> Data
{
get
{
foreach(string item in _Data)
{
return yield item;
}
}
}

Categories

Resources