I need to perform deep cloning on my complex object model. What do you think is the best way to do that in .Net?
I thought about serializing / Deserializing
no need to mention that MemberwiseClone is not good enough.
If you control the object model, then you can write code to do it, but it is a lot of maintenance. There are lots of problems, though, which mean that unless you need absolutely the fastest performance, then serialization is often the most manageable answer.
This is one of the cases where BinaryFormatter works acceptably; normally I'm not a fan (due to the issues with versioning etc) - but since the serialized data is for immediate consumption this isn't an issue.
If you want it a bit faster (but without your own code), then protobuf-net may help, but requires code changes (to add the necessary metadata etc). And it is tree-based (not graph-based).
Other serializers (XmlSerializer, DataContractSerializer) are also fine, but if it is just for clone, they may not offer much over BinaryFormatter (except perhaps that XmlSerializer doesn't need [Serializable].
So really, it depends on your exact classes and the scenario.
If you are running code in a Partial Trust environment such as the Rackspace Cloud you will likely be restricted from using the BinaryFormatter. The XmlSerializer can be used instead.
public static T DeepClone<T>(T obj)
{
using (var ms = new MemoryStream())
{
XmlSerializer xs = new XmlSerializer(typeof(T));
xs.Serialize(ms, obj);
ms.Position = 0;
return (T)xs.Deserialize(ms);
}
}
Example of deep cloning from msdn magazine:
Object DeepClone(Object original)
{
// Construct a temporary memory stream
MemoryStream stream = new MemoryStream();
// Construct a serialization formatter that does all the hard work
BinaryFormatter formatter = new BinaryFormatter();
// This line is explained in the "Streaming Contexts" section
formatter.Context = new StreamingContext(StreamingContextStates.Clone);
// Serialize the object graph into the memory stream
formatter.Serialize(stream, original);
// Seek back to the start of the memory stream before deserializing
stream.Position = 0;
// Deserialize the graph into a new set of objects
// and return the root of the graph (deep copy) to the caller
return (formatter.Deserialize(stream));
}
Please take a look at the really good article C# Object Clone Wars. I found a very interest solution there: Copyable: A framework for copying or cloning .NET objects
The best way is probably to implement the System.IClonable interface in your object and all its fields that also needs custom deep cloning capabilities. Then you implement the Clone method to return a deep copy of your object and its members.
You could try AltSerialize which in many cases is faster than the .Net serializer. It also provides caching and custom attributes to speed up serialization.
Best way to implement this manually. It will be really faster than any other generic methods. Also, there are a lot of libraries for this operation (You can see some list with performance benchmarks here).
By the way, BinaryFormatter is very slow for this task and can be good only for testing.
Related
In short, I've got an application that converts a flat data file into an XML file. It does this by populating objects and then serializing them to XML.
The problem I'm running into is that the Garbage Collector does not seem to be taking care of the serialized strings. 3500 record files are running up to OutOfMemoryExceptions before they finish. Something is fishy, indeed.
When I take the serialization out of the mix and simply pass an empty string, the memory consumption remains as expected, so I've ruled out the possibility that my intermediate objects (between flat file and xml) are the problem here. They seem to be collected as expected.
Can anyone help? How do I make sure these strings are disposed of properly?
Update: Some sample code
// myObj.Serialize invokes an XmlSerializer instance to handle its work
string serialized = myObj.Serialize();
myXmlWriter.WriteRaw(serialized);
This is basically where the problem is ocurring--if I take the string serialized out of play, the memory problems go away, too, even though I'm still transforming the flat file into objects, one at a time.
Update 2: Serialize method
public virtual string Serialize()
{
System.IO.StreamReader streamReader = null;
System.IO.MemoryStream memoryStream = null;
using (memoryStream = new MemoryStream())
{
memoryStream = new System.IO.MemoryStream();
Serializer.Serialize(memoryStream, this);
memoryStream.Seek(0, System.IO.SeekOrigin.Begin);
using (streamReader = new System.IO.StreamReader(memoryStream))
{
return streamReader.ReadToEnd();
}
}
}
You need to make sure they aren't referenced anywhere. Before an OutOfMemoryException is thrown, the GC is run. If it isn't recovering that memory, that means something is still holding on to it. Like others said, if you post some code, we might be able to help. Otherwise you can use a profiler or WinDbg/SOS to help figure out what is holding onto your strings.
Very curious indeed. I added the following dandy after each serialized record writes to the XmlWriter:
if (GC.GetTotalMemory(false) > 104857600)
{
GC.WaitForPendingFinalizers();
}
and wouldn't you know it, it's keeping it in check and it's processing without incident, never getting too far above the threshold I set. I feel like there should be a better way, but it almost seems like the code was executing too fast for the garbage collector to reclaim the strings in time.
Do you have an example of your code - how you're creating these strings? Are you breaking out into unmanaged code anywhere (which means you would be required to clean-up after yourself).
Another thought is how you are converting flat data file into XML. XML can be somewhat heavy depending on how you are building the file. If you are trying to hold the entire object in memory, it is very likely (easy to do, in fact) that you are running out of memory.
It sure looks like your method could be cleaned up to be just:
public virtual string Serialize()
{
StringBuilder sb = new StringBuilder();
using (StringWriter writer = new StringWriter(sb))
{
this.serializer.Serialize(writer, this);
}
return sb.ToString();
}
You are creating an extra MemoryStream for no reason.
But if you are writing the string to a file, then why don't you just send a FileStream to the Serialize() method?
I am trying to optimize a class that serializes objects in binary format and writes them in a file. I am currently using a FileStream (in sync mode because of the size of my objects) and a BinaryWriter. Here's what my class looks like:
public class MyClass
{
private readonly BinaryWriter m_binaryWriter;
private readonly Stream m_stream;
public MyClass()
{
// Leave FileStream in synchronous mode for performance issue (faster in sync mode)
FileStream fileStream = new FileStream(FilePath, FileMode.OpenOrCreate, FileAccess.Write, FileShare.ReadWrite, maxSize, false);
m_stream = fileStream;
m_binaryWriter = new BinaryWriter(m_stream);
}
public void SerializeObject(IMySerializableObject serializableObject)
{
serializableObject.Serialize(m_binaryWriter);
m_stream.Flush();
}
}
A profiler run on this code shows good performance but I was wondering if there are other objects (or techniques) that I could use to improve the performance of this class.
Yes - you could use a different serialization format. The built-in serialization format is rich, but has downsides too - it's quite verbose compared with some other custom formats.
The format I'm most familiar with is Protocol Buffers, which is an efficient and portable binary format from Google. It does, however, require you to design the types that you want to serialize in a different way. There are always pros and cons :)
There are other binary serialization formats too, such as Thrift.
You may want to stick to the built-in serialization, but it's worth knowing that other options are available.
Before you go too far, however, you should determine what you care about and how much you actually need to worry about the performance anyway. You could waste a lot of time investigating options when what you've got may be fine as it is :)
I am trying to use the EndianBinaryReader and EndianBinaryWriter that Jon Skeet wrote as part of his misc utils lib. It works great for the two uses I have made of it.
The first reading from a Network Stream (TCPClient) where I sit in a loop reading the data as it comes in. I can create a single EndianBinaryReader and then just dispose of it on the shut down of the application. I construct the EndianBinaryReader by passing the TCPClient.GetStream in.
I am now trying to do the same thing when reading from a UdpClient but this does not have a stream as it is connection less. so I get the data like so
byte[] data = udpClientSnapShot.Receive(ref endpoint);
I could put this data into a memory stream
var memoryStream = new MemoryStream(data);
and then create the EndianBinaryReader
var endianbinaryReader = new EndianBinaryReader(
new BigEndianBitConverter(), memoryStream,Encoding.ASCII);
but this means I have to create a new endian reader every time I do a read. Id there a way where I can just create a single stream that I can just keep updateing the inputstream with the data from the udp client?
I can't remember whether EndianBinaryReader buffers - you could overwrite a single MemoryStream? But to be honest there is very little overhead from an extra object here. How big are the packets? (putting it into a MemoryStream will clone the byte[]).
I'd be tempted to use the simplest thing that works and see if there is a real problem. Probably the one change I would make is to introduce using (since they are IDisposable):
using(var memoryStream = new MemoryStream(data))
using(var endianbinaryReader = ..blah..) {
// use it
}
Your best option is probably an override of the .NET Stream class to provide your custom functionality. The class is designed to be overridable with custom behavior.
It may look daunting because of the number of members, but it is easier than it looks. There are a number of boolean properties like "CanWrite", etc. Override them and have them all return "false" except for the functionality that your reader needs (probably CanRead is the only one you need to be true.)
Then, just override all of the methods that start with the phrase "When overridden in a derived class" in the help for Stream and have the unsupported methods return an "UnsupportedException" (instead of the default "NotImplementedException".
Implement the Read method to return data from your buffered UDP packets using perhaps a linked list of buffers, setting used buffers to "null" as you read past them so that the memory footprint doesn't grow unbounded.
Aloha,
I have a 8MB XML file that I wish to deserialize.
I'm using this code:
public static T Deserialize<T>(string xml)
{
TextReader reader = new StringReader(xml);
Type type = typeof(T);
XmlSerializer serializer = new XmlSerializer(type);
T obj = (T)serializer.Deserialize(reader);
return obj;
}
This code runs in about a minute, which seems rather slow to me. I've tried to use sgen.exe to precompile the serialization dll, but this didn't change the performance.
What other options do I have to improve performance?
[edit] I need the object that is created by the deserialization to perform (basic) transformations on. The XML is received from an external webservice.
The XmlSerializer uses reflection and is therefore not the best choice if performance is an issue.
You could build up a DOM of your XML document using the XmlDocument or XDocument classes and work with that, or, even faster use an XmlReader. The XmlReader however requires you to write any object mapping - if needed - yourself.
What approach is the best depends stronly on what you want to do with the XML data. Do you simply need to extract certain values or do you have to work and edit the whole document object model?
Yes it does use reflection, but performance is a gray area. When talking an 8mb file... yes it will be much slower. But if dealing with a small file it will not be.
I would NOT saying reading the file vial XmlReader or XPath would be easier or really any faster. What is easier then telling something to turn your xml to an object or your object to XML...? not much.
Now if you need fine grain control then maybe you need to do it by hand.
Personally the choice is like this. I am willing to give up a bit of speed to save a TON of ugly nasty code.
Like everything else in software development there are trade offs.
You can try implementing IXmlSerializable in your "T" class write custom logic to process the XML.
What is the best way to deep clone an interconnected set of objects? Example:
class A {
B theB; // optional
// ...
}
class B {
A theA; // optional
// ...
}
class Container {
A[] a;
B[] b;
}
The obvious thing to do is walk the objects and deep clone everything as I come to it. This creates a problem however -- if I clone an A that contains a B, and that B is also in the Container, that B will be cloned twice after I clone the Container.
The next logical step is to create a Dictionary and look up every object before I clone it. This seems like it could be a slow and ungraceful solution, however.
Any thoughts?
Its not an elegant solution for sure, but it isn't uncommon to use a dictionary (or hashmap). One of the benefits is that a hashmap has a constant lookup time, so speed does not really suffer here.
Not that I am familiar with C#, but typically any type of crawling of a graph for some sort of processing will require a lookup table to stop processing an object due to cyclic references. So I would think you will need to do the same here.
The dictionary solution you suggested is the best I know of. To optimize further, you could use object.GetHashCode() to get a hash for the object, and use that as the dictionary key. Should be fast unless you're talking about huge object trees (10s to 100s of thousands of objects).
maybe create a bit flag to indicate whether this object has been cloned before.
Another possible solution you could investigate is serializing the objects into a stream, and then reconstructing them from that same stream into new instances. This often works wonders when everything else seems awfully convoluted and messy.
Marc
One of the practical ways to do deep cloning is serializing and then deserializing a source graph. Some serializers in .NET like DataContractSerializer are even capable of processing cycles within graphs. You can choose which serializer is the best choice for your scenario by looking at the feature comparison chart.