What is the most efficient way to Deserialze an XML file - c#

Aloha,
I have a 8MB XML file that I wish to deserialize.
I'm using this code:
public static T Deserialize<T>(string xml)
{
TextReader reader = new StringReader(xml);
Type type = typeof(T);
XmlSerializer serializer = new XmlSerializer(type);
T obj = (T)serializer.Deserialize(reader);
return obj;
}
This code runs in about a minute, which seems rather slow to me. I've tried to use sgen.exe to precompile the serialization dll, but this didn't change the performance.
What other options do I have to improve performance?
[edit] I need the object that is created by the deserialization to perform (basic) transformations on. The XML is received from an external webservice.

The XmlSerializer uses reflection and is therefore not the best choice if performance is an issue.
You could build up a DOM of your XML document using the XmlDocument or XDocument classes and work with that, or, even faster use an XmlReader. The XmlReader however requires you to write any object mapping - if needed - yourself.
What approach is the best depends stronly on what you want to do with the XML data. Do you simply need to extract certain values or do you have to work and edit the whole document object model?

Yes it does use reflection, but performance is a gray area. When talking an 8mb file... yes it will be much slower. But if dealing with a small file it will not be.
I would NOT saying reading the file vial XmlReader or XPath would be easier or really any faster. What is easier then telling something to turn your xml to an object or your object to XML...? not much.
Now if you need fine grain control then maybe you need to do it by hand.
Personally the choice is like this. I am willing to give up a bit of speed to save a TON of ugly nasty code.
Like everything else in software development there are trade offs.

You can try implementing IXmlSerializable in your "T" class write custom logic to process the XML.

Related

C# XElement - Parent property

Does the XElement Parent property wrap a weak or a strong reference?
My code currently uses XmlElement, which holds a strong reference (ParentNode), and I'm considering the benefits of replacing it with XDocument.
Thanks.
You won't save any memory be switching from XmlDocument to XDocument. All references are strong. If you have a refernce to any element, you force the whole document to remain in memory.
The choice between XmlDocument and XDocument is about horrible vs. nice API, not about memory.
If you need to work with only small part of the original XML, and must discard the rest, consider making a clone of the elements you are interested in.
See - http://msdn.microsoft.com/en-us/library/bb297950(v=vs.110).aspx
public XElement(XElement other)
This constructor makes a deep copy of an element.

How to check name of element with WriteEndElement

I'm writing xml with XmlWriter. My code has lots of sections like this:
xml.WriteStartElement("payload");
ThirdPartyLibrary.Serialise(results, xml);
xml.WriteEndElement(); // </payload>
The problem is that the ThirdPartyLibrary.Serialise method is unreliable. It can happen (depending on the variable results) that it doesn't close all the tags it opens. As a consequence, my WriteEndElement line is perverted, consumed closing the library's hanging tags, rather than writing </payload>.
Thus I'd like to make a checked call to WriteEndElement that checks the element name, and throws an exception unless the cursor is at the expected element.
xml.WriteEndElement("payload");
You can think of this like XmlReader.ReadStartElement(name) which throws unless the cursor is at the expected place in the document.
How can I achieve this?
Edit: A second use case for this extension method would be to make my own code more readable and reliable.
XMLWriter is just writes the given xml information in the stream with out any validation. If it does any validation while writing the xml tags, the performance problem will arise while creating the big xml file.
Creating the XML file using XMLWriter is up to developer risk. If you want to do any such kind of validation, you can use XMLDocument.
If you really want to do this validation in XMLWriter, you have to create the writer by using String or StringBuilder. Because, if you use Stream or TextWriter you can't read the information which is written into the stream in the middle of writing. In Every update of the XML you have to read the string and write your own method to validate the written information.
I Suggest you to use XMLDocument for creating these type of xml.
In the end, I wrote an extention method WriteSubtree that gives this usable API:
using (var resultsXml = xml.WriteSubtree("Results"))
{
ThirdPartyLibrary.Serialise(results, resultsXml);
}
The extension method XmlWriter.WriteSubtree is analogous to .NET's XmlReader.ReadSubtree. It returns a special XmlWriter that checks against funny business. Its dispose method closes any tags left open.

Best datatype to return for XML serialized data

I am creating a generic Android to C# tcp stack. On the C# side I would like to implement an interface called ITcpSerializable.
The result of this serialization would then be sent over my tcp connection as raw xml.
The current definition is as such.
public interface ITcpSerializable
{
StringBuilder Serialize();
}
However what I am wondering is what is the best return type for a method like Serialize() when you expect that on some occasions the dataset may be rather large.
It seems like there are so many alternatives for return types for such a method. Stream, textreader, xmldocument, perhaps even byte[] etc... What would be the best one??
PS: I realize this is somewhat subjective but am genuinely in need of some advice.
If you are developing a component to be used by third-party then you better stick to the serialization pattern of .NET, google for an article of that.
I believe the answer will be to return byte[] but maybe you will learn more about the pattern if you will read an article.
If you are developing for yourself stick to whats working for you.
Since you are talking TCP, I would use a Stream pattern, not a return type - i.e.
void Read(Stream source);
void Write(Stream destination);
This is then format independent. However, for "separation of concerns" I would favour a separate serializer than objects that also know about serialization... Either it's job is to be an object model, or it's job is to know how to serialize things; rarely both.
Personally, on a socket I'd use something denser than XML though; I have very strong tendencies towards protobuf (but as an implementor (protobuf-net) I'm biased).

C# Optimizing a function by pre-loading it

I have a function that is very small, but is called so many times that my profiler marks it as time consuming. It is the following one:
private static XmlElement SerializeElement(XmlDocument doc, String nodeName, String nodeValue)
{
XmlElement newElement = doc.CreateElement(nodeName);
newElement.InnerXml = nodeValue;
return newElement;
}
The second line (where it enters the nodeValue) is the one takes some time.
The thing is, I don't think it can be optimized code-wise, I'm still open to suggestions on that part though.
However, I remember reading or hearing somewhere that you could tell the compiler to flag this function, so that it is loaded in memory when the program starts and it runs faster.
Is this just my imagination or such a flag exists?
Thanks,
FB.
There are ways you can cause it to be jitted early, but it's not the jit time that's hurting you here.
If you're having performance problems related to Xml serialization, you might consider using XmlWriter rather than XmlDocument, which is fairly heavy. Also, most automatic serialization systems (including the built-in .NET XML Serialization) will emit code dynamically to perform the serialization, which can then be cached and re-used. Most of this has to do with avoiding the overhead of reflection, however, rather than the overhead of the actual XML writing/parsing.
I dont think this can be solved using any kind of catching or inlining. And I believe its your imagination. Mainly the part about performance. What you have in mind is pre-JIT-ing your code. This technique will remove the wait time for JITer when your function is first called. But this is only first time this function is called. It has no performance effect for subsequent calls.
As documentation states, setting InnterXml parses set string as XML. And parsing XML string can be expensive operation, especialy if set xml in string format is complex. And documentation even has this line:
InnerXml is not an efficient way to modify the DOM. There may be performance issues when replacing complex nodes. It is more efficient to construct nodes and use methods such as InsertBefore, InsertAfter, AppendChild, and RemoveChild to modify the Xml document.
So, if you are creating complex XML structure this way it would be wise to do it by hand.

.Net Deep cloning - what is the best way to do that?

I need to perform deep cloning on my complex object model. What do you think is the best way to do that in .Net?
I thought about serializing / Deserializing
no need to mention that MemberwiseClone is not good enough.
If you control the object model, then you can write code to do it, but it is a lot of maintenance. There are lots of problems, though, which mean that unless you need absolutely the fastest performance, then serialization is often the most manageable answer.
This is one of the cases where BinaryFormatter works acceptably; normally I'm not a fan (due to the issues with versioning etc) - but since the serialized data is for immediate consumption this isn't an issue.
If you want it a bit faster (but without your own code), then protobuf-net may help, but requires code changes (to add the necessary metadata etc). And it is tree-based (not graph-based).
Other serializers (XmlSerializer, DataContractSerializer) are also fine, but if it is just for clone, they may not offer much over BinaryFormatter (except perhaps that XmlSerializer doesn't need [Serializable].
So really, it depends on your exact classes and the scenario.
If you are running code in a Partial Trust environment such as the Rackspace Cloud you will likely be restricted from using the BinaryFormatter. The XmlSerializer can be used instead.
public static T DeepClone<T>(T obj)
{
using (var ms = new MemoryStream())
{
XmlSerializer xs = new XmlSerializer(typeof(T));
xs.Serialize(ms, obj);
ms.Position = 0;
return (T)xs.Deserialize(ms);
}
}
Example of deep cloning from msdn magazine:
Object DeepClone(Object original)
{
// Construct a temporary memory stream
MemoryStream stream = new MemoryStream();
// Construct a serialization formatter that does all the hard work
BinaryFormatter formatter = new BinaryFormatter();
// This line is explained in the "Streaming Contexts" section
formatter.Context = new StreamingContext(StreamingContextStates.Clone);
// Serialize the object graph into the memory stream
formatter.Serialize(stream, original);
// Seek back to the start of the memory stream before deserializing
stream.Position = 0;
// Deserialize the graph into a new set of objects
// and return the root of the graph (deep copy) to the caller
return (formatter.Deserialize(stream));
}
Please take a look at the really good article C# Object Clone Wars. I found a very interest solution there: Copyable: A framework for copying or cloning .NET objects
The best way is probably to implement the System.IClonable interface in your object and all its fields that also needs custom deep cloning capabilities. Then you implement the Clone method to return a deep copy of your object and its members.
You could try AltSerialize which in many cases is faster than the .Net serializer. It also provides caching and custom attributes to speed up serialization.
Best way to implement this manually. It will be really faster than any other generic methods. Also, there are a lot of libraries for this operation (You can see some list with performance benchmarks here).
By the way, BinaryFormatter is very slow for this task and can be good only for testing.

Categories

Resources