C# fastest way to serialise object to xml string - c#

Background: I have been tasked with serialising C# objects into xml string. The xml string is then passed to webservice and are written to disk in xml file. The task of serialising needs to occur within 5 mins timeframe that the process gets. The consumer webservice only accepts string as xml.
I have been researching into various ways of creating xml string from xml serialiser, xmlwriter, xdocument, stringbuilder to write xml string, object to json to xml, linq to xml but I needed to know if anyone had experience of doing something similar. Main aim is to have a high performant xml string that is not so verbose and error prone like creating xml in string.
My object is Called Employee and has 18 string/date properties. The objects are created in memory and we get around 4000k objects in total once the process boots up. The process runs for 1 hour a day, loads data from data file and creates person objects. A number of functions are performed on the objects. Once objects are ready, they need to be serialised and data in xml is sent to webservice and is writren to xml file. So in short, these objects need to be serialised and saved to disk and sent to webservice.
Does anyone recommend any high performant yet easy to. Maintain approach? Apologies for not positing any code because I can create a class and add xml serialiser etc code but i dont think it will add any value at the moment as currently I am looking for past experiences plus i want to ensure i dont go on a wild goose chase and want to implement with right solution.
I have tried following serialiser code but it takes 10+ mins to serialise all 4000k objects.
public static bool Serialize<T>(T value, ref string serializeXml)
{
if (value == null)
{
return false;
}
try
{
XmlSerializer xmlserializer = new XmlSerializer(typeof(T));
StringWriter stringWriter = new StringWriter();
XmlWriter writer = XmlWriter.Create(stringWriter);
xmlserializer.Serialize(writer, value);
serializeXml = stringWriter.ToString();
writer.Close();
return true;
}
catch (Exception ex)
{
return false;
}
}
I have also tried caching serialiser but doesn't give any performance improvements

According to your requirement, speed is the most demanding part. We need to write a benchmark here. As mentioned in the comment, besides XmlSerializer, we can use DataContractSerializer for our purpose. There are several Q&A related to the difference between these two, e.g.:
DataContractSerializer vs XmlSerializer: Pros and Cons of each serializer
Linq to Xml VS XmlSerializer VS DataContractSerializer
Difference between DataContractSerializer vs XmlSerializer
Another options are manually write your XML either using StringBuilder or XmlWriter. Although in the requirement you mentioned:
Main aim is to have a high performant xml string that is not so verbose and error prone like creating xml in string
these three serializer is added for comparison. Of course, in the case of StringBuilder, the text must be escaped. Here, I used System.Security.SecurityElement.Escape. The object to be serialized looks like:
//Simple POCO with 11 String properties, 7 DateTime properties
[DataContractAttribute()]
public class Employee
{
[DataMember()]
public string FirstName { set; get; }
[DataMember()]
public string LastName { set; get; }
//...omitted for clarity
[DataMember()]
public DateTime Date03 { set; get; }
[DataMember()]
public DateTime Date04 { set; get; }
}
and all the properties have value (not null), assigned prior to calling the serializer. The serializer codes looks like:
//Serialize using XmlSerializer
public static bool Serialize<T>(T value, ref StringBuilder sb)
{
if (value == null)
return false;
try
{
XmlSerializer xmlserializer = new XmlSerializer(typeof(T));
using (XmlWriter writer = XmlWriter.Create(sb))
{
xmlserializer.Serialize(writer, value);
writer.Close();
}
return true;
}
catch (Exception ex)
{
Console.WriteLine(ex);
return false;
}
}
//Serialize using DataContractSerializer
public static bool SerializeDataContract<T>(T value, ref StringBuilder sb)
{
if (value == null)
return false;
try
{
DataContractSerializer xmlserializer = new DataContractSerializer(typeof(T));
using (XmlWriter writer = XmlWriter.Create(sb))
{
xmlserializer.WriteObject(writer, value);
writer.Close();
}
return true;
}
catch (Exception ex)
{
Console.WriteLine(ex);
return false;
}
}
//Serialize using StringBuilder
public static bool SerializeStringBuilder(Employee obj, ref StringBuilder sb)
{
if (obj == null)
return false;
sb.Append(#"<?xml version=""1.0"" encoding=""utf-16""?>");
sb.Append("<Employee>");
sb.Append("<FirstName>");
sb.Append(SecurityElement.Escape(obj.FirstName));
sb.Append("</FirstName>");
//... Omitted for clarity
sb.Append("</Employee>");
return true;
}
//Serialize using XmlSerializer (manually add elements)
public static bool SerializeManual(Employee obj, ref StringBuilder sb)
{
if (obj == null)
return false;
try
{
using (var xtw = XmlWriter.Create(sb))
{
xtw.WriteStartDocument();
xtw.WriteStartElement("Employee");
xtw.WriteStartElement("FirstName");
xtw.WriteString(obj.FirstName);
xtw.WriteEndElement();
//...Omitted for clarity
xtw.WriteEndElement();
xtw.WriteEndDocument();
xtw.Close();
}
return true;
}
catch(Exception ex)
{
Console.WriteLine(ex);
return false;
}
}
In the benchmark, 4M Employee objects are given as the argument and XML is written to preallocated StringBuilder (parameter ref StringBuilder sb). For DataContractSerializer and Manual XmlWriter, benchmark with Parallel.Invoke (3 parallel tasks) also performed. Required processing time for each serializer:
//Simple POCO with 11 String properties, 7 DateTime properties
XmlSerializer =00:02:37.8151125 = 157 sec: 100% (reference)
DataContractSerializer=00:01:10.3384361 = 70 sec: 45% (3-Parallel: 47sec = 30%)
StringBuilder =00:01:22.5742122 = 82 sec: 52%
Manual XmlWriter =00:00:57.8436860 = 58 sec: 37% (3-Parallel: 40sec = 25%)
Environment: .Net Framework 4.5.2, Intel(R) Core(TM) i5-3337U # 1.80GHz 1.80GHz, Windows 10, 6.0GB Memory. I expect StringBuilder will be the fastest, but it wasn't. Perhaps, the bottleneck is in System.Security.SecurityElement.Escape().
The conclusion: DataContractSerializer is within the requirement, processing time is 30-45% compared to XmlSerializer. The results may differs depend on the environment, and you should make your own benchmark.

Related

How to make an XML tag mandatory with a self-closing tag, using the serialiser?

I'm working on a C# program and I'm trying to serialise XML.
I have the following tag:
using System.Xml.Serialization;
...
[XmlElement("MV")]
public MultiVerse MultiVerse { get; set; }
When I don't fill in this value, the tag <MV> is not present, but I would like to get a tag <MV/> in that case:
Currently I have <HM><ID>Some_ID</ID></HM>.
I'd like to have <HM><ID>Some_ID</ID><MV/></HM>.
I already tried preceeding the line with [Required] but that didn't work, and I think that filling in the IsNullable attribute is the good approach.
Edit1, after some investigation on the internet
On the internet, there are quite some advises on modifying the XmlWriter but in my project, the whole serialisation is done as follows:
public override string ToString()
{
...
using (var stream = new StringWriter())
using (var writer = XmlWriter.Create(stream, settings))
{
var serializer = new XmlSerializer(base.GetType());
serializer.Serialize(writer, this, ns);
return stream.ToString();
}
...
}
As you can see, this is so general that I prefer not to do any modifications in here, hence I'm looking for a way to customise the [XmlElement] directive.
Edit2: XmlWriter settings:
The XmlWriter settings look as follows:
// Remove Declaration
var settings = new XmlWriterSettings
{
Indent = false,
OmitXmlDeclaration = true,
NewLineHandling = NewLineHandling.None,
NewLineOnAttributes = false,
};
Does anybody have an idea?
Thanks in advance
There is https://learn.microsoft.com/en-us/dotnet/api/system.xml.serialization.xmlelementattribute.isnullable?view=net-6.0 so e.g.
[XmlElement("MV", IsNullable=true)]
public MultiVerse MultiVerse { get; set; }
would give you, for a null value, a serialization as <MV xsi:nil="true" /> (or possibly <MV xsi:nil="true"></MV> as ensuring the short tag notation is not something the standard writers give you control over but my experience is that .NET usually uses it for empty elements so you might be lucky that your wanted serialization format is the default one .NET outputs).
This is the way I'm currently solving my issue (don't laugh):
public override string ToString()
{
string temp = base.ToString();
temp = temp.Replace(" p2:nil=\"true\"", "");
temp = temp.Replace(" xmlns:p2=\"http://www.w3.org/2001/XMLSchema-instance\"", "");
temp = temp.Replace("MV />", "MV/>");
return temp;
}
This is hidious! Does anybody have a better solution?
Thanks

How to handle XmlWriter.Create in a subclass

I need to change the behaviour of XmlWriter for my project to change the way that empty xml elements are serialised. Currently, my code uses XmlWriter and XmlSerializer like so:
public string Serialize(object o)
{
XmlWriterSettings settings = new XmlWriterSettings();
...
StringWriter stringWriter = new StringWriter();
XmlWriter xmlWriter = XmlWriter.Create(stringWriter, settings);
XmlSerializer serializer = new XmlSerializer(o.GetType());
serializer.Serialize(xmlWriter, o);
return stringWriter.ToString();
}
When serializing my xml, empty elements are being serialized to <emptyElement/>, but I need the xml to serialize empty elements to <emptyElement></emptyElement>. The best solution I've found for this was stated in this a Microsoft forum years ago: https://social.msdn.microsoft.com/Forums/en-US/979315cf-6727-4979-a554-316218ab8b24/xml-serialize-empty-elements?forum=xmlandnetfx
The faster and safer way of doing this is by writing your own subclass of the XmlWriter and give it to XmlSerializer.
YourXmlWriter would aggregate standard one and would translate all WriteEndElement() calls to WriteFullEndElement() calls.
I've tried writing my own subclass of XmlWriter, overriding the two methods I need to override:
public abstract class CustomXmlWriter : XmlWriter
{
public override void WriteEndElement()
{
WriteFullEndElement();
}
public override Task WriteEndElementAsync()
{
return WriteFullEndElementAsync();
}
}
In theory, I believe this should work. However, when trying to use the code, I'm coming up against a brick wall around XmlWriter.Create. I cannot cast the resulting XmlWriter to my CustomXmlWriter for obvious reasons, and I can't override the method as it's a static method.
How am I meant to deal with the static Create method? The only other way I can think of doing this is to scrap the idea of my own CustomXmlWriter, and to simply manipulate the string at the end of my method, but this feels very wrong. I don't know if what I'm trying to achieve is possible, or if there is a simple setting somewhere that I cannot seem to find anywhere.
Try following Regex as a temporary fix. The variable input can be the entire xml string and it will replace every occurrence. :
static void Main(string[] args)
{
string input = "<emptyElement/>";
string patternNullTag = #"\<(?'tagname'\w+)/\>";
string output = Regex.Replace(input, patternNullTag, ReplaceNullElement);
}
static string ReplaceNullElement(Match match)
{
string tagname = match.Value.Replace("<", "").Replace("/>", "");
string newElement = "<" + tagname + ">" + "</" + tagname + ">";
return newElement;
}

xml serialisation best practices

I have been using the traditional way of serializing content with the following code
private void SaveToXml(IdentifiableEntity IE)
{
try
{
XmlSerializer serializer = new XmlSerializer(IE.GetType());
TextWriter textWriter = new StreamWriter(IE.FilePath);
serializer.Serialize(textWriter, IE);
textWriter.Close();
}
catch (Exception e )
{
Console.WriteLine("erreur : "+ e);
}
}
private T LoadFromXml<T>(string path)
{
XmlSerializer deserializer = new XmlSerializer(typeof(T));
TextReader textReader = new StreamReader(path);
T entity = (T)deserializer.Deserialize(textReader);
textReader.Close();
return entity;
}
Though this approach does the trick, i find it a bit annoying that all my properties have to be public, that i need to tag the properties sometimes [XmlAttribute|XmlElement| XmlIgnore] and that it doesn't deal with dictionaries.
My question is : Is there a better way of serializing objects in c#, a way that with less hassle, more modern and easy to use?
First of all, I would suggest to use "using" blocks in your code.(Sample code)
If my understanding is OK, you are looking for a fast way to build your model classes that you will use during your deserialize/serialize operations.
Every Xml file is different and I don't know any generic way to serialize / deserialize them. At one moment you have to know if there will be an attribute, or elements or if any element can be null etc.
Assuming that you already have a sample XML file with a few lines which gives you general view of how it will look like
I would suggest to use xsd (miracle tool)
xsd yourXMLFileName.xml
xsd yourXMLFileName.xsd \classes
This tool will generate you every time model classes for the XML file you want to work it.
Than you serialize and deserialize easily
To deserialize (assuming that you'll get a class named XXXX representing root node in your xml)
XmlSerializer ser = new XmlSerializer(typeof(XXXX));
XXXX yourVariable;
using (XmlReader reader = XmlReader.Create(#"C:\yyyyyy\yyyyyy\YourXmlFile.xml"))
{
yourVariable= (XXXX) ser.Deserialize(reader);
}
To serialize
var serializer = new XmlSerializer(typeof(XXXX));
using(var writer = new StreamWriter(#"C:\yyyyyy\yyyyyy\YourXmlFile.xml"))
{
serializer.Serialize(writer, yourVariable);
}

Why is XmlSerializer's Deserialize() spitting out a child object which is a XmlNode[]?

I'm using XmlSerializer to serialize and then deserialize a simple object. When I deserialize the object to my surprise I find a child object was not properly deserialized but instead turned into XmlNode[].
Here is very nearly the structure I've got:
// This line I put in here as a way of sneaking into the XML the
// root node's C# namespace, since it's not the same as the
// deserializing code and the deserializing code seemed unable to
// deserialize properly without knowing the Type (see my code below).
// So I basically just use this fake construct to get the namespace
// and make a Type of it to feed the XmlSerializer() instantiation.
[XmlRoot(Namespace = "http://foo.com/CSharpNamespace/Foo.Bar")]
// This is because QueuedFile can be given to the Argument array.
[XmlInclude(typeof(QueuedFile))]
// This class is Foo.Bar.CommandAndArguments
public class CommandAndArguments {
public String Command;
public object[] Arguments;
}
// I don't think this matters to XmlSerialize, but just in case...
[Serializable()]
// I added this line just thinking maybe it would help, but it doesn't
// do anything. I tried it without the XmlType first, and that
// didn't work.
[XmlType("Foo.Baz.Bat.QueuedFile")]
// This class is Foo.Baz.Bat.QueuedFile (in a different c#
// namespace than CommandAndArguments and the deserializing code)
public QueuedFile {
public String FileName;
public String DirectoryName;
}
And the code which deserializes it looks like:
public static object DeserializeXml(String objectToDeserialize)
{
String rootNodeName = "";
String rootNodeNamespace = "";
using (XmlReader xmlReader = XmlReader.Create(new StringReader(objectToDeserialize)))
{
if (xmlReader.MoveToContent() == XmlNodeType.Element)
{
rootNodeName = xmlReader.Name;
rootNodeNamespace = xmlReader.NamespaceURI;
if (rootNodeNamespace.StartsWith("http://foo.com/CSharpNamespace/"))
{
rootNodeName = rootNodeNamespace.Substring("http://foo.com/CSharpNamespace/".Length) + "." +
rootNodeName;
}
}
}
//MessageBox.Show(rootNodeName);
try
{
Type t = DetermineTypeFromName(rootNodeName);
if (t == null)
{
throw new Exception("Could not determine type of serialized string. Type listed as: "+rootNodeName);
}
var s = new XmlSerializer(t);
return s.Deserialize(new StringReader(objectToDeserialize));
// object o = new object();
// MethodInfo castMethod = o.GetType().GetMethod("Cast").MakeGenericMethod(t);
// return castMethod.Invoke(null, new object[] { s.Deserialize(new StringReader(objectToDeserialize)) });
}
catch (InvalidOperationException)
{
return null;
}
}
And here is the XML when the CommandAndArguments is serialized:
<?xml version="1.0" encoding="utf-16"?>
<CommandAndArguments xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://foo.com/CSharpNamespace/Foo.Bar">
<Command>I am a command</Command>
<Arguments>
<anyType xsi:type="Foo.Baz.Bat.QueuedFile">
<FileName xmlns="">HelloWorld.txt</FileName>
<DirectoryName xmlns="">C:\foo\bar</DirectoryName>
</anyType>
</Arguments>
</CommandAndArguments>
But when I deserialize I am given a CommandAndArguments object where Arguments is XmlNode[] with the first item being the attribute giving the QueuedFile as the type and the other indices being elements of the properties. But why wasn't the QueuedFile object recreated?
I suspect this might somehow have do with C# namespaces and the engine doing the deserializing not being able to find or work with QueuedFile... But I don't see why since when I forgot the XmlInclude() it made sure to tell me it didn't expect QueuedFile and now that I've added the XmlInclude() I get no error, just an incomplete deserialization.
Help? I've read everything I can find to read and Googled everything I know to Google and am stuck. I certainly have a lot to learn about XML serialization but I'm not sure how I'm failing at something which should be pretty simple (I actually did something almost exactly like this before without any problem, the only difference then was that everything was in the same C# namespace).
Are you able to change the XML format or is it fixed? I don't know what the problem you are having is, but I use the DataContractSerializer classes extensively with no problems.
http://msdn.microsoft.com/en-us/library/system.runtime.serialization.datacontractserializer.aspx
public static void WriteObject(string fileName)
{
Console.WriteLine(
"Creating a Person object and serializing it.");
Person p1 = new Person("Zighetti", "Barbara", 101);
FileStream writer = new FileStream(fileName, FileMode.Create);
DataContractSerializer ser =
new DataContractSerializer(typeof(Person));
ser.WriteObject(writer, p1);
writer.Close();
}
public static void ReadObject(string fileName)
{
Console.WriteLine("Deserializing an instance of the object.");
FileStream fs = new FileStream(fileName,
FileMode.Open);
XmlDictionaryReader reader =
XmlDictionaryReader.CreateTextReader(fs, new XmlDictionaryReaderQuotas());
DataContractSerializer ser = new DataContractSerializer(typeof(Person));
// Deserialize the data and read it from the instance.
Person deserializedPerson =
(Person)ser.ReadObject(reader, true);
reader.Close();
fs.Close();
Console.WriteLine(String.Format("{0} {1}, ID: {2}",
deserializedPerson.FirstName, deserializedPerson.LastName,
deserializedPerson.ID));
}
To anyone coming along with a similar problem, depending on your situation you're probably better off with NetDataContractSerializer. It is an alternative to DataContractSerializer which records the .Net types in the XML making deserialization a breeze, since it knows exactly what types are involved and thus you do not need to tell it what type the root object is with the deserialize command. And it can produce output in XML or binary form (I prefer XML for easier debugging).
Here is some sample code for easily serializing and deserializing an object to and from a string:
private static object Deserialize(string xml)
{
object toReturn = null;
using (Stream stream = new MemoryStream())
{
byte[] data = System.Text.Encoding.UTF8.GetBytes(xml);
stream.Write(data, 0, data.Length);
stream.Position = 0;
var netDataContractSerializer = new NetDataContractSerializer();
toReturn = netDataContractSerializer.Deserialize(stream);
}
return toReturn;
}
private static string Serialize(object obj)
{
using (var memoryStream = new MemoryStream())
using (var reader = new StreamReader(memoryStream))
{
var netDataContractSerializer = new NetDataContractSerializer();
netDataContractSerializer.Serialize(memoryStream, obj);
memoryStream.Position = 0;
return reader.ReadToEnd();
}
}
Easy as pie!

Does static XML Serializer in C# cause memory over grow?

I just can't find a simple answer to this simple question I have from Dr Google. I have the following serializing function which I put in a static module. It is called many times by my application to serialize lots of XML files. Will this cause memory to over grow? (Ignore the text write part of the code)
public static void SerializeToXML<T>(String inFilename,T t)
{
XmlSerializer serializer = new XmlSerializer(t.GetType());
string FullName = inFilename;
TextWriter textWriter = new StreamWriter(FullName);
serializer.Serialize(textWriter, t);
textWriter.Close();
textWriter.Dispose();
}
Will this cause memory to over grow?
No. There will be no memory over growing. static will let you call SerializeToXML method without create a new instance of the class. Not anything else.
So if you're calling this method many times, You even shrinking the memory usage with a static method.
Though you wrote to ignore the text write part, You should use using statement for unmanaged resources:
public static void SerializeToXML<T>(String inFilename,T t)
{
XmlSerializer serializer = new XmlSerializer(t.GetType());
string FullName = inFilename;
using (TextWriter textWriter = new StreamWriter(FullName))
{
serializer.Serialize(textWriter, t);
textWriter.Close();
}
}

Categories

Resources