LINQ to XML X-DOM internal implementation - c#

How is the LINQ to XML X-DOM from the System.Xml.Linq namespace internally implemented? (XNode, XElement, etc.)
Is it utilizing standard high-performing one-way XmlReader/XmlWriter from the other XML namespaces or something else?
The reason I'm asking is that I'm trying to figure out in which circumstances could or should be used as performance is always a concern.

Using Reflector (or, when that's no longer free, ILSpy :); no I'm not an employee - just spreading the word surreptitiously!) it appears all the load/save methods channel through to XmlReader and XmlWriter.
For example - XElement's implementation of Load(Stream, LoadOptions) does this:
public static XElement Load(Stream stream, LoadOptions options)
{
XmlReaderSettings xmlReaderSettings = XNode.GetXmlReaderSettings(options);
using (XmlReader reader = XmlReader.Create(stream, xmlReaderSettings))
{
return Load(reader, options);
}
}
And it's a similar story for all the other static methods - including Parse.
But then there is the XStreamingElement constructor - however I can't find any real usage of it outside of the XElement class itself. Looks like this could be an optimised type for loading that, as yet, isn't used by much.
Equally, the Save and WriteTo methods ultimately use an XmlWriter instance - e.g:
public void Save(string fileName, SaveOptions options)
{
XmlWriterSettings xmlWriterSettings = XNode.GetXmlWriterSettings(options);
using (XmlWriter writer = XmlWriter.Create(fileName, xmlWriterSettings))
{
this.Save(writer);
}
}
So at least from a performance point of view they started with the right types :)

Related

XML Numeric Character References in Deserialization

I need to deserialize an XML response from an external service containing more than 100.000 rows, but I have a problem unescapeing numeric character references in various places. Since the DOM is complex, I need to have a global solution that applies to the whole document, and not it's specific elements. I have the following situations:
<text>First &#38; Second</text>
I use the following XmlSerializer implementation:
public T DeserializeXmlReader<T>(string path) where T : class
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
using (FileStream fileStream = new FileStream(path, FileMode.Open))
{
using (XmlReader xmlReader = XmlReader.Create(fileStream))
{
return (T)serializer.Deserialize(xmlReader);
}
}
}
After deserialization, I will get the following result: "First &#38 Second", instead of "First & Second". I am not sure if there is an additional step I need to undertake to get the "&" deserialized correctly?
Note: After doing some research, I believe the problem might be similar to this one, but I'm not sure if it's applicable since this concerns php: How to deserialize a xml string along with NCR unescaping?

C# XML Serialization remove unwanted information [duplicate]

Given this generic serialization code:
public virtual string Serialize(System.Text.Encoding encoding)
{
System.IO.StreamReader streamReader = null;
System.IO.MemoryStream memoryStream = null;
memoryStream = new System.IO.MemoryStream();
System.Xml.XmlWriterSettings xmlWriterSettings = new System.Xml.XmlWriterSettings();
xmlWriterSettings.Encoding = encoding;
System.Xml.XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings);
Serializer.Serialize(xmlWriter, this);
memoryStream.Seek(0, System.IO.SeekOrigin.Begin);
streamReader = new System.IO.StreamReader(memoryStream);
return streamReader.ReadToEnd();
}
and this object (gen'd from xsd2code):
[System.CodeDom.Compiler.GeneratedCodeAttribute("System.Xml", "4.0.30319.225")]
[System.SerializableAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true, Namespace = "Com.Foo.Request")]
[System.Xml.Serialization.XmlRootAttribute(Namespace = "Com.Foo.Request", IsNullable = false)]
public partial class REQUEST_GROUP
{
[EditorBrowsable(EditorBrowsableState.Never)]
private List<REQUESTING_PARTY> rEQUESTING_PARTYField;
[EditorBrowsable(EditorBrowsableState.Never)]
private RECEIVING_PARTY rECEIVING_PARTYField;
[EditorBrowsable(EditorBrowsableState.Never)]
private SUBMITTING_PARTY sUBMITTING_PARTYField;
[EditorBrowsable(EditorBrowsableState.Never)]
private REQUEST rEQUESTField;
[EditorBrowsable(EditorBrowsableState.Never)]
private string iDField;
public REQUEST_GROUP()
{
this.rEQUESTField = new REQUEST();
this.sUBMITTING_PARTYField = new SUBMITTING_PARTY();
this.rECEIVING_PARTYField = new RECEIVING_PARTY();
this.rEQUESTING_PARTYField = new List<REQUESTING_PARTY>();
this.IDField = "2.1";
}
}
Output from the Serialize with an encode of utf-8:
<?xml version="1.0" encoding="utf-8"?>
<REQUEST_GROUP xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" ID="2.1" xmlns="Com.Foo.Request">
<RECEIVING_PARTY />
<SUBMITTING_PARTY />
<REQUEST LoginAccountIdentifier="xxx" LoginAccountPassword="yyy" _RecordIdentifier="" _JobIdentifier="">
<REQUESTDATA>
<PROPERTY_INFORMATION_REQUEST _SpecialInstructionsDescription="" _ActionType="Submit">
<_DATA_PRODUCT _ShortSubjectReport="Y" />
<_PROPERTY_CRITERIA _City="Sunshine City" _StreetAddress2="" _StreetAddress="123 Main Street" _State="CA" _PostalCode="12345">
<PARSED_STREET_ADDRESS />
</_PROPERTY_CRITERIA>
<_SEARCH_CRITERIA />
<_RESPONSE_CRITERIA />
</PROPERTY_INFORMATION_REQUEST>
</REQUESTDATA>
</REQUEST>
</REQUEST_GROUP>
EDIT
Question 1: How do I decorate the class in such a fashion, or manipulate the serializer to get rid of all the namespaces in the REQUEST_GROUP node during processing, NOT post-processing with xslt or regex.
Question 2: Bonus point if you could add the doc type too.
Thank you.
You can remove the namespaces like this:
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add(string.Empty, string.Empty);
ns.Add(string.Empty, "Com.Foo.Request");
Serializer.Serialize(xmlWriter, this, ns);
As for adding the doctype, I know it's possible to make a custom XmlWriter and just override WriteStartDocument with a method that makes a call to WriteDocType, but I kind of hope someone else knows an easier way than that.
EDIT: Incidentally, I strongly recommend using using:
using(System.Xml.XmlWriter xmlWriter = XmlWriter.Create(etc.))
{
// use it here.
}
It automatically handles tidying up of the streams by calling the Dispose method when the block ends.
If you just want to remove the namespace aliases, then as already shown you can use XmlSerializerNamespaces to force XmlSerializer to use the namespace explicitly (i.e. xmlns="blah") on each element, rather than declaring an alias and using the alias instead.
However, regardless of what you do with the aliases, the fundamental name of that element is REQUEST_GROUP in the Com.Foo.Request namespace. You can't remove the namespace completely without that representing a breaking change to the underlying data - i.e. somebody somewhere is going to get an exception (due to getting data it didn't expect - specifically REQUEST_GROUP in the root namespace). In C# terms, it is the difference between System.String and My.Custom.String - sure, they are both called String, but that is just their local name.
If you want to remove all traces of the namespace, then a pragmatic option would be to edit away the Namespace=... entries from [XmlRoot(...)] and [XmlType(...)] (plus anywhere else that isn't shown in the example).
If the types are outside of your control, you can also do this at runtime using XmlAttributeOverrides - but a caveat: if you create an XmlSerializer using XmlAttributeOverrides you must cache and re-use it - otherwise your AppDomain will leak (it creates assemblies on the fly per serializer in this mode, and assemblies cannot be unloaded).

xml serialisation best practices

I have been using the traditional way of serializing content with the following code
private void SaveToXml(IdentifiableEntity IE)
{
try
{
XmlSerializer serializer = new XmlSerializer(IE.GetType());
TextWriter textWriter = new StreamWriter(IE.FilePath);
serializer.Serialize(textWriter, IE);
textWriter.Close();
}
catch (Exception e )
{
Console.WriteLine("erreur : "+ e);
}
}
private T LoadFromXml<T>(string path)
{
XmlSerializer deserializer = new XmlSerializer(typeof(T));
TextReader textReader = new StreamReader(path);
T entity = (T)deserializer.Deserialize(textReader);
textReader.Close();
return entity;
}
Though this approach does the trick, i find it a bit annoying that all my properties have to be public, that i need to tag the properties sometimes [XmlAttribute|XmlElement| XmlIgnore] and that it doesn't deal with dictionaries.
My question is : Is there a better way of serializing objects in c#, a way that with less hassle, more modern and easy to use?
First of all, I would suggest to use "using" blocks in your code.(Sample code)
If my understanding is OK, you are looking for a fast way to build your model classes that you will use during your deserialize/serialize operations.
Every Xml file is different and I don't know any generic way to serialize / deserialize them. At one moment you have to know if there will be an attribute, or elements or if any element can be null etc.
Assuming that you already have a sample XML file with a few lines which gives you general view of how it will look like
I would suggest to use xsd (miracle tool)
xsd yourXMLFileName.xml
xsd yourXMLFileName.xsd \classes
This tool will generate you every time model classes for the XML file you want to work it.
Than you serialize and deserialize easily
To deserialize (assuming that you'll get a class named XXXX representing root node in your xml)
XmlSerializer ser = new XmlSerializer(typeof(XXXX));
XXXX yourVariable;
using (XmlReader reader = XmlReader.Create(#"C:\yyyyyy\yyyyyy\YourXmlFile.xml"))
{
yourVariable= (XXXX) ser.Deserialize(reader);
}
To serialize
var serializer = new XmlSerializer(typeof(XXXX));
using(var writer = new StreamWriter(#"C:\yyyyyy\yyyyyy\YourXmlFile.xml"))
{
serializer.Serialize(writer, yourVariable);
}

Does static XML Serializer in C# cause memory over grow?

I just can't find a simple answer to this simple question I have from Dr Google. I have the following serializing function which I put in a static module. It is called many times by my application to serialize lots of XML files. Will this cause memory to over grow? (Ignore the text write part of the code)
public static void SerializeToXML<T>(String inFilename,T t)
{
XmlSerializer serializer = new XmlSerializer(t.GetType());
string FullName = inFilename;
TextWriter textWriter = new StreamWriter(FullName);
serializer.Serialize(textWriter, t);
textWriter.Close();
textWriter.Dispose();
}
Will this cause memory to over grow?
No. There will be no memory over growing. static will let you call SerializeToXML method without create a new instance of the class. Not anything else.
So if you're calling this method many times, You even shrinking the memory usage with a static method.
Though you wrote to ignore the text write part, You should use using statement for unmanaged resources:
public static void SerializeToXML<T>(String inFilename,T t)
{
XmlSerializer serializer = new XmlSerializer(t.GetType());
string FullName = inFilename;
using (TextWriter textWriter = new StreamWriter(FullName))
{
serializer.Serialize(textWriter, t);
textWriter.Close();
}
}

Remove Namespaces During XML Serialization

Given this generic serialization code:
public virtual string Serialize(System.Text.Encoding encoding)
{
System.IO.StreamReader streamReader = null;
System.IO.MemoryStream memoryStream = null;
memoryStream = new System.IO.MemoryStream();
System.Xml.XmlWriterSettings xmlWriterSettings = new System.Xml.XmlWriterSettings();
xmlWriterSettings.Encoding = encoding;
System.Xml.XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings);
Serializer.Serialize(xmlWriter, this);
memoryStream.Seek(0, System.IO.SeekOrigin.Begin);
streamReader = new System.IO.StreamReader(memoryStream);
return streamReader.ReadToEnd();
}
and this object (gen'd from xsd2code):
[System.CodeDom.Compiler.GeneratedCodeAttribute("System.Xml", "4.0.30319.225")]
[System.SerializableAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true, Namespace = "Com.Foo.Request")]
[System.Xml.Serialization.XmlRootAttribute(Namespace = "Com.Foo.Request", IsNullable = false)]
public partial class REQUEST_GROUP
{
[EditorBrowsable(EditorBrowsableState.Never)]
private List<REQUESTING_PARTY> rEQUESTING_PARTYField;
[EditorBrowsable(EditorBrowsableState.Never)]
private RECEIVING_PARTY rECEIVING_PARTYField;
[EditorBrowsable(EditorBrowsableState.Never)]
private SUBMITTING_PARTY sUBMITTING_PARTYField;
[EditorBrowsable(EditorBrowsableState.Never)]
private REQUEST rEQUESTField;
[EditorBrowsable(EditorBrowsableState.Never)]
private string iDField;
public REQUEST_GROUP()
{
this.rEQUESTField = new REQUEST();
this.sUBMITTING_PARTYField = new SUBMITTING_PARTY();
this.rECEIVING_PARTYField = new RECEIVING_PARTY();
this.rEQUESTING_PARTYField = new List<REQUESTING_PARTY>();
this.IDField = "2.1";
}
}
Output from the Serialize with an encode of utf-8:
<?xml version="1.0" encoding="utf-8"?>
<REQUEST_GROUP xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" ID="2.1" xmlns="Com.Foo.Request">
<RECEIVING_PARTY />
<SUBMITTING_PARTY />
<REQUEST LoginAccountIdentifier="xxx" LoginAccountPassword="yyy" _RecordIdentifier="" _JobIdentifier="">
<REQUESTDATA>
<PROPERTY_INFORMATION_REQUEST _SpecialInstructionsDescription="" _ActionType="Submit">
<_DATA_PRODUCT _ShortSubjectReport="Y" />
<_PROPERTY_CRITERIA _City="Sunshine City" _StreetAddress2="" _StreetAddress="123 Main Street" _State="CA" _PostalCode="12345">
<PARSED_STREET_ADDRESS />
</_PROPERTY_CRITERIA>
<_SEARCH_CRITERIA />
<_RESPONSE_CRITERIA />
</PROPERTY_INFORMATION_REQUEST>
</REQUESTDATA>
</REQUEST>
</REQUEST_GROUP>
EDIT
Question 1: How do I decorate the class in such a fashion, or manipulate the serializer to get rid of all the namespaces in the REQUEST_GROUP node during processing, NOT post-processing with xslt or regex.
Question 2: Bonus point if you could add the doc type too.
Thank you.
You can remove the namespaces like this:
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add(string.Empty, string.Empty);
ns.Add(string.Empty, "Com.Foo.Request");
Serializer.Serialize(xmlWriter, this, ns);
As for adding the doctype, I know it's possible to make a custom XmlWriter and just override WriteStartDocument with a method that makes a call to WriteDocType, but I kind of hope someone else knows an easier way than that.
EDIT: Incidentally, I strongly recommend using using:
using(System.Xml.XmlWriter xmlWriter = XmlWriter.Create(etc.))
{
// use it here.
}
It automatically handles tidying up of the streams by calling the Dispose method when the block ends.
If you just want to remove the namespace aliases, then as already shown you can use XmlSerializerNamespaces to force XmlSerializer to use the namespace explicitly (i.e. xmlns="blah") on each element, rather than declaring an alias and using the alias instead.
However, regardless of what you do with the aliases, the fundamental name of that element is REQUEST_GROUP in the Com.Foo.Request namespace. You can't remove the namespace completely without that representing a breaking change to the underlying data - i.e. somebody somewhere is going to get an exception (due to getting data it didn't expect - specifically REQUEST_GROUP in the root namespace). In C# terms, it is the difference between System.String and My.Custom.String - sure, they are both called String, but that is just their local name.
If you want to remove all traces of the namespace, then a pragmatic option would be to edit away the Namespace=... entries from [XmlRoot(...)] and [XmlType(...)] (plus anywhere else that isn't shown in the example).
If the types are outside of your control, you can also do this at runtime using XmlAttributeOverrides - but a caveat: if you create an XmlSerializer using XmlAttributeOverrides you must cache and re-use it - otherwise your AppDomain will leak (it creates assemblies on the fly per serializer in this mode, and assemblies cannot be unloaded).

Categories

Resources