Remove Namespace attribute from Xml Node in Serialized Value

Remove Namespace attribute from Xml Node in Serialized Value - c#

I'm having to recreate a vendor's XML file. I don't have access to their code, schema, or anything, so I'm doing this using the XmlSerializer and attributes. I'm doing it this way because the system is using a generic XmlWriter I've built to write other system XML files, so I'm killing two birds with one stone. Everything has been working out great, with exception of one property value. The vendor XML looks like this:
<TextOutlTxt>
<p style="text-align:left;margin-top:0pt;margin-bottom:0pt;">
<span>SUBSTA SF6 CIRCUIT BKR CONC FDN "C"</span>
</p>
</TextOutlTxt>
Here's my property configuration:
private string _value;
[XmlElement("TextOutlTxt")]
public XmlNode Value
{
get
{
string text = _value;
text = Regex.Replace(text, #"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
string value = "\n<p style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\">\n<span>ReplaceMe</span>\n</p>\n";
XmlDocument document = new XmlDocument();
document.InnerXml = "<root>" + value + "</root>";
XmlNode innerNode = document.DocumentElement.FirstChild;
innerNode.InnerText = text;
return innerNode;
}
set
{ }
}
And this gives me:
<TextOutlTxt>
<p style="text-align:left;margin-top:0pt;margin-bottom:0pt;" xmlns="">SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;</p>
</TextOutlTxt>
So I'm close, but no cigar. There is an unwanted xmlns="..." attribute; it must not be present. In my XmlWriter, I have done the following to remove the namespace unless found atop the object it is serializing:
protected override void OnWrite<T>(T sourceData, Stream outputStream)
{
IKnownTypesLocator knownTypesLocator = KnownTypesLocator.Instance;
//Let's see if we can get the default namespace
XmlRootAttribute xmlRootAttribute = sourceData.GetType().GetCustomAttributes<XmlRootAttribute>().FirstOrDefault();
XmlSerializer serializer = null;
if (xmlRootAttribute != null)
{
string nameSpace = xmlRootAttribute.Namespace ?? string.Empty;
XmlSerializerNamespaces nameSpaces = new XmlSerializerNamespaces();
nameSpaces.Add(string.Empty, nameSpace);
serializer = new XmlSerializer(typeof(T), new XmlAttributeOverrides(), knownTypesLocator.XmlItems.ToArray(), xmlRootAttribute, nameSpace);
//Now we can serialize
using (StreamWriter writer = new StreamWriter(outputStream))
{
serializer.Serialize(writer, sourceData, nameSpaces);
}
}
else
{
serializer = new XmlSerializer(typeof(T), knownTypesLocator.XmlItems.ToArray());
//Now we can serialize
using (StreamWriter writer = new StreamWriter(outputStream))
{
serializer.Serialize(writer, sourceData);
}
}
}
I'm sure I'm overlooking something. Any help would be greatly appreciated!
UPDATE 9/26/2017
So... I've been asked to provide more detail, specifically an explanation of the purpose of my code, and a reproducible example. So here's both:
The purpose for the XML. I am writing an interface UI between two systems. I read data from one, give users options to massage the data, and then give the the ability to export the data into files the second system can import. It's regarding a bill of material system where system one are the CAD drawings and objects in those drawings and system two is an enterprise estimating system that is also being configured to support electronic bills of material. I was given the XMLs from the vendor to recreate.
Fully functional example code.... I've tried generalizing the code in a reproducible form.
[XmlRoot("OutlTxt", Namespace = "http://www.mynamespace/09262017")]
public class OutlineText
{
private string _value;
[XmlElement("TextOutlTxt")]
public XmlNode Value
{
get
{
string text = _value;
text = Regex.Replace(text, #"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
string value = "\n<p style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\">\n<span>ReplaceMe</span>\n</p>\n";
XmlDocument document = new XmlDocument();
document.InnerXml = "<root>" + value + "</root>";
XmlNode innerNode = document.DocumentElement.FirstChild;
innerNode.InnerText = text;
return innerNode;
}
set
{ }
}
private OutlineText()
{ }
public OutlineText(string text)
{
_value = text;
}
}
public class XmlFileWriter
{
public void Write<T>(T sourceData, FileInfo targetFile) where T : class
{
//This is actually retrieved through a locator object, but surely no one will mind an empty
//collection for the sake of an example
Type[] knownTypes = new Type[] { };
using (FileStream targetStream = targetFile.OpenWrite())
{
//Let's see if we can get the default namespace
XmlRootAttribute xmlRootAttribute = sourceData.GetType().GetCustomAttributes<XmlRootAttribute>().FirstOrDefault();
XmlSerializer serializer = null;
if (xmlRootAttribute != null)
{
string nameSpace = xmlRootAttribute.Namespace ?? string.Empty;
XmlSerializerNamespaces nameSpaces = new XmlSerializerNamespaces();
nameSpaces.Add(string.Empty, nameSpace);
serializer = new XmlSerializer(typeof(T), new XmlAttributeOverrides(), knownTypes, xmlRootAttribute, nameSpace);
//Now we can serialize
using (StreamWriter writer = new StreamWriter(targetStream))
{
serializer.Serialize(writer, sourceData, nameSpaces);
}
}
else
{
serializer = new XmlSerializer(typeof(T), knownTypes);
//Now we can serialize
using (StreamWriter writer = new StreamWriter(targetStream))
{
serializer.Serialize(writer, sourceData);
}
}
}
}
}
public static void Main()
{
OutlineText outlineText = new OutlineText(#"SUBSTA SF6 CIRCUIT BKR CONC FDN ""C""");
XmlFileWriter fileWriter = new XmlFileWriter();
fileWriter.Write<OutlineText>(outlineText, new FileInfo(#"C:\MyDirectory\MyXml.xml"));
Console.ReadLine();
}
The result produced:
<?xml version="1.0" encoding="utf-8"?>
<OutlTxt xmlns="http://www.mynamespace/09262017">
<TextOutlTxt>
<p style="text-align:left;margin-top:0pt;margin-bottom:0pt;" xmlns="">SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;</p>
</TextOutlTxt>
</OutlTxt>
Edit 9/27/2017
Per the request in the solution below, a secondary issue I've ran into is keeping the hexadecimal codes. To illustrate this issue based on the above example, let's say the value between is
SUBSTA SF6 CIRCUIT BKR CONC FDN "C"
The vendor file is expecting the literals to be in their hex code format like so
SUBSTA SF6 CIRCUIT BKR CONC FDN "C"
I've rearranged the sample code Value property to be like so:
private string _value;
[XmlAnyElement("TextOutlTxt", Namespace = "http://www.mynamespace/09262017")]
public XElement Value
{
get
{
string value = string.Format("<p xmlns=\"{0}\" style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\"><span>{1}</span></p>", "http://www.mynamespace/09262017", _value);
string innerXml = string.Format("<TextOutlTxt xmlns=\"{0}\">{1}</TextOutlTxt>", "http://www.mynamespace/09262017", value);
XElement element = XElement.Parse(innerXml);
//Remove redundant xmlns attributes
foreach (XElement descendant in element.DescendantsAndSelf())
{
descendant.Attributes().Where(att => att.IsNamespaceDeclaration && att.Value == "http://www.mynamespace/09262017").Remove();
}
return element;
}
set
{
_value = value == null ? null : value.ToString();
}
}
if I use the code
string text = Regex.Replace(element.Value, #"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
to create the hex code values ahead of the XElement.Parse(), the XElement converts them back to their literal values. If I try to set the XElement.Value directly after the XElement.Parse()(or through SetValue()), it changes the " to " Not only that, but it seems to mess with the element output and adds additional elements throwing it all out of whack.
Edit 9/27/2017 #2 to clarify, the original implementation had a related problem, namely that the escaped text was re-escaped. I.e. I was getting
SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;
But wanted
SUBSTA SF6 CIRCUIT BKR CONC FDN "C"

The reason you are getting xmlns="" added to your embedded XML is that your container element(s) <OutlineText> and <TextOutlTxt> are declared to be in the "http://www.mynamespace/09262017" namespace by use of the [XmlRootAttribute.Namespace] attribute, whereas the embedded literal XML elements are in the empty namespace. To fix this, your embedded XML literal must be in the same namespace as its parent elements.
Here is the XML literal. Notice there is no xmlns="..." declaration anywhere in the XML:
<p style="text-align:left;margin-top:0pt;margin-bottom:0pt;" xmlns="">SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;</p>
Lacking such a declaration, the <p> element is in the empty namespace. Conversely, your OutlineText type is decorated with an [XmlRoot] attribute:
[XmlRoot("OutlTxt", Namespace = "http://www.mynamespace/09262017")]
public class OutlineText
{
}
Thus the corresponding OutlTxt root element will be in the http://www.mynamespace/09262017 namespace. All its child elements will default to this namespace as well unless overridden. Placing the embedded XmlNode in the empty namespace counts as overriding the parent namespace, and so an xmlns="" attribute is required.
The simplest way to avoid this problem is for your embedded XML string literal to place itself in the correct namespace as follows:
<p xmlns="http://www.mynamespace/09262017" style="text-align:left;margin-top:0pt;margin-bottom:0pt;">
<span>ReplaceMe</span>
</p>
Then, in your Value method, strip redundant namespace declarations. This is somewhat easier to do with the LINQ to XML API:
[XmlRoot("OutlTxt", Namespace = OutlineText.Namespace)]
public class OutlineText
{
public const string Namespace = "http://www.mynamespace/09262017";
private string _value;
[XmlAnyElement("TextOutlTxt", Namespace = OutlineText.Namespace)]
public XElement Value
{
get
{
var escapedValue = EscapeTextValue(_value);
var nestedXml = string.Format("<p xmlns=\"{0}\" style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\"><span>{1}</span></p>", Namespace, escapedValue);
var outerXml = string.Format("<TextOutlTxt xmlns=\"{0}\">{1}</TextOutlTxt>", Namespace, nestedXml);
var element = XElement.Parse(outerXml);
//Remove redundant xmlns attributes
element.DescendantsAndSelf().SelectMany(e => e.Attributes()).Where(a => a.IsNamespaceDeclaration && a.Value == Namespace).Remove();
return element;
}
set
{
_value = value == null ? null : value.Value;
}
}
static string EscapeTextValue(string text)
{
return Regex.Replace(text, #"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
}
private OutlineText()
{ }
public OutlineText(string text)
{
_value = text;
}
}
And the resulting XML will look like:
<OutlTxt xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.mynamespace/09262017">
<TextOutlTxt>
<p style="text-align:left;margin-top:0pt;margin-bottom:0pt;">
<span>SUBSTA SF6 CIRCUIT BKR CONC FDN "C"</span>
</p>
</TextOutlTxt>
</OutlTxt>
Note that I have changed the attribute on Value from [XmlElement] to [XmlAnyElement]. I did this because it appears your value XML might contain multiple mixed content nodes at the root level, e.g.:
Start Text <p>Middle Text</p> End Text
Using [XmlAnyElement] enables this by allowing a container node to be returned without causing an extra level of XML element nesting.
Sample working .Net fiddle.

Your question now has two requirements:
Suppress certain xmlns="..." attributes on an embedded XElement or XmlNode while serializing, AND
Force certain characters inside element text to be escaped (e.g. " => "). Even though this is not required by the XML standard, your legacy receiving system apparently needs this.
Issue #1 can be addressed as in this answer
For issue #2, however, there is no way to force certain characters to be unnecessarily escaped using XmlNode or XElement because escaping is handled at the level of XmlWriter during output. And Microsoft's built-in implementations of XmlWriter seem not to have any settings that can force certain characters that do not need to be escaped to nevertheless be escaped. You would need to try to subclass XmlWriter or XmlTextWriter (as described e.g. here and here) then intercept string values as they are written and escape quote characters as desired.
Thus, as an alternate approach that solves both #1 and #2, you could implement IXmlSerializable and write your desired XML directly with XmlWriter.WriteRaw():
[XmlRoot("OutlTxt", Namespace = OutlineText.Namespace)]
public class OutlineText : IXmlSerializable
{
public const string Namespace = "http://www.mynamespace/09262017";
private string _value;
// For debugging purposes.
internal string InnerValue { get { return _value; } }
static string EscapeTextValue(string text)
{
return Regex.Replace(text, #"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
}
private OutlineText()
{ }
public OutlineText(string text)
{
_value = text;
}
#region IXmlSerializable Members
XmlSchema IXmlSerializable.GetSchema()
{
return null;
}
void IXmlSerializable.ReadXml(XmlReader reader)
{
_value = ((XElement)XNode.ReadFrom(reader)).Value;
}
void IXmlSerializable.WriteXml(XmlWriter writer)
{
var escapedValue = EscapeTextValue(_value);
var nestedXml = string.Format("<p style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\"><span>{0}</span></p>", escapedValue);
writer.WriteRaw(nestedXml);
}
#endregion
}
And the output will be
<OutlTxt xmlns="http://www.mynamespace/09262017"><p style="text-align:left;margin-top:0pt;margin-bottom:0pt;"><span>SUBSTA SF6 CIRCUIT BKR CONC FDN "C"</span></p></OutlTxt>
Note that, if you use WriteRaw(), you can easily generate invalid XML simply by writing markup characters embedded in text values. You should be sure to add unit tests that verify that does not occur, e.g. that new OutlineText(#"<") does not cause problems. (A quick check seems to show your Regex is escaping < and > appropriately.)
New sample .Net fiddle.

Related

Preserve whitespace-only element content when deserializing XML using XmlSerializer

I have a class InputConfig which contains a List<IncludeExcludeRule>:
public class InputConfig
{
// The rest of the class omitted
private List<IncludeExcludeRule> includeExcludeRules;
public List<IncludeExcludeRule> IncludeExcludeRules
{
get { return includeExcludeRules; }
set { includeExcludeRules = value; }
}
}
public class IncludeExcludeRule
{
// Other members omitted
private int idx;
private string function;
public int Idx
{
get { return idx; }
set { idx = value; }
}
public string Function
{
get { return function; }
set { function = value; }
}
}
Using ...
FileStream fs = new FileStream(path, FileMode.Create);
XmlSerializer xmlSerializer = new XmlSerializer(typeof(InputConfig));
xmlSerializer.Serialize(fs, this);
fs.Close();
... and ...
StreamReader sr = new StreamReader(path);
XmlSerializer reader = new XmlSerializer(typeof(InputConfig));
InputConfig inputConfig = (InputConfig)reader.Deserialize(sr);
It works like a champ! Easy stuff, except that I need to preserve whitespace in the member function when deserializing. The generated XML file demonstrates that the whitespace was preserved when serializing, but it is lost on deserializing.
<IncludeExcludeRules>
<IncludeExcludeRule>
<Idx>17</Idx>
<Name>LIEN</Name>
<Operation>E =</Operation>
<Function> </Function>
</IncludeExcludeRule>
</IncludeExcludeRules>
The MSDN documentation for XmlAttributeAttribute seems to address this very issue under the header Remarks, yet I don't understand how to put it to use. It provides this example:
// Set this to 'default' or 'preserve'.
[XmlAttribute("space",
Namespace = "http://www.w3.org/XML/1998/namespace")]
public string Space
Huh? Set what to 'default' or 'preserve'? I'm sure I'm close, but this just isn't making sense. I have to think there's just a single line XmlAttribute to insert in the class before the member to preserve whitespace on deserialize.
There are many instances of similar questions here and elsewhere, but they all seem to involve the use of XmlReader and XmlDocument, or mucking about with individual nodes and such. I'd like to avoid that depth.

To preserve all whitespace during XML deserialization, simply create and use an XmlReader:
StreamReader sr = new StreamReader(path);
XmlReader xr = XmlReader.Create(sr);
XmlSerializer reader = new XmlSerializer(typeof(InputConfig));
InputConfig inputConfig = (InputConfig)reader.Deserialize(xr);
Unlike XmlSerializer.Deserialize(XmlReader), XmlSerializer.Deserialize(TextReader) preserves only significant whitespace marked by the xml:space="preserve" attribute.

The cryptic documentation means that you need to specify an additional field with the [XmlAttribute("space", Namespace = "http://www.w3.org/XML/1998/namespace")] whose value is default or preserve. XmlAttribute controls the name of the generated attribute for a field or property. The attribute's value is the field's value.
For example, this class:
public class Group
{
[XmlAttribute (Namespace = "http://www.cpandl.com")]
public string GroupName;
[XmlAttribute(DataType = "base64Binary")]
public Byte [] GroupNumber;
[XmlAttribute(DataType = "date", AttributeName = "CreationDate")]
public DateTime Today;
[XmlAttribute("space", Namespace = "http://www.w3.org/XML/1998/namespace")]
public string Space ="preserve";
}
Will be serialized to:
<?xml version="1.0" encoding="utf-16"?>
<Group xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
d1p1:GroupName=".NET"
GroupNumber="ZDI="
CreationDate="2001-01-10"
xml:space="preserve"
xmlns:d1p1="http://www.cpandl.com" />

I believe the part you are missing is to add the xml:space="preserve" to the field, e.g.:
<Function xml:space="preserve"> </Function>
For more details, here is the relevant section in the XML Specification
With annotation in the class definition, according to the MSDN blog it should be:
[XmlAttribute("space=preserve")]
but I remember it being
[XmlAttribute("xml:space=preserve")]

Michael Liu's answer above worked for me, but with one caveat. I would have commented on his answer, but my "reputation" is not adequate enough.
I found that using XmlReader did not fully fix the issue, and the reason for this is that the .net property in question had the attribute:
XmlText(DataType="normalizedString")
To rectify this I found that adding the additional attribute worked:
[XmlAttribute("xml:space=preserve")]
Obviously, if you have no control over the .net class then you have a problem.

Unable to remove empty xmlns attribute from XElement using c#

Before posting this question I have tried all other solution on stack, but with no success.
I am unable to remove empty xmlns attribute from XElement using C#, I have tried the following Codes.
XElement.Attributes().Where(a => a.IsNamespaceDeclaration).Remove();
Another one which postted here
foreach (var attr in objXMl.Descendants().Attributes())
{
var elem = attr.Parent;
attr.Remove();
elem.Add(new XAttribute(attr.Name.LocalName, attr.Value));
}

Image This is you xml file
<Root xmlns="http://my.namespace">
<Firstelement xmlns="">
<RestOfTheDocument />
</Firstelement>
</Root>
This is you expect
<Root xmlns="http://my.namespace">
<Firstelement>
<RestOfTheDocument />
</Firstelement>
</Root>
I think the code below is what you want. You need to put each element into the right namespace, and remove any xmlns='' attributes for the affected elements. The latter part is required as otherwise LINQ to XML basically tries to leave you with an element of
<!-- This would be invalid -->
<Firstelement xmlns="" xmlns="http://my.namespace">
Here's the code:
using System;
using System.Xml.Linq;
class Test
{
static void Main()
{
XDocument doc = XDocument.Load("test.xml");
foreach (var node in doc.Root.Descendants())
{
// If we have an empty namespace...
if (node.Name.NamespaceName == "")
{
// Remove the xmlns='' attribute. Note the use of
// Attributes rather than Attribute, in case the
// attribute doesn't exist (which it might not if we'd
// created the document "manually" instead of loading
// it from a file.)
node.Attributes("xmlns").Remove();
// Inherit the parent namespace instead
node.Name = node.Parent.Name.Namespace + node.Name.LocalName;
}
}
Console.WriteLine(doc); // Or doc.Save(...)
}
}

If you add the namespace of the parent element to the element then the empty namespace tag disappears, as it isn't required because the element is in the same namespace.

here's a simpler way to do this. I believe it happens when you create separate xml segments and then join them to your document.
xDoc.Root.SaveDocument(savePath);
private static void SaveDocument(this XElement doc, string filePath)
{
foreach (var node in doc.Descendants())
{
if (node.Name.NamespaceName == "")
{
node.Name = ns + node.Name.LocalName;
}
}
using (var xw = XmlWriter.Create(filePath, new XmlWriterSettings
{
//OmitXmlDeclaration = true,
//Indent = true,
NamespaceHandling = NamespaceHandling.OmitDuplicates
}))
{
doc.Save(xw);
}
}

Did you try to get Xelement.Attribute by value to see if the element is the "xmlns" before removing.
Xelement.Attribute("xmlns").Value

How should text nodes with CDATA and whitespace be interpreted in XML?

The System.Xml parsing features had a few surprises for me in store, and I wonder how the following should be interpreted, or if this is "up to the implementation":
Version 1:
<root><elem>
<![CDATA[MyValue]]>
</elem></root>
Version 2:
<root><elem>
-<![CDATA[MyValue]]>-
</elem></root>
What should be the value of elem? Or is it okay that this depends on the implementation that parses it, and should I just deal with that?
I expected (at first) that in both cases all whitespace between the start/end node and the first non-whitespace character would be ignored. This is not the case, but failing that, I would've at least expected it to never be ignored, but this is also not the case. See full repro below for my expectations.
To elaborate...
Two cases had me stumped when I tested them:
XDocument.Parse will suddenly start to include the \n\t whitespace in example 2, whereas it ignored it in example 1.
XDocument.Load with new XmlReaderSettings {IgnoreWhitespace = true} will behave similarly.
What gives? Is this just the implementation being (to my taste) quirky, and/or is this specified behavior?
Here's a full repro of my expectations (fresh C# class library project with latest NUnit package from NuGet):
[TestFixture]
public class XmlTests
{
public static XDocument ParseDocument(string input)
{
return XDocument.Parse(input);
}
public static XDocument LoadDocument(Stream stream)
{
var xmlReader = XmlReader.Create(stream, new XmlReaderSettings() { IgnoreWhitespace = false }); // Default
return XDocument.Load(xmlReader);
}
public static XDocument LoadDocument_IgnoreWhitespace(Stream stream)
{
var xmlReader = XmlReader.Create(stream, new XmlReaderSettings() { IgnoreWhitespace = true });
return XDocument.Load(xmlReader);
}
const string example1 = "<root><elem>\n\t<![CDATA[MyValue]]>\n</elem></root>";
const string example2 = "<root><elem>\n\t-<![CDATA[MyValue]]>-\n</elem></root>";
[Test]
public void A_Parsing_Example1_WorksAsExpected()
{
var doc = ParseDocument(example1);
var element = doc.Descendants("elem").Single();
Assert.That(element.Value, Is.EqualTo("MyValue"));
}
[Test]
public void B_Loading_Example1_WorksAsExpected()
{
var doc = LoadDocument(new MemoryStream(Encoding.UTF8.GetBytes(example1)));
var element = doc.Descendants("elem").Single();
Assert.That(element.Value, Is.EqualTo("\n\tMyValue\n"));
}
[Test]
public void C_LoadingWithIgnoreWhitespace_Example1_WorksAsExpected()
{
var doc = LoadDocument_IgnoreWhitespace(new MemoryStream(Encoding.UTF8.GetBytes(example1)));
var element = doc.Descendants("elem").Single();
Assert.That(element.Value, Is.EqualTo("MyValue"));
}
[Test]
public void D_Parsing_Example2_WorksAsExpected()
{
var doc = ParseDocument(example2);
var element = doc.Descendants("elem").Single();
Assert.That(element.Value, Is.EqualTo("-MyValue-"));
}
[Test]
public void E_Loading_Example2_WorksAsExpected()
{
var doc = LoadDocument(new MemoryStream(Encoding.UTF8.GetBytes(example2)));
var element = doc.Descendants("elem").Single();
Assert.That(element.Value, Is.EqualTo("\n\t-MyValue-\n"));
}
[Test]
public void F_LoadingWithIgnoreWhitespace_Example2_WorksAsExpected()
{
var doc = LoadDocument_IgnoreWhitespace(new MemoryStream(Encoding.UTF8.GetBytes(example2)));
var element = doc.Descendants("elem").Single();
Assert.That(element.Value, Is.EqualTo("MyValue"));
}
}

CDATAs are difficult. They are not changed by the parser (read). They are not allowed to include invalid characters or ]]>. However some implementations will change them to generate valid XML output (write).
The content of elem depends on the parser and if it ignores the whitespace nodes. elem has 3 child nodes.
whitespace text node with content "\n\t"
cdata section node with content "MyValue"
whitespace text node with content "\n"
So like you noticed if the whitespace nodes are ignored, only the cdata remains. In you second example the result would be different (If repaired).
text node with content "\n\t-"
cdata section node with content "MyValue"
text node with content "-\n"
The first and third node now have non whitespace content (the -). They are no whitespace nodes any more and not ignored depending on the option.

Using an XmlTextAttribute on an array of mixed types removes whitespace strings

I am writing a set of objects that must serialize to and from Xml, following a strict specification that I cannot change. One element in this specification can contain a mix of strings and elements in-line.
A simple example of this Xml output would be this:
<root>Leading text <tag>tag1</tag> <tag>tag2</tag></root>
Note the whitespace characters between the closing of the first tag, and the start of the second tag. Here are the objects that represents this structure:
[XmlRoot("root")]
public class Root
{
[XmlText(typeof(string))]
[XmlElement("tag", typeof(Tag))]
public List<object> Elements { get; set; }
//this is simply for the sake of example.
//gives us four objects in the elements array
public static Root Create()
{
Root root = new Root();
root.Elements.Add("Leading text ");
root.Elements.Add(new Tag() { Text = "tag1" });
root.Elements.Add(" ");
root.Elements.Add(new Tag() { Text = "tag2" });
return root;
}
public Root()
{
Elements = new List<object>();
}
}
public class Tag
{
[XmlText]
public string Text {get;set;}
}
Calling Root.Create(), and saving to a file using this method looks perfect:
public XDocument SerializeToXml(Root obj)
{
XmlSerializer serializer = new XmlSerializer(typeof(Root));
XDocument doc = new XDocument();
using (var writer = doc.CreateWriter())
{
serializer.Serialize(writer, obj);
}
return doc;
}
Serialization looks exactly like the xml structure at the beginning of this post.
Now when I want to serialize an xml file back into a Root object, I call this:
public static Root FromFile(string file)
{
XmlSerializer serializer = new XmlSerializer(typeof(Root));
XmlReaderSettings settings = new XmlReaderSettings();
XmlReader reader = XmlTextReader.Create(file, settings);
//whitespace gone here
Root root = serializer.Deserialize(reader) as Root;
return root;
}
The problem is here. The whitespace string is eliminated. When I call Root.Create(), there are four objects in the Elements array. One of them is a space. This serializes just fine, but when deserializing, there are only 3 objects in Elements. The whitespace string gets eliminated.
Any ideas on what I'm doing wrong? I've tried using xml:space="preserve", as well as a host of XmlReader, XmlTextReader, etc. variations. Note that when I use a StringBuilder to read the XmlTextReader, the xml contains the spaces as I'd expect. Only when calling Deserialize(stream) do I lose the spaces.
Here's a link to an entire working example. It's LinqPad friendly, just copy/paste: http://pastebin.com/8MkUQviB The example opens two files, one a perfect serialized xml file, the second being a deserialized then reserialized version of the first file. Note you'll have to reference System.Xml.Serialization.
Thanks for reading this novel. I hope someone has some ideas. Thank you!

It looks like a bug. Workaround seems to be replace all whitespaces and crlf in XML text nodes by
entities. Semantic equal entities (
) does not work.
<root>Leading text <tag>tag1</tag> <tag>tag2</tag></root>
is working for me.

C# newbie: reading repetitive XML to memory

I'm new to C#. I'm building an application that persists an XML file with a list of elements. The structure of my XML file is as follows:
<Elements>
<Element>
<Name>Value</Name>
<Type>Value</Type>
<Color>Value</Color>
</Element>
<Element>
<Name>Value</Name>
<Type>Value</Type>
<Color>Value</Color>
</Element>
<Element>
<Name>Value</Name>
<Type>Value</Type>
<Color>Value</Color>
</Element>
</Elements>
I have < 100 of those items, and it's a single list (so I'm considering a DB solution to be overkill, even SQLite). When my application loads, I want to read this list of elements to memory. At present, after browsing the web a bit, I'm using XmlTextReader.
However, and maybe I'm using it in the wrong way, I read the data tag-by-tag, and thus expect the tags to be in a certain order (otherwise the code will be messy). What I would like to do is read complete "Element" structures and extract tags from them by name. I'm sure it's possible, but how?
To clarify, the main difference is that the way I'm using XmlTextReader today, it's not tolerant to scenarios such as wrong order of tags (e.g. Type comes before Name in a certain Element).
What's the best practice for loading such structures to memory in C#?

It's really easy to do in LINQ to XML. Are you using .NET 3.5? Here's a sample:
using System;
using System.Xml.Linq;
using System.Linq;
class Test
{
[STAThread]
static void Main()
{
XDocument document = XDocument.Load("test.xml");
var items = document.Root
.Elements("Element")
.Select(element => new {
Name = (string)element.Element("Name"),
Type = (string)element.Element("Type"),
Color = (string)element.Element("Color")})
.ToList();
foreach (var x in items)
{
Console.WriteLine(x);
}
}
}
You probably want to create your own data structure to hold each element, but you just need to change the "Select" call to use that.

Any particular reason you're not using XmlDocument?
XmlDocument myDoc = new XmlDocument()
myDoc.Load(fileName);
foreach(XmlElement elem in myDoc.SelectNodes("Elements/Element"))
{
XmlNode nodeName = elem.SelectSingleNode("Name/text()");
XmlNode nodeType = elem.SelectSingleNode("Type/text()");
XmlNode nodeColor = elem.SelectSingleNode("Color/text()");
string name = nodeName!=null ? nodeName.Value : String.Empty;
string type = nodeType!=null ? nodeType.Value : String.Empty;
string color = nodeColor!=null ? nodeColor.Value : String.Empty;
// Here you use the values for something...
}

It sounds like XDocument, and XElement might be better suited for this task. They might not have the absolute speed of XmlTextReader, but for your cases they sound like they would be appropriate and it would make dealing with fixed structures a lot easier. Parsing out elements would work like so:
XDocument xml;
foreach (XElement el in xml.Element("Elements").Elements("Element")) {
var name = el.Element("Name").Value;
// etc.
}
You can even get a bit fancier with Linq:
XDocument xml;
var collection = from el in xml.Element("Elements").Elements("Element")
select new { Name = el.Element("Name").Value,
Color = el.Element("Color").Value,
Type = el.Element("Type").Value
};
foreach (var item in collection) {
// here you can use item.Color, item.Name, etc..
}

You could use XmlSerializer class (http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.aspx)
public class Element
{
public string Name { get; set; }
public string Type { get; set; }
public string Color { get; set; }
}
class Program
{
static void Main(string[] args)
{
string xml =
#"<Elements>
<Element>
<Name>Value</Name>
<Type>Value</Type>
<Color>Value</Color>
</Element>(...)</Elements>";
XmlSerializer serializer = new XmlSerializer(typeof(Element[]), new XmlRootAttribute("Elements"));
Element[] result = (Element[])serializer.Deserialize(new StringReader(xml));}

You should check out Linq2Xml, http://www.hookedonlinq.com/LINQtoXML5MinuteOverview.ashx

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove Namespace attribute from Xml Node in Serialized Value - c#

Related

Preserve whitespace-only element content when deserializing XML using XmlSerializer

Unable to remove empty xmlns attribute from XElement using c#

How should text nodes with CDATA and whitespace be interpreted in XML?

Using an XmlTextAttribute on an array of mixed types removes whitespace strings

C# newbie: reading repetitive XML to memory

Categories

Resources