How should text nodes with CDATA and whitespace be interpreted in XML? - c#

The System.Xml parsing features had a few surprises for me in store, and I wonder how the following should be interpreted, or if this is "up to the implementation":
Version 1:
<root><elem>
<![CDATA[MyValue]]>
</elem></root>
Version 2:
<root><elem>
-<![CDATA[MyValue]]>-
</elem></root>
What should be the value of elem? Or is it okay that this depends on the implementation that parses it, and should I just deal with that?
I expected (at first) that in both cases all whitespace between the start/end node and the first non-whitespace character would be ignored. This is not the case, but failing that, I would've at least expected it to never be ignored, but this is also not the case. See full repro below for my expectations.
To elaborate...
Two cases had me stumped when I tested them:
XDocument.Parse will suddenly start to include the \n\t whitespace in example 2, whereas it ignored it in example 1.
XDocument.Load with new XmlReaderSettings {IgnoreWhitespace = true} will behave similarly.
What gives? Is this just the implementation being (to my taste) quirky, and/or is this specified behavior?
Here's a full repro of my expectations (fresh C# class library project with latest NUnit package from NuGet):
[TestFixture]
public class XmlTests
{
public static XDocument ParseDocument(string input)
{
return XDocument.Parse(input);
}
public static XDocument LoadDocument(Stream stream)
{
var xmlReader = XmlReader.Create(stream, new XmlReaderSettings() { IgnoreWhitespace = false }); // Default
return XDocument.Load(xmlReader);
}
public static XDocument LoadDocument_IgnoreWhitespace(Stream stream)
{
var xmlReader = XmlReader.Create(stream, new XmlReaderSettings() { IgnoreWhitespace = true });
return XDocument.Load(xmlReader);
}
const string example1 = "<root><elem>\n\t<![CDATA[MyValue]]>\n</elem></root>";
const string example2 = "<root><elem>\n\t-<![CDATA[MyValue]]>-\n</elem></root>";
[Test]
public void A_Parsing_Example1_WorksAsExpected()
{
var doc = ParseDocument(example1);
var element = doc.Descendants("elem").Single();
Assert.That(element.Value, Is.EqualTo("MyValue"));
}
[Test]
public void B_Loading_Example1_WorksAsExpected()
{
var doc = LoadDocument(new MemoryStream(Encoding.UTF8.GetBytes(example1)));
var element = doc.Descendants("elem").Single();
Assert.That(element.Value, Is.EqualTo("\n\tMyValue\n"));
}
[Test]
public void C_LoadingWithIgnoreWhitespace_Example1_WorksAsExpected()
{
var doc = LoadDocument_IgnoreWhitespace(new MemoryStream(Encoding.UTF8.GetBytes(example1)));
var element = doc.Descendants("elem").Single();
Assert.That(element.Value, Is.EqualTo("MyValue"));
}
[Test]
public void D_Parsing_Example2_WorksAsExpected()
{
var doc = ParseDocument(example2);
var element = doc.Descendants("elem").Single();
Assert.That(element.Value, Is.EqualTo("-MyValue-"));
}
[Test]
public void E_Loading_Example2_WorksAsExpected()
{
var doc = LoadDocument(new MemoryStream(Encoding.UTF8.GetBytes(example2)));
var element = doc.Descendants("elem").Single();
Assert.That(element.Value, Is.EqualTo("\n\t-MyValue-\n"));
}
[Test]
public void F_LoadingWithIgnoreWhitespace_Example2_WorksAsExpected()
{
var doc = LoadDocument_IgnoreWhitespace(new MemoryStream(Encoding.UTF8.GetBytes(example2)));
var element = doc.Descendants("elem").Single();
Assert.That(element.Value, Is.EqualTo("MyValue"));
}
}

CDATAs are difficult. They are not changed by the parser (read). They are not allowed to include invalid characters or ]]>. However some implementations will change them to generate valid XML output (write).
The content of elem depends on the parser and if it ignores the whitespace nodes. elem has 3 child nodes.
whitespace text node with content "\n\t"
cdata section node with content "MyValue"
whitespace text node with content "\n"
So like you noticed if the whitespace nodes are ignored, only the cdata remains. In you second example the result would be different (If repaired).
text node with content "\n\t-"
cdata section node with content "MyValue"
text node with content "-\n"
The first and third node now have non whitespace content (the -). They are no whitespace nodes any more and not ignored depending on the option.

Related

how can i get node Element value From LINQ?

i have a collection of service users, i want to iterate over ServiceUsers and extract value from ServiceUser, (ID, USER_NAME, UN_ID, IP, NAME)
<ServiceUsers xmlns="">
<ServiceUser>
<ID>280334</ID>
<USER_NAME>YVELAMGAMOIYENET12:206322102</USER_NAME>
<UN_ID>731937</UN_ID>
<IP>91.151.136.178</IP>
<NAME>?????????????????????: 123456</NAME>
</ServiceUser>
<ServiceUser>
<ID>266070</ID>
<USER_NAME>ACHIBALANCE:206322102</USER_NAME>
<UN_ID>731937</UN_ID>
<IP>185.139.56.37</IP>
<NAME>123456</NAME>
</ServiceUser>
</ServiceUsers>
my Code looks like this, but i am getting null point exception.
XDocument doc = XDocument.Parse(xml)
List<XElement> xElementList = doc.Element("ServiceUsers").Descendants().ToList();
foreach (XElement element in xElementList)
{
string TEST= element.Element("Name").Value;
comboBoxServiceUser.Items.Add(element.Element("Name").Value);
}
I used the example from XmlSerializer.Deserialize Method as the base for the following snippet that reads the provided xml.
var serializer = new XmlSerializer(typeof(ServiceUsers));
ServiceUsers i;
using (TextReader reader = new StringReader(xml))
{
i = (ServiceUsers)serializer.Deserialize(reader);
}
[XmlRoot(ElementName = "ServiceUsers")]
public class ServiceUsers : List<ServiceUser>
{
}
public class ServiceUser
{
[XmlElement(ElementName = "ID")]
public string Id {get; set;}
}
I think the basis of the problem is your trailing 's' to put it short, You iterate ServiceUser not ServiceUsers
Anyway this runs through fine:
[Fact]
public void CheckIteratorTest()
{
var a = Assembly.GetExecutingAssembly();
string[] resourceNames = a.GetManifestResourceNames();
string nameOf = resourceNames.FirstOrDefault(x => x.Contains("SomeXml"));
Assert.NotNull(nameOf);
using var stream = a.GetManifestResourceStream(nameOf);
Assert.NotNull(stream);
var reader = new StreamReader(stream, Encoding.UTF8);
var serialized = reader.ReadToEnd();
var doc = XDocument.Parse(serialized);
var elemList = doc.Root.Elements("ServiceUser").ToList();
Assert.NotEqual(0, elemList.Count);
foreach(var serviceUser in elemList)
{
System.Diagnostics.Debug.WriteLine($"Name : {serviceUser.Name ?? "n.a."}");
}
}
As being said: XML is case-sensitive. Next issue is .Descendants() returns all the descendant nodes, nested ones, etc, 12 nodes in this case. So NullPointerException will happen even if you fix a "typo".
Here is your fixed code:
XDocument doc = XDocument.Parse(xml);
var xElementList = doc
.Element("ServiceUsers") // picking needed root node from document
.Elements("ServiceUser") // picking exact ServiceUser nodes
.Elements("NAME") // picking actual NAME nodes
.ToList();
foreach (XElement element in xElementList)
{
var TEST = element.Value;
Console.WriteLine(TEST); // do what you were doing instead of console
}
Use doc.Element("ServiceUsers").Elements() to get the<ServiceUser> elements. Then you can loop over the child values of those in a nested loop.
var doc = XDocument.Parse(xml);
foreach (XElement serviceUser in doc.Element("ServiceUsers").Elements()) {
foreach (XElement element in serviceUser.Elements()) {
Console.WriteLine($"{element.Name} = {element.Value}");
}
Console.WriteLine("---");
}
Prints:
ID = 280334
USER_NAME = YVELAMGAMOIYENET12:206322102
UN_ID = 731937
IP = 91.151.136.178
NAME = ?????????????????????: 123456
---
ID = 266070
USER_NAME = ACHIBALANCE:206322102
UN_ID = 731937
IP = 185.139.56.37
NAME = 123456
---
Note: Elements() gets the (immediate) child elements where as Descendants() returns all descendants. Using Elements() gives you a better control and allows you to get the properties grouped by user.
You can also get a specific property like this serviceUser.Element("USER_NAME").Value. Note that the tag names are case sensitive!

Remove Namespace attribute from Xml Node in Serialized Value

I'm having to recreate a vendor's XML file. I don't have access to their code, schema, or anything, so I'm doing this using the XmlSerializer and attributes. I'm doing it this way because the system is using a generic XmlWriter I've built to write other system XML files, so I'm killing two birds with one stone. Everything has been working out great, with exception of one property value. The vendor XML looks like this:
<TextOutlTxt>
<p style="text-align:left;margin-top:0pt;margin-bottom:0pt;">
<span>SUBSTA SF6 CIRCUIT BKR CONC FDN "C"</span>
</p>
</TextOutlTxt>
Here's my property configuration:
private string _value;
[XmlElement("TextOutlTxt")]
public XmlNode Value
{
get
{
string text = _value;
text = Regex.Replace(text, #"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
string value = "\n<p style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\">\n<span>ReplaceMe</span>\n</p>\n";
XmlDocument document = new XmlDocument();
document.InnerXml = "<root>" + value + "</root>";
XmlNode innerNode = document.DocumentElement.FirstChild;
innerNode.InnerText = text;
return innerNode;
}
set
{ }
}
And this gives me:
<TextOutlTxt>
<p style="text-align:left;margin-top:0pt;margin-bottom:0pt;" xmlns="">SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;</p>
</TextOutlTxt>
So I'm close, but no cigar. There is an unwanted xmlns="..." attribute; it must not be present. In my XmlWriter, I have done the following to remove the namespace unless found atop the object it is serializing:
protected override void OnWrite<T>(T sourceData, Stream outputStream)
{
IKnownTypesLocator knownTypesLocator = KnownTypesLocator.Instance;
//Let's see if we can get the default namespace
XmlRootAttribute xmlRootAttribute = sourceData.GetType().GetCustomAttributes<XmlRootAttribute>().FirstOrDefault();
XmlSerializer serializer = null;
if (xmlRootAttribute != null)
{
string nameSpace = xmlRootAttribute.Namespace ?? string.Empty;
XmlSerializerNamespaces nameSpaces = new XmlSerializerNamespaces();
nameSpaces.Add(string.Empty, nameSpace);
serializer = new XmlSerializer(typeof(T), new XmlAttributeOverrides(), knownTypesLocator.XmlItems.ToArray(), xmlRootAttribute, nameSpace);
//Now we can serialize
using (StreamWriter writer = new StreamWriter(outputStream))
{
serializer.Serialize(writer, sourceData, nameSpaces);
}
}
else
{
serializer = new XmlSerializer(typeof(T), knownTypesLocator.XmlItems.ToArray());
//Now we can serialize
using (StreamWriter writer = new StreamWriter(outputStream))
{
serializer.Serialize(writer, sourceData);
}
}
}
I'm sure I'm overlooking something. Any help would be greatly appreciated!
UPDATE 9/26/2017
So... I've been asked to provide more detail, specifically an explanation of the purpose of my code, and a reproducible example. So here's both:
The purpose for the XML. I am writing an interface UI between two systems. I read data from one, give users options to massage the data, and then give the the ability to export the data into files the second system can import. It's regarding a bill of material system where system one are the CAD drawings and objects in those drawings and system two is an enterprise estimating system that is also being configured to support electronic bills of material. I was given the XMLs from the vendor to recreate.
Fully functional example code.... I've tried generalizing the code in a reproducible form.
[XmlRoot("OutlTxt", Namespace = "http://www.mynamespace/09262017")]
public class OutlineText
{
private string _value;
[XmlElement("TextOutlTxt")]
public XmlNode Value
{
get
{
string text = _value;
text = Regex.Replace(text, #"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
string value = "\n<p style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\">\n<span>ReplaceMe</span>\n</p>\n";
XmlDocument document = new XmlDocument();
document.InnerXml = "<root>" + value + "</root>";
XmlNode innerNode = document.DocumentElement.FirstChild;
innerNode.InnerText = text;
return innerNode;
}
set
{ }
}
private OutlineText()
{ }
public OutlineText(string text)
{
_value = text;
}
}
public class XmlFileWriter
{
public void Write<T>(T sourceData, FileInfo targetFile) where T : class
{
//This is actually retrieved through a locator object, but surely no one will mind an empty
//collection for the sake of an example
Type[] knownTypes = new Type[] { };
using (FileStream targetStream = targetFile.OpenWrite())
{
//Let's see if we can get the default namespace
XmlRootAttribute xmlRootAttribute = sourceData.GetType().GetCustomAttributes<XmlRootAttribute>().FirstOrDefault();
XmlSerializer serializer = null;
if (xmlRootAttribute != null)
{
string nameSpace = xmlRootAttribute.Namespace ?? string.Empty;
XmlSerializerNamespaces nameSpaces = new XmlSerializerNamespaces();
nameSpaces.Add(string.Empty, nameSpace);
serializer = new XmlSerializer(typeof(T), new XmlAttributeOverrides(), knownTypes, xmlRootAttribute, nameSpace);
//Now we can serialize
using (StreamWriter writer = new StreamWriter(targetStream))
{
serializer.Serialize(writer, sourceData, nameSpaces);
}
}
else
{
serializer = new XmlSerializer(typeof(T), knownTypes);
//Now we can serialize
using (StreamWriter writer = new StreamWriter(targetStream))
{
serializer.Serialize(writer, sourceData);
}
}
}
}
}
public static void Main()
{
OutlineText outlineText = new OutlineText(#"SUBSTA SF6 CIRCUIT BKR CONC FDN ""C""");
XmlFileWriter fileWriter = new XmlFileWriter();
fileWriter.Write<OutlineText>(outlineText, new FileInfo(#"C:\MyDirectory\MyXml.xml"));
Console.ReadLine();
}
The result produced:
<?xml version="1.0" encoding="utf-8"?>
<OutlTxt xmlns="http://www.mynamespace/09262017">
<TextOutlTxt>
<p style="text-align:left;margin-top:0pt;margin-bottom:0pt;" xmlns="">SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;</p>
</TextOutlTxt>
</OutlTxt>
Edit 9/27/2017
Per the request in the solution below, a secondary issue I've ran into is keeping the hexadecimal codes. To illustrate this issue based on the above example, let's say the value between is
SUBSTA SF6 CIRCUIT BKR CONC FDN "C"
The vendor file is expecting the literals to be in their hex code format like so
SUBSTA SF6 CIRCUIT BKR CONC FDN "C"
I've rearranged the sample code Value property to be like so:
private string _value;
[XmlAnyElement("TextOutlTxt", Namespace = "http://www.mynamespace/09262017")]
public XElement Value
{
get
{
string value = string.Format("<p xmlns=\"{0}\" style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\"><span>{1}</span></p>", "http://www.mynamespace/09262017", _value);
string innerXml = string.Format("<TextOutlTxt xmlns=\"{0}\">{1}</TextOutlTxt>", "http://www.mynamespace/09262017", value);
XElement element = XElement.Parse(innerXml);
//Remove redundant xmlns attributes
foreach (XElement descendant in element.DescendantsAndSelf())
{
descendant.Attributes().Where(att => att.IsNamespaceDeclaration && att.Value == "http://www.mynamespace/09262017").Remove();
}
return element;
}
set
{
_value = value == null ? null : value.ToString();
}
}
if I use the code
string text = Regex.Replace(element.Value, #"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
to create the hex code values ahead of the XElement.Parse(), the XElement converts them back to their literal values. If I try to set the XElement.Value directly after the XElement.Parse()(or through SetValue()), it changes the " to " Not only that, but it seems to mess with the element output and adds additional elements throwing it all out of whack.
Edit 9/27/2017 #2 to clarify, the original implementation had a related problem, namely that the escaped text was re-escaped. I.e. I was getting
SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;
But wanted
SUBSTA SF6 CIRCUIT BKR CONC FDN "C"
The reason you are getting xmlns="" added to your embedded XML is that your container element(s) <OutlineText> and <TextOutlTxt> are declared to be in the "http://www.mynamespace/09262017" namespace by use of the [XmlRootAttribute.Namespace] attribute, whereas the embedded literal XML elements are in the empty namespace. To fix this, your embedded XML literal must be in the same namespace as its parent elements.
Here is the XML literal. Notice there is no xmlns="..." declaration anywhere in the XML:
<p style="text-align:left;margin-top:0pt;margin-bottom:0pt;" xmlns="">SUBSTA SF6 CIRCUIT BKR CONC FDN &#x22;C&#x22;</p>
Lacking such a declaration, the <p> element is in the empty namespace. Conversely, your OutlineText type is decorated with an [XmlRoot] attribute:
[XmlRoot("OutlTxt", Namespace = "http://www.mynamespace/09262017")]
public class OutlineText
{
}
Thus the corresponding OutlTxt root element will be in the http://www.mynamespace/09262017 namespace. All its child elements will default to this namespace as well unless overridden. Placing the embedded XmlNode in the empty namespace counts as overriding the parent namespace, and so an xmlns="" attribute is required.
The simplest way to avoid this problem is for your embedded XML string literal to place itself in the correct namespace as follows:
<p xmlns="http://www.mynamespace/09262017" style="text-align:left;margin-top:0pt;margin-bottom:0pt;">
<span>ReplaceMe</span>
</p>
Then, in your Value method, strip redundant namespace declarations. This is somewhat easier to do with the LINQ to XML API:
[XmlRoot("OutlTxt", Namespace = OutlineText.Namespace)]
public class OutlineText
{
public const string Namespace = "http://www.mynamespace/09262017";
private string _value;
[XmlAnyElement("TextOutlTxt", Namespace = OutlineText.Namespace)]
public XElement Value
{
get
{
var escapedValue = EscapeTextValue(_value);
var nestedXml = string.Format("<p xmlns=\"{0}\" style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\"><span>{1}</span></p>", Namespace, escapedValue);
var outerXml = string.Format("<TextOutlTxt xmlns=\"{0}\">{1}</TextOutlTxt>", Namespace, nestedXml);
var element = XElement.Parse(outerXml);
//Remove redundant xmlns attributes
element.DescendantsAndSelf().SelectMany(e => e.Attributes()).Where(a => a.IsNamespaceDeclaration && a.Value == Namespace).Remove();
return element;
}
set
{
_value = value == null ? null : value.Value;
}
}
static string EscapeTextValue(string text)
{
return Regex.Replace(text, #"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
}
private OutlineText()
{ }
public OutlineText(string text)
{
_value = text;
}
}
And the resulting XML will look like:
<OutlTxt xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.mynamespace/09262017">
<TextOutlTxt>
<p style="text-align:left;margin-top:0pt;margin-bottom:0pt;">
<span>SUBSTA SF6 CIRCUIT BKR CONC FDN "C"</span>
</p>
</TextOutlTxt>
</OutlTxt>
Note that I have changed the attribute on Value from [XmlElement] to [XmlAnyElement]. I did this because it appears your value XML might contain multiple mixed content nodes at the root level, e.g.:
Start Text <p>Middle Text</p> End Text
Using [XmlAnyElement] enables this by allowing a container node to be returned without causing an extra level of XML element nesting.
Sample working .Net fiddle.
Your question now has two requirements:
Suppress certain xmlns="..." attributes on an embedded XElement or XmlNode while serializing, AND
Force certain characters inside element text to be escaped (e.g. " => "). Even though this is not required by the XML standard, your legacy receiving system apparently needs this.
Issue #1 can be addressed as in this answer
For issue #2, however, there is no way to force certain characters to be unnecessarily escaped using XmlNode or XElement because escaping is handled at the level of XmlWriter during output. And Microsoft's built-in implementations of XmlWriter seem not to have any settings that can force certain characters that do not need to be escaped to nevertheless be escaped. You would need to try to subclass XmlWriter or XmlTextWriter (as described e.g. here and here) then intercept string values as they are written and escape quote characters as desired.
Thus, as an alternate approach that solves both #1 and #2, you could implement IXmlSerializable and write your desired XML directly with XmlWriter.WriteRaw():
[XmlRoot("OutlTxt", Namespace = OutlineText.Namespace)]
public class OutlineText : IXmlSerializable
{
public const string Namespace = "http://www.mynamespace/09262017";
private string _value;
// For debugging purposes.
internal string InnerValue { get { return _value; } }
static string EscapeTextValue(string text)
{
return Regex.Replace(text, #"[\a\b\f\n\r\t\v\\""'&<>]", m => string.Join(string.Empty, m.Value.Select(c => string.Format("&#x{0:X};", Convert.ToInt32(c))).ToArray()));
}
private OutlineText()
{ }
public OutlineText(string text)
{
_value = text;
}
#region IXmlSerializable Members
XmlSchema IXmlSerializable.GetSchema()
{
return null;
}
void IXmlSerializable.ReadXml(XmlReader reader)
{
_value = ((XElement)XNode.ReadFrom(reader)).Value;
}
void IXmlSerializable.WriteXml(XmlWriter writer)
{
var escapedValue = EscapeTextValue(_value);
var nestedXml = string.Format("<p style=\"text-align:left;margin-top:0pt;margin-bottom:0pt;\"><span>{0}</span></p>", escapedValue);
writer.WriteRaw(nestedXml);
}
#endregion
}
And the output will be
<OutlTxt xmlns="http://www.mynamespace/09262017"><p style="text-align:left;margin-top:0pt;margin-bottom:0pt;"><span>SUBSTA SF6 CIRCUIT BKR CONC FDN "C"</span></p></OutlTxt>
Note that, if you use WriteRaw(), you can easily generate invalid XML simply by writing markup characters embedded in text values. You should be sure to add unit tests that verify that does not occur, e.g. that new OutlineText(#"<") does not cause problems. (A quick check seems to show your Regex is escaping < and > appropriately.)
New sample .Net fiddle.

XmlElement InnerText property

I'm delving into the world of XmlDocument building and thought I'd try to re-build (at least, in part) the Desktop tree given by Microsoft's program UISpy.
So far I am able to grab a child of the desktop and write that to a XML document, and then grab each child of that and write those to an XML document.
So far the code looks like this...
using System.Windows.Automation;
using System.Xml;
namespace MyTestApplication
{
internal class TestXmlStuff
{
public static void Main(string[] args)
{
XmlDocument xDocument = new XmlDocument();
AutomationElement rootElement = AutomationElement.RootElement;
TreeWalker treeWalker = TreeWalker.ContentViewWalker;
XmlNode rootXmlElement = xDocument.AppendChild(xDocument.CreateElement("Desktop"));
AutomationElement autoElement = rootElement.FindFirst(TreeScope.Children, new PropertyCondition(AutomationElement.NameProperty, "GitHub"));
string name = autoElement.Current.Name;
while (autoElement != null)
{
string lct = autoElement.Current.LocalizedControlType.Replace(" ", "");
lct = (lct.Equals("") ? "Cusotm" : lct);
XmlElement temp = (XmlElement)rootXmlElement.AppendChild(xDocument.CreateElement(lct));
//temp.InnerText = lct;
string outerXML = temp.OuterXml;
rootXmlElement = temp;
autoElement = treeWalker.GetNextSibling(autoElement);
}
}
}
}
...and the resulting XML file...
Now, when I add a line to change the InnerText Property of each XML element, like temp.InnerText = lct I get an oddly formated XML file.
What I expected from this was that each InnerText would be on the same line as the start and end tags of the XML element, but instead all but the last element's InnerText is located on a new line.
So my question is, why is that? Is there something else I could be doing with my XML elements to have their InnerText appear on the same line?
As I said in a comment, XML isn't a display format, so it gets formatted however IE chooses to do so.
To get closer to what you were expecting, you might want to consider using an attribute rather than innertext:
XmlElement temp = (XmlElement)rootXmlElement.AppendChild(xDocument.CreateElement(lct));
var attr = xDocument.CreateAttribute("type");
attr.Value = lct;
temp.Attributes.Append(attr);
IE displays the attributes within the opening element, which may be good enough for your purposes.
From the XML perspective, what you're currently creating is called Mixed Content - you have an element that contains both text and other elements. From a hierarchical perspective, those text nodes and other elements occupy the same position within the hierarchy - so I'd assume that this is why IE is displaying them as "equals" - both nested under their parent element and at the same indentation level.

Using an XmlTextAttribute on an array of mixed types removes whitespace strings

I am writing a set of objects that must serialize to and from Xml, following a strict specification that I cannot change. One element in this specification can contain a mix of strings and elements in-line.
A simple example of this Xml output would be this:
<root>Leading text <tag>tag1</tag> <tag>tag2</tag></root>
Note the whitespace characters between the closing of the first tag, and the start of the second tag. Here are the objects that represents this structure:
[XmlRoot("root")]
public class Root
{
[XmlText(typeof(string))]
[XmlElement("tag", typeof(Tag))]
public List<object> Elements { get; set; }
//this is simply for the sake of example.
//gives us four objects in the elements array
public static Root Create()
{
Root root = new Root();
root.Elements.Add("Leading text ");
root.Elements.Add(new Tag() { Text = "tag1" });
root.Elements.Add(" ");
root.Elements.Add(new Tag() { Text = "tag2" });
return root;
}
public Root()
{
Elements = new List<object>();
}
}
public class Tag
{
[XmlText]
public string Text {get;set;}
}
Calling Root.Create(), and saving to a file using this method looks perfect:
public XDocument SerializeToXml(Root obj)
{
XmlSerializer serializer = new XmlSerializer(typeof(Root));
XDocument doc = new XDocument();
using (var writer = doc.CreateWriter())
{
serializer.Serialize(writer, obj);
}
return doc;
}
Serialization looks exactly like the xml structure at the beginning of this post.
Now when I want to serialize an xml file back into a Root object, I call this:
public static Root FromFile(string file)
{
XmlSerializer serializer = new XmlSerializer(typeof(Root));
XmlReaderSettings settings = new XmlReaderSettings();
XmlReader reader = XmlTextReader.Create(file, settings);
//whitespace gone here
Root root = serializer.Deserialize(reader) as Root;
return root;
}
The problem is here. The whitespace string is eliminated. When I call Root.Create(), there are four objects in the Elements array. One of them is a space. This serializes just fine, but when deserializing, there are only 3 objects in Elements. The whitespace string gets eliminated.
Any ideas on what I'm doing wrong? I've tried using xml:space="preserve", as well as a host of XmlReader, XmlTextReader, etc. variations. Note that when I use a StringBuilder to read the XmlTextReader, the xml contains the spaces as I'd expect. Only when calling Deserialize(stream) do I lose the spaces.
Here's a link to an entire working example. It's LinqPad friendly, just copy/paste: http://pastebin.com/8MkUQviB The example opens two files, one a perfect serialized xml file, the second being a deserialized then reserialized version of the first file. Note you'll have to reference System.Xml.Serialization.
Thanks for reading this novel. I hope someone has some ideas. Thank you!
It looks like a bug. Workaround seems to be replace all whitespaces and crlf in XML text nodes by
entities. Semantic equal entities (
) does not work.
<root>Leading text <tag>tag1</tag> <tag>tag2</tag></root>
is working for me.

XML parser, multiple roots

This is part of the input string, i can't modify it, it will always come in this way(via shared memory), but i can modify after i have put it into a string of course:
<sys><id>SCPUCLK</id><label>CPU Clock</label><value>2930</value></sys><sys><id>SCPUMUL</id><label>CPU Multiplier</label><value>11.0</value></sys><sys><id>SCPUFSB</id><label>CPU FSB</label><value>266</value></sys>
i've read it with both:
String.Concat(
XElement.Parse(encoding.GetString(bytes))
.Descendants("value")
.Select(v => v.Value));
and:
XmlDocument document = new XmlDocument();
document.LoadXml(encoding.GetString(bytes));
XmlNode node = document.DocumentElement.SelectSingleNode("//value");
Console.WriteLine("node = " + node);
but they both have an error when run; that the input has multiple roots(There are multiple root elements quote), i don't want to have to split the string.
Is their any way to read the string take the value between <value> and </value> without spiting the string into multiple inputs?
That's not a well-formed XML document, so most XML tools won't be able to process it.
An exception is the XmlReader. Look up XmlReaderSettings.ConformanceLevel in MSDN. If you set it to ConformanceLevel.Fragment, you can create an XmlReader with those settings and use it to read elements from a stream that has no top-level element.
You have to write code that uses XmlReader.Read() to do this - you can't just feed it to an XmlDocument (which does require that there be a single top-level element).
e.g.,
var readerSettings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };
using (var reader = XmlReader.Create(stream, readerSettings))
{
while (reader.Read())
{
using (var fragmentReader = reader.ReadSubtree())
{
if (fragmentReader.Read())
{
var fragment = XNode.ReadFrom(fragmentReader) as XElement;
// do something with fragment
}
}
}
}
XML elements must have ONE root element, with whatever child structure you want.
Your xml string looks like:
<sys>
...
</sys>
<sys>
...
</sys>
The valid version would be:
<someRootElement>
<sys>
...
</sys>
<sys>
...
</sys>
</someElement>
Try:
XmlDocument document = new XmlDocument();
document.LoadXml("<root>"+encoding.GetString(bytes)+"</root>");
XmlNode node = document.DocumentElement.SelectSingleNode("//value");
Console.WriteLine("node = " + node);
This solution parses successfully all node types, including text nodes:
var settings = new XmlReaderSettings{ConformanceLevel = ConformanceLevel.Fragment};
var reader = XmlReader.Create(stream, settings);
while(reader.Read())
{
while(reader.NodeType != XmlNodeType.None)
{
if(reader.NodeType == XmlNodeType.XmlDeclaration)
{
reader.Skip();
continue;
}
XNode node = XNode.ReadFrom(reader);
}
}
Skips the XML declaration because it isn't a node.

Categories

Resources