Avoid escaping ' entity with .NET XmlDocument class - c#

How can I avoid XmlDocument class replace ' entity with the ' character?
For example if I have:
string xml = "<a> &apos; </a>";
After doing
var doc = new XmlDocument();
doc.LoadXml(xml);
string output = doc.OutterXml;
The value of output is
"<a>'</a>"
I need to avoid this because I must load an XML, make some changes and sign it digitally so the signed XML must be the same loaded.

For your specific requirements, don't use XmlDocument or any other XML parser to parse the original document.
Do use XmlDocument or any other XML-specific classes to create your new document, except put a placeholder where the original document needs to go, like ORIGINAL_DOCUMENT_HERE. Then after you've generated the resulting text XML for your new document, replace ORIGINAL_DOCUMENT_HERE with your original received text, and then sign the result.
Not a normal way to work with XML, but should work for your specific use case.

Related

Making XmlReaderSettings CheckCharacters work for xml string

I have an xml string coming from Adobe PDF AcroForms, which apparently allows naming form fields starting with numeric characters. I'm trying to parse this string to an XDocument:
XDocument xDocument = XDocument.Parse(xmlString);
But whenever I encounter such a form field where the name starts with a numeric char, the xml parsing throws an XmlException:
Name cannot begin with the 'number' character
Other solutions I found were about using: XmlReaderSettings.CheckCharacters
using (XmlReader xmlReader = XmlReader.Create(new StringReader(xmlString), new XmlReaderSettings() { CheckCharacters = false }))
{
XDocument xDocument = XDocument.Load(xmlReader);
}
But this also didn't work. Some articles pointed out the reason as one of the points mentioned in MSDN article:
If the XmlReader is processing text data, it always checks that the
XML names and text content are valid, regardless of the property
setting. Setting CheckCharacters to false turns off character checking
for character entity references.
So I tried using:
using(MemoryStream memoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(xmlString)))
using (XmlReader xmlReader = XmlReader.Create(memoryStream, new XmlReaderSettings() { CheckCharacters = false }))
{
XDocument xDocument = XDocument.Load(xmlReader);
}
This also didn't work.
Can any one please help me in figuring out how to parse an xml string that contains xml elements whose name starts with numeric characters?
How is the flag XmlReaderSettings.CheckCharacters supposed to be used?
You can't make standard XML parser parse your format even if it "looks like" XML, stop trying. Standard-compliant XML parsers are disallowed to parse invalid XML. This was a design decision, based on all the problems quirks mode caused with HTML parsing.
Writing your own parser isn't that hard. XML is very strict and, unless you need advanced features, the syntax is simple.
LL parser can be written by hand. Both lexer and parser are simple.
LR parser can be generated using ANTLR and a simple grammar. Most likely, you'll even find example XML garmmars.
You can also just take either of .NET XML parsers' source code and remove validation you don't need. You can find both XmlDocument and XDocument in .NET Core's repository on GitHub.

C# Xml Encoding

I'm freaking out with C# and XmlDocuments right now.
I need to parse XML data into another XML but I can't get special characters to work.
I'm working with XmlDocument and XmlNode.
What I tried so far:
- XmlDocument.CreateXmlDeclaration("1.0", "UTF-8", "yes");
- XmlTextWriter writer = new XmlTextWriter(outputDir + "systems.xml", Encoding.UTF8);
What I know for sure:
- The input XML is also UTF-8
- The "InnerText" value is encoded without replacing the characters
Here is some code (not all... way to much code):
XmlDocument newXml = new XmlDocument();
newXml = (XmlDocument)systemsTemplate.Clone();
newXml.CreateXmlDeclaration("1.0", "UTF-8", "yes");
newXml.SelectSingleNode("systems").RemoveAll();
foreach(XmlNode categories in exSystems.SelectNodes("root/Content/Systems/SystemLine"))
{
XmlNode categorieSystemNode = systemsTemplate.SelectSingleNode("systems/system").Clone();
categorieSystemNode.RemoveAll();
XmlNode importIdNode = systemsTemplate.SelectSingleNode("systems/system/import_id").Clone();
string import_id = categories.Attributes["nodeName"].Value;
importIdNode.InnerText = import_id;
categorieSystemNode.AppendChild(importIdNode);
[way more Nodes which I proceed like this]
}
newXml.SelectSingleNode("systems").AppendChild(newXml.ImportNode(categorieSystemNode, true));
XmlTextWriter writer = new XmlTextWriter(outputDir + "systems.xml", Encoding.UTF8);
writer.Formatting = Formatting.Indented;
newXml.Save(writer);
writer.Flush();
writer.Close();
But what I get is this as an example:
<intro><p>Whether your project [...]</intro>
Instead of this:
<intro><p>Whether your project [...] </p></intro>
I do have other non-html tags in the XML so please don't provide HTML-parsing solutions :/
I know I could replace the characters with String.Replace() but that's dirty and unsafe (and slow with around 20K lines).
I hope there is a simpler way of doing this.
Kind regards,
Eriwas
The main propose of XmlDocument is to provide an easy way to work with XML documents while making sure the outcome is a well formed document.
So, using InnerText as in your example, you let the framework encode the string and properly insert it into that document. Whenever you read that same value, it will be decoded and returned to you exactly as your original string.
But, if you want to add an XML fragment anyways, you should stick with InnerXml or ImportNode. You must be aware that could lead to a more complex document structure, and you probably would like to avoid that.
As a third possibility, you can use the CreateCDataSection to add a CDATA and add your text there.
You definitely should be away from treating that XML document as a string by trying Replace things; stick with the framework and you'll be ok.

Get XML from XPathDocument

I am working on a stylesheet and have some initial XML. However the XML is being manipulated a bit before styling and i would like to get the final XML sent into .Transform(). For instance, ...
XslCompiledTransform.Transform( xpd, xslArg, output )
...i would like to get the Xml content of xpd (as a string), so i can work on the stylesheet in other tools.
Is there a quick-and-dirty way to get this? Either in the VS2010 immediate window or as a quick C# line or two before the call to .Transform()?
EDIT: The .Transform() i'm using is
public void Transform(IXPathNavigable input,
XsltArgumentList arguments, TextWriter results);
...and xpd is an XPathDocument.
Edit: I misunderstood the intent of your question. The simple answer is to get the XML for any IXPathNavigable (which includes XPathDocument), you can do this:
string xml = xpd.CreateNavigator().OuterXml;
Below is my original answer, which explains how you could modify the XML from an XPathDocument in code before feeding it into a transform:
If xpd is an XPathDocument, you might be able to just get an XPathNavigator from the XPathDocument:
XPathNavigator xpn = xpd.CreateNavigator();
and use that to modify the XML. When you're done modifying it, you can just pass either xpn or xpd into the Transform() method. On the other hand, MSDN says that XPathDocument's CreateNavigator() creates a readonly navigator, so that may be a bit of a hitch.
If it really is readonly, you should be able to do this:
XmlDocument doc = new XmlDocument();
doc.LoadXml(xpd.CreateNavigator().OuterXml);
then use doc to modify the XML and pass doc into the transform when you're done.

C# XMLDocument Encoding?

I'm trying to code a function that validates an XML settings file, so if a node does not exist on the file, it should create it.
I have this function
private void addMissingSettings() {
XmlDocument xmldocSettings = new XmlDocument();
xmldocSettings.Load("settings.xml");
XmlNode xmlMainNode = xmldocSettings.SelectSingleNode("settings");
XmlNode xmlChildNode = xmldocSettings.CreateElement("ExampleNode");
xmlChildNode.InnerText = "Hello World!";
//add to parent node
xmlMainNode.AppendChild(xmlChildNode);
xmldocSettings.Save("settings.xml");
}
But on my XML file, if I have
<rPortSuffix desc="Read Suffix">
</rPortSuffix>
<wPortSuffix desc="Write Suffix"></wPortSuffix>
When the I save the document, it saves those lines as
<rPortSuffix desc="Read Suffix">
</rPortSuffix>
<wPortSuffix desc="Sufijo en puerto de escritura"></wPortSuffix>
<ExampleNode>Hello World!</ExampleNode>
Is there a way to prevent this behaviour? Like setting a working charset or something like that?
The two files are equivalent, and should be treated as being equivalent by all XML parsers, I believe.
Additionally, Unicode character U+0003 isn't a valid XML character, so you've fundamentally got other problems if you're trying to represent it in your file. Even though that particular .NET XML parser doesn't seem to object, other parsers may well do so.
If you need to represent absolutely arbitrary characters in your XML, I suggest you do so in some other form - e.g.
<rPortSuffix desc="Read Suffix">\u000c\u000a</rPortSuffix>
<wPortSuffix desc="Write Suffix">\u0003</wPortSuffix>
Obviously you'll then need to parse that text appropriately, but at least the XML parser won't get in the way, and you'll be able to represent any UTF-16 code unit.

C# XML conversion

I have a string containing fully formatted XML data, created using a Perl script.
I now want to convert this string into an actual XML file in C#. Is there anyway to do this?
Thanks,
You can load a string into an in-memory representation, for example, using the LINQ to SQL XDocument type. Loading string can be done using Parse method and saving the document to a file is done using the Save method:
open System.Xml.Linq;
XDocument doc = XDocument.Parse(xmlContent);
doc.Save(fileName);
The question is why would you do that, if you already have correctly formatted XML document?
A good reasons that I can think of are:
To verify that the content is really valid XML
To generate XML with nice indentation and line breaks
If that's not what you need, then you should just write the data to a file (as others suggest).
Could be as simple as
File.WriteAllText(#"C:\Test.xml", "your-xml-string");
or
File.WriteAllText(#"C:\Test.xml", "your-xml-string", Encoding.UTF8);
XmlDocument doc = new XmlDocument();
doc.Load(... your string ...);
doc.Save(... your destination path...);
see also
http://msdn.microsoft.com/fr-fr/library/d5awd922%28v=VS.80%29.aspx

Categories

Resources