How to read xml string ignoring header? - c#

I want to read a xml string ignoring the header and the comments.
To ignore the comments it's simples and I found a solution here.
But I'm not finding any solution to ignore the header.
Let me give an example:
Consider this xml:
<?xml version="1.0" encoding="iso-8859-1"?>
<!-- Some comments -->
<Tag Attribute="3">
...
</Tag>
I want to read the xml to a string obtaining just the element "Tag" and others elements but withou the "xml version" and the comments.
The element "Tag" is only an example. Could exist many others.
So, I want only this:
<Tag Attribute="3">
...
</Tag>
The code that I've come so far:
XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreComments = true;
XmlReader reader = XmlReader.Create("...", settings);
xmlDoc.Load(reader);
And I'm not finding anything on XmlReaderSettings to do that.
Do I need to go node by node choosing only the ones I want? This setting does not exist?
EDIT 1:
Just to resume my problem. I need the contents of the xml to use in a CDATA of a WebService. When I'm sending comments or xml version, I'm getting an specific error of that part of xml. So I assume that when I read the xml without the version, header and comments I'll be good to go.

Here's a really simple solution.
using (var reader = XmlReader.Create(/*reader, stream, etc.*/)
{
reader.MoveToContent();
string content = reader.ReadOuterXml();
}

Well, it seems that there is no settings to ignore declaration, so I had to ignore it myself.
Here's the code I've written for those who might be interested:
private string _GetXmlWithoutHeadersAndComments(XmlDocument doc)
{
string xml = null;
// Loop through the child nodes and consider all but comments and declaration
if (doc.HasChildNodes)
{
StringBuilder builder = new StringBuilder();
foreach (XmlNode node in doc.ChildNodes)
if (node.NodeType != XmlNodeType.XmlDeclaration && node.NodeType != XmlNodeType.Comment)
builder.Append(node.OuterXml);
xml = builder.ToString();
}
return xml;
}

If you want to only get the Tag elements, you should just read the XML as normal, then find them using the XmlDocument's XPath capabilities.
For your xmlDoc object:
var nodes = xmlDoc.DocumentElement.SelectNodes("Tag");
You can then iterate through these like so:
foreach (XmlNode node in nodes) { }
Or, obviously, you could just put your SelectNodes query into the foreach loop, if you're never going to reuse the nodes object.
This will return all Tag elements within your XML document, and you can do whatever you see fit with them.
There's no need to ever encounter comments while using XmlDocument if you don't want to, and you're not going to end up getting results including either the header or the comments. Is there a particular reason you're trying to remove pieces of the XML before you begin parsing it?
Edit: Based on your edit, it seems like you're having a problem with the header giving an error when you try to pass it. You probably shouldn't straight-up remove the header, so your best option might be to change the header to one that you know works. You can change the header (declaration) like so:
XmlDeclaration xmlDeclaration;
xmlDeclaration = yourDocument.CreateXmlDeclaration(
yourVersion,
yourEncoding,
isStandalone);
yourDocument.ReplaceChild(xmlDeclaration, doc.FirstChild);

Related

How to select child node with XPath

I'm trying to get values from a XML document using the iXF format, but I'm having some issues with the XPath syntax.
I have the following XML document
<SOAP_ENV:Envelope xmlns:NS2="http://www.ixfstd.org/std/ns/core/classBehaviors/links/1.0" xmlns:NS1="CATIA/V5/Electrical/1.0" xmlns:tns="IXF_Schema.xsd" xmlns:ixf="http://www.ixfstd.org/std/ns/core/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP_ENV="http://schemas.xmlsoap.org/soap/envelope/" xsi:schemaLocation="IXF_Schema.xsd ElectricalSchema.xsd">
<SOAP_ENV:Body>
<ixf:object id="Electrical Physical System00000089.1" xsi:type="tns:Harness">
<tns:Name>Electrical Physical System00000089.1</tns:Name>
</ixf:object>
<ixf:object id="X10(1)//X11(1)" xsi:type="tns:Wire">
<tns:Name>X10(1)//X11(1)</tns:Name>
<NS1:Wire>
<NS1:Length>763,752mm</NS1:Length>
<NS1:Color>RD</NS1:Color>
<NS1:OuterDiameter>1,32mm</NS1:OuterDiameter>
</NS1:Wire>
</ixf:object>
</SOAP_ENV:Body>
</SOAP_ENV:Envelope>
And i'm trying to find all the Wire objects and get the Name and Length values with the following code.
XmlDocument xlDocument = new XmlDocument();
xlDocument.Load(importFile);
XmlNamespaceManager nsManager = new XmlNamespaceManager(xlDocument.NameTable);
nsManager.AddNamespace("tns", "IXF_Schema.xsd");
nsManager.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
nsManager.AddNamespace("ixf", "http://www.ixfstd.org/std/ns/core/1.0");
nsManager.AddNamespace("NS1", "CATIA/V6/Electrical/1.0");
nsManager.AddNamespace("NS2", "http://www.ixfstd.org/std/ns/core/classBehaviors/links/1.0");
//Get all wire objects
XmlNodeList wires = xlDocument.SelectNodes("descendant::ixf:object[#xsi:type = \"tns:Wire\"]", nsManager);
foreach (XmlNode wire in wires)
{
string wireName;
string wireLength;
XmlNode node = wire.SelectSingleNode("./tns:Name", nsManager);
wireName = node.InnerText;
XmlNode node1 = wire.SelectSingleNode("./NS1:Wire/NS1:Length", nsManager);
wireLength = node1.InnerText;
}
I can get the wireName value without any problems but the Length element selection always returns 0 matches and I can not figure out why. I also tried to only select the Wire element using the same syntax as the Name element ./NS1:Wire but that also returns 0 matches.
Your XML declares
xmlns:NS1="CATIA/V5/Electrical/1.0"
^^
Your C# declares a different namespacem
nsManager.AddNamespace("NS1", "CATIA/V6/Electrical/1.0")
^^
Make sure both namespaces match exactly.
Regarding your comment asking about the use of version numbers in namespaces...
It is an unfortunately common but certainly not widely accepted practice to include a version number in an XML namespace. Realize that by doing so, you're effectively saying that every namespaced XML component (element or attribute) should now be considered to differ from its counterpart in the old namespace. This is rarely what you want.
See also
Should I use a Namespace of an XML file to identify its version
What are the best practices for versioning XML schemas?

C# Xml Encoding

I'm freaking out with C# and XmlDocuments right now.
I need to parse XML data into another XML but I can't get special characters to work.
I'm working with XmlDocument and XmlNode.
What I tried so far:
- XmlDocument.CreateXmlDeclaration("1.0", "UTF-8", "yes");
- XmlTextWriter writer = new XmlTextWriter(outputDir + "systems.xml", Encoding.UTF8);
What I know for sure:
- The input XML is also UTF-8
- The "InnerText" value is encoded without replacing the characters
Here is some code (not all... way to much code):
XmlDocument newXml = new XmlDocument();
newXml = (XmlDocument)systemsTemplate.Clone();
newXml.CreateXmlDeclaration("1.0", "UTF-8", "yes");
newXml.SelectSingleNode("systems").RemoveAll();
foreach(XmlNode categories in exSystems.SelectNodes("root/Content/Systems/SystemLine"))
{
XmlNode categorieSystemNode = systemsTemplate.SelectSingleNode("systems/system").Clone();
categorieSystemNode.RemoveAll();
XmlNode importIdNode = systemsTemplate.SelectSingleNode("systems/system/import_id").Clone();
string import_id = categories.Attributes["nodeName"].Value;
importIdNode.InnerText = import_id;
categorieSystemNode.AppendChild(importIdNode);
[way more Nodes which I proceed like this]
}
newXml.SelectSingleNode("systems").AppendChild(newXml.ImportNode(categorieSystemNode, true));
XmlTextWriter writer = new XmlTextWriter(outputDir + "systems.xml", Encoding.UTF8);
writer.Formatting = Formatting.Indented;
newXml.Save(writer);
writer.Flush();
writer.Close();
But what I get is this as an example:
<intro><p>Whether your project [...]</intro>
Instead of this:
<intro><p>Whether your project [...] </p></intro>
I do have other non-html tags in the XML so please don't provide HTML-parsing solutions :/
I know I could replace the characters with String.Replace() but that's dirty and unsafe (and slow with around 20K lines).
I hope there is a simpler way of doing this.
Kind regards,
Eriwas
The main propose of XmlDocument is to provide an easy way to work with XML documents while making sure the outcome is a well formed document.
So, using InnerText as in your example, you let the framework encode the string and properly insert it into that document. Whenever you read that same value, it will be decoded and returned to you exactly as your original string.
But, if you want to add an XML fragment anyways, you should stick with InnerXml or ImportNode. You must be aware that could lead to a more complex document structure, and you probably would like to avoid that.
As a third possibility, you can use the CreateCDataSection to add a CDATA and add your text there.
You definitely should be away from treating that XML document as a string by trying Replace things; stick with the framework and you'll be ok.

Best way to read, modify, and write XML

My plan is to read in an XML document using my C# program, search for particular entries which I'd like to change, and then write out the modified document. However, I've become unstuck because it's hard to differentiate between elements, whether they start or end using XmlTextReader which I'm using to read in the file. I could do with a bit of advice to put me on the right track.
The document is a HTML document, so as you can imagine, it's quite complicated.
I'd like to search for an element id within the HTML document, so for example look for this and change the src;
<img border="0" src="bigpicture.png" width="248" height="36" alt="" id="lookforthis" />
If it's actually valid XML, and will easily fit in memory, I'd choose LINQ to XML (XDocument, XElement etc) every time. It's by far the nicest XML API I've used. It's easy to form queries, and easy to construct new elements too.
You can use XPath where that's appropriate, or the built-in axis methods (Elements(), Descendants(), Attributes() etc). If you could let us know what specific bits you're having a hard time with, I'd be happy to help work out how to express them in LINQ to XML.
If, on the other hand, this is HTML which isn't valid XML, you'll have a much harder time - because XML APIs generalyl expect to work with valid XML documents. You could use HTMLTidy first of course, but that may have undesirable effects.
For your specific example:
XDocument doc = XDocument.Load("file.xml");
foreach (var img in doc.Descendants("img"))
{
// src will be null if the attribute is missing
string src = (string) img.Attribute("src");
img.SetAttributeValue("src", src + "with-changes");
}
Are the documents you are processing relatively small? If so, you could load them into memory using an XmlDocument object, modify it, and write the changes back out.
XmlDocument doc = new XmlDocument();
doc.Load("path_to_input_file");
// Make changes to the document.
using(XmlTextWriter xtw = new XmlTextWriter("path_to_output_file", Encoding.UTF8)) {
xtw.Formatting = Formatting.Indented; // optional, if you want it to look nice
doc.WriteContentTo(xtw);
}
Depending on the structure of the input XML, this could make your parsing code a bit simpler.
Here's a tool I wrote to modify an IAR EWARM project (ewp) file, adding a linker define to the project. From the command line, you run it with 2 arguments, the input and output file names (*.ewp).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
namespace ewp_tool
{
class Program
{
static void Main(string[] args)
{
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
XmlNodeList list = doc.SelectNodes("/project/configuration[name='Debug']/settings[name='ILINK']/data/option[name='IlinkConfigDefines']/state");
foreach(XmlElement x in list) {
x.InnerText = "MAIN_APP=1";
}
using (XmlTextWriter xtw = new XmlTextWriter(args[1], Encoding.UTF8))
{
//xtw.Formatting = Formatting.Indented; // leave this out, it breaks EWP!
doc.WriteContentTo(xtw);
}
}
}
}
The structure of the XML looks like this
<U+FEFF><?xml version="1.0" encoding="iso-8859-1"?>
<project>
<fileVersion>2</fileVersion>
<configuration>
<name>Debug</name>
<toolchain>
<name>ARM</name>
</toolchain>
<debug>1</debug>
...
<settings>
<name>ILINK</name>
<archiveVersion>0</archiveVersion>
<data>
...
<option>
<name>IlinkConfigDefines</name>
<state>MAIN_APP=0</state>
</option>
If you have smaller documents which fit in computers memory you can use XmlDocument.
Otherwise you can use XmlReader to iterate through the document.
Using XmlReader you can find out the elements type using:
while (xml.Read()) {
switch xml.NodeType {
case XmlNodeType.Element:
//Do something
case XmlNodeType.Text:
//Do something
case XmlNodeType.EndElement:
//Do something
}
}
For the task in hand - (read existing doc, write, and modify in a formalised way) I'd go with XPathDocument run through an XslCompiledTransform.
Where you can't formalise, don't have pre-existing docs or generally need more adaptive logic, I'd go with LINQ and XDocument like Skeet says.
Basically if the task is transformation then XSLT, if the task is manipulation then LINQ.
My favorite tool for this kind of thing is HtmlAgilityPack. I use it to parse complex HTML documents into LINQ-queryable collections. It is an extremely useful tool for querying and parsing HTML (which is often not valid XML).
For your problem, the code would look like:
var htmlDoc = HtmlAgilityPack.LoadDocument(stringOfHtml);
var images = htmlDoc.DocumentNode.SelectNodes("//img[id=lookforthis]");
if(images != null)
{
foreach (HtmlNode node in images)
{
node.Attributes.Append("alt", "added an alt to lookforthis images.");
}
}
htmlDoc.Save('output.html');
One fairly easy approach would be to create a new XmlDocument, then use the Load() method to populate it. Once you've got the document, you can use CreateNavigator() to get an XPathNavigator object that you can use to find and alter elements in the document. Finally, you can use the Save() method on the XmlDocument to write the changed document back out.
Just start by reading the documentation of the Xml namespace on the MSDN. Then if you have more specific questions, post them here...

How to merge two XmlDocuments in C#

I want to merge two XmlDocuments by inserting a second XML doc to the end of an existing Xmldocument in C#. How is this done?
Something like this:
foreach (XmlNode node in documentB.DocumentElement.ChildNodes)
{
XmlNode imported = documentA.ImportNode(node, true);
documentA.DocumentElement.AppendChild(imported);
}
Note that this ignores the document element itself of document B - so if that has a different element name, or attributes you want to copy over, you'll need to work out exactly what you want to do.
EDIT: If, as per your comment, you want to embed the whole of document B within document A, that's relatively easy:
XmlNode importedDocument = documentA.ImportNode(documentB.DocumentElement, true);
documentA.DocumentElement.AppendChild(importedDocument);
This will still ignore things like the XML declaration of document B if there is one - I don't know what would happen if you tried to import the document itself as a node of a different document, and it included an XML declaration... but I suspect this will do what you want.
Inserting an entire XML document at the end of another XML document is actually guaranteed to produce invalid XML. XML requires that there be one, and only one "document" element. So, assuming that your files were as follows:
A.xml
<document>
<element>value1</element>
<element>value2</element>
</document>
B.xml
<document>
<element>value3</element>
<element>value4</element>
</document>
The resultant document by just appending one at the end of the other:
<document>
<element>value1</element>
<element>value2</element>
</document>
<document>
<element>value3</element>
<element>value4</element>
</document>
Is invalid XML.
Assuming, instead, that the two documents share a common document element, and you want to insert the children of the document element from B into A's document element, you could use the following:
var docA = new XmlDocument();
var docB = new XmlDocument();
foreach (var childEl in docB.DocumentElement.ChildNodes) {
var newNode = docA.ImportNode(childEl, true);
docA.DocumentElement.AppendChild(newNode);
}
This will produce the following document given my examples above:
<document>
<element>value1</element>
<element>value2</element>
<element>value3</element>
<element>value4</element>
</document>
This is the fastest cleanest way to merge xml documents.
XElement xFileRoot = XElement.Load(file1.xml);
XElement xFileChild = XElement.Load(file2.xml);
xFileRoot.Add(xFileChild);
xFileRoot.Save(file1.xml);
Bad news. As long as the xml documents can have only one root element you cannot just put content of one document at the end of the second. Maybe this is what you are looking for? It shows how easily you can merge xml files using Linq-to-XML
Alternatively if you are using XmlDocuments you can try make it like this:
XmlDocument documentA;
XmlDocument documentB;
foreach(var childNode in documentA.DocumentElement.ChildNodes)
documentB.DocumentElement.AppendChild(childNode);

Change the node names in an XML file using C#

I have a huge bunch of XML files with the following structure:
<Stuff1>
<Content>someContent</name>
<type>someType</type>
</Stuff1>
<Stuff2>
<Content>someContent</name>
<type>someType</type>
</Stuff2>
<Stuff3>
<Content>someContent</name>
<type>someType</type>
</Stuff3>
...
...
I need to change the each of the "Content" node names to StuffxContent; basically prepend the parent node name to the content node's name.
I planned to use the XMLDocument class and figure out a way, but thought I would ask if there were any better ways to do this.
(1.) The [XmlElement / XmlNode].Name property is read-only.
(2.) The XML structure used in the question is crude and could be improved.
(3.) Regardless, here is a code solution to the given question:
String sampleXml =
"<doc>"+
"<Stuff1>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff1>"+
"<Stuff2>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff2>"+
"<Stuff3>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff3>"+
"</doc>";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(sampleXml);
XmlNodeList stuffNodeList = xmlDoc.SelectNodes("//*[starts-with(name(), 'Stuff')]");
foreach (XmlNode stuffNode in stuffNodeList)
{
// get existing 'Content' node
XmlNode contentNode = stuffNode.SelectSingleNode("Content");
// create new (renamed) Content node
XmlNode newNode = xmlDoc.CreateElement(contentNode.Name + stuffNode.Name);
// [if needed] copy existing Content children
//newNode.InnerXml = stuffNode.InnerXml;
// replace existing Content node with newly renamed Content node
stuffNode.InsertBefore(newNode, contentNode);
stuffNode.RemoveChild(contentNode);
}
//xmlDoc.Save
PS: I came here looking for a nicer way of renaming a node/element; I'm still looking.
I used this method to rename the node:
/// <summary>
/// Rename Node
/// </summary>
/// <param name="parentnode"></param>
/// <param name="oldname"></param>
/// <param name="newname"></param>
private static void RenameNode(XmlNode parentnode, string oldChildName, string newChildName)
{
var newnode = parentnode.OwnerDocument.CreateNode(XmlNodeType.Element, newChildName, "");
var oldNode = parentnode.SelectSingleNode(oldChildName);
foreach (XmlAttribute att in oldNode.Attributes)
newnode.Attributes.Append(att);
foreach (XmlNode child in oldNode.ChildNodes)
newnode.AppendChild(child);
parentnode.ReplaceChild(newnode, oldNode);
}
The easiest way I found to rename a node is:
xmlNode.InnerXmL = newNode.InnerXml.Replace("OldName>", "NewName>")
Don't include the opening < to ensure that the closing </OldName> tag is renamed as well.
Perhaps a better solution would be to iterate through each node, and write the information out to a new document. Obviously, this will depend on how you will be using the data in future, but I'd recommend the same reformatting as FlySwat suggested...
<stuff id="1">
<content/>
</stuff>
I'd also suggest that using the XDocument that was recently added would be the best way to go about creating the new document.
I'll answer the higher question: why are you trying this using XmlDocument?
I Think the best way to accomplish what you aim is a simple XSLT file
that match the "CONTENTSTUFF" node and output a "CONTENT" node...
don't see a reason to get such heavy guns...
Either way, If you still wish to do it C# Style,
Use XmlReader + XmlWriter and not XmlDocument for memory and speed purposes.
XmlDocument store the entire XML in memory, and makes it very heavy for Traversing once...
XmlDocument is good if you access the element many times (not the situation here).
I am not an expert in XML, and in my case I just needed to make all tag names in a HTML file to upper case, for further manipulation in XmlDocument with GetElementsByTagName. The reason I needed upper case was that for XmlDocument the tag names are case sensitive (since it is XML), and I could not guarantee that my HTML-file had consistent case in the tag names.
So I solved it like this: I used XDocument as an intermediate step, where you can rename elements (i.e. the tag name), and then loaded that into a XmlDocument. Here is my VB.NET-code (the C#-coding will be very similar).
Dim x As XDocument = XDocument.Load("myFile.html")
For Each element In x.Descendants()
element.Name = element.Name.LocalName.ToUpper()
Next
Dim x2 As XmlDocument = New XmlDocument()
x2.LoadXml(x.ToString())
For my purpose it worked fine, though I understand that in certain cases this might not be a solution if you are dealing with a pure XML-file.
Load it in as a string and do a replace on the whole lot..
String sampleXml =
"<doc>"+
"<Stuff1>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff1>"+
"<Stuff2>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff2>"+
"<Stuff3>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff3>"+
"</doc>";
sampleXml = sampleXml.Replace("Content","StuffxContent")
The XML you have provided shows that someone completely misses the point of XML.
Instead of having
<stuff1>
<content/>
</stuff1>
You should have:/
<stuff id="1">
<content/>
</stuff>
Now you would be able to traverse the document using Xpath (ie, //stuff[id='1']/content/) The names of nodes should not be used to establish identity, you use attributes for that.
To do what you asked, load the XML into an xml document, and simply iterate through the first level of child nodes renaming them.
PseudoCode:
foreach (XmlNode n in YourDoc.ChildNodes)
{
n.ChildNode[0].Name = n.Name + n.ChildNode[0].Name;
}
YourDoc.Save();
However, I'd strongly recommend you actually fix the XML so that it is useful, instead of wreck it further.

Categories

Resources