I'm trying to get values from a XML document using the iXF format, but I'm having some issues with the XPath syntax.
I have the following XML document
<SOAP_ENV:Envelope xmlns:NS2="http://www.ixfstd.org/std/ns/core/classBehaviors/links/1.0" xmlns:NS1="CATIA/V5/Electrical/1.0" xmlns:tns="IXF_Schema.xsd" xmlns:ixf="http://www.ixfstd.org/std/ns/core/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP_ENV="http://schemas.xmlsoap.org/soap/envelope/" xsi:schemaLocation="IXF_Schema.xsd ElectricalSchema.xsd">
<SOAP_ENV:Body>
<ixf:object id="Electrical Physical System00000089.1" xsi:type="tns:Harness">
<tns:Name>Electrical Physical System00000089.1</tns:Name>
</ixf:object>
<ixf:object id="X10(1)//X11(1)" xsi:type="tns:Wire">
<tns:Name>X10(1)//X11(1)</tns:Name>
<NS1:Wire>
<NS1:Length>763,752mm</NS1:Length>
<NS1:Color>RD</NS1:Color>
<NS1:OuterDiameter>1,32mm</NS1:OuterDiameter>
</NS1:Wire>
</ixf:object>
</SOAP_ENV:Body>
</SOAP_ENV:Envelope>
And i'm trying to find all the Wire objects and get the Name and Length values with the following code.
XmlDocument xlDocument = new XmlDocument();
xlDocument.Load(importFile);
XmlNamespaceManager nsManager = new XmlNamespaceManager(xlDocument.NameTable);
nsManager.AddNamespace("tns", "IXF_Schema.xsd");
nsManager.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
nsManager.AddNamespace("ixf", "http://www.ixfstd.org/std/ns/core/1.0");
nsManager.AddNamespace("NS1", "CATIA/V6/Electrical/1.0");
nsManager.AddNamespace("NS2", "http://www.ixfstd.org/std/ns/core/classBehaviors/links/1.0");
//Get all wire objects
XmlNodeList wires = xlDocument.SelectNodes("descendant::ixf:object[#xsi:type = \"tns:Wire\"]", nsManager);
foreach (XmlNode wire in wires)
{
string wireName;
string wireLength;
XmlNode node = wire.SelectSingleNode("./tns:Name", nsManager);
wireName = node.InnerText;
XmlNode node1 = wire.SelectSingleNode("./NS1:Wire/NS1:Length", nsManager);
wireLength = node1.InnerText;
}
I can get the wireName value without any problems but the Length element selection always returns 0 matches and I can not figure out why. I also tried to only select the Wire element using the same syntax as the Name element ./NS1:Wire but that also returns 0 matches.
Your XML declares
xmlns:NS1="CATIA/V5/Electrical/1.0"
^^
Your C# declares a different namespacem
nsManager.AddNamespace("NS1", "CATIA/V6/Electrical/1.0")
^^
Make sure both namespaces match exactly.
Regarding your comment asking about the use of version numbers in namespaces...
It is an unfortunately common but certainly not widely accepted practice to include a version number in an XML namespace. Realize that by doing so, you're effectively saying that every namespaced XML component (element or attribute) should now be considered to differ from its counterpart in the old namespace. This is rarely what you want.
See also
Should I use a Namespace of an XML file to identify its version
What are the best practices for versioning XML schemas?
Related
I have the following xml received from a web service
<GRID xmlns="http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result">
<DATA>
<R>
<D>2645</D>
<D>HJIT.HRE#RGW.COM</D>
<D>2019-09-27 10:17:36.0</D>
<D>114041</D>
<D>Awaiting Planning</D>
<D>Work Planned</D>
</R>
<R>
<D>2649</D>
<D>HJIT.HRE#RGW.COM</D>
<D>2019-09-27 10:33:24.0</D>
<D>114043</D>
<D>Awaiting Release</D>
<D>Awaiting Planning</D>
</R>
<R>
<D>2652</D>
<D>HJIT.HRE#RGW.COM</D>
<D>2019-09-27 10:36:53.0</D>
<D>114041</D>
<D>Awaiting Planning</D>
<D>Work Planned</D>
</R>
</DATA>
</GRID>
I wrote the following piece of .NET code to extract the R nodes
HttpWebResponse resp = (HttpWebResponse)Req.GetResponse();
XPathDocument xpResDoc = new XPathDocument(resp.GetResponseStream());
XPathNavigator xpNav = xpResDoc.CreateNavigator();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xpNav.NameTable);
nsmgr.AddNamespace("g2", "http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result");
XPathNodeIterator xpNIter = xpNav.Select("//g2:R", nsmgr); // I can successfully get the three R elements
foreach (XPathNavigator nav in xpNIter)
{
/*
Now I want to iterate through each R element and use XPATH to select each of the six D nodes by its index position.
The order of the D nodes are a known dataset and I want to build a comma separated string by concatenating the value of each D node,
which will later be appended to a CSV file along with a pre-defined header row.
*/
/* I attempted the following XPATH */
// XPathNodeIterator xpDi = nav.Select("(//D)[1]"); -- This does not work and yields a null result
}
Now I want to iterate through each R element and use XPATH to select each of the six D nodes by its index position. The order of the D nodes are a known dataset and I want to build a comma separated string by concatenating the value of each D node, which will later be appended to a CSV file along with a pre-defined header row.
I didn't want to use anything like LINQ to XML as this is part of read-only data extraction program which needs to be as lite and as performant as possible.
What is the correct way to get the D elements by index with XPATH using the XPathNavigator ?
You have a few problems here:
xpNav.Select("//g2:R", nsmgr) does not work for the XML shown in your question.
This expression selects for nodes with local name R in the http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result namespace -- however in your actual XML none of the nodes are in this namespace. There's a namespace declaration xmlns:dstm="http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result" but it's not the default namespace, so none of the nodes are actually in it, as they aren't using the dstm: prefix.
Instead, you should do xpNav.Select("//R", nsmgr) (or better yet xpNav.Select("/*/DATA/R", nsmgr)).
In your question you wrote I can successfully get the three R elements so maybe this is a typo in the question.
nav.Select("(//D)[1]"); -- This does not work and yields a null result.
I cannot reproduce this exact problem -- XPathNavigator.Select()never returns null. It will throw an exception on a malformed query, but not return null.
What I can reproduce is that this always returns the same result for every <R>, specifically the value of the first <D> element, <D>2645</D>. Demo fiddle #1 here.
The problem here is that the recursive descent operator //D selects for all nodes named R in the entire document. To select only the nodes in the current <R> element you need to restrict the scope by prefacing the XPath query with .: nav.Select("(.//D)[1]") (or better yet, nav.Select("(./D)[1]")).
Incidentally, since you expect 6 child <D> nodes of <R> it will be more performant to run one single XPath query and collect all 6 into a list, rather than running 6 queries for each specific node:
var nodes = nav.Select("./D").Cast<XPathNavigator>().ToList();
You indicated that performance is important, but you are using the recursive descent operator // which can have bad performance.
From Effective Xml Part 2: How to kill the performance of an app with XPath…:
// (descendant-or-self axis)
This is a very common pattern that very often leads to serious performance problems. The way it works is that it flattens the whole subtree (the most common usage I saw is flattening the whole xml document) and then it looks for the specified elements. Now in the .NET Framework there aren’t any specific optimizations for this patterns and using it is costly...
Instead, it's better to specify the path directly.
Pulling all of the above together, your code should look something like:
//xpNav and nsmgr set up as in the question
var csvLines = xpNav.Select("/*/DATA/R", nsmgr).Cast<XPathNavigator>()
.Select(nav => string.Join(",", nav.Select("./D").Cast<XPathNavigator>()))
.ToList();
Demo fiddle #2 here.
Notes:
If the XML in your question has been incorrectly edited and the nodes <R> and <D> are really in the dstm: namespace after all, add the g2: prefix to the node names in the XPath queries like so:
var csvLines = xpNav.Select("/*/g2:DATA/g2:R", nsmgr).Cast<XPathNavigator>()
.Select(nav => string.Join(",", nav.Select("./g2:D", nsmgr).Cast<XPathNavigator>()))
.ToList();
Demo fiddle #3 here.
As an aside, you might want to check your assumption that XPathDocument will be more performant than LINQ to XML. I am not sure this will be the case.
I was on the right path, just needed to use the right method which allows to specify the namespace as seen below:
HttpWebResponse resp = (HttpWebResponse)Req.GetResponse();
XPathDocument xpResDoc = new XPathDocument(resp.GetResponseStream());
XPathNavigator xpNav = xpResDoc.CreateNavigator();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xpNav.NameTable);
nsmgr.AddNamespace("g2", "http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result");
XPathNodeIterator xpNIter = xpNav.Select("//g2:R", nsmgr);
foreach (XPathNavigator nav in xpNIter)
{
string r =
$"{nav.SelectSingleNode("./g2:D[1]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[2]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[3]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[4]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[5]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[6]", nsmgr).Value}";
Console.WriteLine(r);
}
// Start writing to a file stream;
First I'm a newbee to C# and XML parsing so please forgive me if my terminology is incorrect. I'm trying to find a node by searching for a particular attribute and value. It will work if the value is exact. I need to be able to search using a substring of the entire attribute value. Here is what I have so far...
XmlDocument doc1 = new XmlDocument();
doc1.Load("DVDCollectionExample.xml");
// Create an XmlNamespaceManager to resolve the default namespace.
XmlNamespaceManager nsmgr1 = new XmlNamespaceManager(doc1.NameTable);
nsmgr1.AddNamespace("bk1", "urn:dvd-schema");
// Select the DVD
XmlNode DVD;
XmlElement root1 = doc1.DocumentElement;
//DVD = root1.SelectSingleNode("descendant::bk1:DVD[bk1:Notes='[filepath disc=2]movies\\Godzilla\\video_ts\\video_ts.ifo[/filepath]'])", nsmgr1);
DVD = root1.SelectSingleNode("descendant::bk1:DVD[bk1:Notes=Contains(name(),'Godzilla')]", nsmgr1);
The last commented out line will work because it has the exact text of the Notes attribute.
Thanks in advance
I have some XML data (similar to the sample below) and I want to read the values in code.
Why am I forced to specify the default namespace to access each element? I would have expected the default namespace to be used for all elements.
Is there a more logical way to achieve my goal?
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<ReceiptsBatch xmlns="http://www.secretsonline.gov.uk/secrets">
<MessageHeader>
<MessageID>00000173</MessageID>
<Timestamp>2009-10-28T16:50:01</Timestamp>
<MessageCheck>BX4f+RmNCVCsT5g</MessageCheck>
</MessageHeader>
<Receipts>
<Receipt>
<Status>OK</Status>
</Receipt>
</Receipts>
</ReceiptsBatch>
Code to read xml elements I'm after:
XDocument xDoc = XDocument.Load( FileInPath );
XNamespace ns = "http://www.secretsonline.gov.uk/secrets";
XElement MessageCheck = xDoc.Element(ns+ "MessageHeader").Element(ns+"MessageCheck");
XElement MessageBody = xDoc.Element("Receipts");
As suggested by this answer, you can do this by removing all namespaces from the in-memory copy of the document. I suppose this should only be done if you know you won't have name collisions in the resulting document.
/// <summary>
/// Makes parsing easier by removing the need to specify namespaces for every element.
/// </summary>
private static void RemoveNamespaces(XDocument document)
{
var elements = document.Descendants();
elements.Attributes().Where(a => a.IsNamespaceDeclaration).Remove();
foreach (var element in elements)
{
element.Name = element.Name.LocalName;
var strippedAttributes =
from originalAttribute in element.Attributes().ToArray()
select (object)new XAttribute(originalAttribute.Name.LocalName, originalAttribute.Value);
//Note that this also strips the attributes' line number information
element.ReplaceAttributes(strippedAttributes.ToArray());
}
}
You can use XmlTextReader.Namespaces property to disable namespaces while reading XML file.
string filePath;
XmlTextReader xReader = new XmlTextReader(filePath);
xReader.Namespaces = false;
XDocument xDoc = XDocument.Load(xReader);
This is how the Linq-To-Xml works. You can't find any element, if it is not in default namespace, and the same is true about its descendants. The fastest way to get rid from namespace is to remove link to the namespace from your initial XML.
The theory is that the meaning of the document is not affected by the user's choice of namespace prefixes. So long as the data is in the namespace http://www.secretsonline.gov.uk/secrets, it doesn't matter whether the author chooses to use the prefix "s", "secrets", "_x.cafe.babe", or the "null" prefix (that is, making it the default namespace). Your application shouldn't care: it's only the URI that matters. That's why your application has to specify the URI.
Note that the element Receipts is also in namespace http://www.secretsonline.gov.uk/secrets, so the XNamespace would also be required for the access to the element:
XElement MessageBody = xDoc.Element(ns + "Receipts");
As an alternative to using namespaces note that you can use "namespace agnostic" xpath using local-name() and namespace-uri(), e.g.
/*[local-name()='SomeElement' and namespace-uri()='somexmlns']
If you omit the namespace-uri predicate:
/*[local-name()='SomeElement']
Would match ns1:SomeElement and ns2:SomeElement etc. IMO I would always prefer XNamespace where possible, and the use-cases for namespace-agnostic xpath are quite limited, e.g. for parsing of specific elements in documents with unknown schemas (e.g. within a service bus), or best-effort parsing of documents where the namespace can change (e.g. future proofing, where the xmlns changes to match a new version of the document schema)
So I'm trying to parse the following XML document with C#, using System.XML:
<root xmlns:n="http://www.w3.org/TR/html4/">
<n:node>
<n:node>
data
</n:node>
</n:node>
<n:node>
<n:node>
data
</n:node>
</n:node>
</root>
Every treatise of XPath with namespaces tells me to do the following:
XmlNamespaceManager mgr = new XmlNamespaceManager(xmlDoc.NameTable);
mgr.AddNamespace("n", "http://www.w3.org/1999/XSL/Transform");
And after I add the code above, the query
xmlDoc.SelectNodes("/root/n:node", mgr);
Runs fine, but returns nothing. The following:
xmlDoc.SelectNodes("/root/node", mgr);
returns two nodes if I modify the XML file and remove the namespaces, so it seems everything else is set up correctly. Any idea why it work doesn't with namespaces?
Thanks alot!
As stated, it's the URI of the namespace that's important, not the prefix.
Given your xml you could use the following:
mgr.AddNamespace( "someOtherPrefix", "http://www.w3.org/TR/html4/" );
var nodes = xmlDoc.SelectNodes( "/root/someOtherPrefix:node", mgr );
This will give you the data you want. Once you grasp this concept it becomes easier, especially when you get to default namespaces (no prefix in source xml), since you instantly know you can assign a prefix to each URI and strongly reference any part of the document you like.
The URI you specified in your AddNamespace method doesn't match the one in the xmlns declaration.
If you declare prefix "n" to represent the namespace "http://www.w3.org/1999/XSL/Transform", then the nodes won't match when you do your query. This is because, in your document, the prefix "n" refers to the namespace "http://www.w3.org/TR/html4/".
Try doing mgr.AddNamespace("n", "http://www.w3.org/TR/html4/"); instead.
I have a huge bunch of XML files with the following structure:
<Stuff1>
<Content>someContent</name>
<type>someType</type>
</Stuff1>
<Stuff2>
<Content>someContent</name>
<type>someType</type>
</Stuff2>
<Stuff3>
<Content>someContent</name>
<type>someType</type>
</Stuff3>
...
...
I need to change the each of the "Content" node names to StuffxContent; basically prepend the parent node name to the content node's name.
I planned to use the XMLDocument class and figure out a way, but thought I would ask if there were any better ways to do this.
(1.) The [XmlElement / XmlNode].Name property is read-only.
(2.) The XML structure used in the question is crude and could be improved.
(3.) Regardless, here is a code solution to the given question:
String sampleXml =
"<doc>"+
"<Stuff1>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff1>"+
"<Stuff2>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff2>"+
"<Stuff3>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff3>"+
"</doc>";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(sampleXml);
XmlNodeList stuffNodeList = xmlDoc.SelectNodes("//*[starts-with(name(), 'Stuff')]");
foreach (XmlNode stuffNode in stuffNodeList)
{
// get existing 'Content' node
XmlNode contentNode = stuffNode.SelectSingleNode("Content");
// create new (renamed) Content node
XmlNode newNode = xmlDoc.CreateElement(contentNode.Name + stuffNode.Name);
// [if needed] copy existing Content children
//newNode.InnerXml = stuffNode.InnerXml;
// replace existing Content node with newly renamed Content node
stuffNode.InsertBefore(newNode, contentNode);
stuffNode.RemoveChild(contentNode);
}
//xmlDoc.Save
PS: I came here looking for a nicer way of renaming a node/element; I'm still looking.
I used this method to rename the node:
/// <summary>
/// Rename Node
/// </summary>
/// <param name="parentnode"></param>
/// <param name="oldname"></param>
/// <param name="newname"></param>
private static void RenameNode(XmlNode parentnode, string oldChildName, string newChildName)
{
var newnode = parentnode.OwnerDocument.CreateNode(XmlNodeType.Element, newChildName, "");
var oldNode = parentnode.SelectSingleNode(oldChildName);
foreach (XmlAttribute att in oldNode.Attributes)
newnode.Attributes.Append(att);
foreach (XmlNode child in oldNode.ChildNodes)
newnode.AppendChild(child);
parentnode.ReplaceChild(newnode, oldNode);
}
The easiest way I found to rename a node is:
xmlNode.InnerXmL = newNode.InnerXml.Replace("OldName>", "NewName>")
Don't include the opening < to ensure that the closing </OldName> tag is renamed as well.
Perhaps a better solution would be to iterate through each node, and write the information out to a new document. Obviously, this will depend on how you will be using the data in future, but I'd recommend the same reformatting as FlySwat suggested...
<stuff id="1">
<content/>
</stuff>
I'd also suggest that using the XDocument that was recently added would be the best way to go about creating the new document.
I'll answer the higher question: why are you trying this using XmlDocument?
I Think the best way to accomplish what you aim is a simple XSLT file
that match the "CONTENTSTUFF" node and output a "CONTENT" node...
don't see a reason to get such heavy guns...
Either way, If you still wish to do it C# Style,
Use XmlReader + XmlWriter and not XmlDocument for memory and speed purposes.
XmlDocument store the entire XML in memory, and makes it very heavy for Traversing once...
XmlDocument is good if you access the element many times (not the situation here).
I am not an expert in XML, and in my case I just needed to make all tag names in a HTML file to upper case, for further manipulation in XmlDocument with GetElementsByTagName. The reason I needed upper case was that for XmlDocument the tag names are case sensitive (since it is XML), and I could not guarantee that my HTML-file had consistent case in the tag names.
So I solved it like this: I used XDocument as an intermediate step, where you can rename elements (i.e. the tag name), and then loaded that into a XmlDocument. Here is my VB.NET-code (the C#-coding will be very similar).
Dim x As XDocument = XDocument.Load("myFile.html")
For Each element In x.Descendants()
element.Name = element.Name.LocalName.ToUpper()
Next
Dim x2 As XmlDocument = New XmlDocument()
x2.LoadXml(x.ToString())
For my purpose it worked fine, though I understand that in certain cases this might not be a solution if you are dealing with a pure XML-file.
Load it in as a string and do a replace on the whole lot..
String sampleXml =
"<doc>"+
"<Stuff1>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff1>"+
"<Stuff2>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff2>"+
"<Stuff3>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff3>"+
"</doc>";
sampleXml = sampleXml.Replace("Content","StuffxContent")
The XML you have provided shows that someone completely misses the point of XML.
Instead of having
<stuff1>
<content/>
</stuff1>
You should have:/
<stuff id="1">
<content/>
</stuff>
Now you would be able to traverse the document using Xpath (ie, //stuff[id='1']/content/) The names of nodes should not be used to establish identity, you use attributes for that.
To do what you asked, load the XML into an xml document, and simply iterate through the first level of child nodes renaming them.
PseudoCode:
foreach (XmlNode n in YourDoc.ChildNodes)
{
n.ChildNode[0].Name = n.Name + n.ChildNode[0].Name;
}
YourDoc.Save();
However, I'd strongly recommend you actually fix the XML so that it is useful, instead of wreck it further.