dot vs. slash in XPath expression - c#

I came across an unusual problem. I have this XML:
<T durationMs="400">
<foo durationMs="407">
<foo-child durationMs="307" />
</foo>
<bar durationMs="208">
<bar-child durationMs="108" />
</bar>
</T>
I am using XPathExtentions to get an XElement out of this XML:
var xe = XElement.Parse(s);
var foo = xe.XPathSelectElement("/T/foo")
It returns nothing. However if I use:
var foo = xe.XPathSelectElement("./foo")
It gets an elements. So what's the difference between dot and slash in this case?

/ selects from the root node.
So with /T/foo it's trying to match T->T->foo which for sure won't match
. depicts the current node in which case it would be the root node
/foo would work

. selects the current node.
/ Selects from the root node.
// Selects nodes in the document from the current node that match the selection no matter where they are.
XPath Syntax gives you a brief idea of how selections are done
In your case ./foo denotes selection from root node i.e. T

Related

XmlDocument Searching for namespace returns children

I am trying to convert nodes that have a namespace declaration over to use a prefix instead. My first stab at it was to just use xslt to transform the xml, but I started looking at doing it with the XmlDocument class and using the SelectNodes() method. The issue I am seeing is when I try to select nodes that have a namespace, it selects that node AND its children. I assume this is because it is selecting the node which contains children.
<foo xmlns="some url">
<child>child</child>
</foo>
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(xmlstring);
var query = xdoc.SelectNodes("//*[namespace-uri()='some url']");
the query variable will return <foo> and <child> nodes, so when I loop through the nodes and change it to use the prefix, I get the following result.
<prefix:foo>
<prefix:child></prefix:child>
</prefix:foo>
Is there a way to just return just the <foo> node in this case? Is it better to also use xslt to transform it?
I didnt think you could change a namespace or prefix when using XDocument and XElement, so thats why I used XmlDocument.
Update
The result id want would be the prefix only on the node where the declaration was. This is valid xml correct or does the prefix need to be on the children as well to be valid?
<prefix:foo>
<child>child</child>
</prefix:foo>
In the XDM data model used by XPath and XSLT, there is no distinction between
<foo xmlns="some url">
<child>child</child>
</foo>
and
<foo xmlns="some url">
<child xmlns="some url">child</child>
</foo>
Logically the namespace is present on both element nodes, and its omission from the child in the lexical serialization is treated as a convenient abbreviation.
So yes, if you search for things having this namespace, you will get both elements.
Now, what are your requirements? I'm not convinced you fully understand them yourself, because the desired output you have shown is not actually well-formed (the namespace prefix is not declared). In your input, the two elements are in the same namespace; in the output, you seem to want them to be in different namespaces. If you want to process them differently, then you're going to have to use something other than the namespace to discriminate between them.
Remember that in XDM, it's the name of the node that matters, not the namespace declarations or prefixes; those are just ornamental. The name of the node is the combination of its local name and its namespace URI. You've described your requirement in terms of prefixes, but it's namespaces that actually matter.

How to select element by index in XPath .net application

I have the following xml received from a web service
<GRID xmlns="http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result">
<DATA>
<R>
<D>2645</D>
<D>HJIT.HRE#RGW.COM</D>
<D>2019-09-27 10:17:36.0</D>
<D>114041</D>
<D>Awaiting Planning</D>
<D>Work Planned</D>
</R>
<R>
<D>2649</D>
<D>HJIT.HRE#RGW.COM</D>
<D>2019-09-27 10:33:24.0</D>
<D>114043</D>
<D>Awaiting Release</D>
<D>Awaiting Planning</D>
</R>
<R>
<D>2652</D>
<D>HJIT.HRE#RGW.COM</D>
<D>2019-09-27 10:36:53.0</D>
<D>114041</D>
<D>Awaiting Planning</D>
<D>Work Planned</D>
</R>
</DATA>
</GRID>
I wrote the following piece of .NET code to extract the R nodes
HttpWebResponse resp = (HttpWebResponse)Req.GetResponse();
XPathDocument xpResDoc = new XPathDocument(resp.GetResponseStream());
XPathNavigator xpNav = xpResDoc.CreateNavigator();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xpNav.NameTable);
nsmgr.AddNamespace("g2", "http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result");
XPathNodeIterator xpNIter = xpNav.Select("//g2:R", nsmgr); // I can successfully get the three R elements
foreach (XPathNavigator nav in xpNIter)
{
/*
Now I want to iterate through each R element and use XPATH to select each of the six D nodes by its index position.
The order of the D nodes are a known dataset and I want to build a comma separated string by concatenating the value of each D node,
which will later be appended to a CSV file along with a pre-defined header row.
*/
/* I attempted the following XPATH */
// XPathNodeIterator xpDi = nav.Select("(//D)[1]"); -- This does not work and yields a null result
}
Now I want to iterate through each R element and use XPATH to select each of the six D nodes by its index position. The order of the D nodes are a known dataset and I want to build a comma separated string by concatenating the value of each D node, which will later be appended to a CSV file along with a pre-defined header row.
I didn't want to use anything like LINQ to XML as this is part of read-only data extraction program which needs to be as lite and as performant as possible.
What is the correct way to get the D elements by index with XPATH using the XPathNavigator ?
You have a few problems here:
xpNav.Select("//g2:R", nsmgr) does not work for the XML shown in your question.
This expression selects for nodes with local name R in the http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result namespace -- however in your actual XML none of the nodes are in this namespace. There's a namespace declaration xmlns:dstm="http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result" but it's not the default namespace, so none of the nodes are actually in it, as they aren't using the dstm: prefix.
Instead, you should do xpNav.Select("//R", nsmgr) (or better yet xpNav.Select("/*/DATA/R", nsmgr)).
In your question you wrote I can successfully get the three R elements so maybe this is a typo in the question.
nav.Select("(//D)[1]"); -- This does not work and yields a null result.
I cannot reproduce this exact problem -- XPathNavigator.Select()never returns null. It will throw an exception on a malformed query, but not return null.
What I can reproduce is that this always returns the same result for every <R>, specifically the value of the first <D> element, <D>2645</D>. Demo fiddle #1 here.
The problem here is that the recursive descent operator //D selects for all nodes named R in the entire document. To select only the nodes in the current <R> element you need to restrict the scope by prefacing the XPath query with .: nav.Select("(.//D)[1]") (or better yet, nav.Select("(./D)[1]")).
Incidentally, since you expect 6 child <D> nodes of <R> it will be more performant to run one single XPath query and collect all 6 into a list, rather than running 6 queries for each specific node:
var nodes = nav.Select("./D").Cast<XPathNavigator>().ToList();
You indicated that performance is important, but you are using the recursive descent operator // which can have bad performance.
From Effective Xml Part 2: How to kill the performance of an app with XPath…:
// (descendant-or-self axis)
This is a very common pattern that very often leads to serious performance problems. The way it works is that it flattens the whole subtree (the most common usage I saw is flattening the whole xml document) and then it looks for the specified elements. Now in the .NET Framework there aren’t any specific optimizations for this patterns and using it is costly...
Instead, it's better to specify the path directly.
Pulling all of the above together, your code should look something like:
//xpNav and nsmgr set up as in the question
var csvLines = xpNav.Select("/*/DATA/R", nsmgr).Cast<XPathNavigator>()
.Select(nav => string.Join(",", nav.Select("./D").Cast<XPathNavigator>()))
.ToList();
Demo fiddle #2 here.
Notes:
If the XML in your question has been incorrectly edited and the nodes <R> and <D> are really in the dstm: namespace after all, add the g2: prefix to the node names in the XPath queries like so:
var csvLines = xpNav.Select("/*/g2:DATA/g2:R", nsmgr).Cast<XPathNavigator>()
.Select(nav => string.Join(",", nav.Select("./g2:D", nsmgr).Cast<XPathNavigator>()))
.ToList();
Demo fiddle #3 here.
As an aside, you might want to check your assumption that XPathDocument will be more performant than LINQ to XML. I am not sure this will be the case.
I was on the right path, just needed to use the right method which allows to specify the namespace as seen below:
HttpWebResponse resp = (HttpWebResponse)Req.GetResponse();
XPathDocument xpResDoc = new XPathDocument(resp.GetResponseStream());
XPathNavigator xpNav = xpResDoc.CreateNavigator();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xpNav.NameTable);
nsmgr.AddNamespace("g2", "http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result");
XPathNodeIterator xpNIter = xpNav.Select("//g2:R", nsmgr);
foreach (XPathNavigator nav in xpNIter)
{
string r =
$"{nav.SelectSingleNode("./g2:D[1]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[2]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[3]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[4]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[5]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[6]", nsmgr).Value}";
Console.WriteLine(r);
}
// Start writing to a file stream;

What is the difference XElement Nodes() vs Elements()?

Documentation says:
XContainer.Nodes Method ()
Returns a collection of the child nodes of this element or document, in document order.
Remarks
Note that the content does not include attributes. In LINQ to XML, attributes are not considered to be nodes of the tree. They are name/value pairs associated with an element.
XContainer.Elements Method ()
Returns a collection of the child elements of this element or document, in document order.
So it looks like Nodes() has a limitation, but then why does it exist? Are there any possible reasons or advantages of using Nodes()?
The reason is simple: XNode is a base (abstract) class for all xml "parts", and XElement is just one such part (so XElement is subclass of XNode). Consider this code:
XDocument doc = XDocument.Parse("<root><el1 />some text<!-- comment --></root>");
foreach (var node in doc.Root.Nodes()) {
Console.WriteLine(node);
}
foreach (var element in doc.Root.Elements()) {
Console.WriteLine(element);
}
Second loop (over Elements()) will only return one item: <el />
First loop however will return also text node (some text) and comment node (<!-- comment -->), so you see the difference.
You can see what other descendants of XNode there are in documentaiton of XNode class.
It's not the case that Nodes "have a limitation". Nodes are the fundamental building block on which most other things (including Elements) are built.
The XML document is represented as a hierarchy (tree), and the nodes are used to represent the fundamental structure of the hierarchy.
If we consider the following XML document:
<root>
<element>
<child>
Text
</child>
</element>
<!-- comment -->
<element>
<child>
Text
<child>
</element>
</root>
Clearly the whole document cannot be represented as elements, since the comment and the text within the "child" elements are not elements. Instead, it's represented as a hierarchy of nodes.
In this document, there are 5 elements (the root element, two "element" elements and two "child" elements). All of these are nodes, but there are also 3 other nodes: the text within "child" elements, and the comment.
It's misleading to say that nodes have a "limitation" because they don't have attributes. Only elements have attributes, and elements are nodes! But there are other nodes (e.g. the comment) that can't have attributes. So not all types of node have attributes.
In coding terms, Node is the base class on which higher-level types such as Element are built. If you want to enumerate the elements in the document, then using XContainer.Elements() is a nice shortcut to do that - but you could also use XContainer.Nodes() and get all the nodes, including both the elements and the other stuff. (You can check the type of the node to see whether you have an element node, a text node, or whatever; if it's an element, you can up-cast it).

Difference between XPathEvaluate on XElement or XDocument?

Somewhere in a C# program, I need to get an attribute value from an xml structure. I can reach this xml structure directly as an XElement and have a simple xpath string to get the attribute. However, using XPathEvaluate, I get an empty array most of the time. (Yes, sometimes, the attribute is returned, but mostly it isn't... for the exact same XElement and xpath string...)
However, if I first convert the xml to string and reparse it as an XDocument, I do always get the attribute back. Can somebody explain this behavior ? (Am using .NET 3.5)
Code that mostly returns an empty IEnumerable:
string xpath = "/exampleRoot/exampleSection[#name='test']/#value";
XElement myXelement = RetrieveXElement();
((IEnumerable)myXElement.XPathEvaluate(xpath)).Cast<XAttribute>().FirstOrDefault().Value;
Code that does always work (I get my attribute value):
string xpath = "/exampleRoot/exampleSection[#name='test']/#value";
string myXml = RetrieveXElement().ToString();
XDocument xdoc = XDocument.Parse(myXml);
((IEnumerable)xdoc.XPathEvaluate(xpath)).Cast<XAttribute>().FirstOrDefault().Value;
With the test xml:
<exampleRoot>
<exampleSection name="test" value="2" />
<exampleSection name="test2" value="2" />
</exampleRoot>
By suggestion related to a surrounding root, I did some 'dry tests' in a test program, using the same xml structure (txtbxXml and txtbxXpath representing the xml and xpath expression described above):
// 1. XDocument Trial:
((IEnumerable)XDocument.Parse(txtbxXml.Text).XPathEvaluate(txtbxXPath.Text)).Cast<XAttribute>().FirstOrDefault().Value.ToString();
// 2. XElement trial:
((IEnumerable)XElement.Parse(txtbxXml.Text).XPathEvaluate(txtbxXPath.Text)).Cast<XAttribute>().FirstOrDefault().Value.ToString();
// 3. XElement originating from other root:
((IEnumerable)(new XElement("otherRoot", XElement.Parse(txtbxXml.Text)).Element("exampleRoot")).XPathEvaluate(txtbxXPath.Text)).Cast<XAttribute>().FirstOrDefault().Value.ToString();
Result : case 1 and 3 produce the correct result, while case 2 throws a nullref exception.
If case 3 would fail and case 2 succeed, it would have made some sense to me, but now I don't get it...
The problem is that the XPath expression is starting with the children of the specified node. If you start with an XDocument, the root element is the child node. If you start with an XElement representing your exampleRoot node, then the children are the two exampleSection nodes.
If you change your XPath expression to "/exampleSection[#name='test']/#value", it will work from the element. If you change it to "//exampleSection[#name='test']/#value", it will work from both the XElement and the XDocument.

Get node parent of defined type using xpath

i will give an example of the problem i have. My XML is like this
<root>
<child Name = "child1">
<node>
<element1>Value1</element1>
<element2>Value2</element2>
</node>
</child>
<child Name = "child2">
<element1>Value1</element1>
<element2>Value2</element2>
<element3>Value3</element3>
</child>
</root>
I have xpath expression that returns all "element2" nodes. Then i want to for every node of type "element2" to find the node of type "child" that contains it. The problem is that between these two nodes there can be from 1 to n different nodes so i can't just use "..". Is there something like "//" that will look up instead of down
I have some basic knowledge of xpath and according http://www.w3schools.com/xpath/xpath_syntax.asp "..' returns current node parent the problem is that the current node is "element2" and the problem is that the xml is dynamic comes from third party library so i can have xml like this
<child Name = "child1">
<node>
<element1>Value1</element1>
</node>
</child>
or like this
<child Name = "child1">
<node1>
<node2>
<node3>
<element1>Value1</element1>
<element2>Value2</element2>
</node3>
</node2>
</node1>
</child>
There can be more then 3 nodes between "element" and child an the number of nodes vary form 1 to 20. Please give me an example of just one xpath query to get the "child" node in both cases using just ".."
Best Regards,
Iordan
The ancestor xpath axis is what you are looking for:
element2/ancestor::child
The ancestor axis returns all the elements that contain the context element, going upward.
So your current context is the element2 element? Use the parent axis:
parent::child/#Name
That will get the parent of the current element, named child, and get its attribute Name.
If you're not in the context of element2 and you want to find all child elements with element2 children, you need this one instead:
child[element2]
Use the .. operator.

Categories

Resources