I have large xml. The xml's nodes have attribute id with valus like this: "1_32434", "2_45656". With this code:
var node = myXml.XPathSelectElement(string.Format("//*[starts-with(#id,\"{0}_\"))", someValue));
I am trying to find all nodes that have attribute id that start with "someValue_", but I get error that there is an invalid token.
There is an mismatch between opened and closed brackets, try to replace the last ')' by ']'
string.Format("//*[starts-with(#id,\"{0}_\")]", someValue)
I am not proud of this xpath. But it should give you all the nodes irregardless of starting id. If you only need for one id at a time. you should just add an ending bracket to your current xpath.
"//*[number(substring-before(#id,"_"))<10 and number(substring-after(#id,"_"))]"
some example XML would be greatly appreciated.
Related
I can see a way of searching for an element within XML by just going:
if(doc.SelectSingleNode("//mynode")==null)
But what I'm more interested in, is finding an element that matches the part of the name. Something like:
doc.SelectSingleNode ...that contains "table" in it.
So if I had a node called "AlinasTable", I want it to find that. Why it matters is because my node can inconsistently contain anything that comes before "table", like "JohnsTable" - in which case I'd want that to be returned. So something more generic.
Cheers.
You can use the contains function, as in the following XPath expression:
doc.SelectSingleNode("//*[contains(name(), 'Table')]")
I have been struggling to resolve this problem I am having over the past couple of days. Say, I want to get all the text() from a HTML document, however I only want to know of and retrieve of the XPath of the node that contains text data. Example:
foreach (var textNode in node.SelectNodes(".//text()"))
//do stuff here
However, when it comes to retrieving the XPath of the textNode using textNode.XPath, I get the full XPath including the #text node:
/html[1]/body[1]/div[1]/a[1]/#text
Yet I only want the containing node of the text, for example:
/html[1]/body[1]/div[1]/a[1]
Could anyone point me toward a better XPath solution to retrieve all nodes that contains text but only retrieve the XPath up until the containing node?
Instead of:
.//text()
use:
.//*[normalize-space(text())]
This selects all "leaf-elements"-descendants of the context (current) node that have at least one non-whitespace-only text node child.
Why don't you
string[] elements = getXPath(textNode).Split(new char[1] { '/' });
return String.Join("/", elements, 0, elements.Length-2);
I would like to search an HTML file for a certain string and then extract the tags. Given:
<div_outer><div_inner>Happy birthday<div><div>
I would like to search the HTML for "Happy birthday" then have a function return some sort of tag structure: this is the innermost tag, this is the tag outside that one, etc. So, <div_inner></div> then <div_outer></div>.
Any ideas? I am thinking HTMLAgilityPack but I haven't been able to figure out how to do it.
Thanks as always, guys.
The HAP is a good place indeed for this.
You can use the OuterHtml and Parent properties of a Node to get the enclosing elements and markup.
You could use xpath for this. I use //*[text()='Happy birthday'][1]/ancestor-or-self::* expression which finds a first (for simplicity) node which text content is Happy birthday, and then returns all the ancestors (parent, grandparent, etc.) of this node and the node itself:
var doc = new HtmlDocument();
doc.LoadHtml("<div_outer><div_inner>Happy birthday<div><div>");
var ancestors = doc.DocumentNode
.SelectNodes("//*[text()='Happy birthday'][1]/ancestor-or-self::*")
.Reverse()
.ToList();
It seems that the order of the nodes returned is the order the nodes found in the document, so I used Enumerable.Reverse method to reverse it.
This will return 2 nodes: div_inner and div_outer.
I am trying to select all nodes with attribute equal to something, I got the error in title.
My Xpath string looks like //#[id=****], anyone know what's wrong?
Your XPath expression probably should be:
//*[#id='something']
Which means match all elements whose id attributes are equal to something, anywhere in the document.
EDIT: If you want the id attribute nodes themselves and not their parent elements, you can use:
//*[#id='something']/#id
Or even better, as #Dimitre Novatchev suggested:
//#id[. = 'something']
I am trying to select all nodes with
attribute equal to something, I got
the error in title.
My Xpath string looks like
//#[id=****], anyone know what's
wrong?
A lot of issues with this expression:
.1. //#[some-condition] The predicate can only be applied to selected nodes, but //# doesn't select any node. # is an abbreviation for attribute:: and this is an unfinished node-test. It is missing the node-type or node-name here.
What would be correct is: //#*[some-condition] or //#attrName[some-condition]
.2. id=**** is syntactically invalid, unless ** is a valid XPath expression itself. My guess is that you want to get all attributes with value equal to some known, literal value. In any such case the syntax to use is id='someLiteral -- do note the single quotes (they can also be double quotes) surrounding the literal value.
Solution:
//*[#id='something']
This selects all elements in the XML document that have attribute id with value 'something'.
//#id[. = 'something']
This selects all attributes named id in the XML document, whose value is 'something'.
//#*[. = 'something']
This selects all attributes in the XML document (regardless of their name), whose value is 'something'.
I have an XML document that I load in and try to search with XPath. The root node in this file is <t:Transmission xmlns:t='urn:InboundShipment'> and the file end is properly closed with </t:Transmission>.
My problem is that I cannot walk the tree without using a descendant axis. In other words, I can do: SelectSingleNode("//TransactionHeader[SHIPPERSTATE='CA']") and get a node in return. But I cannot do what should be the equivalent: SelectSingleNode("/Transmission/TransmissionBody/Transaction/TransactionHeader[SHIPPERSTATE='CA']")
If I remove the t: I can do an XPath search on /Transmission and get the whole file. With the t: in there I just get null. Or if I try SelectSingleNode("t:Transmission") I get an error with my XPath statement.
I generally do not need to query the root element, so I should be able to make do with just using the descendant axis for my searches. But the XML looks valid to me and so I'd like to know how to address this. Plus I don't want to ask the client to remove "t:" just because I don't know how to deal with it.
The "t:" is a namespace prefix, which is bound to the namespace 'urn:InboundShipment.' In order to properly handle it, you have to tell c# what the prefix is bound to. This page should explain how to use System.Xml.XmlNamespaceManager to handle the namespace.
Edit: See this answer, as well.