Finding a string inside an XmlDocument - c#

I need to find an inner text of an element inside an XmlDocument and return it's Xpath.
for example, searching for "ThisText" inside :
<xml>
<xml2>ThisText</xml2>
</xml>
should return the Xpath of xml2
what's the most efficient way of doing this in c#?

What do you think the "xpath" of an element is? An xpath is a querying language in order to find a node/nodes, not to describe where a node is.
You can use an xpath to find the element in question. e.g.
xmlDocument.SelectNodes("//*[contains(text(), 'ThisText')]");
Then you can loop through the returned nodes and look at their name / parent, etc.

Related

XmlDocument SelectSingleNode

On Stack Overflow there is a document explaining the use of XmlDocument and how to select a node.
C# XmlDocument SelectSingleNode without attribute
The code presented is the code I am using as follows.
XmlDocument doc = new XmlDocument();
doc.Load("C:\\FileXml.xml")
string Version = doc.DocumentElement.SelectSingleNode("/Version").InnerText;
Console.Write(Version); //I want to see 3
The Xml file is shown below "in its entirety".
<CharacterObject xmlns="http://www.w3.org/2005/Atom">
<Version>3</Version>
<Path>C:\\FilePath\FileName.xml</Path>
</CharacterObject>
The error that I am receiving is that SelectSingleNode above returns null. It returned null when I searched for CharacterObject as well. No matter what XML node I search for the function SelectSingleNode returns null. This means I must be using SingleSelectNode incorrectly just not sure how.
I would like SelectSingleNode to return the node so that InnerText will return the Version information in the XML Node. I'm just having a problem in usage of reading the information from the nodes.
According to documentation on XmlDocument.DocumentElement - it is a root xml element. So in your case it is CharacterObject already.
When you call .SelectSingleNode('/CharacterObject') for it - you are requesting an CharacterObject element inside the root CharacterObject - which is not there at all.
You can simply use XmlDocument.DocumentElement.InnerText to achieve the result you want.
This particular problem has a solution. This might be due to the namespace attribute in the XML root node itself. Eliminating this attribute solves my issue.

How to access an XML element in a single go?

I have an XML string like below:
<root>
<Test1>
<Result time="2">ProperEnding</Result>
</Test1>
<Test2></Test2>
I have to operate on these elements. Most of the time the elements are unique within their parent element. I am using XDocument. I can remember that there is a way to access an element like this.
XNode resultTest1 = GetNodes("/root//Test1//result")
But I forgot it. It is possible to access the same using linq:
doc.root.Elements.etc.etc.
But I want it using a single string as shown above. Can anybody say how to make it?
Descendants() will skip any number level of intermediate nodes, e.g. this will skip over root and Test1:
doc.Decendants("Result")
Also note that you can use XPath with Linq2Xml as well, e.g. XPathSelectElements
doc.XPathSelectElements("/root/Test1/Result");
You can skip intermediate levels of the hierarchy with // (or use // at the start of the xpath string to skip the root)
"/root//Result"
One caveat - Xml is case sensitive , so Result and result are not the same element.
The string you're referring to ("/root//Test1//result") is an XPath expression.
You can use it with LINQ to XML classes (like XDocument) using XPathEvaluate, XPathSelectElement, and XPathSelectElements extension methods.
You can find more info about these methods on MSDN: http://msdn.microsoft.com/en-us/library/vstudio/system.xml.xpath.extensions_methods(v=vs.90).aspx
To make them work, you need using System.Xml.XPath at the top of your file and System.Xml.Linq.dll assembly referenced (which is probably already there).
You can try to load your xml using XDocument:
// loads xml file with root element
XDocument xml = XDocument.Load("filename.xml");
Now you can append LINQ statements to your xml variable like this:
var retrieveSomeSpecificDataLikeListOfElementsAsAnonymousObjects = xml.Descendants("parentNodeName").Select(node => new { SomeSpecialValueYouWant = node.Element("elementNameUnderParentNode").Value }).ToList();
You can mix and do whatever you want - above is just an example.
Is this what you looking?
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml("YourXML");
XmlNodeList xmlNodes = xmlDocument.SelectNodes("/root/Test1/result");

Difference between XPathEvaluate on XElement or XDocument?

Somewhere in a C# program, I need to get an attribute value from an xml structure. I can reach this xml structure directly as an XElement and have a simple xpath string to get the attribute. However, using XPathEvaluate, I get an empty array most of the time. (Yes, sometimes, the attribute is returned, but mostly it isn't... for the exact same XElement and xpath string...)
However, if I first convert the xml to string and reparse it as an XDocument, I do always get the attribute back. Can somebody explain this behavior ? (Am using .NET 3.5)
Code that mostly returns an empty IEnumerable:
string xpath = "/exampleRoot/exampleSection[#name='test']/#value";
XElement myXelement = RetrieveXElement();
((IEnumerable)myXElement.XPathEvaluate(xpath)).Cast<XAttribute>().FirstOrDefault().Value;
Code that does always work (I get my attribute value):
string xpath = "/exampleRoot/exampleSection[#name='test']/#value";
string myXml = RetrieveXElement().ToString();
XDocument xdoc = XDocument.Parse(myXml);
((IEnumerable)xdoc.XPathEvaluate(xpath)).Cast<XAttribute>().FirstOrDefault().Value;
With the test xml:
<exampleRoot>
<exampleSection name="test" value="2" />
<exampleSection name="test2" value="2" />
</exampleRoot>
By suggestion related to a surrounding root, I did some 'dry tests' in a test program, using the same xml structure (txtbxXml and txtbxXpath representing the xml and xpath expression described above):
// 1. XDocument Trial:
((IEnumerable)XDocument.Parse(txtbxXml.Text).XPathEvaluate(txtbxXPath.Text)).Cast<XAttribute>().FirstOrDefault().Value.ToString();
// 2. XElement trial:
((IEnumerable)XElement.Parse(txtbxXml.Text).XPathEvaluate(txtbxXPath.Text)).Cast<XAttribute>().FirstOrDefault().Value.ToString();
// 3. XElement originating from other root:
((IEnumerable)(new XElement("otherRoot", XElement.Parse(txtbxXml.Text)).Element("exampleRoot")).XPathEvaluate(txtbxXPath.Text)).Cast<XAttribute>().FirstOrDefault().Value.ToString();
Result : case 1 and 3 produce the correct result, while case 2 throws a nullref exception.
If case 3 would fail and case 2 succeed, it would have made some sense to me, but now I don't get it...
The problem is that the XPath expression is starting with the children of the specified node. If you start with an XDocument, the root element is the child node. If you start with an XElement representing your exampleRoot node, then the children are the two exampleSection nodes.
If you change your XPath expression to "/exampleSection[#name='test']/#value", it will work from the element. If you change it to "//exampleSection[#name='test']/#value", it will work from both the XElement and the XDocument.

Get tags around text in HTML document using C#

I would like to search an HTML file for a certain string and then extract the tags. Given:
<div_outer><div_inner>Happy birthday<div><div>
I would like to search the HTML for "Happy birthday" then have a function return some sort of tag structure: this is the innermost tag, this is the tag outside that one, etc. So, <div_inner></div> then <div_outer></div>.
Any ideas? I am thinking HTMLAgilityPack but I haven't been able to figure out how to do it.
Thanks as always, guys.
The HAP is a good place indeed for this.
You can use the OuterHtml and Parent properties of a Node to get the enclosing elements and markup.
You could use xpath for this. I use //*[text()='Happy birthday'][1]/ancestor-or-self::* expression which finds a first (for simplicity) node which text content is Happy birthday, and then returns all the ancestors (parent, grandparent, etc.) of this node and the node itself:
var doc = new HtmlDocument();
doc.LoadHtml("<div_outer><div_inner>Happy birthday<div><div>");
var ancestors = doc.DocumentNode
.SelectNodes("//*[text()='Happy birthday'][1]/ancestor-or-self::*")
.Reverse()
.ToList();
It seems that the order of the nodes returned is the order the nodes found in the document, so I used Enumerable.Reverse method to reverse it.
This will return 2 nodes: div_inner and div_outer.

How to find a XML tag with a specific attribute (in C#)

I need to get a list of tags that contain a specific attribute. I am using DITA xml and I need to find out all tags that has a href attribute.
The problem here is that the attribute may be inside any tag so XPath will not work in this case. For example, an image tag may contain a href, a topicref tag may contain a href, and so on.
So I need to get a XmlNodeList (as returned by the getElementByTagName method). Ideally I need a method getElementByAttributeName that should return XmlNodeList.
I might have misunderstood your problem here, but I think you could possibly use an XPath expression.
var nodes = doc.SelectNodes("//*[#href='pic1.jpg']");
The above should return all elements with href='pic1.jpg', where doc is the XmlDocument
If you're on C#, then the following approach might work for you:
XDocument document = XDocument.Load(xmlReader);
XAttribute xa = new XAttribute("href", "pic1.jpg");
var attrList = document.Descendants().Where (d => d.Attributes().Contains(xa));

Categories

Resources