How to find path of root to node in xml with c#? - c#

I want to load all of element in memory and find a list of root to node paths for them. for example in this XML :
<SigmodRecord>
<issue>
<volume>11</volume>
<number>1</number>
<articles>
<article>
<title>Annotated Bibliography on Data Design.</title>
<initPage>45</initPage>
<endPage>77</endPage>
<authors>
<author position="00">Anthony I. Wasserman</author>
<author position="01">Karen Botnich</author>
</authors>
</article>
<article>
<title>Architecture of Future Data Base Systems.</title>
<initPage>30</initPage>
<endPage>44</endPage>
<authors>
<author position="00">Lawrence A. Rowe</author>
<author position="01">Michael Stonebraker</author>
</authors>
</article>
<article>
<title>Database Directions III Workshop Review.</title>
<initPage>8</initPage>
<endPage>8</endPage>
<authors>
<author position="00">Tom Cook</author>
</authors>
</article>
<article>
<title>Errors in 'Process Synchronization in Database Systems'.</title>
<initPage>9</initPage>
<endPage>29</endPage>
<authors>
<author position="00">Philip A. Bernstein</author>
<author position="01">Marco A. Casanova</author>
<author position="02">Nathan Goodman</author>
</authors>
</article>
</articles>
</issue>
</SigmodRecord>
the answer must be something like this :
1 /SigmodRecord
2 /SigmodRecord/issue
3 /SigmodRecord/issue/volume
4 /SigmodRecord/issue/number
5 /SigmodRecord/issue/articles
6 /SigmodRecord/issue/articles/article
7 /SigmodRecord/issue/articles/article/title
8 /SigmodRecord/issue/articles/article/authors
9 /SigmodRecord/issue/articles/article/initPage
10 /SigmodRecord/issue/articles/article/endPage
11 /SigmodRecord/issue/articles/article/authors/author

You can use XLinq to query the XML document and fetch root nodes and it`s descendants.
XDocument xDoc = XDocument.Load("myXml.xml");
XElement element = null;
if(xDoc!=null)
{
element=xDoc.Root;
}
var descendants=element.DescendantsAndSelf(); //Returns collection of descancdants
var descendants=element.DescendantsAndSelf("nodeName");//Filters to send only nodes with specified name.
Hope it helps!!!

One possible way, by recursively extracting path for each XML element * :
public static List<string> GetXpaths(XDocument doc)
{
var xpathList = new List<string>();
var xpath = "";
foreach(var child in doc.Elements())
{
GetXPaths(child, ref xpathList, xpath);
}
return xpathList;
}
public static void GetXPaths(XElement node, ref List<string> xpathList, string xpath)
{
xpath += "/" + node.Name.LocalName;
if (!xpathList.Contains(xpath))
xpathList.Add(xpath);
foreach(XElement child in node.Elements())
{
GetXPaths(child, ref xpathList, xpath);
}
}
Usage example in console application :
var doc = XDocument.Load("path_to_your_file.xml");
var result = GetXpaths(doc);
foreach(var path in result)
Console.WriteLine(path);
.NET Fiddle demo
*) Adapted from my old answer to another question. Note that this only worked for simple XML without namespace.

Related

XpathNavigator couldn't get the inner text of xml

This is my xml
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography" publicationdate="1981-03-22" ISBN="1-861003-11-0">
<author>
<title>The Autobiography of Benjamin Franklin</title>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel" publicationdate="1967-11-17" ISBN="0-201-63361-2">
<author>
<title>The Confidence Man</title>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
here is my code
XPathNavigator nav;
XPathNodeIterator nodesList = nav.Select("//bookstore//book");
foreach (XPathNavigator node in nodesList)
{
var price = node.Select("price");
string currentPrice = price.Current.Value;
var title = node.Select("author//title");
string text = title.Current.Value;
}
am getting the same output for both
The Autobiography of Benjamin FranklinBenjaminFranklin8.99
I will be having condition like if(price > 10) then get the title. how to fix this
The method XPathNavigator.Select() that you are calling here:
var price = node.Select("price");
Returns an XPathNodeIterator, so as shown in the docs you need to actually iterate through it, through either the old (c# 1.0!) style:
var price = node.Select("price");
while (price.MoveNext())
{
string currentPriceValue = price.Current.Value;
Console.WriteLine(currentPriceValue); // Prints 8.99
}
Or the newer foreach style, which does the same thing:
var price = node.Select("price");
foreach (XPathNavigator currentPrice in price)
{
string currentPriceValue = currentPrice.Value;
Console.WriteLine(currentPriceValue); // 8.99
}
In both examples above, the enumerator's current value is used after the first call to MoveNext(). In your code, you are using IEnumerator.Current before the first call to MoveNext(). And as explained in the docs:
Initially, the enumerator is positioned before the first element in the collection. You must call the MoveNext method to advance the enumerator to the first element of the collection before reading the value of Current; otherwise, Current is undefined.
The odd behavior you are seeing is as a result of using Current when the value is undefined. (I would sort of expect an exception to be thrown in such a situation, but all these classes are very old -- dating from c# 1.1 I believe -- and coding standards were less stringent then.)
If you are sure there will be only one <price> node and don't want to have to iterate through multiple returned nodes, you could use LINQ syntax to pick out that single node:
var currentPriceValue = node.Select("price").Cast<XPathNavigator>().Select(p => p.Value).SingleOrDefault();
Console.WriteLine(currentPriceValue); // 8.99
Or switch to SelectSingleNode():
var currentPrice = node.SelectSingleNode("price");
var currentPriceValue = (currentPrice == null ? null : currentPrice.Value);
Console.WriteLine(currentPriceValue); // 8.99
Finally, consider switching to LINQ to XML for loading and querying arbitrary XML. It's just much simpler than the old XmlDocument API.
You can use condition directly in an xpath expression.
XPathNodeIterator titleNodes = nav.Select("/bookstore/book[price>10]/author/title");
foreach (XPathNavigator titleNode in titleNodes)
{
var title = titleNode.Value;
Console.WriteLine(title);
}

How to go back to the root element in XML using C#?

I am new to XML Programming using C# and have been trying to grasp the concepts. I have a 2books.xml file which looks like
<!--sample XML fragment-->
<bookstore>
<book genre='novel' ISBN='10-861003-324'>
<title>The Handmaid's Tale</title>
<price>19.95</price>
</book>
<book genre='novel' ISBN='1-861001-57-5'>
<title>Pride And Prejudice</title>
<price>24.95</price>
</book>
<book genre='novel' ISBN='1-861991-57-9'>
<title>The Honor</title>
<price>20.12</price>
</book>
</bookstore>
Now using XmlReader when I try this following section of code
using (XmlReader xReader = XmlReader.Create(#"C:\Users\Chiranjib\Desktop\2books.xml"))
{
xReader.MoveToContent();
Console.WriteLine("-----------> Now "+xReader.Name);
Console.WriteLine("------Inner XML -----> "+xReader.ReadInnerXml()); //Positions the reader to the next root element type after the call
Console.WriteLine("------OuterXML XML -----> " + xReader.ReadOuterXml()); //Positions the reader to the next root element type after the call -- for a leaf node it reacts the same way as Read()
while (xReader.Read())
{
Console.WriteLine("In Loop");
if ((xReader.NodeType == XmlNodeType.Element) && (xReader.Name == "book"))
{
xReader.ReadToFollowing("price");
Console.WriteLine("---------- In Loop -------- Price "+xReader.GetAttribute("price"));
}
}
}
Console.ReadKey();
}
obviously xReader.ReadInnerXml() places the reader after call at the End of File and as a result of that xReader.ReadOuterXml() prints nothing.
Now I want xReader.ReadOuterXml() to be called successfully . How can I get back to my previous root node ?
I tried xReader.MoveToElement() but I guess it does not do so .
You can't really do that, as it's not what XmlReader was designed for. What you probably want is a much higher level API like LINQ to XML.
For example, you could loop through your books like this:
var doc = XDocument.Parse(xml);
foreach (var book in doc.Descendants("book"))
{
Console.WriteLine("Title: {0}", (string) book.Element("title"));
Console.WriteLine("ISBN: {0}", (string) book.Attribute("ISBN"));
Console.WriteLine("Price: {0}", (decimal) book.Element("price"));
Console.WriteLine("---");
}
See a working demo here: https://dotnetfiddle.net/m99eCl

Removing CDATA tag from XmlNode

I have an XmlNode which represents the following xml for example:
XmlNode xml.innerText =
<book>
<name><![CDATA[Harry Potter]]</name>
<author><![CDATA[J.K. Rolling]]</author>
</book>
I want to change this node so that it'll contain the following:
XmlNode xml.innerText =
<book>
<name>Harry Potter</name>
<author>J.K. Rolling</author>
</book>
Any ideas?Thanks!
well, if it's exactly how you put it, then it's easy:
xml.innerText = xml.innerText.Replace("![CDATA[","").Replace("]]","");
xmlDoc.Save();// xmlDoc is your xml document
I suggest you to read your entire xml and rewrite it. You can read values without cdata like this
foreach (var child in doc.Root.Elements())
{
string name = child.Name;
string value = child.Value
}

Select Parent XML(Entire Hierarchy) Elements based on Child element values LINQ

I have the following XML and query through the ID,how do get the Parent Hierarchy
<Child>
<Child1 Id="1">
<Child2 Id="2">
<Child3 Id="3">
<Child4 Id="4">
<Child5 Id="5"/>
<Child6 Id="6"/>
</Child4>
</Child3>
</Child2>
</Child1>
</Child>
In this if i query(Id = 4) and find out the Parent elements using Linq in the particular element how to get the following output with Hierarchy.
<Child>
<Child1 Id="1">
<Child2 Id="2">
<Child3 Id="3">
<Child4 Id="4"/>
</Child3>
</Child2>
</Child1>
</Child>
Thanks In Advance.
Assume you want just one node parent tree:
string xml = #"<Child>
<Child1 Id="1">
<Child2 Id="2">
<Child3 Id="3">
<Child4 Id="4">
<Child5 Id="5"/>
<Child6 Id="6"/>
</Child4>
</Child3>
</Child2>
</Child1>
</Child>";
TextReader tr = new StringReader(xml);
XDocument doc = XDocument.Load(tr);
IEnumerable<XElement> myList =
from el in doc.Descendants()
where (string)el.Attribute("Id") == "4" // here whatever you want
select el;
// select your hero element in some way
XElement hero = myList.FirstOrDefault();
foreach (XElement ancestor in hero.Ancestors())
{
Console.WriteLine(ancestor.Name); // rebuild your tree in a separate document, I print ;)
}
To search for every element of your tree iterate retrieve the node with the select query without the where clause and call the foreach for every element.
Based on the sample XML provided, you could walk up the tree to find the parent node once you've found the node in question:
string xml =
#"<Child>
<Child1 Id='1'>
<Child2 Id='2'>
<Child3 Id='3'>
<Child4 Id='4'>
<Child5 Id='5'/>
<Child6 Id='6'/>
</Child4>
</Child3>
</Child2>
</Child1>
</Child>";
var doc = XDocument.Parse( xml );
// assumes there will always be an Id attribute for each node
// and there will be an Id with a value of 4
// otherwise an exception will be thrown.
XElement el = doc.Root.Descendants().First( x => x.Attribute( "Id" ).Value == "4" );
// discared all child nodes
el.RemoveNodes();
// walk up the tree to find the parent; when the
// parent is null, then the current node is the
// top most parent.
while( true )
{
if( el.Parent == null )
{
break;
}
el = el.Parent;
}
In Linq to XML there is a method called AncestorsAndSelf on XElement that
Returns a collection of elements that contain this element, and the
ancestors of this element.
But it will not transform your XML tree the way you want it.
What you want is:
For a given element, find the parent
Remove all elements from parent but the given element
Remove all elements from the given element
Something like this in Linq (no error handling):
XDocument doc = XDocument.Parse("<xml content>");
//finding element having 4 as ID for example
XElement el = doc.Descendants().First(el => el.Attribute("Id").Value == "4");
el.RemoveNodes();
XElement parent = el.Parent;
parent.RemoveNodes();
parent.Add(el);
[Edit]
doc.ToString() must give you what you want as a string.
[Edit]
Using RemoveNodes instead of RemoveAll, the last one also removes attributes.
Removing nodes from the chosen element too.
I found the following way
XElement elementNode = element.Descendants()
.FirstOrDefault(id => id.Attribute("id").Value == "4");
elementNode.RemoveNodes();
while (elementNode.Parent != null)
{
XElement lastNode = new XElement(elementNode);
elementNode = elementNode.Parent;
elementNode.RemoveNodes();
elementNode.DescendantsAndSelf().Last().AddFirst(lastNode);
}
return or Print elementNode.

Get XML content from XmlNodeList

I have a question that may seem very simple, but it's giving me a headache. I have this XML file that has multiple entries, like:
<books>
<book>
<id>1</id>
<firstCover>
<author name="**" age="**" />
<title name="zz" font="yyy" size="uuu"/>
</firstCover>
<lastCover>
</lastCover>
</book>
<book>
<id>2</id>
<firstCover>
<author name="**" age="**" />
<title name="zz" font="yyy" size="uuu"/>
</firstCover>
<lastCover>
</lastCover>
</book>
</books>
Now, in order to get the XML content for first cover of book with id=1, I do this:
XmlNodeList b = root.SelectNodes("/books/book[contains(id,1)]/firstCover");
Then I would really need to take the whole content of what's inside the firstCover for that book :
<author name="**" age="**" />
<title name="zz" font="yyy" size="uuu"/>
and insert it into an XmlElement. This is where I'm stucked. I know I can do it with a foreach loop in XmlNodeList, but is there a more simple way?
I'm guessing you want to actually insert it into an XMLElement in another XMLDocument.
Is this what you are looking for?
XmlDocument sourceDoc = new XmlDocument();
//This is loading the XML you present in your Question.
sourceDoc.LoadXml(xmlcopier.Properties.Resources.data);
XmlElement root = sourceDoc.DocumentElement;
XmlElement b = (XmlElement)root.SelectSingleNode("/books/book[contains(id,1)]/firstCover");
XmlDocument destDoc = new XmlDocument();
XmlElement destRoot = destDoc.CreateElement("base");
destDoc.AppendChild(destRoot);
XmlElement result = destDoc.CreateElement("firstCover");
result.InnerXml = b.InnerXml;
destRoot.AppendChild(result);
destDoc.Save("c:\\test.xml");

Categories

Resources