XPath accessing a non-uniform XML file

XPath accessing a non-uniform XML file - c#

I have an XML file that's modeled something like the following:
<data>
<customer>
<id></id>
<name></name>
<model>
<id></id>
<name></name>
<item>
<id></id>
<history>
<date></date>
<location></location>
</history>
</item>
<item>
<id></id>
<history>
<date></date>
<location></location>
</history>
</item>
</model>
<model>
<id></id>
<name></name>
<item>
<id></id>
<history>
<date></date>
<location></location>
</history>
</item>
</model>
</customer>
<customer>
<id></id>
<name></name>
<model>
<id></id>
<name></name>
<item>
<id></id>
<history>
<date></date>
<location></location>
</history>
</item>
<item>
<id></id>
<history>
<date></date>
<location></location>
</history>
</item>
</model>
</customer>
<customer>
<id></id>
<name></name>
</customer>
</data>
Using XPath in C#, I need to access the following for each customer:
customer/id
customer/name
customer/model/id
customer/model/name
customer/model/item/id
customer/model/item/history/date
customer/model/item/history/location
When data does not exist for any given customer, then the result stored will be null, since all fields of my customer object must be populated. If the XML file was uniform, this would be easy. My problem is accessing each customer's data when each customer may potentially have a different number of model and item nodes. Any ideas?

Assuming that the result will consist of these types of object:
Customer, Model, Item and History
The pseudo - code for populating them is:
Select all /data/customer elements
For each of the nodes selected in 1. do:
Select ./id and ./name and populate the corresponding properties of the object.
Populate a List<Model> property from all model children of the current customer element:
Select all ./model children of the current element.
For each selected model element in 5. create a Model object and populate its properties:
For the current Model object populate its Id and Name properties by selecting the ./id and ./name children of the current model element.
Populate a List<Item> property from all item children of the current model element:
Select all ./item children of the current element.
For the current Item object populate its Id property by selecting the ./id child of the current item element.
In a similar way populate the History property of the Item object with a History object, that you create and populate from the ./history child of the current item element.
Of course, you can skip all this if you use XML Serialization -- read about this here: http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.aspx

Related

Iterating through linq results results in more items than query count

I'm fairly new to LINQ but this seemed pretty straightforward.
I have an XML doc which contains a structure like this:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<folders>
<folder id="-1" parent="-100">
<name><![CDATA[Root]]></name>
<children>
<folder id="2" parent="-1">
<name><![CDATA[Contribution]]></name>
<documents />
<children>
<folder id="775" parent="2">
<name><![CDATA[category1]]></name>
<documents />
<children>
<folder id="2319" parent="775">
<name><![CDATA[Acad_Depts1]]></name>
<documents />
<children>
<folder id="26965" parent="2319">
<name><![CDATA[Student1]]></name>
<documents>
<document>
</document>
</documents>
</folder
</children>
</folder>
<folder id="2319" parent="775">
<name><![CDATA[Acad_Depts2]]></name>
<documents />
<children>
<folder id="26965" parent="2319">
<name><![CDATA[Student1]]></name>
<documents>
<document>
</document>
</documents>
</folder
</children>
</folder>
etc...
</children>
</folder>
</children>
</folder>
</children>
</folder>
</folders>
What I'm trying to do is to select all the elements with an attribute 'parent="775"'.
XElement xelement = XElement.Load("folders_only_registrar_folder.xml");
IEnumerable <XElement> folders = xelement.Elements();
var query = from node in folders.Descendants("folder")
where node.Attribute("parent").Value == registrarNodeID
select node;
Console.WriteLine(query.Count());
Console.ReadKey();
foreach(XElement departmentNode in query.Descendants("name"))
{
Console.WriteLine(departmentNode.Value.ToString());
}
When I run the query and test the count, I get 48 results (which is good)... but when I try to write out those same nodes, I get hundreds of results. For some reason it's giving me almost ALL of the elements named "folder" including children folders.
Thoughts as to what I'm doing wrong?
UPDATE... ok so now I know why i'm getting all the folders but any thoughts on how to create a collection of each grouping of nodes and sub-nodes?
Can the selection in LINQ send each 775 folder node (plus it's collective sub-nodes) into some sort of collection of nodes and then I could parse through them in a foreach by grouping of node?

Replace query.Descendants() with just query. query.Descendants() gets every child of every node that was originally contained within query.

How to get nodes where his child node has some value

I have below xml structure:
<?xml version="1.0" encoding="UTF-8" ?>
<rss>
<channel>
<item>
<title>Some Title</title>
</wp:comment>
<wp:comment_approved><![CDATA[1]]></wp:comment_approved>
</wp:comment>
</wp:comment>
<wp:comment_approved><![CDATA[1]]></wp:comment_approved>
</wp:comment>
</item>
</channel>
</rss>
I can easily get all wp:comments by:
xmlNode.SelectNodes("*[name()='wp:comment']")
But how can I get all wp:comments where wp:comment_approved has value 1?

it's updated #Stefan Hegny answer, as you need not comment_approvedelement, but wp:comment
xmlNode.SelectNodes("//*[name()='wp:comment'][./*[local-name() = 'comment_approved' and . = '1']")
i'm not sure if default css locators working here, but in common css xPath i'll use this locator (logic is simple - you seach some element, that contains element with special parameter, so you can adjust this locator to your needs):
//someTag[./innerTag[text() = '1']]

Edit an XML file without rewriting using XmlSerializer in C#

I have an XML file eg:
<?xml version="1.0" encoding="utf-8"?>
<items>
<item>
<id>1</id>
<details></details>
<description></description>
</item>
<item>
<id>2</id>
<details>
</details>
<description></description>
</item>
</items>
Now say suppose I want to modify an XML file such that I want to add some data to details tag for item with id=2. Using XML serializer, I would have to read the whole XML file then select item with item Id 2 and modify that class object and write whole file again? So for every update I would have to read the whole xml file into memory, then edit it in memory and then re-write as a whole to the disk?
Is there any other way to achieve this? Like could I have alogic which would simply update the node to the existing XML file?

Linq XML add new parent

With linq XML is it possible to add a new father to existing nodes?
Take this XML excerpt:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<items>
<book>
<title>Title 1</title>
<author>Author 1</author>
</book>
<book>
<title>Title 2</title>
<author>Author 2</author>
</book>
<car>
<model>Tesla</model>
</car>
</items>
Is it possible to add a new father "books" to book like this:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<items>
<books>
<book>
<title>Title 1</title>
<author>Author 1</author>
</book>
<book>
<title>Title 2</title>
<author>Author 2</author>
</book>
</books>
<car>
<model>Tesla</model>
</car>
</items>
This is not working because it is cloning the nodes:
doc.Element("items").Add(new XElement("books",doc.Element("items").Elements("book")));

You can remove your existed <book> elements from <items> node after adding them under new <books> parent node:
var books = doc.Element("items").Elements("book");
doc.Element("items").Add(new XElement("books", books));
books.Remove();

How to use LINQ to get data from an XML file?

I have xml files which look like this:
<?xml version="1.0" encoding="utf-8"?>
<record id="177" restricted="false">
<type>record type</type>
<startdate>2000-10-10</startdate>
<enddate>2014-02-01</enddate>
<titles>
<title xml:lang="en" type="main">Main title</title>
<!-- only one title element with type main -->
<title xml:lang="de" type="official">German title</title>
<!-- can have more titles of type official -->
</titles>
<description>description of the record</description>
<categories>
<category id="122">
<name>category name</name>
<description>category description</description>
</category>
<!-- can have more categories -->
</categories>
<tags>
<tag id="5434">
<name>tag name</name>
<description>tag description</description>
</tag>
<!-- can have more tags -->
</tags>
</record>
How do I select the data from these xml files using LINQ, or should I use something else?

You can load xml into XDocument objects using either the Load() method
for files, or the Parse() method for strings:
var doc = XDocument.Load("your-file.xml");
// OR
var doc = XDocument.Parse(yourXmlString);
Then you can access the data using LINQ:
var titles =
from title in doc.XPathSelectElements("//title")
where title.Attribute("type").Value == "official"
select title.Value;

Was searching for examples of Xmlserializer and found this: How to Deserialize XML document
So why not to try. I did Ctrl+C and Edit -> Paste Special -> Paste XML As Classes in Visual Studio 2013 and... Whoa I got all the classes generated. One condition target framework must be 4.5 and this function is available from Visual Studio 2012+ (as stated in that post)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

XPath accessing a non-uniform XML file - c#

Related

Iterating through linq results results in more items than query count

How to get nodes where his child node has some value

Edit an XML file without rewriting using XmlSerializer in C#

Linq XML add new parent

How to use LINQ to get data from an XML file?

Categories

Resources