Reading an XML File and Selecting Nodes in .NET - c#

I have a heavier XML file with lots and lots of tree nodes. I need to pick-up some particular node (for example say Diet), under which there are multiple sections.
ie. Diet node occurs randomly in the XML, so i need to find the node as Diet and get its child elements and save it to DB.
Assume that Diet is not only one line, it has 10-12 entries underneath it (may be i can get its contents using InnerXML, but really can't get line by line nodes)

Make sure you have added a reference to "System.xml.Linq'.
Suck out all the Diet elements:
XElement wholeFile = XElement.Load(#"C:\DietSampleXML.xml");
IEnumerable<XElement> dietElements = wholeFile.Descendants("Diet");
If you set a breakpoint and hover the mouse over "dietElements" and click "Results View", you will see all the Diet elements and their inner xml.
Now iterate through dietElements to add each element and/or children to your database: "foreach (XElement x in dietElements) { ... }"
I tested this with the following xml:
<?xml version="1.0" encoding="utf-8" ?>
<TestElement>
<Diet>
<Name>Atkins</Name>
<Colories>1000</Colories>
</Diet>
<TestElement2>
<Diet>
<Name>Donuts Only</Name>
<Calories>1500</Calories>
</Diet>
</TestElement2>
<TestElement3>
<TestElement4>
<Diet>
<Name>Vegetarian</Name>
<Calories>500</Calories>
</Diet>
</TestElement4>
</TestElement3>
</TestElement>

Depending on the structure of your XML file, you might try loading it into a DataSet (DataSet.ReadXML()) and see what DataTable it puts your Diet nodes into ... if it parses it ok then it is pretty simple to loop through the DataTable and get all your Diet node values.
I wrote a little toy app that opens XML like that, listing all the DataTables in a tree view then showing the table content in a grid. The VS project file for it is here or just an MSI to install it is here, if you want to see how a DataSet parses your XML file.

In XPath, it's just //Diet
To say more, I'd need to know more about your environment.
var doc = XDocument.Load("yourfile.xml");
var nodes = from d in doc.Desendants("Diet")
select d;
foreach(var node in nodes)
{ // do stuff with node
}

The pseudo code below, contains the XPath statement that would get you all elements who have a 'Diet' as parent. Since it produces a XmlNodeList you can walk every node and save it to the DB. For performance i would consider consolidating what you want to save, and then save it, not per line (round trip for every entry is sub-optimal)
XmlNodeList list = xDoc.DocumentElement.SelectNodes("//*[parent::Diet]");
foreach (XmlNode entry in list)
{
DAL.SaveToDatabase(entry);
}
Hope this helps,

Related

Iterate through XDocument when you dont know the structure

Is there any way to iterate through a XDocument when you dont know what the XML structure is (using c#)?
There is plenty of examples when you know the structure, like the answer to this question : C# - Select XML Descendants with Linq and C# Foreach XML Node
I've tried Descendants("A") where A is the example below - which in my foreach returns me one element with the name as the root and the value as 'all of the values concatinated into one string'
The reason I'm doing this is to anonymize certain nodes which I know the names.
The XDocument's I'm loading can be of any shape - so i've decided to just create a list which users can add to which contains these sensitive elements.
A solution I want to avoid is users creating XPath's for sensitive fields.
The XML is also sensitive so I cant share online literally but one example (out of 5) would look.
<A>
<B>
<C>
<D>
<dee>value1</dee>
<doo>value2</doo>
<date>value3</date>
<time>value4</time>
</D>
</C>
</B>
<E>
...ommited..this doc is 5000 lines long with 500~ unique node names
</E>
............
</A>
So is there a way to iterate without using Descendants?
Use .Descendants() to iterate every element.
xmlDoc.Root.Descendants()
.ToList()
.ForEach(e => Console.WriteLine(e.Name));
This is the way I went about it.
Descendants means you know the structure of the nodes before hand. Even with an empty method call to descendants (which should get everything from the root) wasn't giving me what I was expecting.
The below code should work for any XML document, without knowing the structure.
XmlDocument doc = new XmlDocument();
doc.Load(file);
using (XmlReader reader = new XmlNodeReader(doc))
{
while (reader.Read())
{
currentNodeName = reader.Name;

Check for Duplicate Values in Each Child in XML via LINQ

I'm new to using linq in particular linq to xml and am having trouble trying to iterate through the results. My xml document has multiple nodes of the same name nested in a single parent node.
Sample XML is :
<commercial>
<listingAgent>1</listingAgent>
<listingAgent>2</listingAgent>
<listingAgent>1</listingAgent>
</commercial>
<commercial>
<listingAgent>1</listingAgent>
<listingAgent>2</listingAgent>
<listingAgent>3</listingAgent>
</commercial>
So for each commercial tag there should be unique listing agent values. If not i need to raise an error.
The real XML is extremely complicated and these tags are nowhere near root. So i need to traverse to these and then search for duplciates
I tried the following code
foreach (XElement e in root.Descendants("listingAgent"))
{
listerror.Add(e.Value);
}
if(listerror.Count != listerror.Distinct().Count())
Then show error
But i need this looping to be done for each commercial.
First, select all the commercial nodes, then for each node you can get the list of agents values using a Select, this way you will get a list of list, and finally you can apply the same condition you try before, but now for each list of agents:
var result= xdoc.Descendants("commercial")
.Select(c=>c.Descendants("listingAgent").Select(e=>e.Value));
if(result.Any(e=>e.Count()!= e.Distinct().Count())
{
//error
}

Query XDocument based on the depth of the node

I have the following XML and I want to be able to query the XML based on the depth of it. I am aware of the depth before hand.
UPDATED QUESTION:
I have the following XML and I want to be able to query the XML based on if the nodes are repetitive.
So, this is my XML
<Books>
<BookID>12345</BookID>
<BookName>BookName</BookName>
<Authors>
<Author>
<Name>AuthorNameOne</Name>
<City>New York</City>
</Author>
<Author>
<Name>AuthorNameTwo</Name>
<City>New York</City>
</Author>
</Authors>
</Books>
Via XDocument I want to be able to query this XML and get node names for the elements where there is repetitive data such as Authors. Or I want to be able to query it based on the Depth of the Node.
UPDATED QUESTION:
Via XDocument I want to be able to query this XML and get node names for the elements where there is repetitive data such as Authors.
Any help will be much appreciated.
Your XML still doesn't really make sense, but I'm putting together this answer hoping that it will at least point you in the right direction. I'm going to completely ignore the portion of your question that references node depth because I'm not really sure how it applies to the following question that you posted:
Via XDocument I want to be able to query this XML and get node names for the elements where there is repetitive data such as Authors.
Here's how to do just that simple type query assuming that your XDocument is named xml:
List<XElement> repeatedNodes = new List<XElement>();
for(XElement node in xml.Descendants())
{
if(node.Parent.Elements(node.Name).Count() > 1))
{
repeatedNodes.Add(node);
}
}
Here's the same code compressed into a lambda that will provide you with an IEnumerable<XElement> containing all of the elements that would go into the List in my first example:
var dupes = xml.Descandants().Where(n => n.Parent.Elements(n.Name).Count() > 1);
This algorithm will look at every node in the xml tree and then from the parent of the current node it will count how many nodes with that same name exist. If that number is greater than one it will add it to our list of repeated nodes. This does not care what depth the current node is and it will only count duplicate named nodes at the same depth. Additionally this algorithm will put dupes in the List structure, but you can add in your own logic to prevent it from doing that or use a different structure that doesn't allow duplicates.

Saving "skipped" nodes in xml into array

In my code, I am downloading an xml file, and because one of the nodes is variable (both name and count of them), I use code like this:
XmlNodeList arrivals = airplanes.SelectNodes("/myXml/flights/*/arrivals");
Now what I need to do, is saving names of the nodes skipped by "*" into an array, or arraylist, something like that. Later I will need to use some foreach to do something with each of the nodes, now saved as strings. I have tried
foreach(* in MyArrayList)
and that doesnt work, I get a number of errors there, assuming I cant use the " * " here.
Each XmlNode in the XmlNodeList has a ParentNode property, you should be able to use that to navigate back up from the arrivals node in the xml to the * node.
The following Linq query should get the names:
var names = arrivals.Cast<XmlNode>().Select(x => x.ParentNode.Name).ToList();
The Cast<XmlNode> is needed because XmlNodeList doesn't implement the generic IEnumerable interface.

Get specific data from XML document

I have xml document like this:
<level1>
<level2>
<level3>
<attribute1>...</attribute1>
<attribute2>false</attribute2>
<attribute3>...</attribute3>
</level3>
<level3>
<attribute1>...</attribute1>
<attribute2>true</attribute2>
<attribute3>...</attribute3>
</level3>
</level2>
<level2>
<level3>
<attribute1>...</attribute1>
<attribute2>false</attribute2>
...
...
...
I'm using c#, and I want to go thru all "level3", and for every "level3", i want to read attribute2, and if it says "true", i want to print the corresponding attribute3 (can be "level3" without these attributes).
I keep the xml in XmlDocument.
Then I keep all the "level3" nodes like this:
XmlNodeList xnList = document.SelectNodes(String.Format("/level1/level2/level3"));
(document is the XmlDocument).
But from now on, I don't know exactly how to continue. I tried going thru xnList with for..each, but nothing works fine for me..
How can I do it?
Thanks a lot
Well I'd use LINQ to XML:
var results = from level3 in doc.Descendants("level3")
where (bool) level3.Element("attribute2")
select level3.Element("attribute3").Value;
foreach (string result in results)
{
Console.WriteLine(result);
}
LINQ to XML makes all kinds of things much simpler than the XmlDocument API. Of course, the downside is that it requires .NET 3.5...
(By the way, naming elements attributeN is a bit confusing... one would expect attribute to refer to an actual XML attribute...)
You can use LINQ to XML and reading this is a good start.
You can use an XPath query. This will give you a XmlNodeList that contains all <attribute3> elements that match your requirement:
var list = document.SelectNodes("//level3[attribute2 = 'true']/attribute3");
foreach(XmlNode node in list)
{
Console.WriteLine(node.InnerText);
}
You can split the above xpath query in three parts:
"//level3" queries for all descendant elements named <level3>.
"[attribute2 = 'true']" filters the result from (1) and only keeps the elements where the child element <attribute2> contains the text true.
"/attribute3" takes the <attribute3> childnode of each element in the result of (2).

Categories

Resources