XML parsing a subtree in C#

XML parsing a subtree in C# - c#

I have many xml-files which I need to parse. The xml-files are loaded from elsewhere. I can examine these files to get the paths I need to extract my desired data. The paths aren't the same.
So I added the paths in an ini-file for each xml-file. This works fine for 5 of 6 files.
WebClient client = new WebClient();
data = client.DownloadData("ftp://some.site/my.xml");
MemoryStream stream = new MemoryStream(data);
XmlDocument xml_doc = new XmlDocument();
xml_doc.Load(stream);
var prod_ids = xml_doc.DocumentElement.SelectNodes("/Catalog/Products/Product/Product_Id/text()");
foreach (XmlNode node in prod_ids) {
[...]
}
In the last file I need to get 2 information from one subtree at once, because I have to combine them in one string, therefore reading all nodes seperatly doesn't work. See Example-XML:
<Catalog>
<Created><![CDATA[2020-11-16T00:22:11+01:00]]></Created>
<Products>
<Product>
<Product_Id><![CDATA[ABC]]></Product_Id>
<Color_Code><![CDATA[123]]></Color_Code>
<Size><![CDATA[]]></Size>
<Length>210</Length>
<Width>0</Width>
</Product>
<Product>
<Product_Id><![CDATA[ABC]]></Product_Id>
<Color_Code><![CDATA[456]]></Color_Code>
<Size><![CDATA[]]></Size>
<Length>44</Length>
<Width>55</Width>
</Product>
<Product>
<Product_Id><![CDATA[XYZ]]></Product_Id>
<Color_Code><![CDATA[123]]></Color_Code>
<Size><![CDATA[]]></Size>
<Length>150</Length>
<Width>11</Width>
</Product>
</Products>
</Catalog>
I'm lookig for some code which parses each subtree (/Catalog/Products/Product) in which I can read the innerText from Product_Id and Color_Code to combine them to one string.
Any ideas?

You're really close, but you're going too low in the DOM tree. Instead of looping through each Product/ProductID, start your loop at each Product, then in the loop get each ProductID / ColorCode.
foreach( XmlElement ndProduct in xml.SelectNodes( "//Product") ) {
XmlElement ndProductID = (XmlElement)ndProduct.SelectSingleNode("Product_Id");
string strProductID = ndProductID.InnerText;
XmlElement ndColorCode = (XmlElement)ndProduct.SelectSingleNode("Color_Code");
string strColorCode = ndColorCode.InnerText;
string strReturn = strProductID + " - " + strColorCode;
}

Use a more modern linq to xml.
var doc = XDocument.Load(stream);
var values = doc.Root
.Element("Products")
.Elements("Product")
.Select(p => p.Element("Product_Id").Value + p.Element("Color_Code").Value);
foreach (var value in values)
Console.WriteLine(value);
I can offer the following solution.
Get the values of different nodes using the OR operation |.
Then we go through the collection with an increment of two and combine the values.
var prod_ids = xml_doc.DocumentElement.SelectNodes(
"/Catalog/Products/Product/Product_Id | /Catalog/Products/Product/Color_Code");
for (int i = 0; i < prod_ids.Count; i += 2)
Console.WriteLine(prod_ids[i].InnerText + prod_ids[i + 1].InnerText);

Related

how to take a specific child node value in xml using C#

I have an xml schema like below
<library>
<book>
<id>1</id>
<name>abc</name>
<read>
<data>yes</data>
<num>20</num>
</read>
</book>
<book>
<id>20</id>
<name>xyz</name>
<read>
<data>yes</data>
<num>32</num>
</read>
</book>
</library>
Now if the id is 20 i need to take the value of tag <num> under <read>
I done the code as below
var xmlStr = File.ReadAllText("e_test.xml");
var str = XElement.Parse(xmlStr);
var result = str.Elements("book").Where(x => x.Element("id").Value.Equals("20")).ToList();
this give the whole <book> tag with id 20. From this how can I extract only the value of tag <num>.
ie is i need to get the value 32 in to a variable

Before you try to extract the num value, you need to fix your Where clause - at the moment you're comparing a string with an integer. The simplest fix - if you know that your XML will always have an id element which has a textual value which is an integer - is to cast the element to int.
Next, I'd use SingleOrDefault to make sure there's at most one such element, assuming that's what your XML document should have.
Then you just need to use Element twice to navigate down via read and then num, and cast the result to int again:
// Or use XDocument doc = ...; XElement book = doc.Root.Elements("book")...
XElement root = XElement.Load("e_test.xml")
XElement book = root.Elements("book")
.Where(x => (int) x.Element("id") == 20)
.SingleOrDefault();
if (book == null)
{
// No book with that ID
}
int num = (int) book.Element("read").Element("num");

If you're not dead set on using Linq, how about this XPath? XPath is probably more widely understood, and is really simple. The XPath to find your node would be:
/library/book[id=20]/read/num
Which you could use in C# thus:
var doc = new XmlDocument();
doc.LoadXml(myString);
var id = 20;
var myPath = "/library/book[id=" + id + "]/read/num";
var myNode = doc.SelectSingleNode(myPath);
Then you can do whatever you like with myNode to get its value etc..
Helpful reference:
http://www.w3schools.com/xsl/xpath_syntax.asp

XML - where node has same name but different values

I am trying to read an xml file (and later import the data in to a sql data base) which contains employees names address' etc.
The issue I am having is that in the xml the information for the address for the employee the node names are all the same.
<Employee>
<EmployeeDetails>
<Name>
<Ttl>Mr</Ttl>
<Fore>Baxter</Fore>
<Fore>Loki</Fore>
<Sur>Kelly</Sur>
</Name>
<Address>
<Line>Woof Road</Line>
<Line>Woof Lane</Line>
<Line>Woofington</Line>
<Line>London</Line>
</Address>
<BirthDate>1985-09-08</BirthDate>
<Gender>M</Gender>
<PassportNumber>123756rt</PassportNumber>
</EmployeeDetails>
</Employee>
I all other items are fine to extract and I have tried to use Linq to iterate through each "Line" node but it always just gives be the first Line and not the others.
var xAddreesLines = xEmployeeDetails.Descendants("Address").Select(x => new
{
address = (string)x.Element("Line").Value
});
foreach (var item in xAddreesLines)
{
Console.WriteLine(item.address);
}
I need to able to when I'm importing to my sql db that address line is separate variable
eg
var addressline1 = first <line> node
var addressline2 = second <line> node etc etc.
Any advice would be most welcome.

This should give you the expected output:-
var xAddreesLines = xdoc.Descendants("Address")
.Elements("Line")
.Select(x => new { address = (string)x });
You need to simply fetch the Line elements present inside Address node and you can project them. Also note there is no need to call the Value property on node when you use explicit conversion.

You can do it like this:
using System.Xml;
.
.
.
XmlDocument doc = new XmlDocument();
doc.Load("source.xml");
// if you have the xml in a string use doc.LoadXml(stringvar)
XmlNamespaceManager nsmngr = new XmlNamespaceManager(doc.NameTable);
XmlNodeList results = doc.DocumentElement.SelectNodes("child::Employee", nsmngr);
foreach (XmlNode result in results)
{
XmlNode namenode = result.SelectSingleNode("Address");
XmlNodeList types = result.SelectNodes("line");
foreach (XmlNode type in types)
{
Console.WriteLine(type.InnerText);
}
XmlNode fmtaddress = result.SelectSingleNode("formatted_address");
}
Refer to this question for the original source.

Extract part of a big XML

I have to extract a part of an XML. My XML file can contain thousands of nodes and I would like to get only a part of it and have this part as an xml string.
My XML structure:
<ResponseMessage xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<ErrorResponse>
<Code>SUCCESS</Code>
<Message>Success</Message>
</ErrorResponse>
<OutputXml>
<Response>
<Product>
<child1>xxx</child1>
<child2>xxx</child2>
...
</Product>
<Product>
<child1>xxx</child1>
<child2>xxx</child2>
...
</Product>
...
</Response>
</OutputXML>
</ResponseMessage>
I'm getting the XML from a webservice like that:
...
System.Net.WebResponse wResponse = req.GetResponse();
reqstream = wResponse.GetResponseStream();
System.IO.StreamReader reader = new System.IO.StreamReader(reqstream);
System.Xml.Linq.XDocument xmlResponse = System.Xml.Linq.XDocument.Parse(reader.ReadToEnd());
Then I tried to put the XML in a generic collection to process it using linq:
int startIndex = 0;
int nbItem = 25;
System.Text.StringBuilder outputXml = new System.Text.StringBuilder();
System.Collections.Generic.IEnumerable<System.Xml.Linq.XElement> partialList =
xmlResponse.Elements("Response").Skip(startIndex).Take(nbItem);
foreach (System.Xml.Linq.XElement x in partialList)
{
outputXml.Append(x.ToString());
}
My problem is that my list is always empty.

You can use an LINQ To Xml by using the following code:
IEnumerable<XElement> elements = xmlResponse.Root.Element("OutputXml").Element("Response").Elements("Product");
foreach(XElement element in elements)
{
// Do Work Here
}
This will filter the list down to just products and it will select them correctly without using an index. Using indexes with xml is not the greatest idea because the xml can change.

You can use XPathEvaluate to read a subtree.
If your list is empty, chances are it is namespace problem, so you did not account for this namespace in your code xmlns:i="http://www.w3.org/2001/XMLSchema-instance". XDocument/XElement cannot resolve namespaces automatically.
See this topic on how to use namespaces with LINQ-to-XML.

Get certain xml node and save the value

Considering the following XML:
<Stations>
<Station>
<Code>HT</Code>
<Type>123</Type>
<Names>
<Short>H'bosch</Short>
<Middle>Den Bosch</Middle>
<Long>'s-Hertogenbosch</Long>
</Names>
<Country>NL</Country>
</Station>
</Stations>
There are multiple nodes. I need the value of each node.
I've got the XML from a webpage (http://webservices.ns.nl/ns-api-stations-v2)
Login (--) Pass (--)
Currently i take the XML as a string and parse it to a XDocument.
var xml = XDocument.Parse(xmlString);
foreach (var e in xml.Elements("Long"))
{
var stationName = e.ToString();
}

You can retrieve "Station" nodes using XPath, then get each subsequent child node using more XPath. This example isn't using Linq, which it looks like you possibly are trying to do from your question, but here it is:
XmlDocument xml = new XmlDocument();
xml.Load(xmlStream);
XmlNodeList stations = xml.SelectNodes("//Station");
foreach (XmlNode station in stations)
{
var code = station.SelectSingleNode("Code").InnerXml;
var type = station.SelectSingleNode("Type").InnerXml;
var longName = station.SelectSingleNode("Names/Long").InnerXml;
var blah = "you should get the point by now";
}
NOTE: If your xmlStream variable is a String, rather than a Stream, use xml.LoadXml(xmlStream); for line 2, instead of xml.Load(xmlStream). If this is the case, I would also encourage you to name your variable to be more accurately descriptive of the object you're working with (aka. xmlString).

This will give you all the values of "Long" for every Station element.
var xml = XDocument.Parse(xmlStream);
var longStationNames = xml.Elements("Long").Select(e => e.Value);

Split XML document apart creating multiple output files from repeating elements

I need to take an XML file and create multiple output xml files from the repeating nodes of the input file. The source file "AnimalBatch.xml" looks like this:
<?xml version="1.0" encoding="utf-8" ?>
<Animals>
<Animal id="1001">
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>
<Animal id="1002">
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>
<Animal id="1003">
<Quantity>Three</Quantity>
<Color>Blind</Color>
<Name>Mice</Name>
</Animal>
</Animals>
The program needs to split the repeating "Animal" and produce 3 files named: Animal_1001.xml, Animal_1002.xml, and Animal_1003.xml
Each output file should contain just their respective element (which will be the root). The id attribute from AnimalsBatch.xml will supply the sequence number for the Animal_xxxx.xml filenames. The id attribute does not need to be in the output files.
Animal_1001.xml:
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>
Animal_1002.xml
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>
Animal_1003.xml>
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Three</Quantity>
<Adjective>Blind</Adjective>
<Name>Mice</Name>
</Animal>
I want to do this with XmlDocument, since it needs to be able to run on .Net 2.0.
My program looks like this:
static void Main(string[] args)
{
string strFileName;
string strSeq;
XmlDocument doc = new XmlDocument();
doc.Load("D:\\Rick\\Computer\\XML\\AnimalBatch.xml");
XmlNodeList nl = doc.DocumentElement.SelectNodes("Animal");
foreach (XmlNode n in nl)
{
strSeq = n.Attributes["id"].Value;
XmlDocument outdoc = new XmlDocument();
XmlNode rootnode = outdoc.CreateNode("element", "Animal", "");
outdoc.AppendChild(rootnode); // Put the wrapper element into outdoc
outdoc.ImportNode(n, true); // place the node n into outdoc
outdoc.AppendChild(n); // This statement errors:
// "The node to be inserted is from a different document context."
strFileName = "Animal_" + strSeq + ".xml";
outdoc.Save(Console.Out);
Console.WriteLine();
}
Console.WriteLine("END OF PROGRAM: Press <ENTER>");
Console.ReadLine();
}
I think I have 2 problems.
A) After doing the ImportNode on node n into outdoc, I call outdoc.AppendChild(n) which complains: "The node to be inserted is from a different document context." I do not know if this is a scope issue referencing node n within the ForEach loop - or if I am somehow not using ImportNode() or AppendChild properly. 2nd argument on ImportNode() is set to true, because I want the child elements of Animal (3 fields arbitrarily named Quantity, Adjective, and Name) to end up in the destination file.
B) Second problem is getting the Animal element into outdoc. I'm getting '' but I need ' ' so I can place node n inside it. I think my problem is how I am doing: outdoc.AppendChild(rootnode);
To show the xml, I'm doing: outdoc.Save(Console.Out); I do have the code to save() to an output file - which does work, as long as I can get outdoc assembled properly.
There is a similar question at: Split XML in Multiple XML files, but I don't understand the solution code yet. I think I'm pretty close on this approach, and will appreciate any help you can provide.
I'm going to be doing this same task using XmlReader, since I'm going to need to be able to handle large input files, and I understand that XmlDocument reads the whole thing in and can cause memory issues.

That's a simple method that seems what you are looking for
public void test_xml_split()
{
XmlDocument doc = new XmlDocument();
doc.Load("C:\\animals.xml");
XmlDocument newXmlDoc = null;
foreach (XmlNode animalNode in doc.SelectNodes("//Animals/Animal"))
{
newXmlDoc = new XmlDocument();
var targetNode = newXmlDoc.ImportNode(animalNode, true);
newXmlDoc.AppendChild(targetNode);
newXmlDoc.Save(Console.Out);
Console.WriteLine();
}
}

This approach seems to work without using the "var targetnode" statement. It creates an XmlNode object called targetNode from outdoc's "Animal" element in the ForEach loop. I think the main things that were problems in my original code were: A) I was getting nodelist nl incorrectly. And B) I couldn't "Import" node n, I think because it was associated specifically with doc. It had to be created as its own Node.
The problem with the prior proposed solution was the use of the "var" keyword. My program has to assume 2.0 and that came in with v3.0. I like Rogers solution, in that it is concise. For me - I wanted to do each thing as a separate statement.
static void SplitXMLDocument()
{
string strFileName;
string strSeq;
XmlDocument doc = new XmlDocument(); // The input file
doc.Load("D:\\Rick\\Computer\\XML\\AnimalBatch.xml");
XmlNodeList nl = doc.DocumentElement.SelectNodes("//Animals/Animal");
foreach (XmlNode n in nl)
{
strSeq = n.Attributes["id"].Value; // Animal nodes have an id attribute
XmlDocument outdoc = new XmlDocument(); // Create the outdoc xml document
XmlNode targetNode = outdoc.CreateElement("Animal"); // Create a separate node to hold the Animal element
targetNode = outdoc.ImportNode(n, true); // Bring over that Animal
targetNode.Attributes.RemoveAll(); // Remove the id attribute in <Animal id="1001">
outdoc.ImportNode(targetNode, true); // place the node n into outdoc
outdoc.AppendChild(targetNode); // AppendChild to make it stick
strFileName = "Animal_" + strSeq + ".xml";
outdoc.Save(Console.Out); Console.WriteLine();
outdoc.Save("D:\\Rick\\Computer\\XML\\" + strFileName);
Console.WriteLine();
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

XML parsing a subtree in C# - c#

Related

how to take a specific child node value in xml using C#

XML - where node has same name but different values

Extract part of a big XML

Get certain xml node and save the value

Split XML document apart creating multiple output files from repeating elements

Categories

Resources