Select specific nodes in XML with LINQ

Select specific nodes in XML with LINQ - c#

I'm writing a function that loads and XML document and converts it to a CSV. Since I need only some values from the XML file, the goal i'm trying to achieve is to select only the nodes I'm interested in.
Here's my code:
XDocument csvDocument = XDocument.Load(tempOutput);
StringBuilder csvBuilder = new StringBuilder(1000);
foreach (XElement node in csvDocument.Descendants("Sample"))
{
foreach (XElement innerNode in node.Elements())
{
csvBuilder.AppendFormat("{0},", innerNode.Value);
}
csvBuilder.Remove(csvBuilder.Length -1, 1);
csvBuilder.AppendLine();
}
csvOut = csvBuilder.ToString();
But, in this way I'm selectin ALL the child nodes inside the "Sample" node.
In the XML, "Sample" tree is:
<Sample Type="Object" Class ="Sample">
<ID>1</ID>
<Name>10096</Name>
<Type>2</Type>
<Rep>0</Rep>
<Selected>True</Selected>
<Position>1</Position>
<Pattern>0</Pattern>
</Sample>
Code works flawlessly, but I need only "ID" and "Selected" to be selected and their values written inside the CSV file.
Could anyone point me in the right direction, please?
Thanks.

Learn more about Linq-to-xml here. You're not really taking advantage of the 'linq-edness' of XObjects
var samples = csvDocument.Descendants("Sample")
.Select(el => new {
Id = el.Element("ID").Value,
Selected = el.Elemnt("Selected").Value
});
This creates for you an IEnumerable<T> where 'T' is an anonymous type with the properties Id and Selected.
You can parse (int.Parse or bool.Parse) the Id and Selected values for type safety. But since you are simply writing to a StringBuilder object you may not care ...just an FYI.
The StringBuilder object can then be written as follows:
foreach (var sample in samples) {
csvBuilder.AppendFormat(myFormattedString, sample.Id, sample.Selected);
}
The caveat to this is that your anonymous object and the for-each loop should be within the same scope. But there are ways around that if necessary.
As always, there is more than one way to skin a cat.
Update ...in ref. to comment:
foreach (XElement node in csvDocument.Descendants("Sample"))
{
foreach (XElement innerNode in node.Elements())
{
// this logic assumes different formatting for values
// otherwise, change if statement to || each comparison
if(innerNode.Name == "ID") {
// append/format stringBuilder
continue;
}
if(innerNode.Name == "Selected") {
// append/format stringBuilder
continue;
}
}
csvBuilder.Remove(csvBuilder.Length -1, 1);
csvBuilder.AppendLine();
}

Related

How to check XML nodes contained in different XML files for equality?

I have two XML files (file A and file B where file A is a subset of file B) which I read using the System.Xml.XmlDocument.LoadXml(fileName) method.
I am then selecting nodes within these files using the System.Xml.XmlNode.SelectNodes(nodeName) I need to compare that each selected xml node in file A is either equal or a subset of that same node in file B. Need to also check that the order of the subnodes contained within any node in file A is the same of the order of those same subnodes contained within that node in fileB.
For example,
fileA
<rootNodeA>
<elementA>
<subelementA>content</subElementA>
<subelementB>content</subElementB>
<subelementB>content</subElementC>
<subelementB>content</subElementD>
</elementA>
<elementB>
<subelementA>content</subElementA>
<subelementB>content</subElementB>
</elementB>
</rootNodeA>
fileB
<rootNodeB>
<elementA>
<subelementB>content</subElementB>
<subelementD>content</subElementD>
</elementA>
<elementB>
<subelementA>content</subElementA>
</elementB>
</rootNodeB>
As you see, fileB is a subset of fileA. I need to check that elementA node of file B is equal or a subset of that same elementA node in file A. This should be true for the subnodes (subElementA, etc.) as well and the content of the nodes/subnodes.
Also, if you see elementA in fileA, there are 4 subelements in the order A,B,C,D. For that same elementA in fileB, there are 2 subelements in the order A,D. This order i.e A comes before D is same as the order in file A, need to check this as well.
My idea is to compute Hashes of the nodes and then compare them but unsure of how or if this would satisfy the purpose.
EDIT: Code I have so far,
HashSet<XmlElement> hashA = new HashSet<XmlElement>();
HashSet<XmlElement> hashB = new HashSet<XmlElement>();
foreach (XmlElement node in nodeList)
{
hashA.Add(node);
}
foreach(XmlElement node in masterNodeList)
{
hashB.Add(node);
}
isSubset = new HashSet<XmlElement>(hashA).IsSubsetOf(hashB);
return isSubset;

this sounds like a simple recursive function.
didn't check if it actually work, but that should do it:
public static bool isSubset(XmlElement source, XmlElement target)
{
if (!target.HasChildNodes)
{
if (source.HasChildNodes) // surly not same.
return false;
return string.Equals(source.Value, target.Value); // equalize values.
}
var sourceChildren = source.ChildNodes.OfType<XmlElement>().ToArray(); // list all child tags in source (by order)
var currentSearchIndex = 0; // where are we searching from (where have we found our match)
foreach (var targetChild in target.ChildNodes.OfType<XmlElement>())
{
var findIndex = Array.FindIndex(sourceChildren, currentSearchIndex, el => el.Name == targetChild.Name);
if (findIndex == -1)
return false; // not found in source, therefore not a subset.
if (!isSubset(sourceChildren[findIndex], targetChild))
return false; // if the child is not a subset, then parent isn't too.
currentSearchIndex = findIndex; // increment our search index so we won't match nodes that already passed.
}
}

Removing invalid child nodes but keep its contents intact..?

I have some xml files that look like sample file
I want to remove invalid xref nodes from it but keep the contents of those nodes as it is.
The way to know whether a xref node is valid is to check its attribute rid's value exactly matches any of the attributes id of any node present in the entire file, so the output file of the above sample should be something like sample output file
The code I've written thus far is below
XDocument doc=XDocument.Load(#"D:\sample\sample.xml",LoadOptions.None);
var ids = from a in doc.Descendants()
where a.Attribute("id") !=null
select a.Attribute("id").Value;
var xrefs=from x in doc.Descendants("xref")
where x.Attribute("rid")!=null
select x.Attribute("rid").Value;
if (ids.Any() && xrefs.Any())
{
foreach(var xref in xrefs)
{
if (!ids.Contains(xref))
{
string content= File.ReadAllText(#"D:\sample\sample.xml");
string result=Regex.Replace(content,"<xref ref-type=\"[^\"]+\" rid=\""+xref+"\">(.*?)</xref>","$1");
File.WriteAllText(#"D:\sample\sample.xml",result);
}
}
Console.WriteLine("complete");
}
else
{
Console.WriteLine("No value found");
}
Console.ReadLine();
The problem is when the values of xref contain characters like ., *, (etc. which on a regex replace needs to be escaped properly or the replace can mess up the file.
Does anyone have a better solution to the problem?

You don't need regex to do this. Instead use element.ReplaceWith(element.Nodes()) to replace node with its children. Sample code:
XDocument doc = XDocument.Load(#"D:\sample\sample.xml", LoadOptions.None);
// use HashSet, since you only use it for lookups
var ids = new HashSet<string>(from a in doc.Descendants()
where a.Attribute("id") != null
select a.Attribute("id").Value);
// select both element itself (for update), and value of "rid"
var xrefs = from x in doc.Descendants("xref")
where x.Attribute("rid") != null
select new { element = x, rid = x.Attribute("rid").Value };
if (ids.Any()) {
var toUpdate = new List<XElement>();
foreach (var xref in xrefs) {
if (!ids.Contains(xref.rid)) {
toUpdate.Add(xref.element);
}
}
if (toUpdate.Count > 0) {
foreach (var xref in toUpdate) {
// replace with contents
xref.ReplaceWith(xref.Nodes());
}
doc.Save(#"D:\sample\sample.xml");
}
}

Cannot find item name C# XML

I'm having a problem with my XML document.
I want my program to find all values of the items in my XML file, but only if the handlingType is of a certain character bunch.
Code (C#) :
string path = "//files//handling.meta";
var doc = XDocument.Load(path);
var items = doc.Descendants("HandlingData").Elements("Item");
var query = from i in items
select new
{
HandlingName = (string)i.Element("handlingName"),
HandlingType = (string)i.Element("HandlingType"),
Mass = (decimal?)i.Element("fMass")
};
foreach (var HandlingType in items)
{
if (HandlingType.ToString() == "HANDLING_TYPE_FLYING")
{
MessageBox.Show(HandlingType.ToString());
}
}
The above code demonstraights a short version of what I want to happen, but fails to find this handlingType (does not show the messageBox)
Here's the XML :
<CHandlingDataMgr>
<HandlingData>
<Item type="CHandlingData">
<handlingName>Plane</handlingName>
<fMass value="380000.000000"/>
<handlingType>HANDLING_TYPE_FLYING</handlingType>
</Item>
<Item type="CHandlingData">
<handlingName>Car1</handlingName>
<fMass value="150000.000000"/>
<handlingType>HANDLING_TYPE_DRIVING</handlingType>
</Item>
</HandlingData>
</CHandlingDataMgr>
I would like the output to show the handlingName if it contains a certain HandlingType
For e.g.
if (handlingType == "HANDLING_TYPE_FLYING")
{
messageBox.Show(this.HandlingName);
}
My problem in short : Program does not find item's handling type, it does find the tag but when asked to display, returns empty/shows as nothing.
Edit: Also in the XML handling_type_flying contains extra elements such as thrust that cannot be found in each item (such as car), I would like the program to also find these elements. (this is a second problem I'm facing, maybe should ask 2nd ques?)

Several things that need fixing.
you are not using your query in your foreach loop. foreach (var item in query)
Your element has an upercase "H" but should be lowercase "handlingType". HandlingType = (string)i.Element("handlingType"),
You are not pulling the Attribute value of your fMass element.Mass = i.Element("fMass").Attribute("value").Value
Once you adjust your Query in your foreach loop you then need to adjust the loop to account for looping over your newly made object.
NOTE that I removed (decimal) from Mass = i.Element("fMass").Attribute("value").Value
here is the code with all the fixes.
class Program
{
static void Main()
{
const string path = "//files//handling.meta";
var doc = XDocument.Load(path);
var items = doc.Descendants("HandlingData").Elements("Item");
var query = from i in items
select new
{
HandlingName = (string)i.Element("handlingName"),
HandlingType = (string)i.Element("handlingType"),
Mass = i.Element("fMass").Attribute("value").Value
};
foreach (var item in query)
{
if (item.HandlingType == "HANDLING_TYPE_FLYING")
{
//Remove messagebox if consoleapp
MessageBox.Show(item.HandlingType);
MessageBox.Show(item.HandlingName);
Console.WriteLine(item.HandlingType);
Console.WriteLine(item.HandlingName);
}
}
}
}
I would recommend looking into serializing your xml to an object.

If you look at http://msdn.microsoft.com/en-us/library/system.xml.linq.xelement(v=vs.110).aspx the ToString() method doesn't return the name of the tag, but the indented XML.
You should instead be using the Value property. Also you should use .equals("...") instead of ==
if (handlingType.Value.equals("HANDLING_TYPE_FLYING")
{
messageBox.Show(this.handlingname);
}

Create XML based on text tree

I need to go from a list like this:
/home
/home/room1
/home/room1/subroom
/home/room2
/home/room2/miniroom
/home/room2/bigroom
/home/room2/hugeroom
/home/room3
to an xml file. I've tried using LINQ to XML to do this but I just end up getting confused and not sure what to do from there. Any help is much appreciated!
Edit:
I want the XML file to look something like this:
<home>
<room1>
<subroom>This is a subroom</subroom>
</room1>
<room2>
<miniroom>This is a miniroom</miniroom>
<bigroom>This is a bigroom</bigroom>
<hugeroom>This is a hugeroom</hugeroom>
</room2>
<room3></room3>
</home>
The text inside if the tags ("this is a subroom", etc) is optional, but would be really nice to have!

Ok buddy, here's a solution.
Couple of notes and explanation.
Your text structure can be split up into lines and then again by the slashes into the names of the XML nodes. If you think of the text in this way, you get a list of "lines" broken into a list of
names.
/home
First of all, the first line /home is the root of the XML; we can get rid of it and just create and XDocument object with that name as the root element;
var xDoc = new XDocument("home");
Of course we don't want to hard code things but this is just an example. Now, on to the real work:
/home/room1/
/home/room1/bigroom
etc...
as a List<T> then it will look like this
myList = new List<List<string>>();
... [ add the items ]
myList[0][0] = home
myList[0][1] = room1
myList[1][0] = home
myList[1][1] = room1
myList[1][2] = bigroom
So what we can do to get the above structure is use string.Split() multiple times to break your text first into lines, then into parts of each line, and end up with a multidimensional array-style List<T> that contains List<T> objects, in this case, List<List<string>>.
First let's create the container object:
var possibleNodes = new List<List<string>>();
Next, we should split the lines. Let's call the variable that holds the text, "text".
var splitLines = text
.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.ToList();
This gives us a List but our lines are still not broken up. Let's split them again by the slash (/) character. This is where we build our node names. We can do this in a ForEach and just add to our list of possible nodes:
splitLines.ForEach(l =>
possibleNodes.Add(l
.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries)
.ToList()
)
);
Now, we need to know the DEPTH of the XML. Your text shows that there will be 3 nodes of depth. The node depth is the maximum depth of any one given line of nodes, now stored in the List<List<string>>; we can use the .Max() method to get this:
var nodeDepth = possibleNodes.Max(n => n.Count);
A final setup step: We don't need the first line, because it's just "home" and it will be our root node. We can just create an XDocument object and give it this first line to use as the name of Root:
// Create the root node
XDocument xDoc = new XDocument(new XElement(possibleNodes[0][0]));
// We don't need it anymore
possibleNodes.RemoveAt(0);
Ok, here is where the real work happens, let me explain the rules:
We need to loop through the outer list, and through each inner list.
We can use the list indexes to understand which node to add to or which names to ignore
We need to keep hierarchy proper and not duplicate nodes, and some XLinq helps here
The loops - see the comments for a detailed explanation:
// This gets us looping through the outer nodes
for (var i = 0; i < possibleNodes.Count; i++)
{
// Here we go "sideways" by going through each inner list (each broken down line of the text)
for (var ii = 1; ii < nodeDepth; ii++)
{
// Some lines have more depth than others, so we have to check this here since we are looping on the maximum
if (ii < possibleNodes[i].Count)
{
// Let's see if this node already exists
var existingNode = xDoc.Root.Descendants().FirstOrDefault(d => d.Name.LocalName == (possibleNodes[i][ii]));
// Let's also see if a parent node was created in the previous loop iteration.
// This will tell us whether to add the current node at the root level, or under another node
var parentNode = xDoc.Root.Descendants().FirstOrDefault(d => d.Name.LocalName == (possibleNodes[i][ii - 1]));
// If the current node has already been added, we do nothing (this if statement is not entered into)
// Otherwise, existingNode will be null and that means we need to add the current node
if (null == existingNode)
{
// Now, use parentNode to decide where to add the current node
if (null == parentNode)
{
// The parent node does not exist; therefore, the current node will be added to the root node.
xDoc.Root.Add(new XElement(possibleNodes[i][ii]));
}
else
{
// There IS a parent node for this node!
// Therefore, we must add the current node to the parent node
// (remember, parent node is the previous iteration of the inner for loop on nodeDepth )
var newNode = new XElement(possibleNodes[i][ii]);
parentNode.Add(newNode);
// Add "this is a" text (bonus!) -- only adding this text if the current node is the last one in the list.
if (possibleNodes[i].Count -1 == ii)
{
newNode.Add(new XText("This is a " + newNode.Name.LocalName));
}
}
}
}
}
}
The bonus here is this code will work with any number of nodes and build your XML.
To check it, XDocument has a nifty .ToString() overriden implementation that just spits out all of the XML it is holding, so all you do is this:
Console.Write(xDoc.ToString());
And, you'll get this result:
(Note I added a test node to make sure it works with more than 3 levels)
Below, you will find the entire program with your test text, etc, as a working solution:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
namespace XmlFromTextString
{
class Program
{
static void Main(string[] args)
{
// This simulates text from a file; note that it must be flush to the left of the screen or else the extra spaces
// add unneeded nodes to the lists that are generated; for simplicity of code, I chose not to implement clean-up of that and just
// ensure that the string literal is not indented from the left of the Visual Studio screen.
string text =
#"/home
/home/room1
/home/room1/subroom
/home/room2
/home/room2/miniroom
/home/room2/test/thetest
/home/room2/bigroom
/home/room2/hugeroom
/home/room3";
var possibleNodes = new List<List<string>>();
var splitLines = text
.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.ToList();
splitLines.ForEach(l =>
possibleNodes.Add(l
.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries)
.ToList()
)
);
var nodeDepth = possibleNodes.Max(n => n.Count);
// Create the root node
XDocument xDoc = new XDocument(new XElement(possibleNodes[0][0]));
// We don't need it anymore
possibleNodes.RemoveAt(0);
// This gets us looping through the outer nodes
for (var i = 0; i < possibleNodes.Count; i++)
{
// Here we go "sideways" by going through each inner list (each broken down line of the text)
for (var ii = 1; ii < nodeDepth; ii++)
{
// Some lines have more depth than others, so we have to check this here since we are looping on the maximum
if (ii < possibleNodes[i].Count)
{
// Let's see if this node already exists
var existingNode = xDoc.Root.Descendants().FirstOrDefault(d => d.Name.LocalName == (possibleNodes[i][ii]));
// Let's also see if a parent node was created in the previous loop iteration.
// This will tell us whether to add the current node at the root level, or under another node
var parentNode = xDoc.Root.Descendants().FirstOrDefault(d => d.Name.LocalName == (possibleNodes[i][ii - 1]));
// If the current node has already been added, we do nothing (this if statement is not entered into)
// Otherwise, existingNode will be null and that means we need to add the current node
if (null == existingNode)
{
// Now, use parentNode to decide where to add the current node
if (null == parentNode)
{
// The parent node does not exist; therefore, the current node will be added to the root node.
xDoc.Root.Add(new XElement(possibleNodes[i][ii]));
}
else
{
// There IS a parent node for this node!
// Therefore, we must add the current node to the parent node
// (remember, parent node is the previous iteration of the inner for loop on nodeDepth )
var newNode = new XElement(possibleNodes[i][ii]);
parentNode.Add(newNode);
// Add "this is a" text (bonus!) -- only adding this text if the current node is the last one in the list.
if (possibleNodes[i].Count -1 == ii)
{
newNode.Add(new XText("This is a " + newNode.Name.LocalName));
// For the same default text on all child-less nodes, us this:
// newNode.Add(new XText("This is default text"));
}
}
}
}
}
}
Console.Write(xDoc.ToString());
Console.ReadKey();
}
}
}

Time for LINQ magic?
// load file into string[]
var input = File.ReadAllLines("TextFile1.txt");
// in case you have more than one home in your file
var homes =
new XDocument(
new XElement("root",
from line in input
let items = line.Split(new[] { "/" }, StringSplitOptions.RemoveEmptyEntries)
group items by items[0] into g
select new XElement(g.Key,
from rooms in g.OrderBy(x => x.Length).Skip(1)
group rooms by rooms[1] into g2
select new XElement(g2.Key,
from name in g2.OrderBy(x => x.Length).Skip(1)
select new XElement(name[2], string.Format("This is a {0}", name[2]))))));
// get the right home
var home = new XDocument(homes.Root.Element("home"));

C# Treeview checking if node exists

I'm trying to populate a treeview from an XmlDocument.
The Root of the tree is set as 'Scripts' and from the root the next level should be 'Departments' which is within the XML script. I can get data from the XML document no problem. My question is when looping through the XmlDocument and adding nodes to the root node, I want to ensure that if a department is already within the treeview then it is not added again. I should also add that each Department also has a list of scripts that need to be child nodes of the department.
My code so far is:
XmlDocument xDoc = new XmlDocument();
xDoc.LoadXml(scriptInformation);
TreeNode t1;
TreeNode rootNode = new TreeNode("Script View");
treeView1.Nodes.Add(rootNode);
foreach (XmlNode node in xDoc.SelectNodes("//row"))
{
t1 = new TreeNode(node["DEPARTMENT"].InnerXml);
//How to check if node already exists in treeview?
}
Thanks.

if(treeView1.Nodes.ContainsKey("DEPARTMENT")){
//...
}
EDIT: Recursive method:
bool exists = false;
foreach (TreeNode node in treeView1.Nodes) {
if (NodeExists(node, "DEPARTMENT"))
exists = true;
}
private bool NodeExists(TreeNode node, string key) {
foreach (TreeNode subNode in node.Nodes) {
if (subNode.Text == key) {
return true;
}
if (node.Nodes.Count > 0) {
NodeExists(node, key);
}
}
return false;
}

Depending upon the size of your XML file, you could consider using an associated List for fast lookup. As you add each node to the TreeView also add it to the List.

If your XML document has a set structure where 'Departments' will always be indexed at 1;
ie:
index:[0] Scripts
index:[1] Department
index:[2] Script
index:[1] Department2
index:[2] Script
Then you could encapsulate the following code into a method where 'name' is a string parameter and the return type is boolean.
foreach (TreeNode node in uxTreeView.Nodes[0].Nodes) {
if (node.Name.ToLower() == name.ToLower()) {
return true;
}
}
return false;
The idea is you would call that function each time you encounter a 'Department' node in your Xml, before creating the TreeNode.
Full example:
private bool DepartmentNodeExists(string name) {
foreach (TreeNode node in uxTreeView.Nodes[0].Nodes) {
if (node.Name.ToLower() == name.ToLower()) {
return true;
}
}
return false;
}
Lastly, the easy way:
private bool DepartmentNodeExists(string name) {
if (uxTreeView.Nodes[0].ContainsKey(name)) {
return true;
}
else {
return false;
}
}
These are all just refactored and encapsulated into their own named methods, you of course could just call:
if (uxTreeView.Nodes[0].ContainsKey(name)) {
// do not create TreeNode
}
...during your parsing of your XML. PS. These examples all assume that you have the first root node in the TreeView already created and added to the TreeView.

http://www.vbdotnetforums.com/listviews-treeviews/13278-treeview-search.html#post39625
http://forums.asp.net/t/1645725.aspx/1?Check+if+child+Node+exists+on+treeview

You can do something like this:
TreeNode parentNode = t1.Parent;
if (parentNode != null}
{
if(parentNode.Nodes.Cast<TreeNode>().ToList().Find(t => t.Text.Equals(node["DEPARTMENT"].InnerXml) == null)
{
//Add node
}
}
else
{
bool isFound = true;
if (treeView1.Nodes.Cast<TreeNode>().ToList().Find(t => t.Text.Equals(node["DEPARTMENT"].InnerXml) == null)
{
isFound = false;
}
if(!isFound)
{
//Add node
}
}

Not sure about the document structure...
Couldn't you use Linq to Xml, load the document and get the distinct row ( row = department?) and consider only those elements to create a TreeNode? It is more efficient than trying to find if a node with such a text has already been added.
ex:
var rows = ( from row in XDocument.Load(document).Root.Elements("row")
select row
).Distinct(new SampleElementComparerOnNameAttribute());
Here the EqualityComparer is made on the "name" attribute value assuming the doc structure to be
<rows><row name='dep1'><script>script1</script><script>script2</script></row><row name='dep1'><script>script3</script><script>script4</script></row></rows>

I use,
string department = node["DEPARTMENT"].InnerXml;
TreeNode node = parentNode.Nodes[department] ?? parentNode.Nodes.Add(department, department);
That line guarantees that a lookup of the value department will be done first, if not found it creates it. You have to do the double entry in Add() so it will have a key value you can do the lookup with the .Nodes[department].

It depends on the structure of your input. Since you don't show how exactly you add your subnodes I can only point you towards either the Contains or the ContainsKey method of the Nodes property, either of the treeView1 itself, or of any subnodes you add. You should use an overload of the Add method to specify a key name to simplify lookup.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Select specific nodes in XML with LINQ - c#

Related

How to check XML nodes contained in different XML files for equality?

Removing invalid child nodes but keep its contents intact..?

Cannot find item name C# XML

Create XML based on text tree

C# Treeview checking if node exists

Categories

Resources