Extracting XElement children and grandchildren by name

Extracting XElement children and grandchildren by name - c#

I have an XElement (myParent) containing multiple levels of children that I wish to extract data from. The elements of interest are at known locations in the parent.
I understand that I am able to get a child element by:
myParent.Element(childName);
or
myParent.Element(level1).Element(childName);
I am having trouble figuring out how to do this if I want to loop through an array offor a list of elements that are at different levels, and looping through the list. For instance, I am interested in getting the following set of elements:
myParent.Element("FieldOutputs").Element("Capacity");
myParent.Element("EngOutputs").Element("Performance")
myParent.Element("EngOutputs").Element("Unit").Element("Efficiency")
How can I define these locations in an array so that I can simply loop through the array?
i.e.
string[] myStringArray = {"FieldOutputs.Capacity", "EngOutputs.Performance", "EngOutputs.Unit.Efficiency"};
for (int i=0; i< myArray.Count(); i++)
{
XElement myElement = myParent.Element(myStringArray);
}
I understand that the method above does not work, but just wanted to show effectively what I am trying to achieve.
Any feedback is appreciated.
Thank you,
Justin

While normally I'm reluctant to suggest using XPath, it's probably the most appropriate approach here, using XPathSelectElement:
string[] paths = { "FieldOutputs/Capacity", "EngOutputs/Performance",
"EngOutputs/Unit/Efficiency"};
foreach (string path in paths)
{
XElement element = parent.XPathSelectElement(path);
if (element != null)
{
// ...
}
}

The Descendants() method is what you're looking for, I believe. For example:
var descendants = myParent.Descendants();
foreach (var e in descendants) {
...
}
http://msdn.microsoft.com/en-us/library/system.xml.linq.xelement.descendants.aspx
Edit:
Looking at your question more closely, it looks like you may want to use XPathSelectElements()
var descendants = myParent.XPathSelectElements("./FieldOutputs/Capacity | ./EngOutputs/Performance | ./EngOutputs/Units/Efficency");
http://msdn.microsoft.com/en-us/library/bb351355.aspx

Related

A special C# Tree algorithm in Umbraco CMS

I'm creating a special tree algorithm and I need a bit of help with the code that I currently have, but before you take a look on it please let me explain what it really is meant to do.
I have a tree structure and I'm interacting with a node (any of the nodes in the tree(these nodes are Umbraco CMS classes)) so upon interaction I render the tree up to the top (to the root) and obtain these values in a global collection (List<Node> in this particular case). So far, it's ok, but then upon other interaction with another node I must check the list if it already contains the parents of the clicked node if it does contain every parent and it doesn't contain this node then the interaction is on the lowest level (I hope you are still with me?).
Unfortunately calling the Contains() function in Umbraco CMS doesn't check if the list already contains the values which makes the list add the same values all over again even through I added the Contains() function for the check.
Can anyone give me hand here if he has already met such a problem? I exchanged the Contains() function for the Except and Union functions, and they yield the same result - they do contain duplicates.
var currentValue = (string)CurrentPage.technologies;
List<Node> globalNodeList = new List<Node>();
string[] result = currentValue.Split(',');
foreach (var item in result)
{
var node = new Node(int.Parse(item));
if (globalNodeList.Count > 0)
{
List<Node> nodeParents = new List<Node>();
if (node.Parent != null)
{
while (node != null)
{
if (!nodeParents.Contains(node))
{
nodeParents.Add(node);
}
node = (Node)node.Parent;
}
}
else { globalNodeList.Add(node); }
if (nodeParents.Count > 0)
{
var differences = globalNodeList.Except<Node>(globalNodeList);
globalNodeList = globalNodeList.Union<Node>(differences).ToList<Node>();
}
}
else
{
if (node.Parent != null)
{
while (node != null)
{
globalNodeList.Add(node);
node = (Node)node.Parent;
}
}
else
{
globalNodeList.Add(node);
}
}
}
}

If I understand your question, you only want to see if a particular node is an ancestor of an other node. If so, just (string) check the Path property of the node. The path property is a comma separated string. No need to build the list yourself.
Just myNode.Path.Contains(",1001") will work.
Small remarks.
If you are using Umbraco 6, use the IPublishedContent instead of Node.
If you would build a list like you do, I would rather take you can provide the Umbraco helper with multiple Id's and let umbraco build the list (from cache).
For the second remark, you are able to do this:
var myList = Umbraco.Content(1001,1002,1003);
or with a array/list
var myList = Umbraco.Content(someNode.Path.Split(','));
and because you are crawling up to the root, you might need to add a .Reverse()
More information about the UmbracoHelper can be found in the documentation: http://our.umbraco.org/documentation/Reference/Querying/UmbracoHelper/
If you are using Umbraco 4 you can use #Library.NodesById(...)

Better way to use LINQ To XML for an HTML Page

I am looking for specific items on a web page.
What I did (to test, so far) is working just fine, but is really ugly to my eyes. I would like to get suggestions to do this in a more concise manner, that is ONE Linq query instead of 2 now....
document.GetXDocument();
string xmlns = "{http://www.w3.org/1999/xhtml}";
var AllElements = from AnyElement in document.fullPage.Descendants(xmlns + "div")
where AnyElement.Attribute("id") != null && AnyElement.Attribute("id").Value == "maincolumn"
select AnyElement;
// this first query bring only one LARGE Element.
XDocument subdocument = new XDocument(AllElements);
var myElements = from item in subdocument.Descendants(xmlns + "img")
where String.IsNullOrEmpty(item.Attribute("src").Value.Trim()) != true
select item;
foreach (var element in myElements)
{
Console.WriteLine(element.Attribute("src").Value.Trim());
}
Assert.IsNotNull(myElements.Count());
I know I could directly look for "img", but I want to be able to get other types of items in those pages, like links and some text.
I strongly doubt this is the best way!

The same logic in single query:
var myElements = from element in document.fullPage.Descendants(xmlns + "div")
where element.Attribute("id") != null
&& element.Attribute("id").Value == "maincolumn"
from item in new XDocument(element).Descendants(xmlns + "img")
where !String.IsNullOrEmpty(item.Attribute("src").Value.Trim())
select item;

If you insist on parsing the web page as XML, try this:
var elements =
from element in document.Descendants(xmlns + "div")
where (string)element.Attribute("id") == "maincolumn"
from element2 in element.Descendants(xmlns + "img")
let src = ((string)element2.Attribute("src")).Trim()
where String.IsNullOrEmpty(src)
select new {
element2,
src
};
foreach (var item in elements) {
Console.WriteLine(item.src);
}
Notes:
What is the type of document? I am assuming it's an XDocument. If that is the case, you can use Descendants directly on XDocument. (OTOTH if document is an XDocument, where does that fullPath property come from?)
Cast the XAttribute to a string. If it's empty, the result of the cast will be null. This will save on the double check. (This doesn't offer any performance benefits.)
Use let to "save" a value for later reuse, in this case for use in the foreach. Unless all you need is that final Assert, in which case it might be more efficient to use Any instead of Count. Any only has to iterate over the first result in order to return a value; Count has to iterate over all of them.
Why is subdocument of type XDocument? Wouldn't XElement be the appropriate type?
You can also use String.IsNullOrWhitespace to check for whitespace in src, instead of String.IsNullOrEmpty, assuming you want to process the src as is, with any whitespace it might have.

Removing default namespace attributes in XML with C# - can't pass object by ref and then iterate

I'm currently working on a buggy bit of code that's designed to strip out all the namespaces from an XML document and re-add them in the header. We use it because we ingest very large xml documents and then re-serve them in small fragments, so each item needs to replicate the namespaces in the parent document.
The XML is first loaded ias an XmlDocument and then passed to a function that removes the namespaces:
_fullXml = new XmlDocument();
_fullXml.LoadXml(itemXml);
RemoveNamespaceAttributes(_fullXml.DocumentElement);
The remove function iterates through the whole documents looking for namespaces and removing them. It looks like this:
private void RemoveNamespaceAttributes(XmlNode node){
if (node.Attributes != null)
{
for (int i = node.Attributes.Count - 1; i >= 0; i--)
{
if (node.Attributes[i].Name.Contains(':') || node.Attributes[i].Name == "xmlns")
node.Attributes.Remove(node.Attributes[i]);
}
}
foreach (XmlNode n in node.ChildNodes)
{
RemoveNamespaceAttributes(n);
}
}
However, I've discovered that it doesn't work - it leaves all the namespaces intact.
If you iterate through the code with the debugger then it looks to be doing what it's supposed to - the nodes objects have their namespace attributes removed. But the original _fullXml document remains untouched. I assume this is because the function is looking at a clone of the data passed to it, rather than the original data.
So my first thought was to pass it by ref. But I can't do that because the iterative part of the function inside the foreach loop has a compile error - you can't pass the object n by reference.
Second thought was to pass the whole _fullXml document but that doesn't work either, guessing because it's still a clone.
So it looks like I need to solve the problem of passing the document by ref and then iterating through the nodes to remove all namespaces. This will require re-designing this code fragment obviously, but I can't see a good way to do it. Can anyone help?
Cheers,
Matt

To strip namespaces it could be done like this:
void StripNamespaces(XElement input, XElement output)
{
foreach (XElement child in input.Elements())
{
XElement clone = new XElement(child.Name.LocalName);
output.Add(clone);
StripNamespaces(child, clone);
}
foreach (XAttribute attr in input.Attributes())
{
try
{
output.Add(new XAttribute(attr.Name.LocalName, attr.Value));
}
catch (Exception e)
{
// Decide how to handle duplicate attributes
//if(e.Message.StartsWith("Duplicate attribute"))
//output.Add(new XAttribute(attr.Name.LocalName, attr.Value));
}
}
}
You can call it like so:
XElement result = new XElement("root");
StripNamespaces(NamespaceXml, result);

I'm not 100% sure there aren't failure cases with this but it occurs to me that you can do
string x = Regex.Replace(xml, #"(xmlns:?|xsi:?)(.*?)=""(.*?)""", "");
on the raw xml to get rid of namespaces.
It's probably not the best way to solve this but I thought I'd put it out there.

linq looping through tags with the same name

I'm some what new to linq could uses some help..
I have an xml file that looks like this:
<InputPath>
<path isRename="Off" isRouter="Off" pattern="pattern-1">d:\temp1</path>
<path isRename="Off" isRouter="pattern-1">d:\temp2</path>
</InputPath>
I need to loop through and get the key values of the tag "path".
What I have so far is
var results = from c in rootElement.Descendants("InputPath") select c;
foreach (XElement _path in results)
{
string value = _path.Element("path").Value;
}
But I only get the last <path> value. Any help would be great.

Have you tried just just enumerating the path items?
foreach (var element in rootElement.Descendants("path"))
{
var value = element.Value;
}

You'll only get the first element that way, because that's what the Element method gives you: the first child element with the given name.
If you want multiple elements you can just use Elements instead:
// Note: the query expression here is pointless.
var results = from c in rootElement.Descendants("InputPath") select c;
foreach (XElement _path in results)
{
string value = _path.Elements("path").Value;
// Use value here...
}
Alternatively, use the Elements extension method and do it all in one go:
foreach (var path in rootElement.Descendants("InputPath").Elements("path"))
{
string value = path.Value;
// Use value here
}
If that doesn't help, please give more information about what you're trying to do and what the problem is.
If by "last" you mean "the element contents" that's because you're using the Value property. If you want the attributes within the path element, you need the Attribute method, as shown by IamStalker, although personally I'd usually cast the XAttribute to string (or whatever) rather than using the Value property, in case the attribute is missing. (It depends on what you want the behaviour to be in that case.)

What you need is, to loop through the attributes like so
foreach (XElement xElem in rootElement.Descendants("InputPath"))
{
string isRename = xElem.Attribute("isRename").Value;
}

TreeNodeCollection reference problem

First off we have the almighty code!
List nodes = new List();
TreeNode Temp = new TreeNode();
TreeNodeCollection nodeList = treeViewTab4DirectoryTree.Nodes;
while (nodeList.Count != 0)
{
Temp = nodeList[0];
while (Temp.FirstNode != null)
{
Temp = Temp.FirstNode;
}
if (!nodes.Contains(Temp.FullPath))
{
nodes.Add(Temp.Text);
}
nodeList.Remove(Temp);
}
Now the problem: I have written the code above with the intent of creating a List containing the text from all the nodes in the tree. That works perfectly. The problem I am having is when I remove the nodes from my variable they are also being removed from the actual list. The question would be how can I make a copy of the list of nodes so I can play with them without messing with the ACTUAL list. How do I make a copy of it without just making a reference to it? Any help will be greatly appreciated!

Your problem arises because "nodeList" is a reference to treeViewTab4DirectoryTree.Nodes, rather than a copy of it.
The solution depends entirely on the what type of TreeNodeCollection you're using (WinForms, ASP.net, something else?), as you'll need to look for a .Copy(), .Clone(), .ToArray() method or similar to take a copy of the contents of the collection, rather than a reference to the existing collection.
If, for example, you're using asp.net and thus the System.Web.UI.WebControls.TreeNodeCollection, you could use the .CopyTo method in a way similar to this:
TreeNode[] x = null;
treeViewTab4DirectoryTree.Nodes.CopyTo(x, 0);

Updated to show stack based approach:
List<String> result = new List<String>();
Stack<IEnumerator> nodeColls = new Stack<IEnumerator>();
IEnumerator nodes = treeViewTab4DirectoryTree.Nodes.GetEnumerator();
nodeColls.Push(null);
while (nodes != null)
{
while (nodes.MoveNext())
{
result.add(nodes.Current.FullPath);
if (nodes.Current.FirstNode != null)
{
nodeColls.Push(nodes);
nodes = nodes.Current.Nodes.GetEnumerator();
}
}
nodes = nodeColls.Pop();
}
The code below does not work as was mentioned in comments, because it doesn't traverse the entire tree, but only takes the first leaf node of each top-level branch.
I actually thought the original code (in the question) did so too, because I thought the Remove would actually remove the top-level node after finding the first leaf node under it; but instead, it tries to remove the leaf node from the collection of top-level nodes, and just ignores it if it can't find it.
Original post, non-functioning code
First of all, why do you need to remove the items from your list?
List<string> nodes = new List<string>();
foreach (TreeNode tn in treeViewTab4DirectoryTree.Nodes)
{
TreeNode temp = tn;
while (Temp.FirstNode != null)
{
Temp = Temp.FirstNode;
}
if (!nodes.Contains(Temp.FullPath))
{
nodes.Add(Temp.Text);
}
}
To answer your concrete question, assuming the Nodes collection implements IEnumerable, use:
List<TreeNode> nodeList = new List<TreeNode>(treeViewTab4DirectoryTree.Nodes);
If you do decide to stick with your while loop, you can save an instatiation by changing
TreeNode Temp = new TreeNode();
to
TreeNode Temp = null;
... you're never actually using the object you create, at least in the part of the code you've shown.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extracting XElement children and grandchildren by name - c#

Related

A special C# Tree algorithm in Umbraco CMS

Better way to use LINQ To XML for an HTML Page

Removing default namespace attributes in XML with C# - can't pass object by ref and then iterate

linq looping through tags with the same name

TreeNodeCollection reference problem

Categories

Resources