Could someone supply some code that would get the xpath of a System.Xml.XmlNode instance?
Thanks!
Okay, I couldn't resist having a go at it. It'll only work for attributes and elements, but hey... what can you expect in 15 minutes :) Likewise there may very well be a cleaner way of doing it.
It is superfluous to include the index on every element (particularly the root one!) but it's easier than trying to work out whether there's any ambiguity otherwise.
using System;
using System.Text;
using System.Xml;
class Test
{
static void Main()
{
string xml = #"
<root>
<foo />
<foo>
<bar attr='value'/>
<bar other='va' />
</foo>
<foo><bar /></foo>
</root>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
XmlNode node = doc.SelectSingleNode("//#attr");
Console.WriteLine(FindXPath(node));
Console.WriteLine(doc.SelectSingleNode(FindXPath(node)) == node);
}
static string FindXPath(XmlNode node)
{
StringBuilder builder = new StringBuilder();
while (node != null)
{
switch (node.NodeType)
{
case XmlNodeType.Attribute:
builder.Insert(0, "/#" + node.Name);
node = ((XmlAttribute) node).OwnerElement;
break;
case XmlNodeType.Element:
int index = FindElementIndex((XmlElement) node);
builder.Insert(0, "/" + node.Name + "[" + index + "]");
node = node.ParentNode;
break;
case XmlNodeType.Document:
return builder.ToString();
default:
throw new ArgumentException("Only elements and attributes are supported");
}
}
throw new ArgumentException("Node was not in a document");
}
static int FindElementIndex(XmlElement element)
{
XmlNode parentNode = element.ParentNode;
if (parentNode is XmlDocument)
{
return 1;
}
XmlElement parent = (XmlElement) parentNode;
int index = 1;
foreach (XmlNode candidate in parent.ChildNodes)
{
if (candidate is XmlElement && candidate.Name == element.Name)
{
if (candidate == element)
{
return index;
}
index++;
}
}
throw new ArgumentException("Couldn't find element within parent");
}
}
Jon's correct that there are any number of XPath expressions that will yield the same node in an an instance document. The simplest way to build an expression that unambiguously yields a specific node is a chain of node tests that use the node position in the predicate, e.g.:
/node()[0]/node()[2]/node()[6]/node()[1]/node()[2]
Obviously, this expression isn't using element names, but then if all you're trying to do is locate a node within a document, you don't need its name. It also can't be used to find attributes (because attributes aren't nodes and don't have position; you can only find them by name), but it will find all other node types.
To build this expression, you need to write a method that returns a node's position in its parent's child nodes, because XmlNode doesn't expose that as a property:
static int GetNodePosition(XmlNode child)
{
for (int i=0; i<child.ParentNode.ChildNodes.Count; i++)
{
if (child.ParentNode.ChildNodes[i] == child)
{
// tricksy XPath, not starting its positions at 0 like a normal language
return i + 1;
}
}
throw new InvalidOperationException("Child node somehow not found in its parent's ChildNodes property.");
}
(There's probably a more elegant way to do that using LINQ, since XmlNodeList implements IEnumerable, but I'm going with what I know here.)
Then you can write a recursive method like this:
static string GetXPathToNode(XmlNode node)
{
if (node.NodeType == XmlNodeType.Attribute)
{
// attributes have an OwnerElement, not a ParentNode; also they have
// to be matched by name, not found by position
return String.Format(
"{0}/#{1}",
GetXPathToNode(((XmlAttribute)node).OwnerElement),
node.Name
);
}
if (node.ParentNode == null)
{
// the only node with no parent is the root node, which has no path
return "";
}
// the path to a node is the path to its parent, plus "/node()[n]", where
// n is its position among its siblings.
return String.Format(
"{0}/node()[{1}]",
GetXPathToNode(node.ParentNode),
GetNodePosition(node)
);
}
As you can see, I hacked in a way for it to find attributes as well.
Jon slipped in with his version while I was writing mine. There's something about his code that's going to make me rant a bit now, and I apologize in advance if it sounds like I'm ragging on Jon. (I'm not. I'm pretty sure that the list of things Jon has to learn from me is exceedingly short.) But I think the point I'm going to make is a pretty important one for anyone who works with XML to think about.
I suspect that Jon's solution emerged from something I see a lot of developers do: thinking of XML documents as trees of elements and attributes. I think this largely comes from developers whose primary use of XML is as a serialization format, because all the XML they're used to using is structured this way. You can spot these developers because they're using the terms "node" and "element" interchangeably. This leads them to come up with solutions that treat all other node types as special cases. (I was one of these guys myself for a very long time.)
This feels like it's a simplifying assumption while you're making it. But it's not. It makes problems harder and code more complex. It leads you to bypass the pieces of XML technology (like the node() function in XPath) that are specifically designed to treat all node types generically.
There's a red flag in Jon's code that would make me query it in a code review even if I didn't know what the requirements are, and that's GetElementsByTagName. Whenever I see that method in use, the question that leaps to mind is always "why does it have to be an element?" And the answer is very often "oh, does this code need to handle text nodes too?"
I know, old post but the version I liked the most (the one with names) was flawed:
When a parent node has nodes with different names, it stopped counting the index after it found the first non-matching node-name.
Here is my fixed version of it:
/// <summary>
/// Gets the X-Path to a given Node
/// </summary>
/// <param name="node">The Node to get the X-Path from</param>
/// <returns>The X-Path of the Node</returns>
public string GetXPathToNode(XmlNode node)
{
if (node.NodeType == XmlNodeType.Attribute)
{
// attributes have an OwnerElement, not a ParentNode; also they have
// to be matched by name, not found by position
return String.Format("{0}/#{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name);
}
if (node.ParentNode == null)
{
// the only node with no parent is the root node, which has no path
return "";
}
// Get the Index
int indexInParent = 1;
XmlNode siblingNode = node.PreviousSibling;
// Loop thru all Siblings
while (siblingNode != null)
{
// Increase the Index if the Sibling has the same Name
if (siblingNode.Name == node.Name)
{
indexInParent++;
}
siblingNode = siblingNode.PreviousSibling;
}
// the path to a node is the path to its parent, plus "/node()[n]", where n is its position among its siblings.
return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, indexInParent);
}
Here's a simple method that I've used, worked for me.
static string GetXpath(XmlNode node)
{
if (node.Name == "#document")
return String.Empty;
return GetXpath(node.SelectSingleNode("..")) + "/" + (node.NodeType == XmlNodeType.Attribute ? "#":String.Empty) + node.Name;
}
My 10p worth is a hybrid of Robert and Corey's answers. I can only claim credit for the actual typing of the extra lines of code.
private static string GetXPathToNode(XmlNode node)
{
if (node.NodeType == XmlNodeType.Attribute)
{
// attributes have an OwnerElement, not a ParentNode; also they have
// to be matched by name, not found by position
return String.Format(
"{0}/#{1}",
GetXPathToNode(((XmlAttribute)node).OwnerElement),
node.Name
);
}
if (node.ParentNode == null)
{
// the only node with no parent is the root node, which has no path
return "";
}
//get the index
int iIndex = 1;
XmlNode xnIndex = node;
while (xnIndex.PreviousSibling != null) { iIndex++; xnIndex = xnIndex.PreviousSibling; }
// the path to a node is the path to its parent, plus "/node()[n]", where
// n is its position among its siblings.
return String.Format(
"{0}/node()[{1}]",
GetXPathToNode(node.ParentNode),
iIndex
);
}
There's no such thing as "the" xpath of a node. For any given node there may well be many xpath expressions which will match it.
You can probably work up the tree to build up an expression which will match it, taking into account the index of particular elements etc, but it's not going to be terribly nice code.
Why do you need this? There may be a better solution.
If you do this, you will get a Path with Names of der Nodes AND the Position, if you have Nodes with the same name like this:
"/Service[1]/System[1]/Group[1]/Folder[2]/File[2]"
public string GetXPathToNode(XmlNode node)
{
if (node.NodeType == XmlNodeType.Attribute)
{
// attributes have an OwnerElement, not a ParentNode; also they have
// to be matched by name, not found by position
return String.Format("{0}/#{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name);
}
if (node.ParentNode == null)
{
// the only node with no parent is the root node, which has no path
return "";
}
//get the index
int iIndex = 1;
XmlNode xnIndex = node;
while (xnIndex.PreviousSibling != null && xnIndex.PreviousSibling.Name == xnIndex.Name)
{
iIndex++;
xnIndex = xnIndex.PreviousSibling;
}
// the path to a node is the path to its parent, plus "/node()[n]", where
// n is its position among its siblings.
return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, iIndex);
}
I found that none of the above worked with XDocument, so I wrote my own code to support XDocument and used recursion. I think this code handles multiple identical nodes better than some of the other code here because it first tries to go as deep in to the XML path as it can and then backs up to build only what is needed. So if you have /home/white/bob and /home/white/mike and you want to create /home/white/bob/garage the code will know how to create that. However, I didn't want to mess with predicates or wildcards, so I explicitly disallowed those; but it would be easy to add support for them.
Private Sub NodeItterate(XDoc As XElement, XPath As String)
'get the deepest path
Dim nodes As IEnumerable(Of XElement)
nodes = XDoc.XPathSelectElements(XPath)
'if it doesn't exist, try the next shallow path
If nodes.Count = 0 Then
NodeItterate(XDoc, XPath.Substring(0, XPath.LastIndexOf("/")))
'by this time all the required parent elements will have been constructed
Dim ParentPath As String = XPath.Substring(0, XPath.LastIndexOf("/"))
Dim ParentNode As XElement = XDoc.XPathSelectElement(ParentPath)
Dim NewElementName As String = XPath.Substring(XPath.LastIndexOf("/") + 1, XPath.Length - XPath.LastIndexOf("/") - 1)
ParentNode.Add(New XElement(NewElementName))
End If
'if we find there are more than 1 elements at the deepest path we have access to, we can't proceed
If nodes.Count > 1 Then
Throw New ArgumentOutOfRangeException("There are too many paths that match your expression.")
End If
'if there is just one element, we can proceed
If nodes.Count = 1 Then
'just proceed
End If
End Sub
Public Sub CreateXPath(ByVal XDoc As XElement, ByVal XPath As String)
If XPath.Contains("//") Or XPath.Contains("*") Or XPath.Contains(".") Then
Throw New ArgumentException("Can't create a path based on searches, wildcards, or relative paths.")
End If
If Regex.IsMatch(XPath, "\[\]()#='<>\|") Then
Throw New ArgumentException("Can't create a path based on predicates.")
End If
'we will process this recursively.
NodeItterate(XDoc, XPath)
End Sub
What about using class extension ? ;)
My version (building on others work) uses the syntaxe name[index]... with index omited is element has no "brothers".
The loop to get the element index is outside in an independant routine (also a class extension).
Just past the following in any utility class (or in the main Program class)
static public int GetRank( this XmlNode node )
{
// return 0 if unique, else return position 1...n in siblings with same name
try
{
if( node is XmlElement )
{
int rank = 1;
bool alone = true, found = false;
foreach( XmlNode n in node.ParentNode.ChildNodes )
if( n.Name == node.Name ) // sibling with same name
{
if( n.Equals(node) )
{
if( ! alone ) return rank; // no need to continue
found = true;
}
else
{
if( found ) return rank; // no need to continue
alone = false;
rank++;
}
}
}
}
catch{}
return 0;
}
static public string GetXPath( this XmlNode node )
{
try
{
if( node is XmlAttribute )
return String.Format( "{0}/#{1}", (node as XmlAttribute).OwnerElement.GetXPath(), node.Name );
if( node is XmlText || node is XmlCDataSection )
return node.ParentNode.GetXPath();
if( node.ParentNode == null ) // the only node with no parent is the root node, which has no path
return "";
int rank = node.GetRank();
if( rank == 0 ) return String.Format( "{0}/{1}", node.ParentNode.GetXPath(), node.Name );
else return String.Format( "{0}/{1}[{2}]", node.ParentNode.GetXPath(), node.Name, rank );
}
catch{}
return "";
}
I produced VBA for Excel to do this for a work project. It outputs tuples of an Xpath and the associated text from an elemen or attribute. The purpose was to allow business analysts to identify and map some xml. Appreciate that this is a C# forum, but thought this may be of interest.
Sub Parse2(oSh As Long, inode As IXMLDOMNode, Optional iXstring As String = "", Optional indexes)
Dim chnode As IXMLDOMNode
Dim attr As IXMLDOMAttribute
Dim oXString As String
Dim chld As Long
Dim idx As Variant
Dim addindex As Boolean
chld = 0
idx = 0
addindex = False
'determine the node type:
Select Case inode.NodeType
Case NODE_ELEMENT
If inode.ParentNode.NodeType = NODE_DOCUMENT Then 'This gets the root node name but ignores all the namespace attributes
oXString = iXstring & "//" & fp(inode.nodename)
Else
'Need to deal with indexing. Where an element has siblings with the same nodeName,it needs to be indexed using [index], e.g swapstreams or schedules
For Each chnode In inode.ParentNode.ChildNodes
If chnode.NodeType = NODE_ELEMENT And chnode.nodename = inode.nodename Then chld = chld + 1
Next chnode
If chld > 1 Then '//inode has siblings of the same nodeName, so needs to be indexed
'Lookup the index from the indexes array
idx = getIndex(inode.nodename, indexes)
addindex = True
Else
End If
'build the XString
oXString = iXstring & "/" & fp(inode.nodename)
If addindex Then oXString = oXString & "[" & idx & "]"
'If type is element then check for attributes
For Each attr In inode.Attributes
'If the element has attributes then extract the data pair XString + Element.Name, #Attribute.Name=Attribute.Value
Call oSheet(oSh, oXString & "/#" & attr.Name, attr.Value)
Next attr
End If
Case NODE_TEXT
'build the XString
oXString = iXstring
Call oSheet(oSh, oXString, inode.NodeValue)
Case NODE_ATTRIBUTE
'Do nothing
Case NODE_CDATA_SECTION
'Do nothing
Case NODE_COMMENT
'Do nothing
Case NODE_DOCUMENT
'Do nothing
Case NODE_DOCUMENT_FRAGMENT
'Do nothing
Case NODE_DOCUMENT_TYPE
'Do nothing
Case NODE_ENTITY
'Do nothing
Case NODE_ENTITY_REFERENCE
'Do nothing
Case NODE_INVALID
'do nothing
Case NODE_NOTATION
'do nothing
Case NODE_PROCESSING_INSTRUCTION
'do nothing
End Select
'Now call Parser2 on each of inode's children.
If inode.HasChildNodes Then
For Each chnode In inode.ChildNodes
Call Parse2(oSh, chnode, oXString, indexes)
Next chnode
Set chnode = Nothing
Else
End If
End Sub
Manages the counting of elements using:
Function getIndex(tag As Variant, indexes) As Variant
'Function to get the latest index for an xml tag from the indexes array
'indexes array is passed from one parser function to the next up and down the tree
Dim i As Integer
Dim n As Integer
If IsArrayEmpty(indexes) Then
ReDim indexes(1, 0)
indexes(0, 0) = "Tag"
indexes(1, 0) = "Index"
Else
End If
For i = 0 To UBound(indexes, 2)
If indexes(0, i) = tag Then
'tag found, increment and return the index then exit
'also destroy all recorded tag names BELOW that level
indexes(1, i) = indexes(1, i) + 1
getIndex = indexes(1, i)
ReDim Preserve indexes(1, i) 'should keep all tags up to i but remove all below it
Exit Function
Else
End If
Next i
'tag not found so add the tag with index 1 at the end of the array
n = UBound(indexes, 2)
ReDim Preserve indexes(1, n + 1)
indexes(0, n + 1) = tag
indexes(1, n + 1) = 1
getIndex = 1
End Function
Another solution to your problem might be to 'mark' the xmlnodes which you will want to later identify with a custom attribute:
var id = _currentNode.OwnerDocument.CreateAttribute("some_id");
id.Value = Guid.NewGuid().ToString();
_currentNode.Attributes.Append(id);
which you can store in a Dictionary for example.
And you can later identify the node with an xpath query:
newOrOldDocument.SelectSingleNode(string.Format("//*[contains(#some_id,'{0}')]", id));
I know this is not a direct answer to your question, but it can help if the reason you wish to know the xpath of a node is to have a way of 'reaching' the node later after you have lost the reference to it in code.
This also overcomes problems when the document gets elements added/moved, which can mess up the xpath (or indexes, as suggested in other answers).
This is even easier
''' <summary>
''' Gets the full XPath of a single node.
''' </summary>
''' <param name="node"></param>
''' <returns></returns>
''' <remarks></remarks>
Private Function GetXPath(ByVal node As Xml.XmlNode) As String
Dim temp As String
Dim sibling As Xml.XmlNode
Dim previousSiblings As Integer = 1
'I dont want to know that it was a generic document
If node.Name = "#document" Then Return ""
'Prime it
sibling = node.PreviousSibling
'Perculate up getting the count of all of this node's sibling before it.
While sibling IsNot Nothing
'Only count if the sibling has the same name as this node
If sibling.Name = node.Name Then
previousSiblings += 1
End If
sibling = sibling.PreviousSibling
End While
'Mark this node's index, if it has one
' Also mark the index to 1 or the default if it does have a sibling just no previous.
temp = node.Name + IIf(previousSiblings > 0 OrElse node.NextSibling IsNot Nothing, "[" + previousSiblings.ToString() + "]", "").ToString()
If node.ParentNode IsNot Nothing Then
Return GetXPath(node.ParentNode) + "/" + temp
End If
Return temp
End Function
I had to do this recently. Only elements needed to be considered. This is what I came up with:
private string GetPath(XmlElement el)
{
List<string> pathList = new List<string>();
XmlNode node = el;
while (node is XmlElement)
{
pathList.Add(node.Name);
node = node.ParentNode;
}
pathList.Reverse();
string[] nodeNames = pathList.ToArray();
return String.Join("/", nodeNames);
}
public static string GetFullPath(this XmlNode node)
{
if (node.ParentNode == null)
{
return "";
}
else
{
return $"{GetFullPath(node.ParentNode)}\\{node.ParentNode.Name}";
}
}
Related
My logic goes as follows: I want to find the first element that misses a given attribute, add the attribute and then find the next element which misses the element, add it and so fourth.
I find the first element missing the amount attribute in the following way:
private XmlNode GetFirstElementWithoutAmount()
{
string productXPathQuery = "//XML/Products";
XmlNodeList productList = ParentXmlDocument.SelectNodes(productXPathQuery);
foreach (XmlNode element in productList)
{
string passengerXPathQuery = "//XML/Products[ID=" + element.FirstChild.InnerText + "]/Amount";
var amount = element.SelectSingleNode(passengerXPathQuery);
if (amount == null)
{
return element;
}
}
return null;
}
When I've found the first element missing the attribute, the amount is added in the following way:
private XmlNode GetOrCreateChildXMLNode(string NewNodeName, XmlNode ParentXMLNode)
{
if (ParentXMLNode == null)
{
return null;
}
XmlNode NewXMLNode = ParentXMLNode.SelectSingleNode("//" + NewNodeName);
if (NewXMLNode == null)
{
NewXMLNode = ParentXmlDocument.CreateNode(XmlNodeType.Element, NewNodeName, string.Empty);
ParentXMLNode.AppendChild(NewXMLNode);
}
return NewXMLNode;
}
The problems is, that it only adds to the first element, and then the first function always returns the second element, even though there's more elements to come? Any ideas why this is?
You are already inside //XML/Products during your foreach loop. Point directly to subnode.
string passengerXPathQuery = "./Amount";
I need to go from a list like this:
/home
/home/room1
/home/room1/subroom
/home/room2
/home/room2/miniroom
/home/room2/bigroom
/home/room2/hugeroom
/home/room3
to an xml file. I've tried using LINQ to XML to do this but I just end up getting confused and not sure what to do from there. Any help is much appreciated!
Edit:
I want the XML file to look something like this:
<home>
<room1>
<subroom>This is a subroom</subroom>
</room1>
<room2>
<miniroom>This is a miniroom</miniroom>
<bigroom>This is a bigroom</bigroom>
<hugeroom>This is a hugeroom</hugeroom>
</room2>
<room3></room3>
</home>
The text inside if the tags ("this is a subroom", etc) is optional, but would be really nice to have!
Ok buddy, here's a solution.
Couple of notes and explanation.
Your text structure can be split up into lines and then again by the slashes into the names of the XML nodes. If you think of the text in this way, you get a list of "lines" broken into a list of
names.
/home
First of all, the first line /home is the root of the XML; we can get rid of it and just create and XDocument object with that name as the root element;
var xDoc = new XDocument("home");
Of course we don't want to hard code things but this is just an example. Now, on to the real work:
/home/room1/
/home/room1/bigroom
etc...
as a List<T> then it will look like this
myList = new List<List<string>>();
... [ add the items ]
myList[0][0] = home
myList[0][1] = room1
myList[1][0] = home
myList[1][1] = room1
myList[1][2] = bigroom
So what we can do to get the above structure is use string.Split() multiple times to break your text first into lines, then into parts of each line, and end up with a multidimensional array-style List<T> that contains List<T> objects, in this case, List<List<string>>.
First let's create the container object:
var possibleNodes = new List<List<string>>();
Next, we should split the lines. Let's call the variable that holds the text, "text".
var splitLines = text
.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.ToList();
This gives us a List but our lines are still not broken up. Let's split them again by the slash (/) character. This is where we build our node names. We can do this in a ForEach and just add to our list of possible nodes:
splitLines.ForEach(l =>
possibleNodes.Add(l
.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries)
.ToList()
)
);
Now, we need to know the DEPTH of the XML. Your text shows that there will be 3 nodes of depth. The node depth is the maximum depth of any one given line of nodes, now stored in the List<List<string>>; we can use the .Max() method to get this:
var nodeDepth = possibleNodes.Max(n => n.Count);
A final setup step: We don't need the first line, because it's just "home" and it will be our root node. We can just create an XDocument object and give it this first line to use as the name of Root:
// Create the root node
XDocument xDoc = new XDocument(new XElement(possibleNodes[0][0]));
// We don't need it anymore
possibleNodes.RemoveAt(0);
Ok, here is where the real work happens, let me explain the rules:
We need to loop through the outer list, and through each inner list.
We can use the list indexes to understand which node to add to or which names to ignore
We need to keep hierarchy proper and not duplicate nodes, and some XLinq helps here
The loops - see the comments for a detailed explanation:
// This gets us looping through the outer nodes
for (var i = 0; i < possibleNodes.Count; i++)
{
// Here we go "sideways" by going through each inner list (each broken down line of the text)
for (var ii = 1; ii < nodeDepth; ii++)
{
// Some lines have more depth than others, so we have to check this here since we are looping on the maximum
if (ii < possibleNodes[i].Count)
{
// Let's see if this node already exists
var existingNode = xDoc.Root.Descendants().FirstOrDefault(d => d.Name.LocalName == (possibleNodes[i][ii]));
// Let's also see if a parent node was created in the previous loop iteration.
// This will tell us whether to add the current node at the root level, or under another node
var parentNode = xDoc.Root.Descendants().FirstOrDefault(d => d.Name.LocalName == (possibleNodes[i][ii - 1]));
// If the current node has already been added, we do nothing (this if statement is not entered into)
// Otherwise, existingNode will be null and that means we need to add the current node
if (null == existingNode)
{
// Now, use parentNode to decide where to add the current node
if (null == parentNode)
{
// The parent node does not exist; therefore, the current node will be added to the root node.
xDoc.Root.Add(new XElement(possibleNodes[i][ii]));
}
else
{
// There IS a parent node for this node!
// Therefore, we must add the current node to the parent node
// (remember, parent node is the previous iteration of the inner for loop on nodeDepth )
var newNode = new XElement(possibleNodes[i][ii]);
parentNode.Add(newNode);
// Add "this is a" text (bonus!) -- only adding this text if the current node is the last one in the list.
if (possibleNodes[i].Count -1 == ii)
{
newNode.Add(new XText("This is a " + newNode.Name.LocalName));
}
}
}
}
}
}
The bonus here is this code will work with any number of nodes and build your XML.
To check it, XDocument has a nifty .ToString() overriden implementation that just spits out all of the XML it is holding, so all you do is this:
Console.Write(xDoc.ToString());
And, you'll get this result:
(Note I added a test node to make sure it works with more than 3 levels)
Below, you will find the entire program with your test text, etc, as a working solution:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
namespace XmlFromTextString
{
class Program
{
static void Main(string[] args)
{
// This simulates text from a file; note that it must be flush to the left of the screen or else the extra spaces
// add unneeded nodes to the lists that are generated; for simplicity of code, I chose not to implement clean-up of that and just
// ensure that the string literal is not indented from the left of the Visual Studio screen.
string text =
#"/home
/home/room1
/home/room1/subroom
/home/room2
/home/room2/miniroom
/home/room2/test/thetest
/home/room2/bigroom
/home/room2/hugeroom
/home/room3";
var possibleNodes = new List<List<string>>();
var splitLines = text
.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.ToList();
splitLines.ForEach(l =>
possibleNodes.Add(l
.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries)
.ToList()
)
);
var nodeDepth = possibleNodes.Max(n => n.Count);
// Create the root node
XDocument xDoc = new XDocument(new XElement(possibleNodes[0][0]));
// We don't need it anymore
possibleNodes.RemoveAt(0);
// This gets us looping through the outer nodes
for (var i = 0; i < possibleNodes.Count; i++)
{
// Here we go "sideways" by going through each inner list (each broken down line of the text)
for (var ii = 1; ii < nodeDepth; ii++)
{
// Some lines have more depth than others, so we have to check this here since we are looping on the maximum
if (ii < possibleNodes[i].Count)
{
// Let's see if this node already exists
var existingNode = xDoc.Root.Descendants().FirstOrDefault(d => d.Name.LocalName == (possibleNodes[i][ii]));
// Let's also see if a parent node was created in the previous loop iteration.
// This will tell us whether to add the current node at the root level, or under another node
var parentNode = xDoc.Root.Descendants().FirstOrDefault(d => d.Name.LocalName == (possibleNodes[i][ii - 1]));
// If the current node has already been added, we do nothing (this if statement is not entered into)
// Otherwise, existingNode will be null and that means we need to add the current node
if (null == existingNode)
{
// Now, use parentNode to decide where to add the current node
if (null == parentNode)
{
// The parent node does not exist; therefore, the current node will be added to the root node.
xDoc.Root.Add(new XElement(possibleNodes[i][ii]));
}
else
{
// There IS a parent node for this node!
// Therefore, we must add the current node to the parent node
// (remember, parent node is the previous iteration of the inner for loop on nodeDepth )
var newNode = new XElement(possibleNodes[i][ii]);
parentNode.Add(newNode);
// Add "this is a" text (bonus!) -- only adding this text if the current node is the last one in the list.
if (possibleNodes[i].Count -1 == ii)
{
newNode.Add(new XText("This is a " + newNode.Name.LocalName));
// For the same default text on all child-less nodes, us this:
// newNode.Add(new XText("This is default text"));
}
}
}
}
}
}
Console.Write(xDoc.ToString());
Console.ReadKey();
}
}
}
Time for LINQ magic?
// load file into string[]
var input = File.ReadAllLines("TextFile1.txt");
// in case you have more than one home in your file
var homes =
new XDocument(
new XElement("root",
from line in input
let items = line.Split(new[] { "/" }, StringSplitOptions.RemoveEmptyEntries)
group items by items[0] into g
select new XElement(g.Key,
from rooms in g.OrderBy(x => x.Length).Skip(1)
group rooms by rooms[1] into g2
select new XElement(g2.Key,
from name in g2.OrderBy(x => x.Length).Skip(1)
select new XElement(name[2], string.Format("This is a {0}", name[2]))))));
// get the right home
var home = new XDocument(homes.Root.Element("home"));
UPDATED: I still have this problem, better explanation.
I have a list of XElements and I'm iterating through them to check if it match a regex pattern. If there's a match, I need to replace the value of the current element without affecting his child elements.
For example,
<root>{REGEX:#Here}<child>Element</child> more content</root
In that case, I need to replace {REGEX:#Here} which is under the root element but his not a child element! If Use:
string newValue = xElement.ToString();
if(ReplaceRegex(ref newValue))
xElement.ReplaceAll(newValue);
I'm losing the child elements and the tags get converted to & lt;child & gt;element in the value.
If I use:
xElement.SetValue(newValue);
The value of the xElement will be,
"{REGEX:Replaced} Element more content"
thus losing child elements as well.
What can I do to replace the value that will keep the child elements and work if the regex pattern is under the root element or child elements.
PS: I will add the regex function here for understanding purpose
private bool ReplaceRegex(ref string text)
{
bool match = false;
Regex linkRegex = new Regex(#"\{XPath:.*?\}", System.Text.RegularExpressions.RegexOptions.Multiline);
Match m = linkRegex.Match(text);
while (m.Success)
{
match = true;
string substring = m.Value;
string xpath = substring.Replace("{XPath:", string.Empty).Replace("}", string.Empty);
object temp = this.Container.Data.XPathEvaluate(xpath);
text = text.Replace(substring, Utility.XPathResultToString(temp));
m = m.NextMatch();
}
return match;
}
private void ReplaceRegex(XElement xElement)
{
if(xElement.HasElements)
{
foreach (XElement subElement in xElement.Elements())
this.ReplaceRegex(subElement);
}
foreach(var node in xElement.Nodes().OfType<XText>())
{
string value = node.Value;
if(this.ReplaceRegex(ref value))
node.Value = value;
}
}
EDIT :
Regarding your mixed-content comment, edited the code to take care of text nodes. See if it works.
I'm trying to generate CSS selectors for random elements on a webpage by means of C#. Some background:
I use a form with a WebBrowser control. While navigating one can ask for the CSS selector of the element under the cursor. Getting the html-element is trivial, of course, by means of:
WebBrowser.Document.GetElementFromPoint(<Point>);
The ambition is to create a 'strict' css selector leading up to the element under the cursor, a-la:
html > body > span:eq(2) > li:eq(5) > div > div:eq(3) > span > a
This selector is based on :eq operators since it's meant to be handled by jQuery and/or SizzleJS (these two support :eq - original CSS selectors don't. Thumbs up #BoltClock for helping me clarify this). So, you get the picture. In order to achieve this goal, we supply the retrieved HtmlElement to the below method and start ascending up the DOM tree by asking for the Parent of each element we come across:
private static List<String> GetStrictCssForHtmlElement(HtmlElement element)
{
List<String> familyTree;
for (familyTree = new List<String>(); element != null; element = element.Parent)
{
string ordinalString = CalculateOrdinalPositionAmongSameTagSimblings(element);
if (ordinalString == null) return null;
familyTree.Add(element.TagName.ToLower() + ordinalString);
}
familyTree.Reverse();
return familyTree;
}
private static string CalculateOrdinalPositionAmongSameTagSimblings(HtmlElement element, bool simplifyEq0 = true)
{
int count = 0;
int positionAmongSameTagSimblings = -1;
if (element.Parent != null)
{
foreach (HtmlElement child in element.Parent.Children)
{
if (element.TagName.ToLower() == child.TagName.ToLower())
{
count++;
if (element == child)
{
positionAmongSameTagSimblings = count - 1;
}
}
}
if (positionAmongSameTagSimblings == -1) return null; // Couldn't find child in parent's offsprings!?
}
return ((count > 1) ? (":eq(" + positionAmongSameTagSimblings + ")") : ((simplifyEq0) ? ("") : (":eq(0)")));
}
This method has worked reliably for a variety of pages. However, there's one particular page which makes my head in:
http://www.delicious.com/recent
Trying to retrieve the CSS selector of any element in the list (at the center of the page) fails for one very simple reason:
After the ascension hits the first SPAN element in it's way up (you can spot it by inspecting the page with IE9's web-dev tools for verification) it tries to process it by calculating it's ordinal position among it's same tag siblings. To do that we need to ask it's Parent node for the siblings. This is where things get weird. The SPAN element reports that it's Parent is a DIV element with id="recent-index". However that's not the immediate parent of the SPAN (the immediate parent is LI class="wrap isAdv"). This causes the method to fail because -unsurprisingly- it fails to spot SPAN among the children.
But it gets even weirder. I retrieved and isolated the HtmlElement of the SPAN itself. Then I got it's Parent and used it to re-descend back down to the SPAN element using:
HtmlElement regetSpanElement = spanElement.Parent.Children[0].Children[1].Children[1].Children[0].Children[2].Children[0];
This lead us back to the SPAN node we begun ... with one twist however:
regetSpanElement.Parent.TagName;
This now reports LI as the parent X-X. How can this be? Any insight?
Thank you again in advance.
Notes:
I saved the Html code (as it's presented inside WebBrowser.Document.Html) and inspected it myself to be 100% sure that nothing funny is taking place (aka different code served to WebBrowser control than the one I see in IE9 - but that's not happening the structure matches 100% for the path concerned).
I am running WebBrowser control in IE9-mode using the instructions outlined here:
http://www.west-wind.com/weblog/posts/2011/May/21/Web-Browser-Control-Specifying-the-IE-Version
Trying to get WebBrowser control and IE9 to run as similarly as possible.
I suspect that the effects observed might be due to some script running behind my back. However my knowledge is not so far reaching in terms of web-programming to pin it down.
Edit: Typos
Relying on :eq() is tough! It is difficult to reliably re-select out of a DOM that is dynamic. Sure it may work on very static pages, but things are only getting more dynamic every day. You might consider changing strategy a little bit. Try using a smarter more flexible selector. Perhaps pop in some javascript like so:
predictCss = function(s, noid, noclass, noarrow) {
var path, node = s;
var psep = noarrow ? ' ' : ' > ';
if (s.length != 1) return path; //throw 'Requires one element.';
while (node.length) {
var realNode = node[0];
var name = (realNode.localName || realNode.tagName || realNode.nodeName);
if (!name || name == '#document') break;
name = name.toLowerCase();
if(node.parent().children(name).length > 1){
if (realNode.id && !noid) {
try {
var idtest = $(name + '#' + realNode.id);
if (idtest.length == 1) return name + '#' + realNode.id + (path ? '>' + path : '');
} catch (ex) {} // just ignore the exception, it was a bad ID
} else if (realNode.className && !noclass) {
name += '.' + realNode.className.split(/\s+/).join('.');
}
}
var parent = node.parent();
if (name[name.length - 1] == '.') {
name = name.substring(0, name.length - 1);
}
siblings = parent.children(name);
//// If you really want to use eq:
//if (siblings.length > 1) name += ':eq(' + siblings.index(node) + ')';
path = name + (path ? psep + path : '');
node = parent;
}
return path
}
And use it to generate a variety of selectors:
var elem = $('#someelement');
var epath = self.model.util.predictCss(elem, true, true, false);
var epathclass = self.model.util.predictCss(elem, true, false, false);
var epathclassid = self.model.util.predictCss(elem, false, false, false);
Then use each:
var relem= $(epathclassid);
if(relem.length === 0){
relem = $(epathclass);
if(relem.length === 0){
relem = $(epath);
}
}
And if your best selector still comes out with more than one element, you'll have to get creative in how you match a dom element - perhaps levenshtein or perhaps there is some specific text, or you can fallback to eq. Hope that helps!
Btw, I assumed you have jQuery - due to the sizzle reference. You could inject the above in a self-executing anonymous function in a script tag appended to the last child of body for example.
I have this code:
public void AddNode(string Node)
{
try
{
treeView.Nodes.Add(Node);
treeView.Refresh();
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
Very simple as you see, this method gets file path. like C:\Windows\notepad.exe
Now i want the TreeView to show it like FileSystem..
-C:\
+Windows
And if i click the '+' it gets like this:
-C:\
-Windows
notepad.exe
Here is what i get now from sending theses pathes to the method above:
How can i do that it will arrange the nodes?
If I were you, I would split the input string onto substrings, using the string.Split method and then search for the right node to insert the relevant part of a node. I mean, that before adding a node, you should check whether node C:\ and its child node (Windows) exist.
Here is my code:
...
AddString(#"C:\Windows\Notepad.exe");
AddString(#"C:\Windows\TestFolder\test.exe");
AddString(#"C:\Program Files");
AddString(#"C:\Program Files\Microsoft");
AddString(#"C:\test.exe");
...
private void AddString(string name) {
string[] names = name.Split(new char[] { '\\' }, StringSplitOptions.RemoveEmptyEntries);
TreeNode node = null;
for(int i = 0; i < names.Length; i++) {
TreeNodeCollection nodes = node == null? treeView1.Nodes: node.Nodes;
node = FindNode(nodes, names[i]);
if(node == null)
node = nodes.Add(names[i]);
}
}
private TreeNode FindNode(TreeNodeCollection nodes, string p) {
for(int i = 0; i < nodes.Count; i++)
if(nodes[i].Text.ToLower(CultureInfo.CurrentCulture) == p.ToLower(CultureInfo.CurrentCulture))
return nodes[i];
return null;
}
If you are in windows forms (and I guess so), you can implement the IComparer class and use the TreeView.TreeViewNodeSorter property:
public class NodeSorter : IComparer
{
// Compare the length of the strings, or the strings
// themselves, if they are the same length.
public int Compare(object x, object y)
{
TreeNode tx = x as TreeNode;
TreeNode ty = y as TreeNode;
// Compare the length of the strings, returning the difference.
if (tx.Text.Length != ty.Text.Length)
return tx.Text.Length - ty.Text.Length;
// If they are the same length, call Compare.
return string.Compare(tx.Text, ty.Text);
}
}
Is the issue that the parents and children aren't being differentiated?
Each one of the nodes in the tree also has a Nodes property, which represents the collection of its children. Your AddNode routine needs to be changed so you can specify the parent node to whom you want to add a child node. Like:
TreeNode parent = //some node
parent.Nodes.Add(newChildNode);
If you want it to just populate the paths and figure out the parent-child relationships itself, you're going to have to write some code to parse the paths, and identify the parent node based on the path segments.
Try taking a look at this Filesystem TreeView. It should do exactly what you are looking for.