I want to modify all the text nodes using some functions in C#.
I want to insert another xml subtree created from some string.
For example, I want to change this
<root>
this is a test
</root>
to
<root>
this is <subtree>another</subtree> test
</root>
I have this piece of code, but it inserts text node, I want to create xml subtree and insert that instead of plain text node.
List<XText> textNodes = element.DescendantNodes().OfType<XText>().ToList();
foreach (XText textNode in textNodes)
{
String node = System.Text.RegularExpressions.Regex.Replace(textNode.Value, "a", "<subtree>another</subtree>");
textNode.ReplaceWith(new XText(node));
}
You can split the original XText node into several, and add an XElement in between. Then you replace the original node with the three new nodes.
List<XNode> newNodes = Regex.Split(textNode.Value, "a").Select(p => (XNode) new XText(p)).ToList();
newNodes.Insert(1, new XElement("subtree", "another")); // substitute this with something better
textNode.ReplaceWith(newNodes);
I guess CreateDocumentFragment is much easier, though not LINQ, but the idea to use LINQ is ease only.
Related
I have the following xml received from a web service
<GRID xmlns="http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result">
<DATA>
<R>
<D>2645</D>
<D>HJIT.HRE#RGW.COM</D>
<D>2019-09-27 10:17:36.0</D>
<D>114041</D>
<D>Awaiting Planning</D>
<D>Work Planned</D>
</R>
<R>
<D>2649</D>
<D>HJIT.HRE#RGW.COM</D>
<D>2019-09-27 10:33:24.0</D>
<D>114043</D>
<D>Awaiting Release</D>
<D>Awaiting Planning</D>
</R>
<R>
<D>2652</D>
<D>HJIT.HRE#RGW.COM</D>
<D>2019-09-27 10:36:53.0</D>
<D>114041</D>
<D>Awaiting Planning</D>
<D>Work Planned</D>
</R>
</DATA>
</GRID>
I wrote the following piece of .NET code to extract the R nodes
HttpWebResponse resp = (HttpWebResponse)Req.GetResponse();
XPathDocument xpResDoc = new XPathDocument(resp.GetResponseStream());
XPathNavigator xpNav = xpResDoc.CreateNavigator();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xpNav.NameTable);
nsmgr.AddNamespace("g2", "http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result");
XPathNodeIterator xpNIter = xpNav.Select("//g2:R", nsmgr); // I can successfully get the three R elements
foreach (XPathNavigator nav in xpNIter)
{
/*
Now I want to iterate through each R element and use XPATH to select each of the six D nodes by its index position.
The order of the D nodes are a known dataset and I want to build a comma separated string by concatenating the value of each D node,
which will later be appended to a CSV file along with a pre-defined header row.
*/
/* I attempted the following XPATH */
// XPathNodeIterator xpDi = nav.Select("(//D)[1]"); -- This does not work and yields a null result
}
Now I want to iterate through each R element and use XPATH to select each of the six D nodes by its index position. The order of the D nodes are a known dataset and I want to build a comma separated string by concatenating the value of each D node, which will later be appended to a CSV file along with a pre-defined header row.
I didn't want to use anything like LINQ to XML as this is part of read-only data extraction program which needs to be as lite and as performant as possible.
What is the correct way to get the D elements by index with XPATH using the XPathNavigator ?
You have a few problems here:
xpNav.Select("//g2:R", nsmgr) does not work for the XML shown in your question.
This expression selects for nodes with local name R in the http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result namespace -- however in your actual XML none of the nodes are in this namespace. There's a namespace declaration xmlns:dstm="http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result" but it's not the default namespace, so none of the nodes are actually in it, as they aren't using the dstm: prefix.
Instead, you should do xpNav.Select("//R", nsmgr) (or better yet xpNav.Select("/*/DATA/R", nsmgr)).
In your question you wrote I can successfully get the three R elements so maybe this is a typo in the question.
nav.Select("(//D)[1]"); -- This does not work and yields a null result.
I cannot reproduce this exact problem -- XPathNavigator.Select()never returns null. It will throw an exception on a malformed query, but not return null.
What I can reproduce is that this always returns the same result for every <R>, specifically the value of the first <D> element, <D>2645</D>. Demo fiddle #1 here.
The problem here is that the recursive descent operator //D selects for all nodes named R in the entire document. To select only the nodes in the current <R> element you need to restrict the scope by prefacing the XPath query with .: nav.Select("(.//D)[1]") (or better yet, nav.Select("(./D)[1]")).
Incidentally, since you expect 6 child <D> nodes of <R> it will be more performant to run one single XPath query and collect all 6 into a list, rather than running 6 queries for each specific node:
var nodes = nav.Select("./D").Cast<XPathNavigator>().ToList();
You indicated that performance is important, but you are using the recursive descent operator // which can have bad performance.
From Effective Xml Part 2: How to kill the performance of an app with XPath…:
// (descendant-or-self axis)
This is a very common pattern that very often leads to serious performance problems. The way it works is that it flattens the whole subtree (the most common usage I saw is flattening the whole xml document) and then it looks for the specified elements. Now in the .NET Framework there aren’t any specific optimizations for this patterns and using it is costly...
Instead, it's better to specify the path directly.
Pulling all of the above together, your code should look something like:
//xpNav and nsmgr set up as in the question
var csvLines = xpNav.Select("/*/DATA/R", nsmgr).Cast<XPathNavigator>()
.Select(nav => string.Join(",", nav.Select("./D").Cast<XPathNavigator>()))
.ToList();
Demo fiddle #2 here.
Notes:
If the XML in your question has been incorrectly edited and the nodes <R> and <D> are really in the dstm: namespace after all, add the g2: prefix to the node names in the XPath queries like so:
var csvLines = xpNav.Select("/*/g2:DATA/g2:R", nsmgr).Cast<XPathNavigator>()
.Select(nav => string.Join(",", nav.Select("./g2:D", nsmgr).Cast<XPathNavigator>()))
.ToList();
Demo fiddle #3 here.
As an aside, you might want to check your assumption that XPathDocument will be more performant than LINQ to XML. I am not sure this will be the case.
I was on the right path, just needed to use the right method which allows to specify the namespace as seen below:
HttpWebResponse resp = (HttpWebResponse)Req.GetResponse();
XPathDocument xpResDoc = new XPathDocument(resp.GetResponseStream());
XPathNavigator xpNav = xpResDoc.CreateNavigator();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xpNav.NameTable);
nsmgr.AddNamespace("g2", "http://schemas.datastream.net/MP_functions/MP0118_GetGridHeaderData_001_Result");
XPathNodeIterator xpNIter = xpNav.Select("//g2:R", nsmgr);
foreach (XPathNavigator nav in xpNIter)
{
string r =
$"{nav.SelectSingleNode("./g2:D[1]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[2]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[3]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[4]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[5]", nsmgr).Value}," +
$"{nav.SelectSingleNode("./g2:D[6]", nsmgr).Value}";
Console.WriteLine(r);
}
// Start writing to a file stream;
I have an XML string like below:
<root>
<Test1>
<Result time="2">ProperEnding</Result>
</Test1>
<Test2></Test2>
I have to operate on these elements. Most of the time the elements are unique within their parent element. I am using XDocument. I can remember that there is a way to access an element like this.
XNode resultTest1 = GetNodes("/root//Test1//result")
But I forgot it. It is possible to access the same using linq:
doc.root.Elements.etc.etc.
But I want it using a single string as shown above. Can anybody say how to make it?
Descendants() will skip any number level of intermediate nodes, e.g. this will skip over root and Test1:
doc.Decendants("Result")
Also note that you can use XPath with Linq2Xml as well, e.g. XPathSelectElements
doc.XPathSelectElements("/root/Test1/Result");
You can skip intermediate levels of the hierarchy with // (or use // at the start of the xpath string to skip the root)
"/root//Result"
One caveat - Xml is case sensitive , so Result and result are not the same element.
The string you're referring to ("/root//Test1//result") is an XPath expression.
You can use it with LINQ to XML classes (like XDocument) using XPathEvaluate, XPathSelectElement, and XPathSelectElements extension methods.
You can find more info about these methods on MSDN: http://msdn.microsoft.com/en-us/library/vstudio/system.xml.xpath.extensions_methods(v=vs.90).aspx
To make them work, you need using System.Xml.XPath at the top of your file and System.Xml.Linq.dll assembly referenced (which is probably already there).
You can try to load your xml using XDocument:
// loads xml file with root element
XDocument xml = XDocument.Load("filename.xml");
Now you can append LINQ statements to your xml variable like this:
var retrieveSomeSpecificDataLikeListOfElementsAsAnonymousObjects = xml.Descendants("parentNodeName").Select(node => new { SomeSpecialValueYouWant = node.Element("elementNameUnderParentNode").Value }).ToList();
You can mix and do whatever you want - above is just an example.
Is this what you looking?
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml("YourXML");
XmlNodeList xmlNodes = xmlDocument.SelectNodes("/root/Test1/result");
I have been struggling to resolve this problem I am having over the past couple of days. Say, I want to get all the text() from a HTML document, however I only want to know of and retrieve of the XPath of the node that contains text data. Example:
foreach (var textNode in node.SelectNodes(".//text()"))
//do stuff here
However, when it comes to retrieving the XPath of the textNode using textNode.XPath, I get the full XPath including the #text node:
/html[1]/body[1]/div[1]/a[1]/#text
Yet I only want the containing node of the text, for example:
/html[1]/body[1]/div[1]/a[1]
Could anyone point me toward a better XPath solution to retrieve all nodes that contains text but only retrieve the XPath up until the containing node?
Instead of:
.//text()
use:
.//*[normalize-space(text())]
This selects all "leaf-elements"-descendants of the context (current) node that have at least one non-whitespace-only text node child.
Why don't you
string[] elements = getXPath(textNode).Split(new char[1] { '/' });
return String.Join("/", elements, 0, elements.Length-2);
I want to merge two XmlDocuments by inserting a second XML doc to the end of an existing Xmldocument in C#. How is this done?
Something like this:
foreach (XmlNode node in documentB.DocumentElement.ChildNodes)
{
XmlNode imported = documentA.ImportNode(node, true);
documentA.DocumentElement.AppendChild(imported);
}
Note that this ignores the document element itself of document B - so if that has a different element name, or attributes you want to copy over, you'll need to work out exactly what you want to do.
EDIT: If, as per your comment, you want to embed the whole of document B within document A, that's relatively easy:
XmlNode importedDocument = documentA.ImportNode(documentB.DocumentElement, true);
documentA.DocumentElement.AppendChild(importedDocument);
This will still ignore things like the XML declaration of document B if there is one - I don't know what would happen if you tried to import the document itself as a node of a different document, and it included an XML declaration... but I suspect this will do what you want.
Inserting an entire XML document at the end of another XML document is actually guaranteed to produce invalid XML. XML requires that there be one, and only one "document" element. So, assuming that your files were as follows:
A.xml
<document>
<element>value1</element>
<element>value2</element>
</document>
B.xml
<document>
<element>value3</element>
<element>value4</element>
</document>
The resultant document by just appending one at the end of the other:
<document>
<element>value1</element>
<element>value2</element>
</document>
<document>
<element>value3</element>
<element>value4</element>
</document>
Is invalid XML.
Assuming, instead, that the two documents share a common document element, and you want to insert the children of the document element from B into A's document element, you could use the following:
var docA = new XmlDocument();
var docB = new XmlDocument();
foreach (var childEl in docB.DocumentElement.ChildNodes) {
var newNode = docA.ImportNode(childEl, true);
docA.DocumentElement.AppendChild(newNode);
}
This will produce the following document given my examples above:
<document>
<element>value1</element>
<element>value2</element>
<element>value3</element>
<element>value4</element>
</document>
This is the fastest cleanest way to merge xml documents.
XElement xFileRoot = XElement.Load(file1.xml);
XElement xFileChild = XElement.Load(file2.xml);
xFileRoot.Add(xFileChild);
xFileRoot.Save(file1.xml);
Bad news. As long as the xml documents can have only one root element you cannot just put content of one document at the end of the second. Maybe this is what you are looking for? It shows how easily you can merge xml files using Linq-to-XML
Alternatively if you are using XmlDocuments you can try make it like this:
XmlDocument documentA;
XmlDocument documentB;
foreach(var childNode in documentA.DocumentElement.ChildNodes)
documentB.DocumentElement.AppendChild(childNode);
I have a huge bunch of XML files with the following structure:
<Stuff1>
<Content>someContent</name>
<type>someType</type>
</Stuff1>
<Stuff2>
<Content>someContent</name>
<type>someType</type>
</Stuff2>
<Stuff3>
<Content>someContent</name>
<type>someType</type>
</Stuff3>
...
...
I need to change the each of the "Content" node names to StuffxContent; basically prepend the parent node name to the content node's name.
I planned to use the XMLDocument class and figure out a way, but thought I would ask if there were any better ways to do this.
(1.) The [XmlElement / XmlNode].Name property is read-only.
(2.) The XML structure used in the question is crude and could be improved.
(3.) Regardless, here is a code solution to the given question:
String sampleXml =
"<doc>"+
"<Stuff1>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff1>"+
"<Stuff2>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff2>"+
"<Stuff3>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff3>"+
"</doc>";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(sampleXml);
XmlNodeList stuffNodeList = xmlDoc.SelectNodes("//*[starts-with(name(), 'Stuff')]");
foreach (XmlNode stuffNode in stuffNodeList)
{
// get existing 'Content' node
XmlNode contentNode = stuffNode.SelectSingleNode("Content");
// create new (renamed) Content node
XmlNode newNode = xmlDoc.CreateElement(contentNode.Name + stuffNode.Name);
// [if needed] copy existing Content children
//newNode.InnerXml = stuffNode.InnerXml;
// replace existing Content node with newly renamed Content node
stuffNode.InsertBefore(newNode, contentNode);
stuffNode.RemoveChild(contentNode);
}
//xmlDoc.Save
PS: I came here looking for a nicer way of renaming a node/element; I'm still looking.
I used this method to rename the node:
/// <summary>
/// Rename Node
/// </summary>
/// <param name="parentnode"></param>
/// <param name="oldname"></param>
/// <param name="newname"></param>
private static void RenameNode(XmlNode parentnode, string oldChildName, string newChildName)
{
var newnode = parentnode.OwnerDocument.CreateNode(XmlNodeType.Element, newChildName, "");
var oldNode = parentnode.SelectSingleNode(oldChildName);
foreach (XmlAttribute att in oldNode.Attributes)
newnode.Attributes.Append(att);
foreach (XmlNode child in oldNode.ChildNodes)
newnode.AppendChild(child);
parentnode.ReplaceChild(newnode, oldNode);
}
The easiest way I found to rename a node is:
xmlNode.InnerXmL = newNode.InnerXml.Replace("OldName>", "NewName>")
Don't include the opening < to ensure that the closing </OldName> tag is renamed as well.
Perhaps a better solution would be to iterate through each node, and write the information out to a new document. Obviously, this will depend on how you will be using the data in future, but I'd recommend the same reformatting as FlySwat suggested...
<stuff id="1">
<content/>
</stuff>
I'd also suggest that using the XDocument that was recently added would be the best way to go about creating the new document.
I'll answer the higher question: why are you trying this using XmlDocument?
I Think the best way to accomplish what you aim is a simple XSLT file
that match the "CONTENTSTUFF" node and output a "CONTENT" node...
don't see a reason to get such heavy guns...
Either way, If you still wish to do it C# Style,
Use XmlReader + XmlWriter and not XmlDocument for memory and speed purposes.
XmlDocument store the entire XML in memory, and makes it very heavy for Traversing once...
XmlDocument is good if you access the element many times (not the situation here).
I am not an expert in XML, and in my case I just needed to make all tag names in a HTML file to upper case, for further manipulation in XmlDocument with GetElementsByTagName. The reason I needed upper case was that for XmlDocument the tag names are case sensitive (since it is XML), and I could not guarantee that my HTML-file had consistent case in the tag names.
So I solved it like this: I used XDocument as an intermediate step, where you can rename elements (i.e. the tag name), and then loaded that into a XmlDocument. Here is my VB.NET-code (the C#-coding will be very similar).
Dim x As XDocument = XDocument.Load("myFile.html")
For Each element In x.Descendants()
element.Name = element.Name.LocalName.ToUpper()
Next
Dim x2 As XmlDocument = New XmlDocument()
x2.LoadXml(x.ToString())
For my purpose it worked fine, though I understand that in certain cases this might not be a solution if you are dealing with a pure XML-file.
Load it in as a string and do a replace on the whole lot..
String sampleXml =
"<doc>"+
"<Stuff1>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff1>"+
"<Stuff2>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff2>"+
"<Stuff3>"+
"<Content>someContent</Content>"+
"<type>someType</type>"+
"</Stuff3>"+
"</doc>";
sampleXml = sampleXml.Replace("Content","StuffxContent")
The XML you have provided shows that someone completely misses the point of XML.
Instead of having
<stuff1>
<content/>
</stuff1>
You should have:/
<stuff id="1">
<content/>
</stuff>
Now you would be able to traverse the document using Xpath (ie, //stuff[id='1']/content/) The names of nodes should not be used to establish identity, you use attributes for that.
To do what you asked, load the XML into an xml document, and simply iterate through the first level of child nodes renaming them.
PseudoCode:
foreach (XmlNode n in YourDoc.ChildNodes)
{
n.ChildNode[0].Name = n.Name + n.ChildNode[0].Name;
}
YourDoc.Save();
However, I'd strongly recommend you actually fix the XML so that it is useful, instead of wreck it further.