Unsuccessfully extracting InnerText from child nodes of XML document (C#) - c#

The XML I'm working with is as follows:
<?xml version="1.0" encoding="utf-8"?><entry_list version="1.0"><entry
id="commode"><ew>commode</ew><subj>HH-2#CL-1#FU-2a,b,c#BD-2d</subj><art>
<artref id="commode" /><capt>commode 1</capt><dim>54,18</dim></art>
<hw>com*mode</hw><sound><wav>commod01.wav</wav><wpr>ku-!mOd</wpr></sound>
<pr>kə-ˈmōd</pr><fl>noun</fl><et>French, from <it>commode,</it> adjective,
suitable, convenient, from Latin <it>commodus,</it> from <it>com-</it> +
<it>modus</it> measure <ma>mete</ma></et><def><date>circa 1688</date>
<sn>1</sn><dt>:a woman's ornate cap popular in the late 17th and early 18th
centuries</dt><sn>2 a</sn><dt>:a low chest of drawers</dt><sn>b</sn><dt>:a
movable washstand with a cupboard underneath</dt><sn>c</sn><dt>:a boxlike
structure holding a chamber pot under an open seat</dt><sd>also</sd><dt>:
<sx>chamber pot</sx></dt><sn>d</sn><dt>:<sx>toilet <sxn>3b</sxn></sx></dt>
</def><art><bmp>commode.bmp</bmp><cap>commode
1</cap></art></entry></entry_list>
The code I'm using, which I cobbled together from various related questions:
System.Xml.XmlNodeList elemList = doc.GetElementsByTagName("dt");
List<string> defs = new List<string>();
for (int count = 0; count < elemList.Count; count++)
{
string contents = string.Empty;
foreach (System.Xml.XmlNode child in elemList[count])
{
if (child.NodeType == System.Xml.XmlNodeType.Element)
{
contents += child.InnerText;
}
}
defs.Insert(count, contents);
}
The resulting List of "defs" is empty for any number of reasons, all of which are unknown to me.

This is using LINQ. Pass "dt" for the elementName parameter.
static List<string> GetInnerText(XDocument xDoc, string elementName)
{
var children = from node in xDoc.Descendants(elementName).DescendantNodes()
where node.NodeType == XmlNodeType.Text
select ((XText)node).Value;
return children.ToList();
}
I'm not sure if above is exactly what you want, so here's an alternative solution.
static List<string> GetInnerText(XmlDocument xDoc, string elementName)
{
List<string> innerText = new List<string>();
var children = xDoc.GetElementsByTagName(elementName);
foreach (XmlNode child in children)
innerText.Add(child.InnerText);
return innerText;
}

elemList = doc.GetElementsByTagName("dt"); returns an XmlNodeList. You can directly iterate this.
change this System.Xml.XmlNode child in elemList[count] to System.Xml.XmlNode child in elemList and look at the value of child in debugger.

Related

Get all child element values of specific node using XPath

I'm using XPath to read elements from an XML document. Specifically I want to return the values of any element which is the child of a specified element (here the specified element is <SceneryType> and these elements have single-digit values. So I want to return all of the children of <SceneryType> 1 for example.
Here is the XML:
<MissionObjectives>
<Theme themeName="Gothic">
<SceneryType>
1
<Objective>
Do a river thing.
</Objective>
<Objective>
Get all men to the other side of the river.
</Objective>
</SceneryType>
<SceneryType>
2
<Objective>
Climb some trees!
</Objective>
<Objective>
Shoot the tree!
</Objective>
</SceneryType>
</Theme>
I've tried various ways of getting these child elements, but I can't figure it out. My //objective part of the expression just returns everything from the root it seems, but the iterator isn't running which seems odd, shouldn't it loop through every element if the expression is returning a nodelist of all the elements?
XPathDocument missionDoc = new XPathDocument(objectivesPath + "MissionObjectives" + chosenTheme + ".xml");
XPathNavigator nav = missionDoc.CreateNavigator();
foreach (Scenery scenery in world.currentWorld)
{
int sceneryType = scenery.type;
XPathExpression expr = nav.Compile($"MissionObjectives/Theme/SceneryType[text()='{sceneryType}']//Objective");
XPathNodeIterator iterator = nav.Select(expr);
while (iterator.MoveNext())
{
XPathNavigator nav2 = iterator.Current.Clone();
compatibleObjectivesList.Add(nav2.Value);
}
}
I've tried looking through Stack Overflow for similar questions but I can't seem to find anything which applies to XPath. I can't use LINQ to XML for this. Any idea how I can return all the values of the various 'Objective' nodes?
Cheers for any help!
its much simpler to use the XDocument:
var doc = XDocument.Load(objectivesPath + "MissionObjectives" + chosenTheme + ".xml");
to get all of the first SceneryType child nodes:
var node = doc.XPathSelectElement("//MissionObjectives/Theme/SceneryType[1]");
to get the second objective node:
var node = doc.XPathSelectElement("//MissionObjectives/Theme/SceneryType/Objective[2]");
more xpath samples
For one, your xml data has carriage returns, line feeds, and white spaces in the search element's text node. Keep in mind, that an XML node can be an element, attribute, or text (among other node types). The solution below is a bit on the "long-handed" side and perhaps a little "hacky", but it should work. I wasn't certain if you wanted the child element text data or the entire child element, but I return just the child text node data (without carriage returns and line feeds). Also, while this solution DOES NOT use LINQ to XML in the strictest sense, it does use one LINQ expression.
private List<string> getSceneryTypeObjectiveTextList(string xml, int sceneryTypeId, string xpath = "/MissionObjectives/Theme/SceneryType")
{
List<string> result = null;
XmlDocument doc = null;
XmlNodeList sceneryTypeNodes = null;
try
{
doc = new XmlDocument();
doc.LoadXml(xml);
sceneryTypeNodes = doc.SelectNodes(xpath);
if (sceneryTypeNodes != null)
{
if (sceneryTypeNodes.Count > 0)
{
foreach (XmlNode sceneryTypeNode in sceneryTypeNodes)
{
if (sceneryTypeNode.HasChildNodes)
{
var textNode = from XmlNode n in sceneryTypeNode.ChildNodes
where (n.NodeType == XmlNodeType.Text && n.Value.Replace("\r", "").Replace("\n", "").Replace(" ", "") == sceneryTypeId.ToString())
select n;
if (textNode.Count() > 0)
{
XmlNodeList objectiveNodes = sceneryTypeNode.SelectNodes("Objective");
if (objectiveNodes != null)
{
result = new List<string>(objectiveNodes.Count);
foreach (XmlNode objectiveNode in objectiveNodes)
{
result.Add(objectiveNode.InnerText.Replace("\r", "").Replace("\n", "").Trim());
}
// Could break out of the iteration, here, if we know that SceneryType is always unique (i.e. - no duplicates in Element text node)
}
}
}
}
}
}
}
catch (Exception ex)
{
// Handle error
}
finally
{
}
return result;
}
private sampleCall(string filePath, int sceneryTypeId)
{
List<string> compatibleObjectivesList = null;
try
{
compatibleObjectivesList = getSceneryTypeObjectiveTextList(File.ReadAllText(filePath), sceneryTypeId);
}
catch (Exception ex)
{
// Handle error
}
finally
{
}
}

XmlNode check if list of chidnodes exists

I am trying to make a function that will take an XmlNode and check if each subsequent child exists and am having issues.
The function should have a signature similar to
private string GetValueForNodeIfExists(XmlNode node, List<string> childNodes){...}
An example illustrating what I would like to accomplish:
I need to know if the child (and possibly a child of a child) of a node exists.
If I have a node which has a child node named "child" and the "child" node has a node named "grandchild" and that grandchild node has a node named "greatGrandchild" then I would like to check if each sequence gives null or not, so checking the following:
node['child'] != null
node['child']['grandchild'] != null
node['child']['grandchild']['greatGrandchild'] != null
the node names I am checking are passed into the function as a List<string> where the index correlates to the depth of the node I am checking. For example, in the above example, the List I would pass in is List<string> checkedasd = new List<String> {"child", "grandchild", "greatGrandchild" };
I am not sure how I can programatically append each ['nodeName'] expression and then execute the expression. If I could figure that out, my strategy would be to throw everything in a try block and if I caught a Null exception then I would know the node doesnt exist.
All help is appreciated
I would use Linq2Xml and XPATH
var childNodes = new List<string>() { "child", "grandchild", "greatGrandchild" };
var xpath = "//" + string.Join("/", childNodes);
var xDoc = XDocument.Load(filename);
var xElem = xDoc.XPathSelectElement(xpath);
if(xElem!=null) //<--- No need for try- catch block
Console.WriteLine(xElem.Value);
PS: I tested the code above code with the following xml
<root>
<child>
<grandchild>
<greatGrandchild>
a
</greatGrandchild>
</grandchild>
</child>
</root>
If you aren't married to XmlDocument and can use Linq2Xml (or want to learn something new) another alternative might be:
DotNetFiddle
using System;
using System.Xml;
using System.Linq;
using System.Xml.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
//var xDoc = XDocument.Load(filename);
var XDoc = XDocument.Parse(#"<root><a><b><c>value</c></b></a><b><c>no</c></b><a><c>no</c></a></root>");
Console.WriteLine("Params a b c ");
foreach(var nodeValue in XDoc.Root.GetValueForNodeIfExists("a", "b", "c"))
{
Console.WriteLine(nodeValue);
}
Console.WriteLine("List a b c ");
foreach(var nodeValue in XDoc.Root.GetValueForNodeIfExists("a", "b", "c"))
{
Console.WriteLine(nodeValue);
}
}
}
internal static class XElementExtensions
{
public static IEnumerable<string> GetValueForNodeIfExists(this XElement node, params string[] childNodesNames)
{
return GetValueForNodeIfExists(node, childNodesNames.ToList());
}
public static IEnumerable<string> GetValueForNodeIfExists(this XElement node, IEnumerable<string> childNodesNames)
{
IEnumerable<XElement> nodes = new List<XElement> { node };
foreach(var name in childNodesNames)
{
nodes = FilterChildrenByName(nodes, name);
}
var result = nodes.Select(n => n.Value);
return result;
}
private static IEnumerable<XElement> FilterChildrenByName(IEnumerable<XElement> nodes, string filterName)
{
var result = nodes
.SelectMany(n => n.Elements(filterName));
Console.WriteLine("Filtering by {0}, found {1} elements", filterName, result.Count());
return result;
}
}
Results:
Params a b c
Filtering by a, found 2 elements
Filtering by b, found 1 elements
Filtering by c, found 1 elements
value
List a b c
Filtering by a, found 2 elements
Filtering by b, found 1 elements
Filtering by c, found 1 elements
value
All you need to do is use XPath:
private string GetValueForNodeIfExists(XmlNode node, List<string> childNodes)
{
var xpath = string.Join("/", childNodes.ToArray());
var foundNode = node.SelectSingleNode(xpath);
return foundNode != null ? foundNode.InnerText : null;
}
You could also expand on what you already have and just loop through the values until either you get a null value or reach the end:
private string GetValueForNodeIfExists(XmlNode node, List<string> childNodes)
{
foreach (var nodeName in childNodes)
{
if (node != null)
{
node = node[nodeName];
}
}
return node != null ? node.InnerText : null;
}

Select a child node in XML with specific name using C#

I am trying to find a child element with tag name Reason.
I have XML doc that is basically contains bunch of elements with Entity name.
Reason tag is somewhere inside of Entity(along with other elements).
void IParseResponse.ParseResponseData(XmlDocument responseDocument)
{
List<string> reasons = new List<string>();
var reasonValue = "";
var entityList = responseDocument.GetElementsByTagName("Entity");
if (entityList != null)
{
foreach (XmlNode reason in entityList)
{
reasonValue = //look into current Entity element, find Reason in it and get it's inner text.
reasons.Add(reasonValue);
}
}
}
This is location of Reason element.
<Entity>
<WatchList>
<Match ID="1">
<MatchDetails>
<Reason>
Does anybody have experience with this?
Here's how you can get all the Reason elements.
var xml = "<Entity> <WatchList><Match ID=\"1\"><MatchDetails><Reason>asdasd</Reason></MatchDetails></Match></WatchList></Entity>";
var x = XDocument.Parse(xml);
var reasons = x.Descendants("Reason").ToList();
foreach (var reason in reasons)
{
Console.WriteLine(reason.Value);
}
If you give us a more complete example of your XML I can improve the answer.
Edit:
If you want to use XmlDocument instead you could do this:
XmlNodeList nodes = responseDocument.GetElementsByTagName("Reason");
for (int i = 0; i < nodes.Count; i++)
{
Console.WriteLine(nodes[i].InnerText);
}

How to compare two big XML files item by item efficiently?

I plan to implement an method to compare two big XML files (but less than 10,000 element lines for each of other).
The method below works, but it doesn't well when the file more than 100 lines. It begin very slowly. How Can I find a more efficient solution. Maybe need High C# programming design or better Algorithm in C# & XML handling.
Thanks for your comments in advance.
//Remove the item which not in Event Xml and ConfAddition Xml files
XmlDocument doc = new XmlDocument();
doc.Load(xmlFile_AlarmSettingUp);
bool isNewAlid_Event = false;
bool isNewAlid_ConfAddition = false;
int alid = 0;
XmlNodeList xnList = doc.SelectNodes("/Equipment/AlarmSettingUp/EnabledALIDs/ALID");
foreach (XmlNode xn in xnList)
{
XmlAttributeCollection attCol = xn.Attributes;
for (int i = 0; i < attCol.Count; ++i)
{
if (attCol[i].Name == "alid")
{
alid = int.Parse(attCol[i].Value.ToString());
break;
}
}
//alid = int.Parse(attCol[1].Value.ToString());
XmlDocument docEvent_Alarm = new XmlDocument();
docEvent_Alarm.Load(xmlFile_Event);
XmlNodeList xnListEvent_Alarm = docEvent_Alarm.SelectNodes("/Equipment/Alarms/ALID");
foreach (XmlNode xnEvent_Alarm in xnListEvent_Alarm)
{
XmlAttributeCollection attColEvent_Alarm = xnEvent_Alarm.Attributes;
int alidEvent_Alarm = int.Parse(attColEvent_Alarm[1].Value.ToString());
if (alid == alidEvent_Alarm)
{
isNewAlid_Event = false;
break;
}
else
{
isNewAlid_Event = true;
//break;
}
}
XmlDocument docConfAddition_Alarm = new XmlDocument();
docConfAddition_Alarm.Load(xmlFile_ConfAddition);
XmlNodeList xnListConfAddition_Alarm = docConfAddition_Alarm.SelectNodes("/Equipment/Alarms/ALID");
foreach (XmlNode xnConfAddition_Alarm in xnListConfAddition_Alarm)
{
XmlAttributeCollection attColConfAddition_Alarm = xnConfAddition_Alarm.Attributes;
int alidConfAddition_Alarm = int.Parse(attColConfAddition_Alarm[1].Value.ToString());
if (alid == alidConfAddition_Alarm)
{
isNewAlid_ConfAddition = false;
break;
}
else
{
isNewAlid_ConfAddition = true;
//break;
}
}
if ( isNewAlid_Event && isNewAlid_ConfAddition )
{
// Store the root node of the destination document into an XmlNode
XmlNode rootDest = doc.SelectSingleNode("/Equipment/AlarmSettingUp/EnabledALIDs");
rootDest.RemoveChild(xn);
}
}
doc.Save(xmlFile_AlarmSettingUp);
my XML file as this. The two XML files are same style. Except some time one of them may be modified by my app. That's why I need compare them if modified.
<?xml version="1.0" encoding="utf-8"?>
<Equipment xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Licence LicenseId="" LicensePath="" />
<!--Alarm Setting Up XML File-->
<AlarmSettingUp>
<EnabledALIDs>
<ALID logicalName="Misc_EV_RM_STATION_ALREADY_RESERVED" alid="536870915" alcd="7" altx="Misc_Station 1 UnitName 2 SlotId already reserved" ceon="Misc_AlarmOn_EV_RM_STATION_ALREADY_RESERVED" ceoff="Misc_AlarmOff_EV_RM_STATION_ALREADY_RESERVED" />
<ALID logicalName="Misc_EV_RM_SEQ_READ_ERROR" alid="536870916" alcd="7" altx="Misc_Sequence ID 1 d step 2 d read error for wafer in 3 UnitName 4 SlotId" ceon="Misc_AlarmOn_EV_RM_SEQ_READ_ERROR" ceoff="Misc_AlarmOff_EV_RM_SEQ_READ_ERROR" />
...
...
...
</EnabledALIDs>
</AlarmSettingUp>
</Equipment>
The "ALID/#alid" seems to be your key, so the first thing I would do (before foreach (XmlNode xn in xnList)) is build a dictionary (assuming this is unique) over the docEvent_Alarm.SelectNodes("/Equipment/Alarms/ALID") #alid values - then you can do most of the work without O(n*m) performance - it'll be more O(n+m) (which is a big difference).
var lookup = new Dictionary<string, XmlElement>();
foreach(XmlElement el in docEvent_Alarm.SelectNodes("/Equipment/Alarms/ALID")) {
lookup.Add(el.GetAttribute("alid"), el);
}
then you can use:
XmlElement other;
if(lookup.TryGetValue(otherKey, out other)) {
// exists; element now in "other"
} else {
// doesn't exist
}
XmlDocument and related classes (XmlNode, ...) are not pretty fast in xml processing. Try XmlTextReader instead.
Also you call docEvent_Alarm.Load(xmlFile_Event); and docConfAddition_Alarm.Load(xmlFile_ConfAddition); each iteration of the parental loop - it's not good. If your xmlFile_Event and xmlFile_ConfAddition are persistent during all processing - better to initialize it before the main loop.
Have you tried using Microsoft's XmlDiff class? See http://msdn.microsoft.com/en-us/library/aa302294.aspx

Update or inserting a node in an XML doc

I am a beginner to XML and XPath in C#. Here is an example of my XML doc:
<root>
<folder1>
...
<folderN>
...
<nodeMustExist>...
<nodeToBeUpdated>some value</nodeToBeUpdated>
....
</root>
What I need is to update the value of nodeToBeUdpated if the node exists or add this node after the nodeMustExist if nodeToBeUpdated is not there. The prototype of the function is something like this:
void UpdateNode(
xmlDocument xml,
string nodeMustExist,
string nodeToBeUpdte,
string newVal
)
{
/*
search for XMLNode with name = nodeToBeUpdate in xml
to XmlNodeToBeUpdated (XmlNode type?)
if (xmlNodeToBeUpdated != null)
{
xmlNodeToBeUpdated.value(?) = newVal;
}
else
{
search for nodeMustExist in xml to xmlNodeMustExist obj
if ( xmlNodeMustExist != null )
{
add xmlNodeToBeUpdated as next node
xmlNodeToBeUpdte.value = newVal;
}
}
*/
}
Maybe there are other better and simplified way to do this. Any advice?
By the way, if nodeToBeUpdated appears more than once in other places, I just want to update the first one.
This is to update all nodes in folder:
public void UpdateNodes(XmlDocument doc, string newVal)
{
XmlNodeList folderNodes = doc.SelectNodes("folder");
if (folderNodes.Count > 0)
foreach (XmlNode folderNode in folderNodes)
{
XmlNode updateNode = folderNode.SelectSingleNode("nodeToBeUpdated");
XmlNode mustExistNode = folderNode.SelectSingleNode("nodeMustExist"); ;
if (updateNode != null)
{
updateNode.InnerText = newVal;
}
else if (mustExistNode != null)
{
XmlNode node = folderNode.OwnerDocument.CreateNode(XmlNodeType.Element, "nodeToBeUpdated", null);
node.InnerText = newVal;
folderNode.AppendChild(node);
}
}
}
If you want to update a particular node, you cannot pass string nodeToBeUpdte, but you will have to pass the XmlNode of the XmlDocument.
I have omitted the passing of node names in the function since nodes names are unlikely to change and can be hardcoded. However, you can pass these to the functions and use the strings instead of hardcoded node names.
The XPath expression that selects all instances of <nodeToBeUpdated> would be this:
/root/folder[nodeMustExist]/nodeToBeUpdated
or, in a more generic form:
/root/folder[*[name() = 'nodeMustExist']]/*[name() = 'nodeToBeUpdated']
suitable for:
void UpdateNode(xmlDocument xml,
string nodeMustExist,
string nodeToBeUpdte,
string newVal)
{
string xPath = "/root/folder[*[name() = '{0}']]/*[name() = '{1}']";
xPath = String.Format(xPath, nodeMustExist, nodeToBeUpdte);
foreach (XmlNode n in xml.SelectNodes(xPath))
{
n.Value = newVal;
}
}
Have a look at the SelectSingleNode method MSDN Doc
your xpath wants to be something like "//YourNodeNameHere" ;
once you have found that node you can then traverse back up the tree to get to the 'nodeMustExist' node:
XmlNode nodeMustExistNode = yourNode.Parent["nodeMustExist];

Categories

Resources