How to compare two big XML files item by item efficiently? - c#

I plan to implement an method to compare two big XML files (but less than 10,000 element lines for each of other).
The method below works, but it doesn't well when the file more than 100 lines. It begin very slowly. How Can I find a more efficient solution. Maybe need High C# programming design or better Algorithm in C# & XML handling.
Thanks for your comments in advance.
//Remove the item which not in Event Xml and ConfAddition Xml files
XmlDocument doc = new XmlDocument();
doc.Load(xmlFile_AlarmSettingUp);
bool isNewAlid_Event = false;
bool isNewAlid_ConfAddition = false;
int alid = 0;
XmlNodeList xnList = doc.SelectNodes("/Equipment/AlarmSettingUp/EnabledALIDs/ALID");
foreach (XmlNode xn in xnList)
{
XmlAttributeCollection attCol = xn.Attributes;
for (int i = 0; i < attCol.Count; ++i)
{
if (attCol[i].Name == "alid")
{
alid = int.Parse(attCol[i].Value.ToString());
break;
}
}
//alid = int.Parse(attCol[1].Value.ToString());
XmlDocument docEvent_Alarm = new XmlDocument();
docEvent_Alarm.Load(xmlFile_Event);
XmlNodeList xnListEvent_Alarm = docEvent_Alarm.SelectNodes("/Equipment/Alarms/ALID");
foreach (XmlNode xnEvent_Alarm in xnListEvent_Alarm)
{
XmlAttributeCollection attColEvent_Alarm = xnEvent_Alarm.Attributes;
int alidEvent_Alarm = int.Parse(attColEvent_Alarm[1].Value.ToString());
if (alid == alidEvent_Alarm)
{
isNewAlid_Event = false;
break;
}
else
{
isNewAlid_Event = true;
//break;
}
}
XmlDocument docConfAddition_Alarm = new XmlDocument();
docConfAddition_Alarm.Load(xmlFile_ConfAddition);
XmlNodeList xnListConfAddition_Alarm = docConfAddition_Alarm.SelectNodes("/Equipment/Alarms/ALID");
foreach (XmlNode xnConfAddition_Alarm in xnListConfAddition_Alarm)
{
XmlAttributeCollection attColConfAddition_Alarm = xnConfAddition_Alarm.Attributes;
int alidConfAddition_Alarm = int.Parse(attColConfAddition_Alarm[1].Value.ToString());
if (alid == alidConfAddition_Alarm)
{
isNewAlid_ConfAddition = false;
break;
}
else
{
isNewAlid_ConfAddition = true;
//break;
}
}
if ( isNewAlid_Event && isNewAlid_ConfAddition )
{
// Store the root node of the destination document into an XmlNode
XmlNode rootDest = doc.SelectSingleNode("/Equipment/AlarmSettingUp/EnabledALIDs");
rootDest.RemoveChild(xn);
}
}
doc.Save(xmlFile_AlarmSettingUp);
my XML file as this. The two XML files are same style. Except some time one of them may be modified by my app. That's why I need compare them if modified.
<?xml version="1.0" encoding="utf-8"?>
<Equipment xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Licence LicenseId="" LicensePath="" />
<!--Alarm Setting Up XML File-->
<AlarmSettingUp>
<EnabledALIDs>
<ALID logicalName="Misc_EV_RM_STATION_ALREADY_RESERVED" alid="536870915" alcd="7" altx="Misc_Station 1 UnitName 2 SlotId already reserved" ceon="Misc_AlarmOn_EV_RM_STATION_ALREADY_RESERVED" ceoff="Misc_AlarmOff_EV_RM_STATION_ALREADY_RESERVED" />
<ALID logicalName="Misc_EV_RM_SEQ_READ_ERROR" alid="536870916" alcd="7" altx="Misc_Sequence ID 1 d step 2 d read error for wafer in 3 UnitName 4 SlotId" ceon="Misc_AlarmOn_EV_RM_SEQ_READ_ERROR" ceoff="Misc_AlarmOff_EV_RM_SEQ_READ_ERROR" />
...
...
...
</EnabledALIDs>
</AlarmSettingUp>
</Equipment>

The "ALID/#alid" seems to be your key, so the first thing I would do (before foreach (XmlNode xn in xnList)) is build a dictionary (assuming this is unique) over the docEvent_Alarm.SelectNodes("/Equipment/Alarms/ALID") #alid values - then you can do most of the work without O(n*m) performance - it'll be more O(n+m) (which is a big difference).
var lookup = new Dictionary<string, XmlElement>();
foreach(XmlElement el in docEvent_Alarm.SelectNodes("/Equipment/Alarms/ALID")) {
lookup.Add(el.GetAttribute("alid"), el);
}
then you can use:
XmlElement other;
if(lookup.TryGetValue(otherKey, out other)) {
// exists; element now in "other"
} else {
// doesn't exist
}

XmlDocument and related classes (XmlNode, ...) are not pretty fast in xml processing. Try XmlTextReader instead.
Also you call docEvent_Alarm.Load(xmlFile_Event); and docConfAddition_Alarm.Load(xmlFile_ConfAddition); each iteration of the parental loop - it's not good. If your xmlFile_Event and xmlFile_ConfAddition are persistent during all processing - better to initialize it before the main loop.

Have you tried using Microsoft's XmlDiff class? See http://msdn.microsoft.com/en-us/library/aa302294.aspx

Related

Unsuccessfully extracting InnerText from child nodes of XML document (C#)

The XML I'm working with is as follows:
<?xml version="1.0" encoding="utf-8"?><entry_list version="1.0"><entry
id="commode"><ew>commode</ew><subj>HH-2#CL-1#FU-2a,b,c#BD-2d</subj><art>
<artref id="commode" /><capt>commode 1</capt><dim>54,18</dim></art>
<hw>com*mode</hw><sound><wav>commod01.wav</wav><wpr>ku-!mOd</wpr></sound>
<pr>kə-ˈmōd</pr><fl>noun</fl><et>French, from <it>commode,</it> adjective,
suitable, convenient, from Latin <it>commodus,</it> from <it>com-</it> +
<it>modus</it> measure <ma>mete</ma></et><def><date>circa 1688</date>
<sn>1</sn><dt>:a woman's ornate cap popular in the late 17th and early 18th
centuries</dt><sn>2 a</sn><dt>:a low chest of drawers</dt><sn>b</sn><dt>:a
movable washstand with a cupboard underneath</dt><sn>c</sn><dt>:a boxlike
structure holding a chamber pot under an open seat</dt><sd>also</sd><dt>:
<sx>chamber pot</sx></dt><sn>d</sn><dt>:<sx>toilet <sxn>3b</sxn></sx></dt>
</def><art><bmp>commode.bmp</bmp><cap>commode
1</cap></art></entry></entry_list>
The code I'm using, which I cobbled together from various related questions:
System.Xml.XmlNodeList elemList = doc.GetElementsByTagName("dt");
List<string> defs = new List<string>();
for (int count = 0; count < elemList.Count; count++)
{
string contents = string.Empty;
foreach (System.Xml.XmlNode child in elemList[count])
{
if (child.NodeType == System.Xml.XmlNodeType.Element)
{
contents += child.InnerText;
}
}
defs.Insert(count, contents);
}
The resulting List of "defs" is empty for any number of reasons, all of which are unknown to me.
This is using LINQ. Pass "dt" for the elementName parameter.
static List<string> GetInnerText(XDocument xDoc, string elementName)
{
var children = from node in xDoc.Descendants(elementName).DescendantNodes()
where node.NodeType == XmlNodeType.Text
select ((XText)node).Value;
return children.ToList();
}
I'm not sure if above is exactly what you want, so here's an alternative solution.
static List<string> GetInnerText(XmlDocument xDoc, string elementName)
{
List<string> innerText = new List<string>();
var children = xDoc.GetElementsByTagName(elementName);
foreach (XmlNode child in children)
innerText.Add(child.InnerText);
return innerText;
}
elemList = doc.GetElementsByTagName("dt"); returns an XmlNodeList. You can directly iterate this.
change this System.Xml.XmlNode child in elemList[count] to System.Xml.XmlNode child in elemList and look at the value of child in debugger.

Get all child element values of specific node using XPath

I'm using XPath to read elements from an XML document. Specifically I want to return the values of any element which is the child of a specified element (here the specified element is <SceneryType> and these elements have single-digit values. So I want to return all of the children of <SceneryType> 1 for example.
Here is the XML:
<MissionObjectives>
<Theme themeName="Gothic">
<SceneryType>
1
<Objective>
Do a river thing.
</Objective>
<Objective>
Get all men to the other side of the river.
</Objective>
</SceneryType>
<SceneryType>
2
<Objective>
Climb some trees!
</Objective>
<Objective>
Shoot the tree!
</Objective>
</SceneryType>
</Theme>
I've tried various ways of getting these child elements, but I can't figure it out. My //objective part of the expression just returns everything from the root it seems, but the iterator isn't running which seems odd, shouldn't it loop through every element if the expression is returning a nodelist of all the elements?
XPathDocument missionDoc = new XPathDocument(objectivesPath + "MissionObjectives" + chosenTheme + ".xml");
XPathNavigator nav = missionDoc.CreateNavigator();
foreach (Scenery scenery in world.currentWorld)
{
int sceneryType = scenery.type;
XPathExpression expr = nav.Compile($"MissionObjectives/Theme/SceneryType[text()='{sceneryType}']//Objective");
XPathNodeIterator iterator = nav.Select(expr);
while (iterator.MoveNext())
{
XPathNavigator nav2 = iterator.Current.Clone();
compatibleObjectivesList.Add(nav2.Value);
}
}
I've tried looking through Stack Overflow for similar questions but I can't seem to find anything which applies to XPath. I can't use LINQ to XML for this. Any idea how I can return all the values of the various 'Objective' nodes?
Cheers for any help!
its much simpler to use the XDocument:
var doc = XDocument.Load(objectivesPath + "MissionObjectives" + chosenTheme + ".xml");
to get all of the first SceneryType child nodes:
var node = doc.XPathSelectElement("//MissionObjectives/Theme/SceneryType[1]");
to get the second objective node:
var node = doc.XPathSelectElement("//MissionObjectives/Theme/SceneryType/Objective[2]");
more xpath samples
For one, your xml data has carriage returns, line feeds, and white spaces in the search element's text node. Keep in mind, that an XML node can be an element, attribute, or text (among other node types). The solution below is a bit on the "long-handed" side and perhaps a little "hacky", but it should work. I wasn't certain if you wanted the child element text data or the entire child element, but I return just the child text node data (without carriage returns and line feeds). Also, while this solution DOES NOT use LINQ to XML in the strictest sense, it does use one LINQ expression.
private List<string> getSceneryTypeObjectiveTextList(string xml, int sceneryTypeId, string xpath = "/MissionObjectives/Theme/SceneryType")
{
List<string> result = null;
XmlDocument doc = null;
XmlNodeList sceneryTypeNodes = null;
try
{
doc = new XmlDocument();
doc.LoadXml(xml);
sceneryTypeNodes = doc.SelectNodes(xpath);
if (sceneryTypeNodes != null)
{
if (sceneryTypeNodes.Count > 0)
{
foreach (XmlNode sceneryTypeNode in sceneryTypeNodes)
{
if (sceneryTypeNode.HasChildNodes)
{
var textNode = from XmlNode n in sceneryTypeNode.ChildNodes
where (n.NodeType == XmlNodeType.Text && n.Value.Replace("\r", "").Replace("\n", "").Replace(" ", "") == sceneryTypeId.ToString())
select n;
if (textNode.Count() > 0)
{
XmlNodeList objectiveNodes = sceneryTypeNode.SelectNodes("Objective");
if (objectiveNodes != null)
{
result = new List<string>(objectiveNodes.Count);
foreach (XmlNode objectiveNode in objectiveNodes)
{
result.Add(objectiveNode.InnerText.Replace("\r", "").Replace("\n", "").Trim());
}
// Could break out of the iteration, here, if we know that SceneryType is always unique (i.e. - no duplicates in Element text node)
}
}
}
}
}
}
}
catch (Exception ex)
{
// Handle error
}
finally
{
}
return result;
}
private sampleCall(string filePath, int sceneryTypeId)
{
List<string> compatibleObjectivesList = null;
try
{
compatibleObjectivesList = getSceneryTypeObjectiveTextList(File.ReadAllText(filePath), sceneryTypeId);
}
catch (Exception ex)
{
// Handle error
}
finally
{
}
}

Select a child node in XML with specific name using C#

I am trying to find a child element with tag name Reason.
I have XML doc that is basically contains bunch of elements with Entity name.
Reason tag is somewhere inside of Entity(along with other elements).
void IParseResponse.ParseResponseData(XmlDocument responseDocument)
{
List<string> reasons = new List<string>();
var reasonValue = "";
var entityList = responseDocument.GetElementsByTagName("Entity");
if (entityList != null)
{
foreach (XmlNode reason in entityList)
{
reasonValue = //look into current Entity element, find Reason in it and get it's inner text.
reasons.Add(reasonValue);
}
}
}
This is location of Reason element.
<Entity>
<WatchList>
<Match ID="1">
<MatchDetails>
<Reason>
Does anybody have experience with this?
Here's how you can get all the Reason elements.
var xml = "<Entity> <WatchList><Match ID=\"1\"><MatchDetails><Reason>asdasd</Reason></MatchDetails></Match></WatchList></Entity>";
var x = XDocument.Parse(xml);
var reasons = x.Descendants("Reason").ToList();
foreach (var reason in reasons)
{
Console.WriteLine(reason.Value);
}
If you give us a more complete example of your XML I can improve the answer.
Edit:
If you want to use XmlDocument instead you could do this:
XmlNodeList nodes = responseDocument.GetElementsByTagName("Reason");
for (int i = 0; i < nodes.Count; i++)
{
Console.WriteLine(nodes[i].InnerText);
}

How to iterate a xml file with XmlReader class

my xml stored in xml file which look like as below
<?xml version="1.0" encoding="utf-8"?>
<metroStyleManager>
<Style>Blue</Style>
<Theme>Dark</Theme>
<Owner>CSRAssistant.Form1, Text: CSR Assistant</Owner>
<Site>System.ComponentModel.Container+Site</Site>
<Container>System.ComponentModel.Container</Container>
</metroStyleManager>
this way i am iterating but some glitch is there
XmlReader rdr = XmlReader.Create(System.IO.Path.GetDirectoryName(System.Windows.Forms.Application.ExecutablePath) + #"\Products.xml");
while (rdr.Read())
{
if (rdr.NodeType == XmlNodeType.Element)
{
string xx1= rdr.LocalName;
string xx = rdr.Value;
}
}
it is always getting empty string xx = rdr.Value;
when element is style then value should be Blue as in the file but i am getting always empty....can u say why?
another requirement is i want to iterate always within <metroStyleManager></metroStyleManager>
can anyone help for the above two points. thanks
Blue is the value of Text node, not of Element node. You either need to add another if to get value of text nodes, or you can read inner xml of current element node:
rdr.MoveToContent();
while (rdr.Read())
{
if (rdr.NodeType == XmlNodeType.Element)
{
string name = rdr.LocalName;
string value = rdr.ReadInnerXml();
}
}
You can also use Linq to Xml to get names and values of root children:
var xdoc = XDocument.Load(path_to_xml);
var query = from e in xdoc.Root.Elements()
select new {
e.Name.LocalName,
Value = (string)e
};
You can use the XmlDocument class for this.
XmlDocument doc = new XmlDocument.Load(filename);
foreach (XmlNode node in doc.ChildNodes)
{
if (node.ElementName == "metroStyleManager")
{
foreach (XmlNode subNode in node.ChildNodes)
{
string key = subNode.LocalName; // Style, Theme, etc.
string value = subNode.Value; // Blue, Dark, etc.
}
}
else
{
...
}
}
you can user XDocument xDoc = XDocument.Load(strFilePath) to load XML file.
then you can use
foreach (XElement xeNode in xDoc.Element("metroStyleManager").Elements())
{
//Check if node exist
if (!xeNode.Elements("Style").Any()
//If yes then
xeNode.Value
}
Hope it Helps...
BTW, its from System.XML.Linq.XDocument

A better way to handle XML updation

I have a DataGridView control where some values are popluted.
And also I have an xml file. The user can change the value in the Warning Column of DataGridView.And that needs to be saved in the xml file.
The below program just does the job
XDocument xdoc = XDocument.Load(filePath);
//match the record
foreach (var rule in xdoc.Descendants("Rule"))
{
foreach (var row in dgRulesMaster.Rows.Cast<DataGridViewRow>())
{
if (rule.Attribute("id").Value == row.Cells[0].Value.ToString())
{
rule.Attribute("action").Value = row.Cells[3].Value.ToString();
}
}
}
//save the record
xdoc.Save(filePath);
Matching the grid values with the XML document and for the matched values, updating the needed XML attribute.
Is there a better way to code this?
Thanks
You could do something like this:
var rules = dgRulesMaster.Rows.Cast<DataGridViewRow>()
.Select(x => new {
RuleId = x.Cells[0].Value.ToString(),
IsWarning = x.Cells[3].Value.ToString() });
var tuples = from n in xdoc.Descendants("Rule")
from r in rules
where n.Attribute("id").Value == r.RuleId
select new { Node = n, Rule = r };
foreach(var tuple in tuples)
tuple.Node.Attribute("action").Value = tuple.Rule.IsWarning;
This is basically the same, just a bit more LINQ-y. Whether or not this is "better" is debatable. One thing I removed is the conversion of IsWarning first to string, then to int and finally back to string. It now is converted to string once and left that way.
XPath allows you to target nodes in the xml with alot of power. Microsoft's example of using the XPathNavigator to modify an XML file is as follows:
XmlDocument document = new XmlDocument();
document.Load("contosoBooks.xml");
XPathNavigator navigator = document.CreateNavigator();
XmlNamespaceManager manager = new XmlNamespaceManager(navigator.NameTable);
manager.AddNamespace("bk", "http://www.contoso.com/books");
foreach (XPathNavigator nav in navigator.Select("//bk:price", manager))
{
if (nav.Value == "11.99")
{
nav.SetValue("12.99");
}
}
Console.WriteLine(navigator.OuterXml);
Source: http://msdn.microsoft.com/en-us/library/zx28tfx1(v=vs.80).aspx

Categories

Resources