How can I merge XML files? - c#

I have two xml files that both have the same schema and I would like to merge into a single xml file. Is there an easy way to do this?
For example,
<Root>
<LeafA>
<Item1 />
<Item2 />
</LeafA>
<LeafB>
<Item1 />
<Item2 />
</LeafB>
</Root>
+
<Root>
<LeafA>
<Item3 />
<Item4 />
</LeafA>
<LeafB>
<Item3 />
<Item4 />
</LeafB>
</Root>
= new file containing
<Root>
<LeafA>
<Item1 />
<Item2 />
<Item3 />
<Item4 />
</LeafA>
<LeafB>
<Item1 />
<Item2 />
<Item3 />
<Item4 />
</LeafB>
</Root>

"Automatic XML merge" sounds like a relatively simple requirement, but when you go into all the details, it gets complex pretty fast. Merge with c# or XSLT will be much easier for more specific task, like in the answer for EF model. Using tools to assist with a manual merge can also be an option (see this SO question).
For the reference (and to give an idea about complexity) here's an open-source example from Java world: XML merging made easy
Back to the original question. There are few big gray-ish areas in task specification: when 2 elements should be considered equivalent (have same name, matching selected or all attributes, or also have same position in the parent element); how to handle situation when original or merged XML have multiple equivalent elements etc.
The code below is assuming that
we only care about elements at the moment
elements are equivalent if element names, attribute names, and attribute values match
an element doesn't have multiple attributes with the same name
all equivalent elements from merged document will be combined with the first equivalent element in the source XML document.
.
// determine which elements we consider the same
//
private static bool AreEquivalent(XElement a, XElement b)
{
if(a.Name != b.Name) return false;
if(!a.HasAttributes && !b.HasAttributes) return true;
if(!a.HasAttributes || !b.HasAttributes) return false;
if(a.Attributes().Count() != b.Attributes().Count()) return false;
return a.Attributes().All(attA => b.Attributes(attA.Name)
.Count(attB => attB.Value == attA.Value) != 0);
}
// Merge "merged" document B into "source" A
//
private static void MergeElements(XElement parentA, XElement parentB)
{
// merge per-element content from parentB into parentA
//
foreach (XElement childB in parentB.DescendantNodes())
{
// merge childB with first equivalent childA
// equivalent childB1, childB2,.. will be combined
//
bool isMatchFound = false;
foreach (XElement childA in parentA.Descendants())
{
if (AreEquivalent(childA, childB))
{
MergeElements(childA, childB);
isMatchFound = true;
break;
}
}
// if there is no equivalent childA, add childB into parentA
//
if (!isMatchFound) parentA.Add(childB);
}
}
It will produce desired result with the original XML snippets, but if input XMLs are more complex and have duplicate elements, the result will be more... interesting:
public static void Test()
{
var a = XDocument.Parse(#"
<Root>
<LeafA>
<Item1 />
<Item2 />
<SubLeaf><X/></SubLeaf>
</LeafA>
<LeafB>
<Item1 />
<Item2 />
</LeafB>
</Root>");
var b = XDocument.Parse(#"
<Root>
<LeafB>
<Item5 />
<Item1 />
<Item6 />
</LeafB>
<LeafA Name=""X"">
<Item3 />
</LeafA>
<LeafA>
<Item3 />
</LeafA>
<LeafA>
<SubLeaf><Y/></SubLeaf>
</LeafA>
</Root>");
MergeElements(a.Root, b.Root);
Console.WriteLine("Merged document:\n{0}", a.Root);
}
Here's merged document showing how equivalent elements from document B were combined together:
<Root>
<LeafA>
<Item1 />
<Item2 />
<SubLeaf>
<X />
<Y />
</SubLeaf>
<Item3 />
</LeafA>
<LeafB>
<Item1 />
<Item2 />
<Item5 />
<Item6 />
</LeafB>
<LeafA Name="X">
<Item3 />
</LeafA>
</Root>

If the format is always exactly like this there is nothing wrong with this method:
Remove the last two lines from the first file and append the second files while removing the first two lines.
Have a look at the Linux commands head and tail which can delete the first and last two lines.

It's a simple XSLT transformation something like this (which you apply to document a.xml):
<xsl:variable name="docB" select="document('b.xml')"/>
<xsl:template match="Root">
<Root><xsl:apply-templates/></Root>
</xsl:template>
<xsl:template match="Root/LeafA">
<xsl:copy-of select="*"/>
<xsl:copy-of select="$docB/Root/LeafA/*"/>
</xsl:template>
<xsl:template match="Root/LeafB">
<xsl:copy-of select="*"/>
<xsl:copy-of select="$docB/Root/LeafB/*"/>
</xsl:template>

vimdiff file_a file_b as just one example
BeyondCompare is a favorite when I'm on windows http://www.scootersoftware.com/

I ended up using C# and created myself a script. I knew I could do it when I asked the question, but I wanted to know if there was a faster way to do this since I've never really worked with XML.
The script went along the lines of this:
var a = new XmlDocument();
a.Load(PathToFile1);
var b = new XmlDocument();
b.Load(PathToFile2);
MergeNodes(
a.SelectSingleNode(nodePath),
b.SelectSingleNode(nodePath).ChildNodes,
a);
a.Save(PathToFile1);
And MergeNodes() looked something like this:
private void MergeNodes(XmlNode parentNodeA, XmlNodeList childNodesB, XmlDocument parentA)
{
foreach (XmlNode oNode in childNodesB)
{
// Exclude container node
if (oNode.Name == "#comment") continue;
bool isFound = false;
string name = oNode.Attributes["Name"].Value;
foreach (XmlNode child in parentNodeA.ChildNodes)
{
if (child.Name == "#comment") continue;
// If node already exists and is unchanged, exit loop
if (child.OuterXml== oNode.OuterXml&& child.InnerXml == oNode.InnerXml)
{
isFound = true;
Console.WriteLine("Found::NoChanges::" + oNode.Name + "::" + name);
break;
}
// If node already exists but has been changed, replace it
if (child.Attributes["Name"].Value == name)
{
isFound = true;
Console.WriteLine("Found::Replaced::" + oNode.Name + "::" + name);
parentNodeA.ReplaceChild(parentA.ImportNode(oNode, true), child);
}
}
// If node does not exist, add it
if (!isFound)
{
Console.WriteLine("NotFound::Adding::" + oNode.Name + "::" + name);
parentNodeA.AppendChild(parentA.ImportNode(oNode, true));
}
}
}
Its not perfect - I have to manually specify the nodes I want merged, but it was quick and easy for me to put together and since I have almost no knowledge of XML, I'm happy :)
It actually works out better that it only merges the specified nodes since I'm using it to merge Entity Framework's edmx files, and I only really want to merge the SSDL, CDSL, and MSL nodes.

The way you could do it, is load a dataset with the xml and merge the datasets.
Dim dsFirst As New DataSet()
Dim dsMerge As New DataSet()
' Create new FileStream with which to read the schema.
Dim fsReadXmlFirst As New System.IO.FileStream(myXMLfileFirst, System.IO.FileMode.Open)
Dim fsReadXmlMerge As New System.IO.FileStream(myXMLfileMerge, System.IO.FileMode.Open)
Try
dsFirst.ReadXml(fsReadXmlFirst)
dsMerge.ReadXml(fsReadXmlMerge)
Dim str As String = "Merge Table(0) Row Count = " & dsMerge.Tables(0).Rows.Count
str = str & Chr(13) & "Merge Table(1) Row Count = " & dsMerge.Tables(1).Rows.Count
str = str & Chr(13) & "Merge Table(2) Row Count = " & dsMerge.Tables(2).Rows.Count
MsgBox(str)
dsMerge.Merge(dsFirst, True)
DataGridParent.DataSource = dsMerge
DataGridParent.DataMember = "rulefile"
DataGridChild.DataSource = dsMerge
DataGridChild.DataMember = "rule"
str = ""
str = "Merge Table(0) Row Count = " & dsMerge.Tables(0).Rows.Count
str = str & Chr(13) & "Merge Table(1) Row Count = " & dsMerge.Tables(1).Rows.Count
str = str & Chr(13) & "Merge Table(2) Row Count = " & dsMerge.Tables(2).Rows.Count
MsgBox(str)

reposting answer from https://www.perlmonks.org/?node_id=127848
Paste following into a perl script
use strict;
require 5.000;
use Data::Dumper;
use XML::Simple;
use Hash::Merge;
my $xmlFile1 = shift || die "XmlFile1\n";
my $xmlFile2 = shift || die "XmlFile2\n";
my %config1 = %{XMLin ($xmlFile1)};
my %config2 = %{XMLin ($xmlFile2)};
my $merger = Hash::Merge->new ('RIGHT_PRECEDENT');
my %newhash = %{ $merger->merge (\%config1, \%config2) };
# XMLout (\%newhash, outputfile => "newfile", xmldecl => 1, rootname => 'config');
print XMLout (\%newhash);

Related

Getting attributes from xml using Linq

I have an xml document that I want to obtain attributes from
Here is the XML:
<Translations>
<Product Name="Room" ID="16">
<Terms>
<Term Generic="Brand" Product="Sub Category" />
<Term Generic="Range" Product="Brand" />
</Terms>
</Product>
<Product Name="House"" ID="29">
<Terms>
<Term Generic="Category" Product="Product Brand" />
<Term Generic="Brand" Product="Category Description" />
<Term Generic="Range" Product="Group Description" />
<Term Generic="Product" Product="Product Description" />
</Terms>
</Product>
</Translations>
Here is my current Linq query
public static string clsTranslationTesting(string GenericTerm, int ProductID)
{
const string xmlFilePath = "C:\\Dev\\XMLTrial\\XMLFile1.xml";
var xmlDocument = XDocument.Load(xmlFilePath);
var genericValue =
from gen in xmlDocument.Descendants("Product")
where gen.Attribute("ID").Value == ProductID.ToString()
select gen.Value.ToString();
}
The error that I am having is when I pass data into the method, the method loads the xml from the file to the xmlDocument variable successfully. However when it executes the query it returns a value null. I want to obtain the ID value.
I'm a little lost with your question, but here's my attempt.
First thing is you need to change "Customer" to "Product". Your XML contains not a single instance of the word "Customer" so I think you have a typo there.
I don't know exactly what you want returned from the query (I assume just the entire matched node?). Try this:
var genericValue = xmlDocument.Descendants("Product")
.FirstOrDefault(x => x.Attribute("ID").Value == "16");
I made a fiddle here that shows it in action

Unsure of how to use LINQ to access this particular element

* I'm completely new to this, and this is a personal project I am doing. *
So I have an XML document structured like this
<Licensing key="20325">
<Organization Org="500">
<Constraints>
<MaximumOrgsInSecurity>2</MaximumOrgsInSecurity>
<MaximumUsersInSecurity>999</MaximumUsersInSecurity>
<MaximumLoggedInUsers>999</MaximumLoggedInUsers>
<MaximumLenders>1</MaximumLenders>
<OptOutofPasswordPolicy>FALSE</OptOutofPasswordPolicy>
</Constraints>
<Modules>
<Module registered="true" name="DV" id="1" />
<Module registered="true" name="DP" id="2" />
<Module registered="true" name="DCC" id="3" />
<Module registered="false" name="DRE" id="4" />
</Modules>
</Organization>
</Licensing>
and I am trying to read it using LINQ in my C# code, and although I am attempting to following this tutorial on LINQ (http://www.dotnetcurry.com/linq/564/linq-to-xml-tutorials-examples), I just can't seem to access the elements I would like. For example, how would I use LINQ to get the key number of 20325, Org number of 500, id/name/registered of each module, and stuff like that? The XML document has to be in this format. Any help or walkthroughs would be appreciated, thank you!
EDIT:
For example, I've tried doing
IEnumerable<XElement> Licensing = xelement.Elements();
foreach (var Organization in Licensing)
{
System.Diagnostics.Debug.Write(Organization.Element("Constraints").Value);
}
to see what this would give me, and it gives 29999991FALSE, when I was hoping it would give something along the lines of
MaximumOrgsInSecurity
MaximumUsersInSecurity
MaximumLoggedInUsers
MaximumLenders
OptOutofPasswordPolicy
or at least
2
999
999
1
False
I've also tried doing
IEnumerable<XElement> Licensing = xelement.Elements();
foreach (var Organization in Licensing)
{
System.Diagnostics.Debug.Write(Organization.Element("Modules").Value);
}
to see what this would give, and it gives absolutely nothing.
If there is a better way than LINQ to do this, then I am all ears. The only reason I am saying LINQ is because based on what I've found so far, LINQ would be my best bet to achieve what I am attempting to do.
Those key values are called Attributes and here's a few different ways to access them:
Debug.WriteLine(xelement.Attribute("key").Value);
Debug.WriteLine(xelement.Element("Organization").Attribute("Org").Value);
Debug.WriteLine(((XElement)xelement.FirstNode).Attribute("Org").Value);
For the constraints you're selecting a level to high, need to select the child nodes with .Elements():
foreach (var constraint in xelement.Descendants("Constraints").Elements())
{
Debug.WriteLine(constraint.Name + ": " + constraint.Value);
}
foreach (var constraint in xelement.Element("Organization").Element("Constraints").Elements())
{
Debug.WriteLine(constraint.Name + ": " + constraint.Value);
}
You can also add using System.Diagnostics; to the top of the file so you don't need to add it before every Debug too.
So based on what #mattmanser said, and further looking into LINQ and Xelement/Xdocument, I figured out how to do what I'm looking to do.
For example, say I want to know all of the "registered" booleans within the modules element and store them in an array of booleans, Id do this:
string Name = FileUpload1.FileName;
bool[] ModuleBools = new bool[4];
for (int moduleID = 1; moduleID < 5; moduleID++)
{
var quotes = XDocument.Load("C:/Users/.../Created XMLs/" + Name)
.Descendants("Module")
.Where(x => (string)x.Attribute("id") == moduleID.ToString())
.Select(x => (string)x.Attribute("registered"))
.ToList();
ModuleBools[moduleID-1] = bool.Parse(quotes.First());
}
Excuse the VB. With the given example, here is how you would access each item. The variable names, though the same as the elements, are just names.
Dim someXE As XElement
' someXE = XElement.Load("path here") 'to load from file / uri
' for testing we can do this
someXE = <Licensing key="20325">
<Organization Org="500">
<Constraints>
<MaximumOrgsInSecurity>2</MaximumOrgsInSecurity>
<MaximumUsersInSecurity>999</MaximumUsersInSecurity>
<MaximumLoggedInUsers>999</MaximumLoggedInUsers>
<MaximumLenders>1</MaximumLenders>
<OptOutofPasswordPolicy>FALSE</OptOutofPasswordPolicy>
</Constraints>
<Modules>
<Module registered="true" name="DV" id="1"/>
<Module registered="true" name="DP" id="2"/>
<Module registered="true" name="DCC" id="3"/>
<Module registered="false" name="DRE" id="4"/>
</Modules>
</Organization>
</Licensing>
Dim key As String = someXE.#key
Dim MaximumOrgsInSecurity As String = someXE.<Organization>.<Constraints>.<MaximumOrgsInSecurity>.Value
Dim MaximumUsersInSecurity As String = someXE.<Organization>.<Constraints>.<MaximumOrgsInSecurity>.Value
Dim MaximumLoggedInUsers As String = someXE.<Organization>.<Constraints>.<MaximumLoggedInUsers>.Value
Dim MaximumLenders As String = someXE.<Organization>.<Constraints>.<MaximumLenders>.Value
Dim OptOutofPasswordPolicy As String = someXE.<Organization>.<Constraints>.<OptOutofPasswordPolicy>.Value

XML full file reading C#

So I have a code that reads partially into an XML document, precenting me with the first block of results which is great, but I have a file containing multiple blocks of the same code & my program seems to quit after the first.
Here's the code:
string path = "data//handling.meta";
var doc = XDocument.Load(path);
var items = doc.Descendants("HandlingData").Elements("Item");//.ToArray();
var query = from i in items
select new
{
HandlingName = (string)i.Element("handlingName"),
Mass = (decimal?)i.Element("fMass").Attribute("value"),
InitialDragCoeff = (decimal?)i.Element("fInitialDragCoeff").Attribute("value"),
PercentSubmerged = (decimal?)i.Element("fPercentSubmerged").Attribute("value"),
DriveBiasFront = (decimal?)i.Element("fDriveBiasFront").Attribute("value"),
InitialDriveGears = i.Element("nInitialDriveGears").Attribute("value")
}
string test = ("{0} - {1}" + query.First().HandlingName + query.First().Mass + query.First().InitialDragCoeff);
richTextBox1.Text = test;
Here's the XML Document :
<?xml version="1.0" encoding="UTF-8"?>
<CHandlingDataMgr>
<HandlingData>
<Item type="CHandlingData">
<handlingName>Car1</handlingName>
<fMass value="140000.000000" />
<fInitialDragCoeff value="30.000000" />
<fPercentSubmerged value="85.000000" />
<vecCentreOfMassOffset x="0.000000" y="0.000000" z="0.000000" />
<vecInertiaMultiplier x="1.000000" y="1.000000" z="1.000000" />
<fDriveBiasFront value="1.000000" />
<nInitialDriveGears value="1" />
</Item>
<Item type="CHandlingData">
<handlingName>Car2</handlingName>
<fMass value="180000.000000" />
<fInitialDragCoeff value="7.800000" />
<fPercentSubmerged value="85.000000" />
<vecCentreOfMassOffset x="0.000000" y="0.000000" z="0.000000" />
<vecInertiaMultiplier x="1.000000" y="1.300000" z="1.500000" />
<fDriveBiasFront value="0.200000" />
<nInitialDriveGears value="6" />
</Item>
</HandlingData>
</CHandlingDataMgr>
As shown, there's multiple handling Name's. The CSharp code above does work, but only for the first block & I'm wondering how to make it read the same values from the different handling name.
I have tried :
if (query.First().HandlingName == "Car2")
{
MessageBox.Show("Car 2 found");
}
but since the message box never appeared, I assume this code doesn't read the hole file?
I'm hoping for output like this:
Name: Car 1
Mass: 140000.000000
InitialDragCoeff: 30.000000
Name: Car 2
Mass: 180000.000000
InitialDragCoeff: 7.800000
My problem in a 'nut shell' : Program does not see Car 2
Any help would be really appreciated, as I've tried many solutions & read many pages regarding XML today
You have:
string test = ("{0} - {1}" + query.First().HandlingName + query.First().Mass
+ query.First().InitialDragCoeff);
that's only ever going to get you the first element, because that's what you asked for.
I think you probably want to loop:
foreach (var item in query) {
var s = "{0} - {1}" + item.HandlingName + query.item.Mass
+ item.InitialDragCoeff
// …
}

Adding a class instance as value to a dictionary

I've an xml file "Sample.xml"
<RootElement>
<Children>
<Child Name="FirstChild" Start="0" End="2">
<Sibling Name="Test1" />
<Sibling Name="Test2" />
<AdditionalSibling Name="Add_Test_1" />
<AdditionalSibling Name="Add_Test_2" />
<MissingSibling Name="Miss_Test_1" />
<MissingSibling Name="Miss_Test_2" /
</Child>
<Child Name="SecondChild" Start="0" End="2">
<Sibling Name="Test3" />
<Sibling Name="Test4" />
</Child>
<Child Name="ThirdChild" Start="0" End="2">
<Sibling Name="Test5" />
<Sibling Name="Test6" />
</Child>
<Child Name="FourthChild" Start="0" End="2">
<Sibling Name="Test7" />
<Sibling Name="Test8" />
</Child>
<Child Name="FifthChild" Start="0" End="2">
<Sibling Name="Test9" />
<Sibling Name="Test10" />
</Child>
<Child Name="SixthChild" Start="0" End="2">
<Sibling Name="Test11" />
<Sibling Name="Test12" />
</Child>
<MatchedChilds>
<Child Name="FirstChild" />
<Child Name="SecondChild" />
<Child Name="ThirdChild" />
<Child Name="FourthChild" />
<Child Name="FifthChild" />
<Child Name="SixthChild" />
</MatchedChilds>
</Children>
</RootElement>
And a Class "SampleClass"
public class SampleClass
{
string Start;
string End;
List<string> Siblings;
List<string> AdditionalSiblings;
List<string> MissingSiblings;
public SampleClass()
{
Start= "";
End = "";
Siblings = new List<string>();
AdditionalSiblings = new List<string>();
MissingSiblings = new List<string>();
}
public SampleClass( string St, string En,List<string> S, List<string> AS, List<string> MS)
{
Start= St;
End = En;
Siblings = S;
AdditionalSiblings = AS;
MissingSiblings = MS;
}
}
and in another class i've declared a Dictonary Like
Dictionary<string, SampleClass> m_dictSample = new Dictionary<string, SampleClass>();
i need to fill this dictonary with the contents of the file ..
I'm using Xml Linq for this..
XDocument l_XDOC = XDocument.Load(Application.StartupPath + "\\Sample.xml");
m_dictSample = (from element in l_XDOC.Descendants("Child")
group element by element.Attribute("Name").Value into KeyGroup
select KeyGroup )
.ToDictionary(grp => grp.Key,
grp => new
SampleClass(grp.Attributes("Start").ToList()[0].Value.ToString(),
grp.Attributes("End").ToList()[0].Value.ToString(),
grp.Descendants("Sibling").Attributes("Name").Select(l_Temp => l_Temp.Value).ToList(),
grp.Descendants("AdditionalSibling").Attributes("Name").Select(l_Temp => l_Temp.Value).ToList(),
grp.Descendants("MissingSibling").Attributes("Name").Select(l_Temp => l_Temp.Value).ToList()));
This query is working properly for the above described file.
But if the file have more than one Element with same name, or an element with no "start" and "end" attribute make an exception while
executing the query.
I've problem with the following lines
grp.Attributes("Start").ToList()[0].Value.ToString(),
grp.Attributes("End").ToList()[0].Value.ToString()
please give me a better way to do this
And i need to fill a listView with the contents of the Dictonary like
S.no Child Siblings Additional Siblings Missing Siblings
1 FirstChild Test1,Test2 Add_Test_1,Add_Test_2 Miss_Test_1,Miss_Test_2
2 SecondChild Test3,Test4
3 ThirdChild Test5,Test6
now i'm using for loop for this
please give me a better way to do this..
You can't use a Dictionary you have to use a Lookup collection
Dictionaries require that each key has a 1-1 mapping to a unique value.
What you're looking for is a Lookup collection.
See this for more info.
Given the way that you have written your query I don't see why you'd be getting an exception if you had two Child elements with the same name. If anything their data would just get combined into a single key instance in your dictionary. To fix the exception you receive when a Start attribute doesn't exist just do a conditional operator test ? : to see if results were returned from your attribute query. That said the below code should work for you with the disclaimer that just because this works doesn't mean it's best practice. In many ways I am a LINQ neophyte myself.
Dictionary<string, SampleClass> dict =
(from element in xDoc.Descendants("Child")
group element by element.Attribute("Name").Value
into kGrp
select kGrp)
.ToDictionary(grp => grp.Key,
grp => new SampleClass
{
Start = grp.Attributes("Start").Count() > 0
? grp.Attributes("Start")
.ToList()[0].Value.ToString()
: String.Empty
,End = grp.Attributes("End").Count() > 0
? grp.Attributes("End")
.ToList()[0].Value.ToString()
: String.Empty
,Siblings =
grp.Descendants("Sibling")
.Attributes("Name")
.Select(l_Temp => l_Temp.Value).ToList()
,AdditionalSiblings =
grp.Descendants("AdditionalSibling")
.Attributes("Name")
.Select(l_Temp => l_Temp.Value).ToList()
,MissingSiblings =
grp.Descendants("MissingSibling")
.Attributes("Name")
.Select(l_Temp => l_Temp.Value).ToList()
});

XML : how to remove all nodes which have no attributes nor child elements

I have a xml document like this :
<Node1 attrib1="abc">
<node1_1>
<node1_1_1 attrib2 = "xyz" />
</ node1_1>
</Node1>
<Node2 />
Here <node2 /> is the node i want to remove since it has not children/elements nor any attributes.
Using an XPath expression it is possible to find all nodes that have no attributes or children. These can then be removed from the xml. As Sani points out, you might have to do this recursively because node_1_1 becomes empty if you remove its inner node.
var xmlDocument = new XmlDocument();
xmlDocument.LoadXml(
#"<Node1 attrib1=""abc"">
<node1_1>
<node1_1_1 />
</node1_1>
</Node1>
");
// select all nodes without attributes and without children
var nodes = xmlDocument.SelectNodes("//*[count(#*) = 0 and count(child::*) = 0]");
Console.WriteLine("Found {0} empty nodes", nodes.Count);
// now remove matched nodes from their parent
foreach(XmlNode node in nodes)
node.ParentNode.RemoveChild(node);
Console.WriteLine(xmlDocument.OuterXml);
Console.ReadLine();
Smething like this should do it:
XmlNodeList nodes = xmlDocument.GetElementsByTagName("Node1");
foreach(XmlNode node in nodes)
{
if(node.ChildNodes.Count == 0)
node.RemoveAll;
else
{
foreach (XmlNode n in node)
{
if(n.InnerText==String.Empty && n.Attributes.Count == 0)
{
n.RemoveAll;
}
}
}
}
To do this for all empty child nodes, use a for loop (not foreach) and in reverse order. I resolved it as:
var xmlDocument = new XmlDocument();
xmlDocument.LoadXml(#"<node1 attrib1=""abc"">
<node1_1>
<node1_1_1 />
</node1_1>
<node1_2 />
<node1_3 />
</node1>
<node2 />
");
RemoveEmptyNodes(xmlDocument );
private static bool RemoveEmptyNodes(XmlNode node)
{
if (node.HasChildNodes)
{
for(int I = node.ChildNodes.Count-1;I >= 0;I--)
if (RemoveEmptyNodes(node.ChildNodes[I]))
node.RemoveChild(node.ChildNodes[I]);
}
return
(node.Attributes == null ||
node.Attributes.Count == 0) &&
node.InnerText.Trim() == string.Empty;
}
The recursive calls (similarly to other solutions) eliminate the duplicated document processing of the xPath approach. More importantly the code is more readable and more readily editable. Win-Win.
So, this solution will remove <node2>, but also correctly removes <node1_2> and <node1_3>.
Update: Found a notable performance increase by using the following Linq implementation.
string myXml = #"<node1 attrib1=""abc"">
<node1_1>
<node1_1_1 />
</node1_1>
<node1_2 />
<node1_3 />
</node1>
<node2 />
");
XElement xElem = XElement.Parse(myXml);
RemoveEmptyNodes2(xElem);
private static void RemoveEmptyNodes2(XElement elem)
{
int cntElems = elem.Descendants().Count();
int cntPrev;
do
{
cntPrev = cntElems;
elem.Descendants()
.Where(e =>
string.IsNullOrEmpty(e.Value.Trim()) &&
!e.HasAttributes).Remove();
cntElems = elem.Descendants().Count();
} while (cntPrev != cntElems);
}
The loop handles cases where a parent needs to be removed because its only child was removed. Using the XContainer or derivatives tends to have similar performance increases due to the IEnumerable implementations behind the scenes. It's my new favorite thing.
On an arbitrary 68MB xml file RemoveEmptyNodes tends to take about 90sec, while RemoveEmptyNodes2 tends to take about 1sec.
This stylesheet uses an identity transform with an empty template matching elements without nodes or attributes, which will prevent them from being copied to the output:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!--Identity transform copies all items by default -->
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!--Empty template to match on elements without attributes or child nodes to prevent it from being copied to output -->
<xsl:template match="*[not(child::node() | #*)]"/>
</xsl:stylesheet>

Categories

Resources