Split XML document apart creating multiple output files from repeating elements - c#

I need to take an XML file and create multiple output xml files from the repeating nodes of the input file. The source file "AnimalBatch.xml" looks like this:
<?xml version="1.0" encoding="utf-8" ?>
<Animals>
<Animal id="1001">
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>
<Animal id="1002">
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>
<Animal id="1003">
<Quantity>Three</Quantity>
<Color>Blind</Color>
<Name>Mice</Name>
</Animal>
</Animals>
The program needs to split the repeating "Animal" and produce 3 files named: Animal_1001.xml, Animal_1002.xml, and Animal_1003.xml
Each output file should contain just their respective element (which will be the root). The id attribute from AnimalsBatch.xml will supply the sequence number for the Animal_xxxx.xml filenames. The id attribute does not need to be in the output files.
Animal_1001.xml:
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>
Animal_1002.xml
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>
Animal_1003.xml>
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Three</Quantity>
<Adjective>Blind</Adjective>
<Name>Mice</Name>
</Animal>
I want to do this with XmlDocument, since it needs to be able to run on .Net 2.0.
My program looks like this:
static void Main(string[] args)
{
string strFileName;
string strSeq;
XmlDocument doc = new XmlDocument();
doc.Load("D:\\Rick\\Computer\\XML\\AnimalBatch.xml");
XmlNodeList nl = doc.DocumentElement.SelectNodes("Animal");
foreach (XmlNode n in nl)
{
strSeq = n.Attributes["id"].Value;
XmlDocument outdoc = new XmlDocument();
XmlNode rootnode = outdoc.CreateNode("element", "Animal", "");
outdoc.AppendChild(rootnode); // Put the wrapper element into outdoc
outdoc.ImportNode(n, true); // place the node n into outdoc
outdoc.AppendChild(n); // This statement errors:
// "The node to be inserted is from a different document context."
strFileName = "Animal_" + strSeq + ".xml";
outdoc.Save(Console.Out);
Console.WriteLine();
}
Console.WriteLine("END OF PROGRAM: Press <ENTER>");
Console.ReadLine();
}
I think I have 2 problems.
A) After doing the ImportNode on node n into outdoc, I call outdoc.AppendChild(n) which complains: "The node to be inserted is from a different document context." I do not know if this is a scope issue referencing node n within the ForEach loop - or if I am somehow not using ImportNode() or AppendChild properly. 2nd argument on ImportNode() is set to true, because I want the child elements of Animal (3 fields arbitrarily named Quantity, Adjective, and Name) to end up in the destination file.
B) Second problem is getting the Animal element into outdoc. I'm getting '' but I need ' ' so I can place node n inside it. I think my problem is how I am doing: outdoc.AppendChild(rootnode);
To show the xml, I'm doing: outdoc.Save(Console.Out); I do have the code to save() to an output file - which does work, as long as I can get outdoc assembled properly.
There is a similar question at: Split XML in Multiple XML files, but I don't understand the solution code yet. I think I'm pretty close on this approach, and will appreciate any help you can provide.
I'm going to be doing this same task using XmlReader, since I'm going to need to be able to handle large input files, and I understand that XmlDocument reads the whole thing in and can cause memory issues.

That's a simple method that seems what you are looking for
public void test_xml_split()
{
XmlDocument doc = new XmlDocument();
doc.Load("C:\\animals.xml");
XmlDocument newXmlDoc = null;
foreach (XmlNode animalNode in doc.SelectNodes("//Animals/Animal"))
{
newXmlDoc = new XmlDocument();
var targetNode = newXmlDoc.ImportNode(animalNode, true);
newXmlDoc.AppendChild(targetNode);
newXmlDoc.Save(Console.Out);
Console.WriteLine();
}
}

This approach seems to work without using the "var targetnode" statement. It creates an XmlNode object called targetNode from outdoc's "Animal" element in the ForEach loop. I think the main things that were problems in my original code were: A) I was getting nodelist nl incorrectly. And B) I couldn't "Import" node n, I think because it was associated specifically with doc. It had to be created as its own Node.
The problem with the prior proposed solution was the use of the "var" keyword. My program has to assume 2.0 and that came in with v3.0. I like Rogers solution, in that it is concise. For me - I wanted to do each thing as a separate statement.
static void SplitXMLDocument()
{
string strFileName;
string strSeq;
XmlDocument doc = new XmlDocument(); // The input file
doc.Load("D:\\Rick\\Computer\\XML\\AnimalBatch.xml");
XmlNodeList nl = doc.DocumentElement.SelectNodes("//Animals/Animal");
foreach (XmlNode n in nl)
{
strSeq = n.Attributes["id"].Value; // Animal nodes have an id attribute
XmlDocument outdoc = new XmlDocument(); // Create the outdoc xml document
XmlNode targetNode = outdoc.CreateElement("Animal"); // Create a separate node to hold the Animal element
targetNode = outdoc.ImportNode(n, true); // Bring over that Animal
targetNode.Attributes.RemoveAll(); // Remove the id attribute in <Animal id="1001">
outdoc.ImportNode(targetNode, true); // place the node n into outdoc
outdoc.AppendChild(targetNode); // AppendChild to make it stick
strFileName = "Animal_" + strSeq + ".xml";
outdoc.Save(Console.Out); Console.WriteLine();
outdoc.Save("D:\\Rick\\Computer\\XML\\" + strFileName);
Console.WriteLine();
}
}

Related

Selecting Particular Node List in XML

<Report xmlns="Microsoft.SystemCenter.DataWarehouse.Report.Alert" xmlns:p1="w3.org/2001/XMLSchema-instance"; Name="Microsoft.SystemCenter.DataWarehouse.Report.Alert" p1:schemaLocation="Microsoft.SystemCenter.DataWarehou?Schema=True">
<Title>Alert Report</Title>
<Created>6/27/2013 9:32 PM</Created>
<StartDate>6/1/2013 9:29 PM</StartDate>
<EndDate>6/27/2013 9:29 PM</EndDate>
<TimeZone>(UTC)</TimeZone>
<Severity>Warning, Critical</Severity>
<Priority>Low, Medium, High</Priority>
<AlertTable>
<Alerts>
<Alert>
<AlertName></AlertName>
<Priority></Priority>
</Alert>
</Alerts>
</AlertTable>
</Report>
So I'm trying to pull down the list of nodes that appear under Alerts child. So /Report/AlertTable/Alerts.
I've done very similar before but in this format it is not working for some reason. Can someone point me out in the right direction?
XmlDocument Log = new XmlDocument();
Log.Load("test.xml");
XmlNodeList myLog = Log.DocumentElement.SelectNodes("//Report/AlertTable/Alerts");
foreach (XmlNode alert in myLog)
{
Console.Write("HERE");
Console.WriteLine(alert.SelectNodes("AlertName").ToString());
Console.WriteLine(alert.SelectNodes("Priority").ToString());
Console.Read();
}
EDIT:
One of the responses had me try to use a bunch of namespace with p1 but had no such luck.
EDIT:
Did not work either:
var name = new XmlNamespaceManager(log.NameTable);
name.AddNamespace("Report", "http://www.w3.org/2001/XMLSchema-instance");
XmlNodeList xml = log.SelectNodes("//Report:Alerts", name);
From a site:
nodename Selects all nodes with the name "nodename"
/ Selects from the root node
// Selects nodes in the document from the current node that match the selection no matter where they are
So I believe
"/AlertTable/Alerts"
would work, as that would be 'from the root node' as well as
"Report/AlertTable/Alerts"
XPath Site
Figured this sucker out.
It had to do with the namespace of "Microsoft.SystemCenter.DataWarehouse.Report.Alert". Changing this to anything but that won't read the XML properly.
XmlDocument log = new XmlDocument();
log.Load(#"C:\Users\barranca\Desktop\test.xml");
// XmlNodeList xml = log.SelectNodes("//ns1:Alerts");
var name = new XmlNamespaceManager(log.NameTable);
name.AddNamespace("ns1", "Microsoft.SystemCenter.DataWarehouse.Report.Alert");
XmlNodeList xml = log.SelectNodes("//ns1:Alert", name);
foreach (XmlNode alert in xml)
{
Console.Write("HERE");
XmlNode test = alert.SelectSingleNode("//ns1:AlertName",name);
string testing = test.InnerText;
Console.Write(testing);
}

Renaming Child Nodes in XML file using C#?

I am having a problem renaming child nodes in xml files using c#.
This is my xml file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<ZACAC01>
<IDOC BEGIN="1">
<ZACGPIAD SEGMENT="1">
<IDENTIFIER>D000</IDENTIFIER>
<CUST_DEL_NO/>
<CUST_DEL_DATE/>
<TRUCKNO/>
<DRIVERNAME/>
<DRIVERID/>
<RESPONS_OFF/>
<CONFIRM_DATE>20/01/13</CONFIRM_DATE>
<SERIAL_NO>2</SERIAL_NO>
<SERIAL_CHAR/>
<DEL_INFO1/>
<QTY>0</QTY>
<DEL_INFO2/>
<QTY>0</QTY>
<DEL_INFO3/>
<QTY>0</QTY>
<TRANS_COMPANY>0</TRANS_COMPANY>
</ZACGPIAD>
</IDOC>
</ZACAC01>
And below is my requirement:
<?xml version="1.0" encoding="ISO-8859-1"?>
<ZACAC01>
<IDOC BEGIN="1">
<ZACGPIADD SEGMENT="1">
<IDENTIFIER>D000</IDENTIFIER>
<CUST_DEL_NO/>
<CUST_DEL_DATE/>
<TRUCKNO/>
<DRIVERNAME/>
<DRIVERID/>
<RESPONS_OFF/>
<CONFIRM_DATE>20/01/13</CONFIRM_DATE>
<SERIAL_NO>2</SERIAL_NO>
<SERIAL_CHAR/>
<DEL_INFO1/>
<QTY1>0</QTY1>
<DEL_INFO2/>
<QTY2>0</QTY2>
<DEL_INFO3/>
<QTY3>0</QTY3>
<TRANS_COMPANY>0</TRANS_COMPANY>
</ZACGPIADD>
</IDOC>
</ZACAC01>
I am able to change the segment tag <ZACGPIAD> to this <ZACGPIADD> using the following code:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(srcfile);
var root = xmlDoc.GetElementsByTagName("IDOC")[0];
var oldElem = root.SelectSingleNode("ZACGPIAD");
var newElem = xmlDoc.CreateElement("ZACGPIADD");
root.ReplaceChild(newElem, oldElem);
while (oldElem.ChildNodes.Count != 0)
{
newElem.AppendChild(oldElem.ChildNodes[0]);
}
while (oldElem.Attributes.Count != 0)
{
newElem.Attributes.Append(oldElem.Attributes[0]);
}
xmlDoc.Save(desfile);
But I can't change the <QTY> tag to <QTY1>, <QTY2>, <QTY3>
How can I do this?
I think you have the answer right in your code. You can use SelectSingleNode to pull the first <QTY> element with this:
var qtyNode = root.SelectSingleNode("ZACAC01/IDOC/ZACGPIADD/QTY[1]")
then use ReplaceChild on it's parent node. Then do the same for the second and third <QTY> nodes, replacing the '1' with '2' and '3' respectively.
You can use XDocument and operate on XElements which expose setter for node name (so you can simply set new name instead of doing node replacements):
var doc = XDocument.Load(srcfile);
var zacgpidNode = doc.Descendants("ZACGPIAD").First();
zacgpidNode.Name = "ZACGPIADD";
// now rename all QTY nodes
var qtyNodes = zacgpidNode.Elements("QTY").ToArray();
for (int i = 0; i < qtyNodes.Length; i++)
{
qtyNodes[i].Name = string.Format("{0}{1}", qtyNodes[i].Name, i+1);
}
doc.Save(desfile);
Having Descendants("ZACGPIAD").First() might not be suitable if your document structure is different than what you've shown in example. You can use XPathSelectElement method to have more control over what you'll be extracting:
var node = doc.XPathSelectElement("//IDOC[#BEGIN='1']/ZACGPIAD[#SEGMENT='1']");

Read first root node from XML

I work with three kinds of XML files :
Type A:
<?xml version="1.0" encoding="UTF-8"?>
<nfeProc versao="2.00" xmlns="http://www.portalfiscal.inf.br/nfe">
</nfeProc>
Tyepe B:
<?xml version="1.0" encoding="UTF-8"?>
<cancCTe xmlns="http://www.portalfiscal.inf.br/cte" versao="1.04">
</cancCTe>
Type C:]
<?xml version="1.0" encoding="UTF-8"?>
<cteProc xmlns="http://www.portalfiscal.inf.br/cte" versao="1.04">
</cteProc>
I have try with this code to read the first node :
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(#"C:\crruopto\135120068964590_v01.04-procCTe.xml");
XmlNodeList ml = xmlDoc.GetElementsByTagName("*");
XmlElement root = xmlDoc.DocumentElement;
exti = root.ToString();
but dont return anything i want to read the first node , need to know if the file is nfeProc ,canCTE or cteProc
The second question is how i get the value from "value" in the same tag???
Thanks
From this post:
//Root node is the DocumentElement property of XmlDocument
XmlElement root = xmlDoc.DocumentElement
//If you only have the node, you can get the root node by
XmlElement root = xmlNode.OwnerDocument.DocumentElement
I would suggest using XPath. Here's an example where I read in the XML content from a locally stored string and select whatever the first node under the root is:
XmlDocument doc = new XmlDocument();
doc.Load(new StringReader(xml));
XmlNode node = doc.SelectSingleNode("(/*)");
If you aren't required to use the XmlDocument stuff, then Linq is your friend.
XDocument doc = XDocument.Load(#"C:\crruopto\135120068964590_v01.04-procCTe.xml");
XElement first = doc.GetDescendants().FirstOrDefault();
if(first != null)
{
//first.Name will be either nfeProc, canCTE or cteProc.
}
Working with Linq to XML is the newest and most powerful way of working with XML in .NET and offers you a lot more power and flexibility than things like XmlDocument and XmlNode.
Getting the root node is very simple:
XDocument doc = XDocument.Load(#"C:\crruopto\135120068964590_v01.04-procCTe.xml");
Console.WriteLine(doc.Root.Name.ToString());
Once you have constructed an XDocument you don't need to use any LINQ querying or special checking. You simply pull the Root property from the XDocument.
Thanks i have solved this way the first part
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(nomear);
XmlNodeList ml = xmlDoc.GetElementsByTagName("*");
XmlNode primer = xmlDoc.DocumentElement;
exti = primer.Name;
First, to be clear, you're asking about the root element, not the root node.
You can use an XmlReader to avoid having to load large documents completely into memory. See my answer to a how to find the root element at https://stackoverflow.com/a/60642354/1307074.
Second, once the reader is referencing the element, you can use the reader's Name property to get the qualified tag name of the element. You can get the value as a string using the Value property.

Keeping Redundant Namespace Prefixes On XML Elements in C#

I'm trying to write an XML file that will be picked up and parsed by another service. In order for this to happen the XML must be formatted in a very specific way, namely:
<?xml version="1.0"?>
<Feedbacks:Feedbacks xmlns:Feedbacks="Feedbacks">
<Feedbacks:Elements>
<Feedback:XMLFeedback xmlns:Feedback="Feedback">
<Feedback:MfgUnitID></Feedback:MfgUnitID>
<Feedback:MachineId></Feedback:MachineId>
<Feedback:OperationCode></Feedback:OperationCode>
<Feedback:ItemSeqNum></Feedback:ItemSeqNum>
<Feedback:OperDispositionCd></Feedback:OperDispositionCd>
<Feedback:ItemId></Feedback:ItemId>
<Feedback:ParentItemId></Feedback:ParentItemId>
<Feedback:ItemEndSize>1821</Feedback:ItemEndSize>
<Feedback:ItemDispositionCd></Feedback:ItemDispositionCd>
<Feedback:OperStartDate></Feedback:OperStartDate>
<Feedback:OperEndDate></Feedback:OperEndDate>
</Feedback:XMLFeedback>
</Feedbacks:Elements>
</Feedbacks:Feedbacks>
with data of course between the innermost elements. Here's the issue though, no matter what I do, I can't get any of the C# classes to keep the semicolons on the innermost nodes. As far as I know these need to stay, so is there a way in C# to force it to format the nodes this way? I've tried all of the create methods that I could find in the XMLDocument class. I can get the outer nodes formatted fine, but the inner ones just keep creating problems.
Edit, sorry here's the code that makes the inner nodes.
private void AppendFile(string filename, string[] headers, Dictionary<string, string> values)
{
XmlDocument doc = new XmlDocument();
doc.Load(filename);
XmlNode node = doc.GetElementsByTagName(headers[headers.Length - 2]).Item(0);
string[] hPieces = headers[headers.Length - 1].Split(':');
XmlElement appendee = doc.CreateElement(hPieces[0].Trim(), hPieces[1].Trim(), hPieces[0].Trim());
node.AppendChild(appendee);
foreach (KeyValuePair<string, string> pair in values)
{
string[] ePieces = pair.Key.Split(':');
//XmlElement element = doc.CreateElement(ePieces[0].Trim(), string.Empty, ePieces[1].Trim());
//XmlText text = doc.CreateTextNode(pair.Value);
XmlNode innerNode = doc.CreateNode(XmlNodeType.Element, ePieces[1].Trim(), ePieces[0].Trim());
node.InnerText = pair.Value;
// element.AppendChild(text);
appendee.AppendChild(innerNode);
}
doc.Save(filename);
}
The data for the inner nodes comes in as key value pairs in the dictionary. Where the keys contain the intended name.
Edit2: This is what the file output looks like
<?xml version="1.0" encoding="utf-8"?>
<Feedbacks:Feedbacks xmlns:Feedbacks="Feedbacks">
<Feedbacks:Elements>
<Feedback:XMLFeedback xmlns:Feedback="Feedback">
<MfgUnitID></MfgUnitID>
<MachineId></MachineId>
<OperationCode</OperationCode>
<ItemSeqNum></ItemSeqNum>
<OperDispositionCd></OperDispositionCd>
<ItemId></ItemId>
<ParentItemId></ParentItemId>
<ItemEndSize></ItemEndSize>
<ItemDispositionCd></ItemDispositionCd>
<OperStartDate></OperStartDate>
<OperEndDate></OperEndDate>
</Feedback:XMLFeedback>
</Feedbacks:Elements>
</Feedbacks:Feedbacks>
You can accompish this easily with XLinq:
using System.Xml.Linq;
XNamespace ns1 = "Feedbacks";
XNamespace ns2 = "Feedback";
var doc = new XElement("Feedbacks",
new XAttribute(XNamespace.Xmlns+"Feedbacks", ns1));
doc.Add(new XElement(ns1 + "Elements",
new XElement(ns2 + "Feedback",
new XAttribute(XNamespace.Xmlns+"Feedback", ns2),
new XElement(ns2 + "Unit"))));
Gives
<Feedbacks xmlns:Feedbacks="Feedbacks">
<Feedbacks:Elements>
<Feedback:Feedback xmlns:Feedback="Feedback">
<Feedback:Unit />
</Feedback:Feedback>
</Feedbacks:Elements>
</Feedbacks>
Although I believe that your own output should be valid XML, relying on the parent namespcae.

Parsing a large XML file to multiple output xmls, using XmlReader - getting every other element

I need to take a very large XML file and create multiple output xml files from what could be thousands of repeating nodes of the input file. There is no whitespace in the source file "AnimalBatch.xml" which looks like this:
<?xml version="1.0" encoding="utf-8" ?><Animals><Animal id="1001"><Quantity>One</Quantity><Adjective>Red</Adjective><Name>Rooster</Name></Animal><Animal id="1002"><Quantity>Two</Quantity><Adjective>Stubborn</Adjective><Name>Donkeys</Name></Animal><Animal id="1003"><Quantity>Three</Quantity><Adjective>Blind</Adjective><Name>Mice</Name></Animal><Animal id="1004"><Quantity>Four</Quantity><Adjective>Purple</Adjective><Name>Horses</Name></Animal><Animal id="1005"><Quantity>Five</Quantity><Adjective>Long</Adjective><Name>Centipedes</Name></Animal><Animal id="1006"><Quantity>Six</Quantity><Adjective>Dark</Adjective><Name>Owls</Name></Animal></Animals>
The program needs to split the repeating "Animal" and produce the appropriate number of files named: Animal_1001.xml, Animal_1002.xml, Animal_1003.xml, etc.
Animal_1001.xml:
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>
Animal_1002.xml
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>
Animal_1003.xml>
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Three</Quantity>
<Adjective>Blind</Adjective>
<Name>Mice</Name>
</Animal>
The code below works, but only if the input file has CR/LF after the <Animal id="xxxx"> elements. If it has no "whitespace" (I don't, and can't get it like that), I get every other one (the odd numbered animals)
static void SplitXMLReader()
{
string strFileName;
string strSeq = "";
XmlReader doc = XmlReader.Create("C:\\AnimalBatch.xml");
while (doc.Read())
{
if ( doc.Name == "Animal" && doc.NodeType == XmlNodeType.Element )
{
strSeq = doc.GetAttribute("id");
XmlDocument outdoc = new XmlDocument();
XmlDeclaration xmlDeclaration = outdoc.CreateXmlDeclaration("1.0", "utf-8", null);
XmlElement rootNode = outdoc.CreateElement(doc.Name);
rootNode.InnerXml = doc.ReadInnerXml();
// This seems to be advancing the cursor in doc too far.
outdoc.InsertBefore(xmlDeclaration, outdoc.DocumentElement);
outdoc.AppendChild(rootNode);
strFileName = "Animal_" + strSeq + ".xml";
outdoc.Save("C:\\" + strFileName);
}
}
}
My understanding is that "whitespace" or formatting in XML should make no difference to XmlReader - but I've tried this both ways, with and without CR/LF's after the <Animal id="xxxx">, and can confirm there is a difference. If it has CR/LFs (possibly even just a space, which I'll try next) - it gets each <Animal> node processed fully, and saved under the right filename that comes from the id attribute.
Can someone let me know what's going on here - and a possible workaround?
yes, when using the doc.readInnerXml() white space is important.
From the documentation of the function. This returns a string. so of course white space will matter. If you want the inner text as a xmlNode you should use something like this
Thanks for the guidance on using the ReadSubTree() method:
This code works for the XML input file with no linefeeds:
static void SplitXMLReaderSubTree()
{
string strFileName;
string strSeq = "";
XmlReader doc = XmlReader.Create("C:\\AnimalBatch.xml");
while (!doc.EOF)
{
if ( doc.Name == "Animal" && doc.NodeType == XmlNodeType.Element )
{
strSeq = doc.GetAttribute("id");
XmlReader inner = doc.ReadSubtree();
inner.Read();
XmlDocument outdoc = new XmlDocument();
XmlDeclaration xmlDeclaration = outdoc.CreateXmlDeclaration("1.0", "utf-8", null);
XmlElement myElement;
myElement = outdoc.CreateElement(doc.Name);
myElement.InnerXml = inner.ReadInnerXml();
inner.Close();
myElement.Attributes.RemoveAll();
outdoc.InsertBefore(xmlDeclaration, outdoc.DocumentElement);
outdoc.ImportNode(myElement, true);
outdoc.AppendChild(myElement);
strFileName = "Animal_" + strSeq + ".xml";
outdoc.Save("C:\\" + strFileName);
}
else
{
doc.Read();
}
}

Categories

Resources