split xml document into chunks - c#

I have a large xml document that needs to be processed 100 records at a time
It is being done within a Windows Service written in c#.
The structure is as follows :
<docket xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="docket.xsd">
<order>
<Date>2008-10-13</Date>
<orderNumber>050758023</orderNumber>
<ParcelID/>
<CustomerName>sddsf</CustomerName>
<DeliveryName>dsfd</DeliveryName>
<Address1>sdf</Address1>
<Address2>sdfsdd</Address2>
<Address3>sdfdsfdf</Address3>
<Address4>dffddf</Address4>
<PostCode/>
</order>
<order>
<Date>2008-10-13</Date>
<orderNumber>050758023</orderNumber>
<ParcelID/>
<CustomerName>sddsf</CustomerName>
<DeliveryName>dsfd</DeliveryName>
<Address1>sdf</Address1>
<Address2>sdfsdd</Address2>
<Address3>sdfdsfdf</Address3>
<Address4>dffddf</Address4>
<PostCode/>
</order>
.....
.....
</docket>
There could be thousands of orders in a docket.
I need to chop this into 100 element chunks
However each of the 100 orders still need to be wrapped with the parent "docket" node and have the same namespace etc
is this possible?

Another naive solution; this time for .NET 2.0. It should give you an idea of how to go about what you want. Uses Xpath expressions instead of Linq to XML. Chunks a 100 order docket into 10 dockets in under a second on my devbox.
public List<XmlDocument> ChunkDocket(XmlDocument docket, int chunkSize)
{
List<XmlDocument> newDockets = new List<XmlDocument>();
//
int orderCount = docket.SelectNodes("//docket/order").Count;
int chunkStart = 0;
XmlDocument newDocket = null;
XmlElement root = null;
XmlNodeList chunk = null;
while (chunkStart < orderCount)
{
newDocket = new XmlDocument();
root = newDocket.CreateElement("docket");
newDocket.AppendChild(root);
chunk = docket.SelectNodes(String.Format("//docket/order[position() > {0} and position() <= {1}]", chunkStart, chunkStart + chunkSize));
chunkStart += chunkSize;
XmlNode targetNode = null;
foreach (XmlNode c in chunk)
{
targetNode = newDocket.ImportNode(c, true);
root.AppendChild(targetNode);
}
newDockets.Add(newDocket);
}
return newDockets;
}

Naive, iterative, but works [EDIT: in .NET 3.5 only]
public List<XDocument> ChunkDocket(XDocument docket, int chunkSize)
{
var newDockets = new List<XDocument>();
var d = new XDocument(docket);
var orders = d.Root.Elements("order");
XDocument newDocket = null;
do
{
newDocket = new XDocument(new XElement("docket"));
var chunk = orders.Take(chunkSize);
newDocket.Root.Add(chunk);
chunk.Remove();
newDockets.Add(newDocket);
} while (orders.Any());
return newDockets;
}

If the reason to process 100 orders at a time is for performance purposes, e.g. taking too much time and resource to open a big file, You can utilize XmlReader to process order element one at a time without degrading the performance.
XmlReader reader = XmlReader.Create(#"c:\foo\Doket.xml")
while( reader.Read())
{
if(reader.LocalName == "order")
{
// read each child element and its value from the reader.
// or you can deserialize the order element by using a XmlSerializer and Order class
}
}

Related

Edit item in a dynamically created list of objects

I have a C# winforms application. In this application I am creating a list of CanMessage objects dynamically from an xml file.
The xml file is constructed as below
<message id="0x641" ecu="BCM" name="BODY9" dlc="8" cyclicrate="500">
<bytes>
<byte0>0x0</byte0>
<byte1>0x0</byte1>
<byte2>0x0</byte2>
<byte3>0x0</byte3>
<byte4>0x0</byte4>
<byte5>0x0</byte5>
<byte6>0x0</byte6>
<byte7>0x0</byte7>
</bytes>
The Canmessage object is constructed like this:
CanMessage(String name,String _EcuName, ulong id, ushort dlc,int[] bytearray , int cyclic)
{
this.Name = name;
this.EcuName = _EcuName;
this.Id = id;
this.Dlc = dlc;
this.Bytes = new int[dlc];
this.CyclicRate = cyclic;
int i = 0;
for(i = 0; i < dlc; i++)
{
this.Bytes[i] = bytearray[i];
}
}
Below is how I am building my Canmessage list:
public void BuildCanList()
{
try
{
XmlDocument xd = new XmlDocument();
xd.Load(XmlFile);
XmlNodeList nodelist = xd.SelectNodes("/messages/message");
foreach (XmlNode n in nodelist)
{
String name, ecuname;
ulong id;
ushort dlc;
int[] bytes = new int[Convert.ToInt32(n.Attributes.GetNamedItem("dlc").Value)];
int cyclic;
name = n.Attributes.GetNamedItem("name").Value.ToString();
id = (ulong)Convert.ToInt32(n.Attributes.GetNamedItem("id").Value, 16);
ecuname = n.Attributes.GetNamedItem("ecu").Value.ToString();
dlc = (ushort)Convert.ToByte(n.Attributes.GetNamedItem("dlc").Value);
cyclic = Convert.ToInt32(n.Attributes.GetNamedItem("cyclicrate").Value);
XmlNode sn = n.SelectSingleNode("bytes");
for (ushort i = 0; i < dlc; i++)
{
try
{
bytes[i] = Convert.ToInt32(sn.ChildNodes.Item(i).InnerText, 16);
}
catch(Exception e)
{
bytes[i] = 0x0;
Console.WriteLine(String.Format("Error Building can Message: {0}", e.ToString()));
}
}
CanMessage cm = new CanMessage(name, ecuname, id, dlc, bytes, cyclic);
CanList.Add(cm);
}
My list is being created with no issues. My question is after my list has been created I will sometimes need to do some bit manipulation on certain bytes of a certain Canmessage. How can I select a message from the list based on its name property and then edit certain bytes from that message? I know how to select a message from the list using lambda expression and linq. But I don't know how to then combine that select method with an edit and save method or if this is even the best way to go about doing this.
If i understand your problem statement correctly, you need to find a specific CanMessage in the List<CanMessage> and edit its properties to your liking. Since your CanMessage is an object, it's being accessed by reference and therefore your edits will get reflected everywhere you reference it from.
Consider the following:
{
var CanList = new List<CanMessage>(); // I am assuming this is what it is
BuildCanList(CanList);
var messagetoEdit = CanList.First(m => m.Name == "BODY9");
messagetoEdit.Bytes[1]= 0xff;
messagetoEdit.Bytes[2]= 0xff;
messagetoEdit.Bytes[3]= 0xff;
messagetoEdit.Bytes[4]= 0xff;
var newMessagetoEdit = CanList.First(m => m.Name == "BODY9"); // you will see that values have changed here
//in case you wanted to serialise the list back, heres a snippet on how you could do it, for more details see https://stackoverflow.com/questions/6161159/converting-xml-to-string-using-c-sharp
//this is just to prove a point that changing bytes has effect,
StringWriter sw = new StringWriter();
var serialiser = new XmlSerializer(typeof(List<CanMessage>));
serialiser.Serialize(sw, CanList);
sw.ToString();
}
I hope this clarifies

Getting: "Document would result in an invalid XML document" error

I'm trying to split up a xml document into multiple smaller documents. I want to pre define a badge size (max number of nodes / document) and then insert the data into it. There are 2 possible structures of my xml data:
<?xml version='1.0' encoding='UTF-8' ?>
<V2:EndInvoices">
<V2:EndInvoice>
</V2:EndInvoice>
...
</V2:EndInvoices>
<?xml version='1.0' encoding='UTF-8' ?>
<tls:AkontoGroup">
<tls:AkontoMember>
</tls:AkontoMember>
...
</tls:AkontoGroup>
Right now I'm focusing on only one case. Each rechnungen.ToArray()[i] element contains one of these EndInvoice elements. I was able to create 4 files with a input file of 20 invoices split by 5 (batchSize = 5), each file containing one EndInvoice. Then I moved the line batchRechnung.Add(rechnungen.ToArray()[i]); out of the if block, which now causes me the error.
public List<XDocument> createTemporaryXMLFiles(string pathToData, int batchSize)
{
List<XDocument> batchRechnungen = new List<XDocument>();
XDocument batchRechnung = new XDocument();
XElement dataSource = XElement.Load(pathToData);
IEnumerable<XElement> rechnungen = dataSource.Elements();
for(int i = 0; i < rechnungen.ToArray().Length; i++)
{
if (i == 0 || (i % batchSize) == 0)
{
batchRechnung = new XDocument();
batchRechnungen.Add(batchRechnung);
}
batchRechnung.Add(rechnungen.ToArray()[i]);
}
return batchRechnungen;
}
How can I get correct xml files, each containting
<V2:EndInvoices">
batchSize x (<V2:EndInvoice></V2:EndInvoice>)
</V2:EndInvoices>
You cannot add multiple root elements to XDocument. And that's exactly what happens when you write batchRechnung.Add. Therefore, you must add the root element first. And then add elements to it.
public List<XDocument> createTemporaryXMLFiles(string pathToData, int batchSize)
{
List<XDocument> batchRechnungen = new List<XDocument>();
XElement dataSource = XElement.Load(pathToData);
XDocument batchRechnung = new XDocument(new XElement(dataSource.Name));
var rechnungen = dataSource.Elements().ToArray();
for (int i = 0; i < rechnungen.Length; i++)
{
if (i == 0 || (i % batchSize) == 0)
{
batchRechnung = new XDocument(new XElement(dataSource.Name));
batchRechnungen.Add(batchRechnung);
}
batchRechnung.Root.Add(rechnungen[i]);
}
return batchRechnungen;
}

Displaying the name of each module stored in the array ina listBox

I'm trying to display the modules names from the array to the listBox but I'm getting a "NullReferenceException was unhandled" error.
modules.xml
<?xml version="1.0" encoding="utf-8" ?>
<Modules>
<Module>
<MCode>3SFE504</MCode>
<MName>Algorithms and Data Structures</MName>
<MCapacity>5</MCapacity>
<MSemester>1</MSemester>
<MPrerequisite>None</MPrerequisite>
<MLectureSlot>0</MLectureSlot>
<MTutorialSlot>1</MTutorialSlot>
</Module>
</Modules>
form1.cs
Modules[] modules = new Modules[16];
Modules[] pickedModules = new Modules[8];
int modulecounter = 0, moduleDetailCounter = 0;
while (textReader.Read())
{
XmlNodeType nType1 = textReader.NodeType;
if ((nType1 != XmlNodeType.EndElement) && (textReader.Name == "ModuleList"))
{
// ls_modules_list.Items.Add("MODULE");
Modules m = new Modules();
while (textReader2.Read()) //While reader 2 reads the next 7 TEXT items
{
XmlNodeType nType2 = textReader2.NodeType;
if (nType2 == XmlNodeType.Text)
{
if (moduleDetailCounter == 0)
m.MCode = textReader2.Value;
if (moduleDetailCounter == 1)
m.MName = textReader2.Value;
if (moduleDetailCounter == 2)
m.MCapacity = textReader2.Value;
if (moduleDetailCounter == 3)
m.MSemester = textReader2.Value;
if (moduleDetailCounter == 4)
m.MPrerequisite = textReader2.Value;
if (moduleDetailCounter == 5)
m.MLectureSlot = textReader2.Value;
if (moduleDetailCounter == 6)
m.MTutorialSlot = textReader2.Value;
// ls_modules_list.Items.Add(reader2.Value);
moduleDetailCounter++;
}
if (moduleDetailCounter == 7) { moduleDetailCounter = 0; break; }
}
modules[modulecounter] = m;
modulecounter++;
}
}
for (int i = 0; i < modules.Length; i++)
{
ModulesListBox.Items.Add(modules[i].MName); // THE ERROR APPEARS HERE
}
}
I'm getting that error on the line which is marked with // THE ERROR APPEARS HERE.
Either ModulesListBox is null because you're accessing it before it is initialized or the modules array contains empty elements.
Like one of the commenters said, you're probably better off using XmlSerializer to handle loading the XML into the collection of modules. If that's not possible, change modules to a List<Modules> instead.
You initialize your modules array to be 16 in length and you load it with the modulecounter, but in the loop use the array length. Instead use the modulecounter variable to limit the loop, like this:
for (int i = 0; i < modulecounter; i++)
{
ModulesListBox.Items.Add(modules[i].MName);
}
Your array is null for every value modulecounter and up. That is why the error.
the for loop runs from 0 to 16 but modules is only 0 to 15, change modules.length to (modules.length -1)
Almost positive the issue is somewhere with your deserialization logic. One could debug it, but why reinvent the wheel?
var serializer = new XmlSerializer(typeof(List<Module>), new XmlRootAttribute("Modules"));
using (var reader = new StreamReader(workingDir + #"\ModuleList.xml"))
var modules = (List<Module>)serializer.Deserialize(reader);
this would give a nice complete collection of Modules assuming it was defined as
public class Module
{
public string MCode;
public string MName;
public int MCapacity;
public int MSemester;
public string MPrerequisite;
public int MLectureSlot;
public int MTutorialSlot;
}
If you have no problems with memory (i.e: the file is usually not too large), then I suggest not to use XmlTextReader and using XmlDocument instead:
XmlDocument d = new XmlDocument();
d.Load(#"FileNameAndDirectory");
XmlNodeList list = d.SelectNodes("/Modules/Module/MName");
foreach (XmlNode node in list)
{
// Whatsoever
}
The code above should extract every MName node for you and put them all in list, use it for good :)

Which is the best for performance wise: XPathNavigator with XPath vs Linq to Xml with query?

I have an application in which I am using XPathNavigator to iterate nodes. It is working fine.
But I want to know that if I use LINQ to Xml....
What benefits(Performance, maintainability) I will get?
With XPath, LINQ to Xml what is the performance hit?
I am using C#.net, VS 2010 and my .xml is mid size.
Just to add onto what has already been stated here, overall performance seems to depends on what you are actually doing with the document in question. This is what I have concluded based on a simple experimental run comparing parsing performance between XElement to XPathNavigator.
If you are selecting nodes, traversing these nodes and reading some attribute values:
XElement.Element is faster than XElement.CreateNavigator.Select by an
approximate factor of 1.5.
XElement.CreateNavigator.Select is faster
than XPathNavigator.Select by an approximate factor of 0.5.
XPathNavigator.Select is faster than XElement.XPathSelectElement by
an approximate factor of 0.5.
On the other hand, if you are also reading the value of each node's children it gets a little interesting:
XElement.Element is faster than XElement.XPathSelectElements by an approximate factor of 0.5.
XElement.XPathSelectElement is faster than XPathNavigator.Select by an approximate factor of 3.
XPathNavigator.Select is faster than XElement.CreateNavigator.Select by an approximate factor of 0.5.
These conclusions are based on the following code:
[Test]
public void CompareXPathNavigatorToXPathSelectElement()
{
var max = 100000;
Stopwatch watch = new Stopwatch();
watch.Start();
bool parseChildNodeValues = true;
ParseUsingXPathNavigatorSelect(max, watch, parseChildNodeValues);
ParseUsingXElementElements(watch, max, parseChildNodeValues);
ParseUsingXElementXPathSelect(watch, max, parseChildNodeValues);
ParseUsingXPathNavigatorFromXElement(watch, max, parseChildNodeValues);
}
private static void ParseUsingXPathNavigatorSelect(int max, Stopwatch watch, bool parseChildNodeValues)
{
var document = new XPathDocument(#"data\books.xml");
var navigator = document.CreateNavigator();
for (var i = 0; i < max; i++)
{
var books = navigator.Select("/catalog/book");
while (books.MoveNext())
{
var location = books.Current;
var book = new Book();
book.Id = location.GetAttribute("id", "");
if (!parseChildNodeValues) continue;
book.Title = location.SelectSingleNode("title").Value;
book.Genre = location.SelectSingleNode("genre").Value;
book.Price = location.SelectSingleNode("price").Value;
book.PublishDate = location.SelectSingleNode("publish_date").Value;
book.Author = location.SelectSingleNode("author").Value;
}
}
watch.Stop();
Console.WriteLine("Time using XPathNavigator.Select = " + watch.ElapsedMilliseconds);
}
private static void ParseUsingXElementElements(Stopwatch watch, int max, bool parseChildNodeValues)
{
watch.Restart();
var element = XElement.Load(#"data\books.xml");
for (var i = 0; i < max; i++)
{
var books = element.Elements("book");
foreach (var xElement in books)
{
var book = new Book();
book.Id = xElement.Attribute("id").Value;
if (!parseChildNodeValues) continue;
book.Title = xElement.Element("title").Value;
book.Genre = xElement.Element("genre").Value;
book.Price = xElement.Element("price").Value;
book.PublishDate = xElement.Element("publish_date").Value;
book.Author = xElement.Element("author").Value;
}
}
watch.Stop();
Console.WriteLine("Time using XElement.Elements = " + watch.ElapsedMilliseconds);
}
private static void ParseUsingXElementXPathSelect(Stopwatch watch, int max, bool parseChildNodeValues)
{
XElement element;
watch.Restart();
element = XElement.Load(#"data\books.xml");
for (var i = 0; i < max; i++)
{
var books = element.XPathSelectElements("book");
foreach (var xElement in books)
{
var book = new Book();
book.Id = xElement.Attribute("id").Value;
if (!parseChildNodeValues) continue;
book.Title = xElement.Element("title").Value;
book.Genre = xElement.Element("genre").Value;
book.Price = xElement.Element("price").Value;
book.PublishDate = xElement.Element("publish_date").Value;
book.Author = xElement.Element("author").Value;
}
}
watch.Stop();
Console.WriteLine("Time using XElement.XpathSelectElement = " + watch.ElapsedMilliseconds);
}
private static void ParseUsingXPathNavigatorFromXElement(Stopwatch watch, int max, bool parseChildNodeValues)
{
XElement element;
watch.Restart();
element = XElement.Load(#"data\books.xml");
for (var i = 0; i < max; i++)
{
// now we can use an XPath expression
var books = element.CreateNavigator().Select("book");
while (books.MoveNext())
{
var location = books.Current;
var book = new Book();
book.Id = location.GetAttribute("id", "");
if (!parseChildNodeValues) continue;
book.Title = location.SelectSingleNode("title").Value;
book.Genre = location.SelectSingleNode("genre").Value;
book.Price = location.SelectSingleNode("price").Value;
book.PublishDate = location.SelectSingleNode("publish_date").Value;
book.Author = location.SelectSingleNode("author").Value;
}
}
watch.Stop();
Console.WriteLine("Time using XElement.Createnavigator.Select = " + watch.ElapsedMilliseconds);
}
with books.xml downloaded from here
Overall, it looks like the XElement parsing API, excluding the XPath extensions, gives you the best performance, while also easier to use, if your document is somewhat flat. If you have deep nested structures where you have to do something like
XElement.Element("book").Element("author").Element("firstname").SomethingElse()
then XElement.XPathSelectElement may provide the best compromise between performance and code maintainability.
Well, XPathNavigator will generally be faster than Linq to XML queries. But there's always 'but'.
Linq to XML will definitely make your code more readable and maintainable. It's easier (at least for me) to read linq query then analyze XPath. Also - you will get intellisense when writing query which will help to make your code correct. Linq to XML also gives you possibility to easily modify data, if that's what you need. XPathNavigator gives you readonly access.
On the other hand, if you really need top performance, XPathNavigator is probably the way to go. It simply depends on your current scenario and what you're trying to accomplish. If performance is not an issue (XML file is rather small, you won't make many requests to this file and so on) you can easily go with Linq to XML. Otherwise stick close to XPathNavigator.

Converting XML to String[] help

I'm having Trouble converting the contents of XML document to an int[] or string[]
I'm saving the x and y coordinates of 20 different picture boxes on the screen (For a jigsaw Puzzle Program) to an xml file, and are now trying to load the saved coordinates and update the jigsaw puzzle pieces to those saved locations.
Heres my code:
XmlWriter XmlWriter1;
XmlReader XmlReader1;
private void Form1_Load(object sender, EventArgs e)
{
//-------------------------------------------------
//Load Events
//-------------------------------------------------
SavedPositions = new int[40];
}
//-------------------------------------------------------
//Saves The Current Tile Locations To A Hidden XML File
//-------------------------------------------------------
public void SavePicPositionsXML()
{
using (XmlWriter1 = XmlWriter.Create("SavedPuzzle.xml"))
{
XmlWriter1.WriteStartDocument();
XmlWriter1.WriteStartElement("MTiles");
for (int i = 0; i < JigsawImgCount; i++)
{
XmlWriter1.WriteStartElement("Tile");
XmlWriter1.WriteElementString("X",Convert.ToString(MTiles[i].Pic.Location.X));
XmlWriter1.WriteElementString("Y",Convert.ToString(MTiles[i].Pic.Location.Y));
XmlWriter1.WriteEndElement();
}
XmlWriter1.WriteEndElement();
XmlWriter1.WriteEndDocument();
}
}
//---------------------------------------------------------------
//Reads Text From A Hidden Xml File & Adds It To A String Array
//---------------------------------------------------------------
private int ReadXmlFile(int Z)
{
XmlReader1 = XmlReader.Create("SavedPuzzle.xml");
XmlReader1.MoveToContent();
while (XmlReader1.Read())
{
}
// SavedPositions[B] = Convert.ToInt32(XmlReader1.Value.ToString());
return SavedPositions[Z];
}
//-------------------------------------------------
//Loads Saved Tile Positions From A Hidden Xml File
//-------------------------------------------------
private void LoadPositionsXML()
{
G = 0;
for (int i = 0; i < JigsawImgCount; i++)
{
LineX = ReadXmlFile(G);
LineY = ReadXmlFile(G + 1);
MTiles[i].Pic.Location = new Point(LineX, LineY);
G = G + 2;
}
}
What am i doing wrong???
Your ReadXmlFile method isn't doing anything really.
Consider using XmlDocument or XDocument instead of XmlWriter and XmlReader. They are a lot easier to handle.
try this:
XmlDocument document = new XmlDocument();
document.Load(#"D:/SavedPuzzle.xml");
XmlNode topNode = document.GetElementsByTagName("MTiles")[0];
foreach (XmlNode node in topNode.ChildNodes)
{
int X = Int32.Parse(node.ChildNodes[0].InnerText);
int Y = Int32.Parse(node.ChildNodes[1].InnerText);
}
The following LinqToXML statement will extract all tiles into a list in the order they are stored in the document.
I'm assuming an XML file that looks like this:
<xml>
<MTiles>
<Tile>
<X>1</X>
<Y>10</Y>
</Tile>
<Tile>
<X>2</X>
<Y>20</Y>
</Tile>
<Tile>
<X>3</X>
<Y>30</Y>
</Tile>
<Tile>
<X>4</X>
<Y>40</Y>
</Tile>
</MTiles>
</xml>
And this code will load it, and extract all the tiles into an enumerable list. Remember to put a using System.Xml.Linq at the top of the file and build against a recent enough framework (IIRC, it was introduced in .NET 3.5)
XDocument doc = XDocument.Load(/* path to the file, or use an existing reader */);
var tiles = from tile in doc.Descendants("Tile")
select new
{
X = (int)tile.Element("X"),
Y = (int)tile.Element("Y"),
};
foreach (var tile in tiles)
{
Console.WriteLine("Tile: x={0}, y={1}", tile.X, tile.Y);
}
The output from the code above using the XML file I specified is:
Tile: x=1, y=10
Tile: x=2, y=20
Tile: x=3, y=30
Tile: x=4, y=40
EDIT:
If you just want all the X-values as an array of integers, the following LINQ query would work:
int[] allXValues = (from tile in doc.Descendants("Tile")
select (int)tile.Element("X")).ToArray();

Categories

Resources