I'm trying to split up a xml document into multiple smaller documents. I want to pre define a badge size (max number of nodes / document) and then insert the data into it. There are 2 possible structures of my xml data:
<?xml version='1.0' encoding='UTF-8' ?>
<V2:EndInvoices">
<V2:EndInvoice>
</V2:EndInvoice>
...
</V2:EndInvoices>
<?xml version='1.0' encoding='UTF-8' ?>
<tls:AkontoGroup">
<tls:AkontoMember>
</tls:AkontoMember>
...
</tls:AkontoGroup>
Right now I'm focusing on only one case. Each rechnungen.ToArray()[i] element contains one of these EndInvoice elements. I was able to create 4 files with a input file of 20 invoices split by 5 (batchSize = 5), each file containing one EndInvoice. Then I moved the line batchRechnung.Add(rechnungen.ToArray()[i]); out of the if block, which now causes me the error.
public List<XDocument> createTemporaryXMLFiles(string pathToData, int batchSize)
{
List<XDocument> batchRechnungen = new List<XDocument>();
XDocument batchRechnung = new XDocument();
XElement dataSource = XElement.Load(pathToData);
IEnumerable<XElement> rechnungen = dataSource.Elements();
for(int i = 0; i < rechnungen.ToArray().Length; i++)
{
if (i == 0 || (i % batchSize) == 0)
{
batchRechnung = new XDocument();
batchRechnungen.Add(batchRechnung);
}
batchRechnung.Add(rechnungen.ToArray()[i]);
}
return batchRechnungen;
}
How can I get correct xml files, each containting
<V2:EndInvoices">
batchSize x (<V2:EndInvoice></V2:EndInvoice>)
</V2:EndInvoices>
You cannot add multiple root elements to XDocument. And that's exactly what happens when you write batchRechnung.Add. Therefore, you must add the root element first. And then add elements to it.
public List<XDocument> createTemporaryXMLFiles(string pathToData, int batchSize)
{
List<XDocument> batchRechnungen = new List<XDocument>();
XElement dataSource = XElement.Load(pathToData);
XDocument batchRechnung = new XDocument(new XElement(dataSource.Name));
var rechnungen = dataSource.Elements().ToArray();
for (int i = 0; i < rechnungen.Length; i++)
{
if (i == 0 || (i % batchSize) == 0)
{
batchRechnung = new XDocument(new XElement(dataSource.Name));
batchRechnungen.Add(batchRechnung);
}
batchRechnung.Root.Add(rechnungen[i]);
}
return batchRechnungen;
}
Related
I've got xml files looking like this
<?xml version="1.0" encoding="UTF-8"?>
<books>
<book id="101">3.1256
<auth-name>Idris Polk</auth-name>
<auth id="a1">The author is a Professor of Physics at MIT</auth>
<ph ll="p1">336451234</ph> <ph ll="p2">336051294</ph> <mail>IP.00#yandex.com</mail> <ph ll="p3">336133291</ph>
</book>
<book id="105">4.2250
<auth-name>Andre Olo</auth-name>
<auth id="a2">The research fellow at NSF</auth>
<ph ll="p101">336200316</ph>, <ph ll="p102">336151093</ph>, <ph ll="p103">336151094</ph>, <mail>An.olo#yandex.com</mail> <ph ll="p111">336900336</ph>, <ph ll="p112">336154094</ph>, <ph ll="p113">336151098</ph>, <mail>ano_ano#yandex.com</mail>
</book>
<ebook id="1">4.2350
<auth-name>John Bart</auth-name>
<auth id="ae1">The research fellow at Caltech</auth>
<ph ll="p50">336200313</ph>, <ph ll="p51">336151090</ph>, <ph ll="p52">336851091</ph>, <ph ll="p53">336151097</ph>, <mail>bart.j#yandex.com</mail> <ph ll="p111">336000311</ph>, <ph ll="p112">336224094</ph>
</ebook>
...
</books>
How do I get the nodes ph with attribute ll of a particular parent node to a collection when there are more than 2 of the nodes ph which are either separated by a whitespace or separated by a comma and a whitespace? If any other character/node(or any type of string) falls between one ph node and the next ph node then that will not be taken in the collection. e.x. if a <book id="..."> node contains ph nodes in the fashion <ph ll="1">...</ph> <ph ll="2">...</ph> <mail>...<mail> <ph ll="3">...</ph> then it won't be added to the collection, however if they are in the order <ph ll="1">...</ph> <ph ll="2">...</ph> <ph ll="3">...</ph> <mail>...<mail> then <ph ll="1">...</ph> <ph ll="2">...</ph> <ph ll="3">...</ph> should be added as a single element to the collection as there are more than 2 ph nodes only separated by a whitespace in a given parent node..
Obviously a simple
var cls=doc.Descendants("ph")
.Where(Attribute("ll"));
won't do. Can anyone help?
Try code below. I used xml linq along with a help method. :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XDocument doc = XDocument.Load(FILENAME);
var books = doc.Descendants("books").Elements().Select(x => new { book = x, sequence = TestChildren(x) }).Where(x => x.sequence != null).ToList();
string results = string.Join("\n", books.SelectMany(x => x.sequence).Select((x, i) => (i + 1).ToString() + ") " + string.Join("", x.Select(y => y.ToString()))));
Console.WriteLine(results);
Console.ReadLine();
}
static List<List<XElement>> TestChildren(XElement book)
{
List<List<XElement>> results = null;
List<XElement> children = book.Elements().ToList();
// get lls, make -1 if not ph
List<int> lls = children.Select(x => x.Name.LocalName != "ph" ? -1 : int.Parse(((string)x.Attribute("ll")).Substring(1))).ToList();
//check for 3 in a row incrementing
int startIndex = -1;
int numberInSequence = 0;
for (int i = 0; i < lls.Count() - 3; i++)
{
//test for 3 in a row
if ((lls[i] + 1 == lls[i + 1]) && (lls[i] + 2 == lls[i + 2]))
{
//if first sequency found set start index and lenght to 3
if (startIndex == -1)
{
startIndex = i;
numberInSequence = 3;
}
else
{
//increase length if more than 3
numberInSequence++;
}
}
else
{
//if a sequence has been found add to results
if (numberInSequence >= 3)
{
List<XElement> sequence = new List<XElement>(children.Skip(startIndex).Take(numberInSequence).ToList());
if (results == null) results = new List<List<XElement>>();
results.Add(sequence);
startIndex = -1;
numberInSequence = 0;
}
}
}
if (numberInSequence >= 3)
{
List<XElement> sequence = new List<XElement>(children.Skip(startIndex).Take(numberInSequence).ToList());
if (results == null) results = new List<List<XElement>>();
results.Add(sequence);
}
return results;
}
}
}
I'm trying to display the modules names from the array to the listBox but I'm getting a "NullReferenceException was unhandled" error.
modules.xml
<?xml version="1.0" encoding="utf-8" ?>
<Modules>
<Module>
<MCode>3SFE504</MCode>
<MName>Algorithms and Data Structures</MName>
<MCapacity>5</MCapacity>
<MSemester>1</MSemester>
<MPrerequisite>None</MPrerequisite>
<MLectureSlot>0</MLectureSlot>
<MTutorialSlot>1</MTutorialSlot>
</Module>
</Modules>
form1.cs
Modules[] modules = new Modules[16];
Modules[] pickedModules = new Modules[8];
int modulecounter = 0, moduleDetailCounter = 0;
while (textReader.Read())
{
XmlNodeType nType1 = textReader.NodeType;
if ((nType1 != XmlNodeType.EndElement) && (textReader.Name == "ModuleList"))
{
// ls_modules_list.Items.Add("MODULE");
Modules m = new Modules();
while (textReader2.Read()) //While reader 2 reads the next 7 TEXT items
{
XmlNodeType nType2 = textReader2.NodeType;
if (nType2 == XmlNodeType.Text)
{
if (moduleDetailCounter == 0)
m.MCode = textReader2.Value;
if (moduleDetailCounter == 1)
m.MName = textReader2.Value;
if (moduleDetailCounter == 2)
m.MCapacity = textReader2.Value;
if (moduleDetailCounter == 3)
m.MSemester = textReader2.Value;
if (moduleDetailCounter == 4)
m.MPrerequisite = textReader2.Value;
if (moduleDetailCounter == 5)
m.MLectureSlot = textReader2.Value;
if (moduleDetailCounter == 6)
m.MTutorialSlot = textReader2.Value;
// ls_modules_list.Items.Add(reader2.Value);
moduleDetailCounter++;
}
if (moduleDetailCounter == 7) { moduleDetailCounter = 0; break; }
}
modules[modulecounter] = m;
modulecounter++;
}
}
for (int i = 0; i < modules.Length; i++)
{
ModulesListBox.Items.Add(modules[i].MName); // THE ERROR APPEARS HERE
}
}
I'm getting that error on the line which is marked with // THE ERROR APPEARS HERE.
Either ModulesListBox is null because you're accessing it before it is initialized or the modules array contains empty elements.
Like one of the commenters said, you're probably better off using XmlSerializer to handle loading the XML into the collection of modules. If that's not possible, change modules to a List<Modules> instead.
You initialize your modules array to be 16 in length and you load it with the modulecounter, but in the loop use the array length. Instead use the modulecounter variable to limit the loop, like this:
for (int i = 0; i < modulecounter; i++)
{
ModulesListBox.Items.Add(modules[i].MName);
}
Your array is null for every value modulecounter and up. That is why the error.
the for loop runs from 0 to 16 but modules is only 0 to 15, change modules.length to (modules.length -1)
Almost positive the issue is somewhere with your deserialization logic. One could debug it, but why reinvent the wheel?
var serializer = new XmlSerializer(typeof(List<Module>), new XmlRootAttribute("Modules"));
using (var reader = new StreamReader(workingDir + #"\ModuleList.xml"))
var modules = (List<Module>)serializer.Deserialize(reader);
this would give a nice complete collection of Modules assuming it was defined as
public class Module
{
public string MCode;
public string MName;
public int MCapacity;
public int MSemester;
public string MPrerequisite;
public int MLectureSlot;
public int MTutorialSlot;
}
If you have no problems with memory (i.e: the file is usually not too large), then I suggest not to use XmlTextReader and using XmlDocument instead:
XmlDocument d = new XmlDocument();
d.Load(#"FileNameAndDirectory");
XmlNodeList list = d.SelectNodes("/Modules/Module/MName");
foreach (XmlNode node in list)
{
// Whatsoever
}
The code above should extract every MName node for you and put them all in list, use it for good :)
I'm having Trouble converting the contents of XML document to an int[] or string[]
I'm saving the x and y coordinates of 20 different picture boxes on the screen (For a jigsaw Puzzle Program) to an xml file, and are now trying to load the saved coordinates and update the jigsaw puzzle pieces to those saved locations.
Heres my code:
XmlWriter XmlWriter1;
XmlReader XmlReader1;
private void Form1_Load(object sender, EventArgs e)
{
//-------------------------------------------------
//Load Events
//-------------------------------------------------
SavedPositions = new int[40];
}
//-------------------------------------------------------
//Saves The Current Tile Locations To A Hidden XML File
//-------------------------------------------------------
public void SavePicPositionsXML()
{
using (XmlWriter1 = XmlWriter.Create("SavedPuzzle.xml"))
{
XmlWriter1.WriteStartDocument();
XmlWriter1.WriteStartElement("MTiles");
for (int i = 0; i < JigsawImgCount; i++)
{
XmlWriter1.WriteStartElement("Tile");
XmlWriter1.WriteElementString("X",Convert.ToString(MTiles[i].Pic.Location.X));
XmlWriter1.WriteElementString("Y",Convert.ToString(MTiles[i].Pic.Location.Y));
XmlWriter1.WriteEndElement();
}
XmlWriter1.WriteEndElement();
XmlWriter1.WriteEndDocument();
}
}
//---------------------------------------------------------------
//Reads Text From A Hidden Xml File & Adds It To A String Array
//---------------------------------------------------------------
private int ReadXmlFile(int Z)
{
XmlReader1 = XmlReader.Create("SavedPuzzle.xml");
XmlReader1.MoveToContent();
while (XmlReader1.Read())
{
}
// SavedPositions[B] = Convert.ToInt32(XmlReader1.Value.ToString());
return SavedPositions[Z];
}
//-------------------------------------------------
//Loads Saved Tile Positions From A Hidden Xml File
//-------------------------------------------------
private void LoadPositionsXML()
{
G = 0;
for (int i = 0; i < JigsawImgCount; i++)
{
LineX = ReadXmlFile(G);
LineY = ReadXmlFile(G + 1);
MTiles[i].Pic.Location = new Point(LineX, LineY);
G = G + 2;
}
}
What am i doing wrong???
Your ReadXmlFile method isn't doing anything really.
Consider using XmlDocument or XDocument instead of XmlWriter and XmlReader. They are a lot easier to handle.
try this:
XmlDocument document = new XmlDocument();
document.Load(#"D:/SavedPuzzle.xml");
XmlNode topNode = document.GetElementsByTagName("MTiles")[0];
foreach (XmlNode node in topNode.ChildNodes)
{
int X = Int32.Parse(node.ChildNodes[0].InnerText);
int Y = Int32.Parse(node.ChildNodes[1].InnerText);
}
The following LinqToXML statement will extract all tiles into a list in the order they are stored in the document.
I'm assuming an XML file that looks like this:
<xml>
<MTiles>
<Tile>
<X>1</X>
<Y>10</Y>
</Tile>
<Tile>
<X>2</X>
<Y>20</Y>
</Tile>
<Tile>
<X>3</X>
<Y>30</Y>
</Tile>
<Tile>
<X>4</X>
<Y>40</Y>
</Tile>
</MTiles>
</xml>
And this code will load it, and extract all the tiles into an enumerable list. Remember to put a using System.Xml.Linq at the top of the file and build against a recent enough framework (IIRC, it was introduced in .NET 3.5)
XDocument doc = XDocument.Load(/* path to the file, or use an existing reader */);
var tiles = from tile in doc.Descendants("Tile")
select new
{
X = (int)tile.Element("X"),
Y = (int)tile.Element("Y"),
};
foreach (var tile in tiles)
{
Console.WriteLine("Tile: x={0}, y={1}", tile.X, tile.Y);
}
The output from the code above using the XML file I specified is:
Tile: x=1, y=10
Tile: x=2, y=20
Tile: x=3, y=30
Tile: x=4, y=40
EDIT:
If you just want all the X-values as an array of integers, the following LINQ query would work:
int[] allXValues = (from tile in doc.Descendants("Tile")
select (int)tile.Element("X")).ToArray();
I believe I have found a weird bug as follow:
I want to delete the first two nodes in an XmlNodeList.
I know that there may be other ways of doing this (there surely are) but it is the reason why one of the code segments works and one doesn't (the difference being the Count line) that I am interested in.
var strXml = #"<food><fruit type=""apple""/><fruit type=""pear""/><fruit type=""banana""/></food>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(strXml);
XmlNodeList nlFruit = doc.SelectNodes("food/fruit");
for(int i = 0; i < 2; i++)
{
// This produces a null reference exception:
nlFruit[i].ParentNode.RemoveChild(nlFruit[i]);
}
However, if I count the number of nodes in the XmlNodeList it works and I am left with the desired outcome:
var strXml = #"<food><fruit type=""apple""/><fruit type=""pear""/><fruit type=""banana""/></food>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(strXml);
XmlNodeList nlFruit = doc.SelectNodes("food/fruit");
// Count the nodes..
Debug.WriteLine(nlFruit.Count);
for(int i = 0; i < 2; i++)
{
nlFruit[i].ParentNode.RemoveChild(nlFruit[i]);
}
// doc is now: <food><fruit type="banana" /></food>
Both are wrong you should delete from the end
for(int i = 1; i >= 0; i--)
{
nlFruit[i].ParentNode.RemoveChild(nlFruit[i]);
}
because you remove the 0 th element, and 1 st element becomes the 0 th, than you removes 1st element which is null.
May be this will help:
Halloween Problem : http://blogs.msdn.com/mikechampion/archive/2006/07/20/672208.aspx
I have a large xml document that needs to be processed 100 records at a time
It is being done within a Windows Service written in c#.
The structure is as follows :
<docket xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="docket.xsd">
<order>
<Date>2008-10-13</Date>
<orderNumber>050758023</orderNumber>
<ParcelID/>
<CustomerName>sddsf</CustomerName>
<DeliveryName>dsfd</DeliveryName>
<Address1>sdf</Address1>
<Address2>sdfsdd</Address2>
<Address3>sdfdsfdf</Address3>
<Address4>dffddf</Address4>
<PostCode/>
</order>
<order>
<Date>2008-10-13</Date>
<orderNumber>050758023</orderNumber>
<ParcelID/>
<CustomerName>sddsf</CustomerName>
<DeliveryName>dsfd</DeliveryName>
<Address1>sdf</Address1>
<Address2>sdfsdd</Address2>
<Address3>sdfdsfdf</Address3>
<Address4>dffddf</Address4>
<PostCode/>
</order>
.....
.....
</docket>
There could be thousands of orders in a docket.
I need to chop this into 100 element chunks
However each of the 100 orders still need to be wrapped with the parent "docket" node and have the same namespace etc
is this possible?
Another naive solution; this time for .NET 2.0. It should give you an idea of how to go about what you want. Uses Xpath expressions instead of Linq to XML. Chunks a 100 order docket into 10 dockets in under a second on my devbox.
public List<XmlDocument> ChunkDocket(XmlDocument docket, int chunkSize)
{
List<XmlDocument> newDockets = new List<XmlDocument>();
//
int orderCount = docket.SelectNodes("//docket/order").Count;
int chunkStart = 0;
XmlDocument newDocket = null;
XmlElement root = null;
XmlNodeList chunk = null;
while (chunkStart < orderCount)
{
newDocket = new XmlDocument();
root = newDocket.CreateElement("docket");
newDocket.AppendChild(root);
chunk = docket.SelectNodes(String.Format("//docket/order[position() > {0} and position() <= {1}]", chunkStart, chunkStart + chunkSize));
chunkStart += chunkSize;
XmlNode targetNode = null;
foreach (XmlNode c in chunk)
{
targetNode = newDocket.ImportNode(c, true);
root.AppendChild(targetNode);
}
newDockets.Add(newDocket);
}
return newDockets;
}
Naive, iterative, but works [EDIT: in .NET 3.5 only]
public List<XDocument> ChunkDocket(XDocument docket, int chunkSize)
{
var newDockets = new List<XDocument>();
var d = new XDocument(docket);
var orders = d.Root.Elements("order");
XDocument newDocket = null;
do
{
newDocket = new XDocument(new XElement("docket"));
var chunk = orders.Take(chunkSize);
newDocket.Root.Add(chunk);
chunk.Remove();
newDockets.Add(newDocket);
} while (orders.Any());
return newDockets;
}
If the reason to process 100 orders at a time is for performance purposes, e.g. taking too much time and resource to open a big file, You can utilize XmlReader to process order element one at a time without degrading the performance.
XmlReader reader = XmlReader.Create(#"c:\foo\Doket.xml")
while( reader.Read())
{
if(reader.LocalName == "order")
{
// read each child element and its value from the reader.
// or you can deserialize the order element by using a XmlSerializer and Order class
}
}