C# Dividing XML into parts

C# Dividing XML into parts - c#

I am trying to divide an XML file into parts
I have an XML file like this
<?xml version="1.0" encoding="utf-8"?>
<RegistrationOpenData xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://example.gov">
<Description>Registration data is collected by ABC XYZ</Description>
<InformationURL>http://www.example.com/html/hpd/property-reg-unit.shtml</InformationURL>
<SourceAgency>ABC Department of Housing</SourceAgency>
<SourceSystem>PREMISYS</SourceSystem>
<StartDate>2016-02-29T00:03:06.642772-05:00</StartDate>
<EndDate i:nil="true" />
<Registrations>
<Registration xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<RegistrationID>1</RegistrationID>
<BuildingID>1A</BuildingID>
<element1>E11</element1>
<element2>E21</element2>
<element3>E31</element3>
<element4>E41</element4>
</Registration>
<Registration xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<RegistrationID>2</RegistrationID>
<BuildingID>2A</BuildingID>
<element1>E21</element1>
<element2>E22</element2>
<element3>E32</element3>
<element4>E42</element4>
</Registration>
</Registrations>
</RegistrationOpenData>
And I am trying to fetch the number of nodes trough this code
XmlDocument doc = null;
doc = new XmlDocument();
doc.Load(#"D:\Registrations20160229.xml");
XmlNodeReader nodeReader = new XmlNodeReader(doc);
XmlElement root = doc.DocumentElement;
XmlNodeList elemList = root.GetElementsByTagName("Registration");
int totalnode = elemList.Count;
int nodehalf = totalnode / 2;
MessageBox.Show(nodehalf.ToString());
But after this I am unable to proceed, This code I have used to calculate number of Registration Nodes and then made them into half, now I don't know how to proceed further to split this file, I have total of 158718 entries (Registration Nodes) inside the file (sometimes even more) and I am trying to break all into parts, maybe 3 to 4 parts.

Try this , it should not load whole xml to memory
using(XmlReader reader = XmlReader.Create(new FileStream(#"D:\Registrations20160229.xml" , FileMode.Open))){
while (reader.Read())
{
if(reader.NodeType == XmlNodeType.Element && reader.Name == "Registration")
counter++;
}
Console.WriteLine(counter);
}

Related

foreach loop with SelectSingleNode always returns the first node

I am trying to get the inner text of each domain node from an httpxml response.
I save the response to a string and load to XmlDocument. But using the code below or variations of it I either get "CorpDomainMyDomain aaaa" or just "CorpDomain aaaa".
I have tried various iterations of domain and domains and cannot get the domains individually.
I would have thought that
XmlNodeList elemList = xmlDoc.SelectNodes("//DAV:domains", nsmgr);
would have created a list of each of the Domain elements but it doesn't seem that way.
The xml:
<?xml version="1.0" encoding="UTF-8" ?>
<multistatus xmlns="DAV:">
<response>
<propstat>
<prop>
<domains>
<domain logindefault="1" type="internal"><![CDATA[CorpDomain]]></domain>
<domain type="internal"><![CDATA[MyDomain]]></domain>
</domains>
</prop>
<status>HTTP/1.1 200 OK</status>
</propstat>
</response>
My code snippet
var nsmgr = new XmlNamespaceManager(xmlDoc.NameTable);
nsmgr.AddNamespace("DAV", "DAV:");
XmlNodeList elemList = xmlDoc.SelectNodes("//DAV:domains", nsmgr);
foreach (XmlNode node in elemList)
{
strNode = node.SelectSingleNode("//DAV:domains", nsmgr).InnerText;
responseString = strNode+" aaaa ";
}
return responseString;

(Even after you've fixed the responseString += strNode issue that Ian picked up on, When you loop through the domain elements under the domains parent, you shouldn't again use // - that will reset the context to the root of the document.
e.g. if your XmlDocument looks like:
<?xml version="1.0" encoding="UTF-8" ?>
<multistatus xmlns="DAV:">
<response>
<propstat>
<prop>
<domains>
<domain logindefault="1" type="internal">domains1-domain1</domain>
<domain type="internal">domains1-domain2</domain>
</domains>
</prop>
<prop>
<domains>
<domain logindefault="1" type="internal">domains2-domain1</domain>
<domain type="internal">domains2-domain2</domain>
</domains>
</prop>
<status>HTTP/1.1 200 OK</status>
</propstat>
</response>
</multistatus>
your code would in fact scrape the combined text nodes of all child nodes of just the first domains element, i.e. something like (where aaaa is your delimiter):
domains1-domain1 aaaa domains1-domain2 aaaa
Instead, you should just provide the relative pathing from parent to child, i.e. just domain in your case. Assuming that there are N domains parent elements each with M domain child elements, if you stop at the parent nodes, you will need a second level of iteration through the child nodes:
var nsmgr = new XmlNamespaceManager(xmlDoc.NameTable);
nsmgr.AddNamespace("DAV", "DAV:");
XmlNodeList elemList = xmlDoc.SelectNodes("//DAV:domains", nsmgr);
foreach (var domains in elemList)
{
foreach (var domain in domains.SelectNodes("DAV:domain", nsmgr))
{
strNode = domain.InnerText;
responseString = strNode+" aaaa ";
}
}
return responseString;
But if you don't need to retain a reference to the parent for other purposes, you could also do this in one step by flattening out the child nodes directly. For large xml documents with many nodes, it would also be a good idea to avoid the string concatenation problem, e.g. with a StringBuilder:
var sb = new StringBuilder();
foreach (XmlNode node in xmlDoc.SelectNodes("//DAV:domains/DAV:domain", nsmgr))
{
var strNode = node.InnerText;
sb.Append(strNode); // do the same for delimiter
}
// use sb.ToString() here.

How read data from xml string in C# [duplicate]

This question already has answers here:
Deserializing XML from String
(2 answers)
Closed 6 years ago.
I get this XML string for my web page, how can I retrieve data from that XML and assign values to labels in my web page?
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<things>
<bat>201400000586</bat>
<status>Y</status>
<totalAmount>3090</totalAmount>
<billno>P2355</billno>
<ReceiveDate>27/04/2015 06:22:18 PM</ReceiveDate>
</things>

Firstly load the Xml Doc using XMLDocument
XDocument doc = XDocument.Load(filePath);
XElement rootElm = doc.Element("things")
Now using linq you can fetch IENumerable
IEnumerable<XElement> childList = from Y in rootElm.Root.Elements()
select Y;
Now ou can loop through list items
foreach (XElement elm in childList)
{
//Here you can access elements this way
Console.log(elm.Element("status").Value);
..........
}
Here you can even edit the contents in xml file and save them.
Assign the values for the XElement type elements in the loop
doc.Save(filePath);

There are different ways to do this. Here is one.
You'll need to add "using System.Xml.XPath;"
XPathDocument doc = new XPathDocument(Server.MapPath("~/XMLFile1.xml"));
XPathNavigator nav = doc.CreateNavigator();
XPathExpression exp = nav.Compile(#"/things");
foreach (XPathNavigator item in nav.Select(exp))
{
label1.Text = item.SelectSingleNode("bat").ToString();
label2.Text = item.SelectSingleNode("totalAmount").ToString();
}
Or you can load it as a string, then use EITHER XmlElement or XmlNode with such a simple XML structure.
XmlDocument m_xml = new XmlDocument();
m_xml.LoadXml(#"<?xml version=""1.0"" encoding=""utf-8"" standalone=""yes"" ?><things><bat>201400000586</bat><status>Y</status><totalAmount>3090</totalAmount><billno>P2355</billno><ReceiveDate>27/04/2015 06:22:18 PM</ReceiveDate></things>");
XmlNode node_bat = m_xml.SelectSingleNode("//things/bat");
XmlNode node_totalAmount = m_xml.SelectSingleNode("//things/totalAmount");
XmlElement node_bat1 = m_xml.DocumentElement["bat"];
XmlElement node_totalAmount1 = m_xml.DocumentElement["totalAmount"];
label1.Text = node_bat1.InnerText;
label2.Text = node_totalAmount1.InnerText;

How to parse nested XML in C#

I'm working with an API and retrieving the data in XML. Here's my XML:
<RTT>
<AgencyList>
<Agency Name="Caltrain" HasDirection="True" Mode="Rail">
<RouteList>
<Route Name="BABY BULLET" Code="BABY BULLET">
<RouteDirectionList>
<RouteDirection Code="SB2" Name="SOUTHBOUND TO TAMIEN">
<StopList>
<Stop name="Sunnyvale Caltrain Station" StopCode="70222">
<DepartureTimeList/>
</Stop>
</StopList>
</RouteDirection>
<RouteDirection Code="NB" Name="NORTHBOUND TO SAN FRANCISCO">
<StopList>
<Stop name="Sunnyvale Caltrain Station" StopCode="70221">
<DepartureTimeList>
<DepartureTime>69</DepartureTime>
</DepartureTimeList>
</Stop>
</StopList>
</RouteDirection>
</RouteDirectionList>
</Route>
<Route Name="LIMITED" Code="LIMITED">...</Route>
<Route Name="LOCAL" Code="LOCAL">...</Route>
</RouteList>
</Agency>
</AgencyList>
</RTT>
Not every DepartureTimeList will have a DepartureTime child node. Here's what I got so far, which retrieves the Route name:
List<string> trainType = new List<string>();
XDocument doc = XDocument.Load("http://services.my511.org/Transit2.0/GetNextDeparturesByStopName.aspx?token=0f01ac4a-bc16-46a5-8527-5abc79fee435&agencyName=Caltrain&stopName=" + DropDownList1.SelectedItem.Text.ToString());
doc.Save("times.xml");
string feed = doc.ToString();
XmlReader r = XmlReader.Create(new StringReader(feed));
r.ReadToFollowing("RouteList");
if (r.ReadToDescendant("Route"))
{
do
{
trainType.Add(r.GetAttribute("Name"));
} while (r.ReadToNextSibling("Route"));
}
I'm mostly interested in the departure time (if it exists) and I've been struggling all afternoon to try and parse it.

Try this... Hopefully this will do it.
XmlDocument doc = new XmlDocument();
doc.Load("xml path");
XmlNode node = doc.SelectSingleNode("/RTT");
foreach (XmlNode nodes in node.SelectNodes(
"/AgencyList/Agency Name/RouteList/Route"))
{
trainType.Add(r.GetAttribute("Name"));
XmlNode s = nodes.SelectSingleNode("Route Name/RouteDirectionList/RouteDirection Code/StopList/Stop");
if (s != null && s["DepartureTimeList"].HasChildNodes)
{
// do stuff here
}
}

Get specific values from Xml

I don't how to extract the values from XML document, and am looking for some help as I'm new to C#
I am using XmlDocument and then XmlNodeList for fetching the particular XML document
Here is my code
XmlNodeList XMLList = doc.SelectNodes("/response/result/doc");
And my XML looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<result>
<doc>
<long name="LoadID">494</long>
<long name="EventID">5557</long>
<str name="XMLData"><TransactionDate>2014-05-28T14:17:31.2186777-06:00</TransactionDate><SubType>tblQM2222Components</SubType><IntegerValue title="ComponentID">11111</IntegerValue></str></doc>
<doc>
<long name="LoadID">774</long>
<long name="EventID">5558</long>
<str name="XMLData"><TransactionDate>2014-05-28T14:17:31.2186777-06:00</TransactionDate><SubType>tblQM2222Components</SubType><IntegerValue title="ComponentID">11111</IntegerValue></str></doc>
</result>
</response>
In this i have to fetch every the XMLData data that is under every doc tag and i have to fetch last doc tag EventID.

var xml = XDocument.Parse(xmlString);
var docs = xml.Root.Elements("doc");
var lastDocEventID = docs.Last()
.Elements("long")
.First(l => (string)l.Attribute("name") == "EventID")
.Value;
Console.WriteLine ("Last doc EventId: " +lastDocEventID);
foreach (var doc in docs)
{
Console.WriteLine (doc.Element("str").Element("TransactionDate").Value);
}
prints:
Last doc EventId: 5558
2014-05-28T14:17:31.2186777-06:00
2014-05-28T14:17:31.2186777-06:00

You can use two XPath expressions to select the nodes you want. To answer each part of your question in turn:
To select all of the XMLData nodes:
XmlNodeList XMLList
= doc.SelectNodes("/response/result/doc/str[#name='XMLData']");
To select the last EventId:
XmlNode lastEventIdNode =
doc.SelectSingleNode("/response/result/doc[position() =
last()]/long[#name='EventID']");
If not all doc nodes are guaranteed to have an event id child node, then you can simply:
XmlNodeList eventIdNodes =
doc.SelectNodes("/response/result/doc[long[#name='EventID']]");
XmlNode lastNode = eventIdNodes[eventIdNodes.Count - 1];
That should give you what you've asked for.
Update;
If you want the XML data inside each strXml element, you can use the InnerXml property:
XmlNodeList xmlList
= doc.SelectNodes("/response/result/doc/str[#name='XMLData']");
foreach(XmlNode xmlStrNode in xmlList)
{
string xmlInner = xmlStrNode.InnerXml;
}

There's one result tag short in your xml.
Try using this. It's cleaner too imho
XmlNodeList docs = doc.SelectSingleNode("response").SelectSingleNode("result").SelectNodes("doc");
Then you can use a combination of SelectSingleNode, InnerText, Value to get the data from each XmlNode in your list.
For example if you want the EventID from the first doc tag:
int eventID = int.Parse(docs[0].SelectSingleNode("long[#name='EventID']").InnerText);

Parsing a large XML file to multiple output xmls, using XmlReader - getting every other element

I need to take a very large XML file and create multiple output xml files from what could be thousands of repeating nodes of the input file. There is no whitespace in the source file "AnimalBatch.xml" which looks like this:
<?xml version="1.0" encoding="utf-8" ?><Animals><Animal id="1001"><Quantity>One</Quantity><Adjective>Red</Adjective><Name>Rooster</Name></Animal><Animal id="1002"><Quantity>Two</Quantity><Adjective>Stubborn</Adjective><Name>Donkeys</Name></Animal><Animal id="1003"><Quantity>Three</Quantity><Adjective>Blind</Adjective><Name>Mice</Name></Animal><Animal id="1004"><Quantity>Four</Quantity><Adjective>Purple</Adjective><Name>Horses</Name></Animal><Animal id="1005"><Quantity>Five</Quantity><Adjective>Long</Adjective><Name>Centipedes</Name></Animal><Animal id="1006"><Quantity>Six</Quantity><Adjective>Dark</Adjective><Name>Owls</Name></Animal></Animals>
The program needs to split the repeating "Animal" and produce the appropriate number of files named: Animal_1001.xml, Animal_1002.xml, Animal_1003.xml, etc.
Animal_1001.xml:
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>
Animal_1002.xml
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>
Animal_1003.xml>
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Three</Quantity>
<Adjective>Blind</Adjective>
<Name>Mice</Name>
</Animal>
The code below works, but only if the input file has CR/LF after the <Animal id="xxxx"> elements. If it has no "whitespace" (I don't, and can't get it like that), I get every other one (the odd numbered animals)
static void SplitXMLReader()
{
string strFileName;
string strSeq = "";
XmlReader doc = XmlReader.Create("C:\\AnimalBatch.xml");
while (doc.Read())
{
if ( doc.Name == "Animal" && doc.NodeType == XmlNodeType.Element )
{
strSeq = doc.GetAttribute("id");
XmlDocument outdoc = new XmlDocument();
XmlDeclaration xmlDeclaration = outdoc.CreateXmlDeclaration("1.0", "utf-8", null);
XmlElement rootNode = outdoc.CreateElement(doc.Name);
rootNode.InnerXml = doc.ReadInnerXml();
// This seems to be advancing the cursor in doc too far.
outdoc.InsertBefore(xmlDeclaration, outdoc.DocumentElement);
outdoc.AppendChild(rootNode);
strFileName = "Animal_" + strSeq + ".xml";
outdoc.Save("C:\\" + strFileName);
}
}
}
My understanding is that "whitespace" or formatting in XML should make no difference to XmlReader - but I've tried this both ways, with and without CR/LF's after the <Animal id="xxxx">, and can confirm there is a difference. If it has CR/LFs (possibly even just a space, which I'll try next) - it gets each <Animal> node processed fully, and saved under the right filename that comes from the id attribute.
Can someone let me know what's going on here - and a possible workaround?

yes, when using the doc.readInnerXml() white space is important.
From the documentation of the function. This returns a string. so of course white space will matter. If you want the inner text as a xmlNode you should use something like this

Thanks for the guidance on using the ReadSubTree() method:
This code works for the XML input file with no linefeeds:
static void SplitXMLReaderSubTree()
{
string strFileName;
string strSeq = "";
XmlReader doc = XmlReader.Create("C:\\AnimalBatch.xml");
while (!doc.EOF)
{
if ( doc.Name == "Animal" && doc.NodeType == XmlNodeType.Element )
{
strSeq = doc.GetAttribute("id");
XmlReader inner = doc.ReadSubtree();
inner.Read();
XmlDocument outdoc = new XmlDocument();
XmlDeclaration xmlDeclaration = outdoc.CreateXmlDeclaration("1.0", "utf-8", null);
XmlElement myElement;
myElement = outdoc.CreateElement(doc.Name);
myElement.InnerXml = inner.ReadInnerXml();
inner.Close();
myElement.Attributes.RemoveAll();
outdoc.InsertBefore(xmlDeclaration, outdoc.DocumentElement);
outdoc.ImportNode(myElement, true);
outdoc.AppendChild(myElement);
strFileName = "Animal_" + strSeq + ".xml";
outdoc.Save("C:\\" + strFileName);
}
else
{
doc.Read();
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Dividing XML into parts - c#

Related

foreach loop with SelectSingleNode always returns the first node

How read data from xml string in C# [duplicate]

How to parse nested XML in C#

Get specific values from Xml

Parsing a large XML file to multiple output xmls, using XmlReader - getting every other element

Categories

Resources