Update XLSX file changes whilst reading the file with XmlReader

Update XLSX file changes whilst reading the file with XmlReader - c#

We had a code which was loading the Excel XLSX document into the memory, doing some modifications with it and saving it back.
XmlDocument doc = new XmlDocument();
doc.Load(pp.GetStream());
XmlNode rootNode = doc.DocumentElement;
if (rootNode == null) return;
ProcessNode(rootNode);
if (this.fileModified)
{
doc.Save(pp.GetStream(FileMode.Create, FileAccess.Write));
}
This was working good with small files, but throwing OutOfMemory exceptions with some large Excel files. So we decided to change the approach and use XmlReader class to not load the file into the memory at once.
PackagePartCollection ppc = this.Package.GetParts();
foreach (PackagePart pp in ppc)
{
if (!this.xmlContentTypesXlsx.Contains(pp.ContentType)) continue;
using (XmlReader reader = XmlReader.Create(pp.GetStream()))
{
reader.MoveToContent();
while (reader.EOF == false)
{
XmlDocument doc;
XmlNode rootNode;
if (reader.NodeType == XmlNodeType.Element && reader.Name == "hyperlinks")
{
doc = new XmlDocument();
rootNode = doc.ReadNode(reader);
if (rootNode != null)
{
doc.AppendChild(rootNode);
ProcessNode(rootNode); // how can I save updated changes back to the file?
}
}
else if (reader.NodeType == XmlNodeType.Element && reader.Name == "row")
{
doc = new XmlDocument();
rootNode = doc.ReadNode(reader);
if (rootNode != null)
{
doc.AppendChild(rootNode);
ProcessNode(rootNode); // how can I save updated changes back to the file?
}
}
else
{
reader.Read();
}
}
}
}
This reads the file node by node and processes nodes we need (and changes some values there). However, I'm not sure how we can update those values back to the original Excel file.
I tried to use XmlWriter together with the XmlReader, but was not able to make it work. Any ideas?
UPDATE:
I tried to use #dbc's suggestions from the comments section, but it seems too slow to me. It probably will not throw OutOfMemory exceptions for huge files, but processing will take forever.
PackagePartCollection ppc = this.Package.GetParts();
foreach (PackagePart pp in ppc)
{
if (!this.xmlContentTypesXlsx.Contains(pp.ContentType)) continue;
StringBuilder strBuilder = new StringBuilder();
using (XmlReader reader = XmlReader.Create(pp.GetStream()))
{
using (XmlWriter writer = this.Package.FileOpenAccess == FileAccess.ReadWrite ? XmlWriter.Create(strBuilder) : null)
{
reader.MoveToContent();
while (reader.EOF == false)
{
XmlDocument doc;
XmlNode rootNode;
if (reader.NodeType == XmlNodeType.Element && reader.Name == "hyperlinks")
{
doc = new XmlDocument();
rootNode = doc.ReadNode(reader);
if (rootNode != null)
{
doc.AppendChild(rootNode);
ProcessNode(rootNode);
writer?.WriteRaw(rootNode.OuterXml);
}
}
else if (reader.NodeType == XmlNodeType.Element && reader.Name == "row")
{
doc = new XmlDocument();
rootNode = doc.ReadNode(reader);
if (rootNode != null)
{
doc.AppendChild(rootNode);
ProcessNode(rootNode);
writer?.WriteRaw(rootNode.OuterXml);
}
}
else
{
WriteShallowNode(writer, reader); // Used from the #dbc's suggested stackoverflow answers
reader.Read();
}
}
writer?.Flush();
}
}
}
NOTE 1: I'm using StringBuilder for the test, but was planning to switch to a temp file in the end.
NOTE 2: I tried flushing the XmlWriter after every 100 elements, but it's still slow.
Any ideas?

Try following. I've been use for a long time with huge xml files that give Out of Memory
using (XmlReader reader = XmlReader.Create("File Stream", readerSettings))
{
while (!reader.EOF)
{
if (reader.Name != "row")
{
reader.ReadToFollowing("row");
}
if (!reader.EOF)
{
XElement row = (XElement)XElement.ReadFrom(reader);
}
}
}
}

I did some more modifications with #dbc's help and now it works as I wanted.
PackagePartCollection ppc = this.Package.GetParts();
foreach (PackagePart pp in ppc)
{
try
{
if (!this.xmlContentTypesXlsx.Contains(pp.ContentType)) continue;
string tempFilePath = GetTempFilePath();
using (XmlReader reader = XmlReader.Create(pp.GetStream()))
{
using (XmlWriter writer = this.Package.FileOpenAccess == FileAccess.ReadWrite ? XmlWriter.Create(tempFilePath) : null)
{
while (reader.EOF == false)
{
if (reader.NodeType == XmlNodeType.Element && reader.Name == "hyperlinks")
{
XmlDocument doc = new XmlDocument();
XmlNode rootNode = doc.ReadNode(reader);
if (rootNode != null)
{
ProcessNode(rootNode);
if (writer != null)
{
rootNode.WriteTo(writer);
}
}
}
else if (reader.NodeType == XmlNodeType.Element && reader.Name == "row")
{
XmlDocument doc = new XmlDocument();
XmlNode rootNode = doc.ReadNode(reader);
if (rootNode != null)
{
ProcessNode(rootNode);
if (writer != null)
{
rootNode.WriteTo(writer);
}
}
}
else
{
WriteShallowNode(writer, reader); // Used from the #dbc's suggested StackOverflow answers
reader.Read();
}
}
}
}
if (this.packageChanged) // is being set in ProcessNode method
{
this.packageChanged = false;
using (var tempFile = File.OpenRead(tempFilePath))
{
tempFile.CopyTo(pp.GetStream(FileMode.Create, FileAccess.Write));
}
}
}
catch (OutOfMemoryException)
{
throw;
}
catch (Exception ex)
{
Log.Exception(ex, #"Failed to process a file."); // our inner log method
}
finally
{
if (!string.IsNullOrWhiteSpace(tempFilePath))
{
// Delete temp file
}
}
}

Related

C# XmlReader confused about assigning values from read() function

Have the following code, new to this so be kind, it looks clunky and isn't returning what I expect it to return. Basically I'm trying to read nodes for Operator, Password, and Group values into vars and return via a tuple.
public static Tuple<string, string, string> ReadSecurity()
{
XmlReader reader = XmlReader.Create("Operator.xml");
string sOperator = "";
string sPassword = "";
string sGroup = "";
while (reader.Read())
{
if(reader.NodeType == XmlNodeType.Element && reader.Name == "Security")
{
while (reader.NodeType != XmlNodeType.EndElement)
{
reader.Read();
if (reader.NodeType == XmlNodeType.Text)
{
sOperator = reader.Value;
}
reader.Read();
if (reader.NodeType == XmlNodeType.Text)
{
sPassword = reader.Value;
}
reader.Read();
if (reader.NodeType == XmlNodeType.Text)
{
sGroup = reader.Value;
}
}
}
}
return Tuple.Create(sOperator, sPassword, sGroup);
}
Seem to be missing the first value each time but have no idea how to change this, online tutorials are assuming a lot more knowledge than I currently have.
For example:
See below for the current iteration (yes, I know the password should be encrypted).
<?xml version="1.0" encoding="utf-8"?>
<Security ver="beta">
<Operator>Ted</Operator>
<Password>password</Password>
<Group>op</Group>
</Security>

Put a breakpoint at the beginning of your method and run it step by step pressing F11. You will see that the XmlReader reads, among others, the Whitespace nodes. See Debug > Windows > Autos or Locals window.
You must ignore these Whitespace nodes and correctly navigate to nodes of type Text, to read their values. You also need to correctly handle start and end tags.
As a result, the code might look like this:
using (var reader = XmlReader.Create("Operator.xml"))
{
string sOperator = "";
string sPassword = "";
string sGroup = "";
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element && reader.Name == "Operator")
{
reader.Read(); // move to Text node
sOperator = reader.Value;
}
if (reader.NodeType == XmlNodeType.Element && reader.Name == "Password")
{
reader.Read(); // move to Text node
sPassword = reader.Value;
}
if (reader.NodeType == XmlNodeType.Element && reader.Name == "Group")
{
reader.Read(); // move to Text node
sGroup = reader.Value;
}
}
return Tuple.Create(sOperator, sPassword, sGroup);
}
However the XmlReader class has many useful methods. If you use them correctly, you can make its use simple and enjoyable.
using (var reader = XmlReader.Create("Operator.xml"))
{
reader.ReadToFollowing("Operator");
var sOperator = reader.ReadElementContentAsString();
reader.ReadToFollowing("Password");
var sPassword = reader.ReadElementContentAsString();
reader.ReadToFollowing("Group");
var sGroup = reader.ReadElementContentAsString();
return Tuple.Create(sOperator, sPassword, sGroup);
}

ReadOuterXml is throwing OutOfMemoryException reading part of large (1 GB) XML file

I am working on a large XML file and while running the application, XmlTextReader.ReadOuterXml() method is throwing memory exception.
Lines of codes are like,
XmlTextReader xr = null;
try
{
xr = new XmlTextReader(fileName);
while (xr.Read() && success)
{
if (xr.NodeType != XmlNodeType.Element)
continue;
switch (xr.Name)
{
case "A":
var xml = xr.ReadOuterXml();
var n = GetDetails(xml);
break;
}
}
}
catch (Exception ex)
{
//Do stuff
}
Using:
private int GetDetails (string xml)
{
var rootNode = XDocument.Parse(xml);
var xnodes = rootNode.XPathSelectElements("//A/B").ToList();
//Then working on list of nodes
}
Now while loading the XML files, the application throwing exception on the xr.ReadOuterXml() line. What can be done to avoid this? The size of XML is almost 1 GB.

The most likely reason you are getting a OutOfMemoryException in ReadOuterXml() is that you are trying to read in a substantial portion of the 1 GB XML document into a string, and are hitting the Maximum string length in .Net.
So, don't do that. Instead load directly from the XmlReader using XDocument.Load() with XmlReader.ReadSubtree():
using (var xr = XmlReader.Create(fileName))
{
while (xr.Read() && success)
{
if (xr.NodeType != XmlNodeType.Element)
continue;
switch (xr.Name)
{
case "A":
{
// ReadSubtree() positions the reader at the EndElement of the element read, so the
// next call to Read() moves to the next node.
using (var subReader = xr.ReadSubtree())
{
var doc = XDocument.Load(subReader);
GetDetails(doc);
}
}
break;
}
}
}
And then in GetDetails() do:
private int GetDetails(XDocument rootDocument)
{
var xnodes = rootDocument.XPathSelectElements("//A/B").ToList();
//Then working on list of nodes
return xnodes.Count;
}
Not only will this use less memory, it will also be more performant. ReadOuterXml() uses a temporary XmlWriter to copy the XML in the input stream to an output StringWriter (which you then parse a second time). This version of the algorithm completely skips this extra work. It also avoids creating strings large enough to go on the large object heap which can cause additional performance issues.
If this is still using too much memory you will need to implement SAX-like parsing for your XML where you only load one element <B> at a time. First, introduce the following extension method:
public static partial class XmlReaderExtensions
{
public static IEnumerable<XElement> WalkXmlElements(this XmlReader xmlReader, Predicate<Stack<XName>> filter)
{
Stack<XName> names = new Stack<XName>();
while (xmlReader.Read())
{
if (xmlReader.NodeType == XmlNodeType.Element)
{
names.Push(XName.Get(xmlReader.LocalName, xmlReader.NamespaceURI));
if (filter(names))
{
using (var subReader = xmlReader.ReadSubtree())
{
yield return XElement.Load(subReader);
}
}
}
if ((xmlReader.NodeType == XmlNodeType.Element && xmlReader.IsEmptyElement)
|| xmlReader.NodeType == XmlNodeType.EndElement)
{
names.Pop();
}
}
}
}
Then, use it as follows:
using (var xr = XmlReader.Create(fileName))
{
Predicate<Stack<XName>> filter =
(stack) => stack.Peek().LocalName == "B" && stack.Count > 1 && stack.ElementAt(1).LocalName == "A";
foreach (var element in xr.WalkXmlElements(filter))
{
//Then working on the specific node.
}
}

using (var reader = XmlReader.Create(fileName))
{
XmlDocument oXml = new XmlDocument();
while (reader.Read())
{
oXml.Load(reader);
}
}
For me above code resolved the issue when we return it to XmlDocument through XmlDocument Load method

using XmlReader to get child nodes without knowing their names(in .net)

How do I get the the top level child nodes(unknownA) of root node with XmlReader in .net? Because their names are unknown, ReadToDescendant(string) and ReadToNextSibling(string)won't work.
<root>
<unknownA/>
<unknownA/>
<unknownA>
<unknownB/>
<unknownB/>
</unknownA>
<unknownA/>
<unknownA>
<unknownB/>
<unknownB>
<unknownC/>
<unknownC/>
</unknownB>
</unknownA>
<unknownA/>
</root>

You can walk through the file using XmlReader.Read(), checking the current Depth against the initial depth, until reaching an element end at the initial depth, using the following extension method:
public static class XmlReaderExtensions
{
public static IEnumerable<string> ReadChildElementNames(this XmlReader xmlReader)
{
if (xmlReader == null)
throw new ArgumentNullException();
if (xmlReader.NodeType == XmlNodeType.Element && !xmlReader.IsEmptyElement)
{
var depth = xmlReader.Depth;
while (xmlReader.Read())
{
if (xmlReader.Depth == depth + 1 && xmlReader.NodeType == XmlNodeType.Element)
yield return xmlReader.Name;
else if (xmlReader.Depth == depth && xmlReader.NodeType == XmlNodeType.EndElement)
break;
}
}
}
public static bool ReadToFirstElement(this XmlReader xmlReader)
{
if (xmlReader == null)
throw new ArgumentNullException();
while (xmlReader.NodeType != XmlNodeType.Element)
if (!xmlReader.Read())
return false;
return true;
}
}
Then it could be used as follows:
var xml = GetXml(); // Your XML string
using (var textReader = new StringReader(xml))
using (var xmlReader = XmlReader.Create(textReader))
{
xmlReader.ReadToFirstElement();
var names = xmlReader.ReadChildElementNames().ToArray();
Console.WriteLine(string.Join("\n", names));
}

treeview AfterLabelEdit MessageBox displays twice

I am playing with Treeview and the AfterLabelEdit function and IM having a problem where after validation it displays the MessageBox Twice before it goes back to Editing. Anyone see what I might be doing wrong here.
private void treeView1_AfterLabelEdit(object sender, System.Windows.Forms.NodeLabelEditEventArgs e)
{
var HostsXML = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData), "Hosts.xml");
XmlDocument doc = new XmlDocument();
doc.Load(HostsXML);
foreach (TreeNode pChild in e.Node.Parent.Nodes)
{
if (pChild.Text == e.Label)
{
// same name found, cancel the edit operation
MessageBox.Show("That Name Cannot be Used. Please Select a Different Name");
e.CancelEdit = true;
e.Node.BeginEdit();
//treeView1.Nodes.Remove(treeView1.SelectedNode);
return;
}
}
if (e.Label != null)
{
if (e.Label.Length > 0)
{
if (String.IsNullOrEmpty(selectedNode))
{
XmlNode rootNode = doc.SelectSingleNode("Servers");
XmlNode recordNode = rootNode.AppendChild(doc.CreateNode(XmlNodeType.Element, "Server", ""));
recordNode.AppendChild(doc.CreateNode(XmlNodeType.Element, "Name", "")).InnerText = e.Label;
}
else
{
XmlElement root = doc.DocumentElement;
XmlNodeList xnList = doc.SelectNodes("/Servers/Server[Name ='" + selectedNode + "']");
foreach (XmlNode xn in xnList)
{
xn["Name"].InnerText = e.Label;
}
}
}
else
{
MessageBox.Show("You Did Not Enter a Valid Name:1");
e.CancelEdit = true;
e.Node.BeginEdit();
//treeView1.Nodes.Remove(treeView1.SelectedNode);
return;
}
}
else
{
e.CancelEdit = true;
MessageBox.Show("You Did Not Enter a Valid Name: 2");
e.Node.BeginEdit();
//treeView1.Nodes.Remove(treeView1.SelectedNode);
return;
}
selectedNode = null;
doc.Save(HostsXML);
}

how to read xml files in C# ?

I have a code which reads an xml file. There are some parts I dont understand.
From my understanding , the code will create an xml file with 2 elements,
"Product" and "OtherDetails" . How come we only have to use writer.WriteEndElement();
once when we used writer.WriteStartElement twice ? , shouldn't we close each
writer.WriteStartElement statement with a writer.WriteEndElement() statement ?
using System.Xml;
public class Program
{
public static void Main()
{
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
XmlWriter writer = XmlWriter.Create("Products.xml", settings);
writer.WriteStartDocument();
writer.WriteComment("This file is generated by the program.");
writer.WriteStartElement("Product"); // first s
writer.WriteAttributeString("ID", "001");
writer.WriteAttributeString("Name", "Soap");
writer.WriteElementString("Price", "10.00")
// Second Element
writer.WriteStartElement("OtherDetails");
writer.WriteElementString("BrandName", "X Soap");
writer.WriteElementString("Manufacturer", "X Company");
writer.WriteEndElement();
writer.WriteEndDocument();
writer.Flush();
writer.Close();
}
}
using System;
using System.Xml;
public class Program
{
public static void Main()
{
XmlReader reader = XmlReader.Create("Products.xml");
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element
&& reader.Name == "Product")
{
Console.WriteLine("ID = " + reader.GetAttribute(0));
Console.WriteLine("Name = " + reader.GetAttribute(1));
while (reader.NodeType != XmlNodeType.EndElement)
{
reader.Read();
if (reader.Name == "Price")
{
while (reader.NodeType != XmlNodeType.EndElement)
{
reader.Read();
if (reader.NodeType == XmlNodeType.Text)
{
Console.WriteLine("Price = {0:C}", Double.Parse(reader.Value));
}
}
reader.Read();
} //end if
if (reader.Name == "OtherDetails")
{
while (reader.NodeType != XmlNodeType.EndElement)
{
reader.Read();
if (reader.Name == "BrandName")
{
while (reader.NodeType != XmlNodeType.EndElement)
{
reader.Read();
if (reader.NodeType == XmlNodeType.Text)
{
Console.WriteLine("Brand Name = " + reader.Value);
}
}
reader.Read();
} //end if
if (reader.Name == "Manufacturer")
{
while (reader.NodeType != XmlNodeType.EndElement)
{
reader.Read();
if (reader.NodeType == XmlNodeType.Text)
{
Console.WriteLine("Manufacturer = " + reader.Value);
}
}
} //end if
}
} //end if
} //end while
} //end if
} //end while
}
}
I don't get this part:
if (reader.Name == "OtherDetails")
{
while (reader.NodeType != XmlNodeType.EndElement)
{
reader.Read();
if (reader.Name == "BrandName")
{
while (reader.NodeType != XmlNodeType.EndElement)
{
reader.Read();
if (reader.NodeType == XmlNodeType.Text)
{
Console.WriteLine("Brand Name = " + reader.Value);
}
}
notice how the condition while (reader.NodeType != XmlNodeType.EndElement) has been used twice ?
why is that we don't have to specify
if (reader.NodeType == XmlNodeType.Element for OtherDetails) as we did for Product,
like this
if (reader.NodeType == XmlNodeType.Element
&& reader.Name == "OtherDetails")
{}

To answer your first question:
As the MSDN documentation for XmlWriter.WriteEndDocument() says:
Closes any open elements or attributes and puts the writer back in the Start state.
So it will automatically close any open elements for you. In fact, you can remove the call to WriteEndElement() altogether and it will still work ok.
And as people are saying in the comments above, you should perhaps consider using Linq-to-XML.
It can make things much easier. For example, to create the XML structure from your program using Linq-to-XML you can do this:
var doc = new XDocument(
new XElement("Product",
new XAttribute("ID", "001"), new XAttribute("Name", "Soap"),
new XElement("Price", 10.01),
new XElement("OtherDetails",
new XElement("BrandName", "X Soap"),
new XElement("Manufacturer", "X Company"))));
File.WriteAllText("Products.xml", doc.ToString());
If you were reading data from the XML, you can use var doc = XDocument.Load("Filename.xml") to load the XML from a file, and then getting the data out is as simple as:
double price = double.Parse(doc.Descendants("Price").Single().Value);
string brandName = doc.Descendants("BrandName").Single().Value;
Or alternatively (casting):
double price = (double) doc.Descendants("Price").Single();
string brandName = (string) doc.Descendants("BrandName").Single();
(In case you're wondering how on earth we can cast an object of type XElement like that: It's because a load of explict conversion operators are defined for XElement.)

If you need anything strait forward (no reading or research), here is what I did:
I recently wrote a custom XML parsing method for my MenuStrip for WinForms (it had hundreds of items and XML was my best bet).
// load the document
// I loaded mine from my C# resource file called TempResources
XDocument doc = XDocument.Load(new MemoryStream(Encoding.UTF8.GetBytes(TempResources.Menu)));
// get the root element
// (var is an auto token, it becomes what ever you assign it)
var elements = doc.Root.Elements();
// iterate through the child elements
foreach (XElement node in elements)
{
// if you know the name of the attribute, you can call it
// mine was 'name'
// (if you don't know, you can call node.Attributes() - this has the name and value)
Console.WriteLine("Loading list: {0}", node.Attribute("name").Value);
// in my case, every child had additional children, and them the same
// *.Cast<XElement>() would give me the array in a datatype I can work with
// menu_recurse(...) is just a resursive helper method of mine
menu_recurse(node.Elements().Cast<XElement>().ToArray()));
}
(My answer can also be found here: Reading an XML File With Linq - though it unfortunately is not Linq)

Suppose if you want to read an xml file we need to use a dataset,because xml file internally converts into datatables using a dataset.Use the following line of code to access the file and to bind the dataset with the xml data.
DataSet ds=new DataSet();
ds.ReadXml(HttpContext.Current.Server.MapPath("~/Labels.xml");
DataSet comprises of many datatables,count of those datatables depends on number of Parent Child Tags in an xml file

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Update XLSX file changes whilst reading the file with XmlReader - c#

Related

C# XmlReader confused about assigning values from read() function

ReadOuterXml is throwing OutOfMemoryException reading part of large (1 GB) XML file

using XmlReader to get child nodes without knowing their names(in .net)

treeview AfterLabelEdit MessageBox displays twice

how to read xml files in C# ?

Categories

Resources