Transforming large Xml files

Transforming large Xml files - c#

I was using this extension method to transform very large xml files with an xslt.
Unfortunately, I get an OutOfMemoryException on the source.ToString() line.
I realize there must be a better way, I'm just not sure what that would be?
public static XElement Transform(this XElement source, string xslPath, XsltArgumentList arguments)
{
var doc = new XmlDocument();
doc.LoadXml(source.ToString());
var xsl = new XslCompiledTransform();
xsl.Load(xslPath);
using (var swDocument = new StringWriter(System.Globalization.CultureInfo.InvariantCulture))
{
using (var xtw = new XmlTextWriter(swDocument))
{
xsl.Transform((doc.CreateNavigator()), arguments, xtw);
xtw.Flush();
return XElement.Parse(swDocument.ToString());
}
}
}
Thoughts? Solutions? Etc.
UPDATE:
Now that this is solved, I have issues with validating the schema!
Validating large Xml files

Try this:
using System.Xml.Linq;
using System.Xml.XPath;
using System.Xml.Xsl;
static class Extensions
{
public static XElement Transform(
this XElement source, string xslPath, XsltArgumentList arguments)
{
var xsl = new XslCompiledTransform();
xsl.Load(xslPath);
var result = new XDocument();
using (var writer = result.CreateWriter())
{
xsl.Transform(source.CreateNavigator(), arguments, writer);
}
return result.Root;
}
}
BTW, new XmlTextWriter() is deprecated as of .NET 2.0. Use XmlWriter.Create() instead. Same with new XmlTextReader() and XmlReader.Create().

For large XML files you can try to use XPathDocument as suggested in Microsoft Knowledge Base article.
XPathDocument srcDoc = new XPathDocument(srcFile);
XslCompiledTransform myXslTransform = new XslCompiledTransform();
myXslTransform.Load(xslFile);
using (XmlWriter destDoc = XmlWriter.Create(destFile))
{
myXslTransform.Transform(srcDoc, destDoc);
}

Related

Append XML to file without writing it from scratch? [duplicate]

This question already has answers here:
Fastest way to add new node to end of an xml?
(10 answers)
Closed 5 years ago.
The standard way to append an XML file in LINQ-to-XML is to read it in, modify the in-memory document, and write the whole file out from scratch. For example:
XDocument doc = XDocument.Load("pathToDoc.xml");
doc.Root.Add(new XElement(namespace + "anotherChild", new XAttribute("child-id", childId)));
doc.Save("pathToDoc.xml");
Or, with a FileStream:
using (FileStream fs = new FileStream("pathToDoc.xml", FileMode.Open, FileAccess.ReadWrite)) {
XDocument doc = XDocument.Load(fs);
doc.Root.Add(new XElement(namespace + "anotherChild", new XAttribute("child-id", childId)));
fs.SetLength(0);
using (var writer = new StreamWriter(fs, new UTF8Encoding(false))) {
doc.Save(writer);
}
}
However, in both cases, the existing XML document is being loaded into memory, modified in memory, and then written from scratch to the XML file. For small XML files this is OK, but for large files with hundreds or thousands of nodes this seems like a very inefficient process. Is there any way to make XDocument (or perhaps something like an XmlWriter) just append the necessary additional nodes to the existing XML document rather than blanking it out and starting from scratch?

This totally depends on the position where you need to add the additional elements. Of course you can implement something that removes the closing "</root>" tag, writes additional elements and then adds the "</root>" again. However, such code is highly optimized for your purpose and you'll probably not find a library for it.
Your code could look like this (quick and dirty, without input checking, assuming that <root/> cannot exist):
using System.IO;
using System.Xml.Linq;
namespace XmlAddElementWithoutLoading
{
class Program
{
static void Main()
{
var rootelement = "root";
var doc = GetDocumentWithNewNodes(rootelement);
var newNodes = GetXmlOfNewNodes(doc);
using (var fs = new FileStream("pathToDoc.xml", FileMode.Open, FileAccess.ReadWrite))
{
using (var writer = new StreamWriter(fs))
{
RemoveClosingRootNode(fs, rootelement);
writer.Write(newNodes);
writer.Write("</"+rootelement+">");
}
}
}
private static void RemoveClosingRootNode(FileStream fs, string rootelement)
{
fs.SetLength(fs.Length - ("</" + rootelement + ">").Length);
fs.Seek(0, SeekOrigin.End);
}
private static string GetXmlOfNewNodes(XDocument doc)
{
var reader = doc.Root.CreateReader();
reader.MoveToContent();
return reader.ReadInnerXml();
}
private static XDocument GetDocumentWithNewNodes(string rootelement)
{
var doc = XDocument.Parse("<" + rootelement + "/>");
var childId = "2";
XNamespace ns = "namespace";
doc.Root.Add(new XElement(ns + "anotherChild", new XAttribute("child-id", childId)));
return doc;
}
}
}

Advantages of linq-to-xml compared to OfficeOpenXML.WordProcessing namespace

I'm working on an application that needs to generate Word documents based on user input, database values and a template. I've looked online for examples, and found many different approaches to generate word documents but I've made up my mind and decided to stick with the official Office Open XML SDK 2.5. Now I've just written a simple program that inserts a table (stored in a .xml file) into a word document:
Edit: Question down at the bottom if not interested in the code
static void Main(string[] args)
{
XNamespace ns = XNamespace
.Get(#"http://schemas.openxmlformats.org/wordprocessingml/2006/main");
byte[] byteArray = File.ReadAllBytes(#"C:/Users/Alexander/Downloads/WordTest.docx");
using (var stream = new MemoryStream())
{
XDocument xdoc;
stream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
{
Then I can do 2 different things which will generate the same output.
1) Using OfficeOpenXml.Wordprocessing namespace methods:
#region Openxml.WordProcessing
var paragraphs = doc.MainDocumentPart.Document.Body.ToList();
Table tbl = new Table(File.ReadAllText(#"C:/users/alexander/downloads/tablecontent.xml"));
var bookmark = paragraphs.SelectMany(p => p.Descendants<BookmarkStart>()
.Where(bm => bm.Id == "0")).FirstOrDefault();
doc.MainDocumentPart.Document.Body.ReplaceChild(tbl, bookmark.Parent);
#endregion Openxml.WordProcessing
2) Using Linq-To-XML:
#region LINQ TO XML
XElement xtbl = XElement.Load(
new FileStream(#"C:/users/alexander/downloads/tablecontent.xml", FileMode.Open));
using (StreamReader sr = new StreamReader(doc.MainDocumentPart.GetStream()))
using (XmlReader xr = XmlReader.Create(sr))
xdoc = XDocument.Load(xr);
//Document - Body - Paragraphs - Runs/Bookmarks/etc.
//any way to write this more clearly in linq-to-xml?
var test = xdoc.Elements().First().Elements().First().Elements()
.SelectMany(e => e.Elements()).ToList();
var startBookmark = test.Where(p => p.Name == XName.Get("bookmarkStart", ns.NamespaceName)
&& p.Attribute(XName.Get("id", ns.NamespaceName)).Value == "0").First();
startBookmark.Parent.ReplaceWith(xtbl);
using (XmlWriter xw = XmlWriter.Create(doc.MainDocumentPart.GetStream()))
xdoc.Save(xw);
#endregion LINQ TO XML
And finally I write the document to a new file:
using (FileStream fs =
new FileStream(#"C:/users/alexander/downloads/WordTestModified.docx", FileMode.Create))
{
stream.WriteTo(fs);
}
As far as I see it, the first option is easier and the code is more clear to read, (no use of XName and no need for extra StreamReader/XmlReader/Writer) but are there any distinct advantages Linq-to-xml has over this approach? This is going to be a big application and I don't want to be limited later on.

c# StringReader, XmlReader, XSLT - Unexpected XML Declaration

I've been using this function to read XML from a string and apply an XSLT style sheets, it has been working very well for small portions of XML:
private static string TransformXML(String XML, String XSLT)
{
string output = String.Empty;
using (StringReader srt = new StringReader(XSLT))
{
using (StringReader sri = new StringReader(XML))
{
using (XmlReader xrt = XmlReader.Create(srt))
using (XmlReader xri = XmlReader.Create(sri))
{
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(xrt);
using (StringWriter sw = new StringWriter())
using (XmlWriter xwo = XmlWriter.Create(sw, xslt.OutputSettings)) // use OutputSettings of xsl, so it can be output as HTML
{
xslt.Transform(xri, xwo);
output = sw.ToString();
}
}
}
}
return output;
}
However, with large portions of XML, I'm getting errors, even though I know it is correctly formatted.
Here is an example error: Unexpected end of file while parsing Name has occurred. Line 1, position 30001.
I'm guessing there is a limit on the buffering, but I can't quite work it out - the code is within an SSIS package and different script tasks produce and translate the XML.
I appreciate any help!

How to use XmlDocument object instead of reading XML file from drive?

I didn't know that I can use XSD schema to serialize received XML file. I used xsd.exe to generate cs class from XSD file and now I need to use that class to get data in class properties but I miss one thing and I need help.
This is the code:
private void ParseDataFromXmlDocument_UsingSerializerClass(XmlDocument doc)
{
XmlSerializer ser = new XmlSerializer(typeof(ClassFromXsd));
string filename = Path.Combine("C:\\myxmls\\test", "xmlname.xml");
ClassFromXsdmyClass = ser.Deserialize(new FileStream(filename, FileMode.Open)) as ClassFromXsd;
if (myClass != null)
{
// to do
}
...
Here I use XML file from drive. And I want to use this XmlDocument from parameter that I passed in. So how to adapt this code to use doc instead XML from drive?

You could write the XmlDocument to a MemoryStream, and then Deserialize it like you already did.
XmlDocument doc = new XmlDocument();
ClassFromXsd obj = null;
using (var s = new MemoryStream())
{
doc.Save(s);
var ser = new XmlSerializer(typeof (ClassFromXsd));
s.Seek(0, SeekOrigin.Begin);
obj = (ClassFromXsd)ser.Deserialize(s);
}

HTML File with xml and xmln declaration cannot be transformed

I've generated a HTML file and the top html declaration looks like this:
<html xml:lang="de-CH" lang="de-CH" xmlns="http://www.w3.org/1999/xhtml">
And then I try to convert it into a different format with this .Net 4 code:
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Ignore;
XslCompiledTransform proc = new XslCompiledTransform();
proc.Load("Html_to_Sql.xslt");
fsHtmlXml = new FileStream(file.Name, FileMode.Create);
html = XmlReader.Create(file.FullName, settings);
proc.Transform(html, null, fsHtmlXml);
Unfortunately nothing happens as long as I have the xml, lang and xmlns attributes in the HTML.
Why is that?

Your XSLT will need to refer to elements in the http://www.w3.org/1999/xhtml namespace. You haven't posted your XSLT code yet, the the problem most likely lies in that file.

Will this work via XML and XPath
using System;
using System.IO;
using System.Xml;
using System.Xml.Xsl;
using System.Xml.XPath;
public class TransformXML
{
//This will transform xml document using xslt and produce result xml document
//and display it
public static void Main(string[] args)
{
try
{
XPathDocument myXPathDocument = new XPathDocument(sourceDoc);
XslTransform myXslTransform = new XslTransform();
XmlTextWriter writer = new XmlTextWriter(resultDoc, null);
myXslTransform.Load(xsltDoc);
myXslTransform.Transform(myXPathDocument, null, writer);
writer.Close();
StreamReader stream = new StreamReader (resultDoc);
Console.Write("**This is result document**\n\n");
Console.Write(stream.ReadToEnd());
}
catch (Exception e)
{
Console.WriteLine ("Exception: {0}", e.ToString());
}
}
}

The xmlns attribute specifies the namespace of the XML document. This works in much the same way as namespaces within C#, where two classes with the same name but different namespaces are considered to be completely different classes. Changing the XML namespaces means that your XSLT templates / XPath will not match.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Transforming large Xml files - c#

Related

Append XML to file without writing it from scratch? [duplicate]

Advantages of linq-to-xml compared to OfficeOpenXML.WordProcessing namespace

c# StringReader, XmlReader, XSLT - Unexpected XML Declaration

How to use XmlDocument object instead of reading XML file from drive?

HTML File with xml and xmln declaration cannot be transformed

Categories

Resources