How to parse mathML in output of WordOpenXML? - c#

I want to read only the xml used for generating equation, which i obtained by using Paragraph.Range.WordOpenXML. But the section used for the equation is not as per MathML which as i found that the Equation of microsoft is in MathML.
Do I need to use some special converter to get desired xmls or are there any other methods?

You could use the OMML2MML.XSL file (located under %ProgramFiles%\Microsoft Office\Office15)
to transform Microsoft Office MathML (equations) included in a word document into MathML.
The code below shows how to transform the equations in a word document into MathML
using the following steps:
Open the word document using OpenXML SDK (version 2.5).
Create a XslCompiledTransform and load the OMML2MML.XSL file.
Transform the word document by calling the Transform() method
on the created XslCompiledTransform instance.
Output the result of the transform (e.g. print on console or write to file).
I've tested the code below with a simple word document containing two equations, text and pictures.
using System.IO;
using System.Xml;
using System.Xml.Xsl;
using DocumentFormat.OpenXml.Packaging;
public string GetWordDocumentAsMathML(string docFilePath, string officeVersion = "14")
{
string officeML = string.Empty;
using (WordprocessingDocument doc = WordprocessingDocument.Open(docFilePath, false))
{
string wordDocXml = doc.MainDocumentPart.Document.OuterXml;
XslCompiledTransform xslTransform = new XslCompiledTransform();
// The OMML2MML.xsl file is located under
// %ProgramFiles%\Microsoft Office\Office15\
xslTransform.Load(#"c:\Program Files\Microsoft Office\Office" + officeVersion + #"\OMML2MML.XSL");
using (TextReader tr = new StringReader(wordDocXml))
{
// Load the xml of your main document part.
using (XmlReader reader = XmlReader.Create(tr))
{
using (MemoryStream ms = new MemoryStream())
{
XmlWriterSettings settings = xslTransform.OutputSettings.Clone();
// Configure xml writer to omit xml declaration.
settings.ConformanceLevel = ConformanceLevel.Fragment;
settings.OmitXmlDeclaration = true;
XmlWriter xw = XmlWriter.Create(ms, settings);
// Transform our OfficeMathML to MathML.
xslTransform.Transform(reader, xw);
ms.Seek(0, SeekOrigin.Begin);
using (StreamReader sr = new StreamReader(ms, Encoding.UTF8))
{
officeML = sr.ReadToEnd();
// Console.Out.WriteLine(officeML);
}
}
}
}
}
return officeML;
}
To convert only one single equation (and not the whole word document) just query for the desired Office Math Paragraph (m:oMathPara) and use the OuterXML property of this node.
The code below shows how to query for the first math paragraph:
string mathParagraphXml =
doc.MainDocumentPart.Document.Descendants<DocumentFormat.OpenXml.Math.Paragraph>().First().OuterXml;
Use the returned XML to feed the TextReader.

Related

Advantages of linq-to-xml compared to OfficeOpenXML.WordProcessing namespace

I'm working on an application that needs to generate Word documents based on user input, database values and a template. I've looked online for examples, and found many different approaches to generate word documents but I've made up my mind and decided to stick with the official Office Open XML SDK 2.5. Now I've just written a simple program that inserts a table (stored in a .xml file) into a word document:
Edit: Question down at the bottom if not interested in the code
static void Main(string[] args)
{
XNamespace ns = XNamespace
.Get(#"http://schemas.openxmlformats.org/wordprocessingml/2006/main");
byte[] byteArray = File.ReadAllBytes(#"C:/Users/Alexander/Downloads/WordTest.docx");
using (var stream = new MemoryStream())
{
XDocument xdoc;
stream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
{
Then I can do 2 different things which will generate the same output.
1) Using OfficeOpenXml.Wordprocessing namespace methods:
#region Openxml.WordProcessing
var paragraphs = doc.MainDocumentPart.Document.Body.ToList();
Table tbl = new Table(File.ReadAllText(#"C:/users/alexander/downloads/tablecontent.xml"));
var bookmark = paragraphs.SelectMany(p => p.Descendants<BookmarkStart>()
.Where(bm => bm.Id == "0")).FirstOrDefault();
doc.MainDocumentPart.Document.Body.ReplaceChild(tbl, bookmark.Parent);
#endregion Openxml.WordProcessing
2) Using Linq-To-XML:
#region LINQ TO XML
XElement xtbl = XElement.Load(
new FileStream(#"C:/users/alexander/downloads/tablecontent.xml", FileMode.Open));
using (StreamReader sr = new StreamReader(doc.MainDocumentPart.GetStream()))
using (XmlReader xr = XmlReader.Create(sr))
xdoc = XDocument.Load(xr);
//Document - Body - Paragraphs - Runs/Bookmarks/etc.
//any way to write this more clearly in linq-to-xml?
var test = xdoc.Elements().First().Elements().First().Elements()
.SelectMany(e => e.Elements()).ToList();
var startBookmark = test.Where(p => p.Name == XName.Get("bookmarkStart", ns.NamespaceName)
&& p.Attribute(XName.Get("id", ns.NamespaceName)).Value == "0").First();
startBookmark.Parent.ReplaceWith(xtbl);
using (XmlWriter xw = XmlWriter.Create(doc.MainDocumentPart.GetStream()))
xdoc.Save(xw);
#endregion LINQ TO XML
And finally I write the document to a new file:
using (FileStream fs =
new FileStream(#"C:/users/alexander/downloads/WordTestModified.docx", FileMode.Create))
{
stream.WriteTo(fs);
}
As far as I see it, the first option is easier and the code is more clear to read, (no use of XName and no need for extra StreamReader/XmlReader/Writer) but are there any distinct advantages Linq-to-xml has over this approach? This is going to be a big application and I don't want to be limited later on.

c# StringReader, XmlReader, XSLT - Unexpected XML Declaration

I've been using this function to read XML from a string and apply an XSLT style sheets, it has been working very well for small portions of XML:
private static string TransformXML(String XML, String XSLT)
{
string output = String.Empty;
using (StringReader srt = new StringReader(XSLT))
{
using (StringReader sri = new StringReader(XML))
{
using (XmlReader xrt = XmlReader.Create(srt))
using (XmlReader xri = XmlReader.Create(sri))
{
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(xrt);
using (StringWriter sw = new StringWriter())
using (XmlWriter xwo = XmlWriter.Create(sw, xslt.OutputSettings)) // use OutputSettings of xsl, so it can be output as HTML
{
xslt.Transform(xri, xwo);
output = sw.ToString();
}
}
}
}
return output;
}
However, with large portions of XML, I'm getting errors, even though I know it is correctly formatted.
Here is an example error: Unexpected end of file while parsing Name has occurred. Line 1, position 30001.
I'm guessing there is a limit on the buffering, but I can't quite work it out - the code is within an SSIS package and different script tasks produce and translate the XML.
I appreciate any help!

Microsoft Open XML SDL 2.0 append document to template document asp.net c#

My asp.net c# web-application is creating word documents by filling an existing template word document with data. Now I need to add a further existing documents to that document as next page.
For example: My template has two pages. The document I need to append has one page. As result I want to get one word document with 3 pages.
How do I append documents to an existing word document in asp.net/c# with the Microsoft Open XML SDK 2.0?
Use this code to merge two documents
using System.Linq;
using System.IO;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
namespace altChunk
{
class Program
{
static void Main(string[] args)
{
string fileName1 = #"c:\Users\Public\Documents\Destination.docx";
string fileName2 = #"c:\Users\Public\Documents\Source.docx";
string testFile = #"c:\Users\Public\Documents\Test.docx";
File.Delete(fileName1);
File.Copy(testFile, fileName1);
using (WordprocessingDocument myDoc =
WordprocessingDocument.Open(fileName1, true))
{
string altChunkId = "AltChunkId1";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk =
mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open(fileName2, FileMode.Open))
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document
.Body
.InsertAfter(altChunk, mainPart.Document.Body
.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
}
}
}
This works flawlessly and the same code is also available here.
There is another approach that uses Open XML PowerTools

HTML File with xml and xmln declaration cannot be transformed

I've generated a HTML file and the top html declaration looks like this:
<html xml:lang="de-CH" lang="de-CH" xmlns="http://www.w3.org/1999/xhtml">
And then I try to convert it into a different format with this .Net 4 code:
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Ignore;
XslCompiledTransform proc = new XslCompiledTransform();
proc.Load("Html_to_Sql.xslt");
fsHtmlXml = new FileStream(file.Name, FileMode.Create);
html = XmlReader.Create(file.FullName, settings);
proc.Transform(html, null, fsHtmlXml);
Unfortunately nothing happens as long as I have the xml, lang and xmlns attributes in the HTML.
Why is that?
Your XSLT will need to refer to elements in the http://www.w3.org/1999/xhtml namespace. You haven't posted your XSLT code yet, the the problem most likely lies in that file.
Will this work via XML and XPath
using System;
using System.IO;
using System.Xml;
using System.Xml.Xsl;
using System.Xml.XPath;
public class TransformXML
{
//This will transform xml document using xslt and produce result xml document
//and display it
public static void Main(string[] args)
{
try
{
XPathDocument myXPathDocument = new XPathDocument(sourceDoc);
XslTransform myXslTransform = new XslTransform();
XmlTextWriter writer = new XmlTextWriter(resultDoc, null);
myXslTransform.Load(xsltDoc);
myXslTransform.Transform(myXPathDocument, null, writer);
writer.Close();
StreamReader stream = new StreamReader (resultDoc);
Console.Write("**This is result document**\n\n");
Console.Write(stream.ReadToEnd());
}
catch (Exception e)
{
Console.WriteLine ("Exception: {0}", e.ToString());
}
}
}
The xmlns attribute specifies the namespace of the XML document. This works in much the same way as namespaces within C#, where two classes with the same name but different namespaces are considered to be completely different classes. Changing the XML namespaces means that your XSLT templates / XPath will not match.

save xslt output transformation in a file

I have a "book.xml" and "book.xslt" the output has been set on text-mode, I don't want to load text file by browser because it is so heavy I need some code to save output text file in hard-drive. How I can implement this kind of transformation by c# ?
This should work:
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(#"c:\book.xslt");
xslt.Transform(#"c:\book.xml", #"c:\output.txt");
Obviously your paths will need to be updated to match your particular scenario, for example:
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(Server.MapPath("~/book.xslt"));
xslt.Transform(Server.MapPath("~/book.xml"), Server.MapPath("~/output.txt") );
This will read your XSL file from the root of the site and transform /book.xml and save it to /output.txt.
You can find out more about the System.Xml.Xsl.XslCompiledTransform class here:
System.Xml.Xsl.XslCompiledTransform
Use the System.Xml.Xsl.XslCompiledTransform class.
XslCompiledTransform transform = new XslCompiledTransform();
transform.Load(Server.MapPath("~/book.xslt"));
transform.Transform(Server.MapPath("~/book.xml"), Server.MapPath("~/output.xml"));
(Note: this assumed all the documents are stored in the root of the web application)
By using xmwwriter and xdocument like so:
using System.Data;
using System.Xml;
using System.Xml.XPath;
using System.Xml.Xsl;
public void xmltest(string xmlFilePath, string xslFilePath, string outFilePath)
{
var doc = new XPathDocument(xmlFilePath);
var writer = XmlWriter.Create(outFilePath);
var transform = new XslCompiledTransform();
// The following two lines are only needed if you need scripting.
// Because of security considerations read up on that topic on MSDN first.
var settings = new XsltSettings();
settings.EnableScript = true;
transform.Load(xslFilePath,settings,null);
transform.Transform(doc, writer);
}
More info here:
http://msdn.microsoft.com/en-us/library/14689742.aspx
regards

Categories

Resources