I'm trying to embed an OpenXML word document inside of another OpenXML word document and it's creating corrupted files. I've already review this post
http://blogs.msdn.com/b/brian_jones/archive/2009/06/30/embedding-an-open-xml-file-in-another-open-xml-file.aspx and this stack overflow post Embedding an OpenXML document within another OpenXml document and I think I'm following the instructions, but I'm not getting the same results.
My code looks something like this.
foreach (KeyValuePair<string, byte[]> docKV in files)
{
var embeddedPackagePart = document.MainDocumentPart.AddNewPart<EmbeddedPackagePart>("application/vnd.openxmlformats-officedocument.wordprocessingml.document", "rfp" + docKV.key);
var stream = new MemoryStream(docKV.Value);
embeddedPackagePart.FeedData(stream);
stream.Close();
}
So essentially I'm looping through a list of files stored in a dictionary and embedding each file. I've removed the code that creates the actual OpenXML markup to link to the embedded object to try and narrow down the error. As far as I can tell the actual embedding is what creates the issue.
I've also opened the generated document using the OpenXML SDK Productivity tool, and it is missing the embeddings section that I see on documents where I manually embedded a file.
Any thoughts on what I'm doing wrong?
You can read Merging word processing documents. Sample code is available too.
I figured out the issue. I was not calling WordprocessingDocument.Close before disposing of the WordprocessingDocument. If you are adding new document parts you have to invoke Close() for those to get written to the underlying stream.
Related
If you take a .docx file, rename it to .zip, and unzip it, you can view its .xml files. I'm building a program to programmatically inspect these XML properties (no existing API seems to suffice as our company is using a 3rd party program that attaches custom XML to files, and that program does not have an API).
Is there a clean way to access this XML without programmatically saving copies of files as .zip files, opening them, taking out only the XML and then deleting the rest?
use openxml sdk to fetch all the xml elements
WordprocessingDocument document = WordprocessingDocument.Open(this.FilePath, true);
MainDocumentPart mainPart = document.MainDocumentPart;
List<OpenXmlElement> ParagraphElements = new List<OpenXmlElement>();
foreach (var i in mainPart.Document.ChildElements.FirstOrDefault().ChildElements)
{
ParagraphElements.Add(i);
}
Here is your complete solution,
From ParagraphElements all XML elements can be retrieved.
This's easy way to access XML elements present in it.
Have you tried the Open XML SDK for Office?
Allows you to access the xml files inside .docx files.
I'm using Novacode DocX to take some data and convert this to a docx document. This is then put into a memory stream. Now, I want to take this memorystream, using ABCpdf.NET, and convert it to a PDF document. Is this possible? I can't figure out whether this is possible using the ABCpdf.Net.
I've tried simple this just to get it out like this:
var doc = new Doc();
doc.Read(docxMemoryStream);
This would just read the stream into the document. But I can't figure out if I can take this, now document, and convert it into pdf?
This in the end have to be deployed on a CRM server that we cannot control, regarding what to install. So having Word installed is out of the question, which removes the Interop function sadly.
I created a PDF file with library PDFsharp with this code. I open the file with Adobe Reader and the bookmarks are created. Then I created another program when I read this created PDF file and I want to view my bookmarks but the Outlines collection Count is 0 and HasOutline is false. Is it a bug?
Do you have tried to display the bookmarks of a PDF?
Sorry for my English
It's a known limitation that the Outlines collection remains empty when you open an existing document.
You can access the outlines using GetObject() if you have to.
See also:
http://forum.pdfsharp.net/viewtopic.php?p=705#p705
http://forum.pdfsharp.net/viewtopic.php?p=1008#p1008
So, first off here's my code to open the dotx and create a new docx copy (of which the copy is then modified). Cut for brevity, but essentially takes 3 params a data table (to make it usable by legacy systems), the UNC path as a string to a template and a UNC path as a string to the output document:
using (WordprocessingDocument docGenerated = WordprocessingDocument.Open(outputPath, true))
{
docGenerated.ChangeDocumentType(WordprocessingDocumentType.Document);
foreach (SimpleField field in docGenerated.MainDocumentPart.Document.Descendants<SimpleField>())
{
string mergeFieldName = GetFieldName(field).Trim();
DataRow[] dr = dtSchema.Select("FieldName = '" + mergeFieldName + "'");
if (dr.Length > 0)
{
string runProperties = string.Empty;
foreach (RunProperties property in field.Descendants<RunProperties>())
{
runProperties = property.OuterXml;
break;
}
Run run = new Run();
run.Append(new RunProperties(runProperties));
run.Append(new Text(dr[0]["FieldDataValue"].ToString()));
field.Parent.ReplaceChild<SimpleField>(run, field);
}
}
docGenerated.MainDocumentPart.Document.Save();
}
What I did initially was take a .dot template and re-save it as a .dotx and crossed my fingers, didn't work. So instead I tried deleting all merge fields in the .dotx and adding them again. This worked - but it would only find one merge field (as a SimpleField), specifically the last one added before saving the .dotx. Looking further at the template using the open XML productivity tool I can see that all other merge fields are of type w:instrText which is why they're being ignored.
I'm literally just starting out with OpenXML as we're looking to replace our current office automation with it so I know very little at this point. Could someone please instruct me a bit further or point me to a good resource? I've Google'd around a bit but I can't find my specific problem. I am trying to put off reading through the whole SDK documentation (I know, I know!) as I need to get a solution put together quickly so am focusing on a single task which is to take our existing .dot templates, convert them to .dotx and just replace merge fields with data to derive a .docx.
Thanks in advance!
Working with OpenXml - you don't strictly need to use .dotx for your templates, instead you can make your templates just using DocX straight away. A good resource of learning OpenXML is obviously http://openxmldeveloper.org/ and you can find a good pdf read there.
Also worth looking at third party API docx.codeplex.com which I am using now for developing server side doc automation solution for my company. See the example http://cathalscorner.blogspot.co.uk/2009/08/docx-v1007-released.html which is similar to your scenario.. merging fields with data..
Hope this helps..
Here is the link to "C# OpenXML Mail Merge Complete Example" http://www.jarredcapellman.com/2012/10/22/c-openxml-mail-merge-complete-example/
I used it to update my web application code that used to work with mergefields in .DOC (.DOT) files (and required MS Word to be installed and the corresponding DCOM to be configured on the web server). Now the solution requires just OpenXML SDK 2.0 to be installed on the web server and the .DOC (.DOT) templates to be saved as .DOCX (.DOTX).
If you want to use content controls you can also try WordDocumentGenerator. WordDocumentGenerator is an utility to generate Word documents from templates using Visual Studio 2010 and Open XML 2.0 SDK # http://worddocgenerator.codeplex.com
I know there are a lot of other questions on SO about this topic, but I need some more information. It's a two-part question to my requirement: dynamically generate an MS Word document from HTML and prompt for download.
Q1) From what I'm reading it seems that Microsoft.Office.Interop is not designed to be used for server automation since this is just a wrapper around the application and would require Office to be installed on the web server. Is this correct?
I have gotten some of this to work, I get prompted to download, the Word doc saves properly, but the doc shows my markup as the content of the document, not the rendered HTML as the content. From what I've read, it's supposedly possible to export HTML to MS Word simply like this without the need for 3rd party tools or components. I'd also like to avoid the Open XML format as I can't guarantee which version of Word my users have.
Q2) What am I missing here to get my HTML to appear rendered in the MS Word output file? doc.DocumentBody is a string type that contains the entire HTML document.
public FileStreamResult DownloadDocument(string id)
{
/* pseudo-code here to fetch my custom "Document" object from DB */
Document doc = DocumentService.FindById(id);
var fileName = string.Format("{0}.doc", doc.Title);
Response.AddHeader("Content-Disposition", "inline;filename=" + fileName);
return new FileStreamResult(WordStream(doc.DocumentBody), "application/msword");
}
private static Stream WordStream(string body)
{
var ms = new MemoryStream();
byte[] byteInfo = Encoding.ASCII.GetBytes(body);
ms.Write(byteInfo, 0, byteInfo.Length);
ms.Position = 0;
return ms;
}
I have used essentially the same code as you to download html as word documents, and it works fine. I modified my code so that it was the same as yours to test, and it still worked OK, so I wonder if the issue is actually with your HTML.
Have a look at doc.DocumentBody in your debugger, and see if it is valid html.
Is it wrapped in <html><body></body></html>?
I had a test - I think if you leave out the body tags, you'll end up seeing raw html.
yes, and running Office applications on server without UI is not supported. (Note: "not supported" does not mean it will not work, but simply no guarantees of any kind made).
use File method to return file - http://msdn.microsoft.com/en-us/library/dd505200.aspx, Check out this popular answer - How can I present a file for download from an MVC controller?.
Microsoft.Office.Interop is not designed to be used for server automation since this is just a wrapper around the application and would require Office to be installed on the web server. Is this correct?
Yes.
What am I missing here to get my HTML to appear rendered in the MS Word output file?
Well, you need to create a Word document, of course! Word's file format and the HTML file format are different.
There are some very good commercial libraries out there that provide a nice API for generating Office documents programmatically. With Office XML, this is not quite as necessary - it's now much more feasible to generate the XML that Word knows how to read.