Saving an OpenXML Document (Word) generated from a template - c#

I have a bit of code that will open a Word 2007 (docx) document and update the appropriate CustomXmlPart (thus updating the Content Controls in the document itself as they are mapped to the CustomXmlPart) but can't work out how to save this as a new file.! Surely it can't be that hard!
My current thinking is that I need to open the template and copy the content into a new, blank document - file by file, updating the CustomXmlPart when I encounter it.
Call me old fashioned but that sounds a little bit clunky to me!
Why can't I just do a WordprocessingDocument.SaveAs(filename); ...?
Please tell me I am missing something simple here.
Thanks in advance

Are you referring to the OpenXml SDK? Unfortunately, as of OpenXml SDK 2.0, there's no SaveAs method. You'll need to:
Make a temporary copy of your template file, naming it whatever you want.
Perform your OpenXml changes on the above file.
Save the appropriate sections (ie. using the .myWordDocument.MainDocumentPart.Document.Save() method for the main content or someHeaderPart.Header.Save() method for a particular header).

OpenXml 2.8.1 has a SaveAs method that seems to do the trick.
var document = WordprocessingDocument.Open(specificationPath, true);
document.SaveAs("filePath/documentCopy.docx");

Indeed you can, at least, in OpenXml SDK 2.5. However, watchout to work with a copy of the original file, because changes in the XML will be actually reflected in the file. Here you have the methods Load and Save of my custom class (after removing some validation code,...):
public void Load(string pathToDocx)
{
_tempFilePath = CloneFileInTemp(pathToDocx);
_document = WordprocessingDocument.Open(_tempFilePath, true);
_documentElement = _document.MainDocumentPart.Document;
}
public void Save(string pathToDocx)
{
using(FileStream fileStream = new FileStream(pathToDocx, FileMode.Create))
{
_document.MainDocumentPart.Document.Save(fileStream);
}
}
Having "_document" as a WordprocessingDocument instance.

You can use a MemoryStream to write the changes, rather than in the original file. Consequently, you can save that MemoryStream to a new file:
byte[] byteArray = File.ReadAllBytes("c:\\temp\\mytemplate.docx");
using (var stream = new MemoryStream())
{
stream.Write(byteArray, 0, byteArray.Length);
using (var wordDoc = WordprocessingDocument.Open(stream, true))
{
// Do work here
// ...
wordDoc.MainDocumentPart.Document.Save(); // won't update the original file
}
// Save the file with the new name
stream.Position = 0;
File.WriteAllBytes("C:\\temp\\newFile.docx", stream.ToArray());
}

In Open XML SDK 2.5 Close saves changes when AutoSave is true.
See my answer here:
https://stackoverflow.com/a/36335092/3285954

Related

Creating a new Document with a FileStream with Aspose.Pdf

I'm trying to create a new Aspose.Pdf.Document using a FileStream to a new File, but it always throws an "Incorrect File Header" Exception. I need to work with the FileStream so that I can incrementally save the Document when merging other Pdf documents without keeping all of the Streams in scope.
According to the documentation, the following is the code to create a Document with a FileStream (I changed FileMode.Open to FileMode.OpenOrCreate since I don't have an existing Pdf file and want to start with a blank Document).
await using var fileStream = new FileStream(fileName, FileMode.OpenOrCreate, FileAccess.ReadWrite);
var document = new Document(fileStream);
This code throws an "Incorrect File Header" Exception unless the FileStream points to an existing valid Pdf file.
The following code works, but it's kind of silly to create and dispose a Document just so that we can work with the Document through the FileStream.
var fileName = Path.GetTempFileName();
var doc = new Document();
doc.Save(fileName);
doc.Dispose();
await using var fileStream = new FileStream(fileName, FileMode.Open, FileAccess.ReadWrite);
var document = new Document(fileStream);
I have to be missing something painfully obvious, because this is an incredibly simple use case and I don't see anything about it when searching online.
You cannot initialize the Document object with an empty Stream or invalid PDF file. File or Stream should be a valid PDF document. In order to use the incremental saving approach, you can initialize the FileStream with a new file and keep saving the Document into it. For example, please check the below sample code snippet:
using var fileStream = new FileStream(dataDir + "output.pdf", FileMode.OpenOrCreate, FileAccess.ReadWrite);
{
var document = new Document();
document.Pages.Add();
document.Save(fileStream);
document.Pages.Add();
document.Save(fileStream);
}
Please note that the FileStream needs to remain open during the whole process of PDF generation. Along with that, you can also use Document.Save(); method (without any constructor) to implement incremental saving.
We believe that you have also posted a similar inquiry in Aspose.PDF official support forum and we have responded to you there as well. You can please follow up on it there and carry on the discussion in case you need more information.
This is Asad Ali and I work as Developer Evangelist at Aspose.

C# iTextSharp: The process cannot access the file because it is being used by another process

I'm generating a pdf file from a template with iTextSharp, filling each field in this code portion:
PdfReader pdfReader = new PdfReader(templatePath);
try
{
using (FileStream newFileStream = new FileStream(newFilePath, FileMode.Create))
{
using (PdfStamper stamper = new PdfStamper(pdfReader, newFileStream))
{
// fill each field
AcroFields pdfFormFields = stamper.AcroFields;
foreach (KeyValuePair<string, string> entry in content)
{
if (!String.IsNullOrEmpty(entry.Value))
pdfFormFields.SetField(entry.Key, entry.Value);
}
//The below will make sure the fields are not editable in
//the output PDF.
stamper.FormFlattening = true;
stamper.Close();
}
}
}
finally
{
pdfReader.Close();
}
Everything goes fine, file looks ok, but when i try to reopen the file to merge it with some other files I've generated in a unique document i get this error:
2015-11-23 09:46:54,651||ERROR|UrbeWeb|System.IO.IOException: The process cannot access the file 'D:\Sviluppo\communitygov\MaxiAnagrafeImmobiliare\MaxiAnagrafeImmobiliare\cache\IMU\E124\admin\Stampe\Provvedimento_00223850306_2015_11_23_094654.pdf' because it is being used by another process.
Error occurs at this point
foreach (Documento item in docs)
{
string fileName = item.FilePath;
pdfReader = new PdfReader(fileName); // IOException
// some other operations ...
}
Edit: Using Process monitor as suggested I can see there is no close CloseFile operation as I would expect. Can this be the source of the issue?
I've been stuck on this for hours any help is really really appreciated.
Had the same issue with me. This helped a lot.
"You're problem is that you are writing to a file while you are also reading from it. Unlike some file types (JPG, PNG, etc) that "load" all of the data into memory, iTextSharp reads the data as a stream. You either need to use two files and swap them at the end or you can force iTextSharp to "load" the first file by binding your PdfReader to a byte array of the file."
PdfReader reader = new PdfReader(System.IO.File.ReadAllBytes(filePath));
Ref: Cris Haas answer to Cannot access the file because it is being used by another process
I had a similar problem with opening pdf files (for read only) with iTextSharp PdfReader. The first file gave no problem, the second one gave that exception (can not access the file, etc.).
After hours and googling and searching for complicate solutions and twisting my brain, only the simple following code resolved it fully:
iTextSharp_pdf.PdfReader pdfReader = null;
pdfReader = new iTextSharp_pdf.PdfReader(fileName);

Create ZipArchive from XML with base64 encoded content

I am creating an XML file on the fly.
One of it's nodes contains a ZIP file encoded as a BASE64 string.
I then create another ZIP file.
I add this XML file and a few other JPEG files.
I output the file to the browser.
I am unable to open the FINAL ZIP file.
I get: "Windows cannot open the folder. The Compressed(zipped) Folder'c:\path\file.zip' is invalid."
I am able to save my original XML file to the file system.
I can open that XML file, decode the ZIP node and save to the file system.
I am then able to open that Zip file with no problems.
I can create the final ZIP file, OMIT my XML file, and the ZIP file opens no problem.
I seem to only have an issue with I attempt to ZIP an XML file that has a node with ZIP content encoded as a BASE64 string.
Any ideas? Code snipets are below. Heavily edited.
XDocument xDoc = new XDocument();
XDocument xDocReport = new XDocument();
XElement xNodeReport;
using (FileStream fsData = new FileStream(strFullFilePath, FileMode.Open, FileAccess.Read)) {
xDoc = XDocument.Load(fsData);
xNodeReport = xDoc.Element("Data").Element("Reports").Element("Report");
//SNIP
//create XDocument xDocReport
//SNIO
using (MemoryStream zipInMemoryReport = new MemoryStream()) {
using (ZipArchive zipFile = new ZipArchive(zipInMemoryReport, ZipArchiveMode.Update)) {
//Add REPORT to ZIP file
ZipArchiveEntry entryReport = zipFile.CreateEntry("data.xml");
using (StreamWriter writer = new StreamWriter(entryReport.Open())) {
writer.Write(xDocReport.ToString());
} //END USING report entry
}
xNodeReport.Value = System.Convert.ToBase64String(zipInMemoryReport.GetBuffer());
//I am able to write this file to disk and manipulate it no problem.
//File.WriteAllText("c:\\users\\snip\\desktop\\Report.xml",xDoc.ToString());
}
//create ZIP for response
using (MemoryStream zipInMemory = new MemoryStream()) {
using (ZipArchive zipFile = new ZipArchive(zipInMemory, ZipArchiveMode.Update)) {
//Add REPORT to ZIP file
ZipArchiveEntry entryReportWrapper = zipFile.CreateEntry("Report.xml");
//THIS IS THE STEP THAT makes the Zip "invalid". Although i can open and manipulate this source file no problem.
//********
using (StreamWriter writer = new StreamWriter(entryReportWrapper.Open())) {
xDoc.Save(writer);
}
//Add JPEG(s) to report
//Create Charts
if (chkDLSalesPrice.Checked) {chartDownloadSP.SaveImage(entryChartSP.Open(), ChartImageFormat.Jpeg);}
if (chkDLSalesDOM.Checked) {chartDownloadDOM.SaveImage(entryChartDOM.Open(), ChartImageFormat.Jpeg);}
if (chkDLSPLP.Checked) {chartDownloadSPLP.SaveImage(entryChartSPLP.Open(), ChartImageFormat.Jpeg);}
if (chkDLSPLP.Checked) {chartDownloadLP.SaveImage(entryChartLP.Open(), ChartImageFormat.Jpeg);}
} // END USING ziparchive
Response.Clear();
Response.AppendHeader("content-disposition", "attachment; filename=file.zip");
Response.ContentType = "application/zip";
Response.BinaryWrite(zipInMemory.GetBuffer());
Response.End();
Without a good, minimal, complete code example, it's impossible to know for sure what bugs are in the code. But there are at least two apparent errors in the code snippet you posted, one of which could easily be responsible for the "invalid .zip" error:
In the statement writer.Write(xDocReport.ToString());, the variable xDocReport has not been initialized to anything useful, at least not in the code you posted. So you'll get an empty XML document in the archive.
Since the code example is incomplete, it's possible you just omitted from the code example in your question the initialization of that variable to something else. In any case, even if you didn't that would just lead to an empty XML document in the archive, not an invalid archive.
More problematic though…
You are calling GetBuffer() on your MemoryStream objects, instead of ToArray(). You want the latter. The former gets the entire backing buffer for the MemoryStream object, including the uninitialized bytes past the end of the valid stream. Since a valid .zip file includes a CRC value at the end of the file, adding extra data beyond that causes anything trying to read the file as a .zip archive to miss the correct CRC, reading the uninitialized data instead.
Replace your calls to GetBuffer() with calls to ToArray() instead.
If the above does not lead to a solution for your problem, you should edit your post, to provide a better code example.
One last comment: there is no point in initializing a variable like xDoc to an empty XDocument object when you're going to just replace that object with a different one (e.g. by calling XDocument.Load()).

How to set custom properties in the currently active Word document via OpenXML

So far I've been able to set custom properties to a Word doc by using VSTO and by adding a package stream to the active document as it follows
public static void SetCustomProperty(Microsoft.Office.Interop.Word.Document doc, string propertyName, object propertyValue)
{
using (MemoryStream stream = new MemoryStream())
using ((WordprocessingDocument wordDoc = WordprocessingDocument.Create(stream, WordprocessingDocumentType.Document, true))
{
SetProperty(wordDoc, propertyName, propertyValue);
// Flush the contents of the package.
wordDoc.Package.Flush();
// Convert back to flat OPC by using this in-memory package.
XDocument xDoc = OpcHelper.OpcToFlatOpc(wordDoc.Package);
// Return the xml string.
string openxml = xDoc.ToString();
// Add to Word doc
doc.CustomXMLParts.Add(openxml);
}
}
The SetProperty method works as explained here and the OpcHelper can be found here and is explained here.
The problem is that my custom property is inserted in a xml file (e.g. item1.xml) that is located in the folder document.zip\customXml of the OpenXML file format. Later on when I want to read my custom property I use the WordProcessingDocument.CustomFilePropertiesPart which is empty. In fact I found that CustomFilePropertiesPart references the document.zip\docProps\custom.xml file.
So instead of using doc.CustomXMLParts.Add(openxml); what should I use to populate the right xml file, i.e. document.zip\docProps\custom.xml?
EDIT
I tried already the solution proposed by Mishra without success, i.e custom properties were not always saved. However since he posted this solution I tried again and I found here that you firstly need to mark the document as unsaved:
doc.CustomDocumentProperties.Add("MyProp", False, MsoDocProperties.msoPropertyTypeNumber, 123);
doc.Saved = false;
doc.Save();
you cant set custome properties using CustomXMLParts collection. If you have document open better keep it simple and use CustomDocumentProperties collection, its quite fast and easy. I would use open XML in open doc only if the data to insert is vary large.

How to store formatted snippets of Microsoft Word documents in sql server

I need to extract formatted text snippets of a Word document and store it inside an SQL Server table, for later processing and then reinsertion in the Word document using C#.
I've had a look at the Word DOM and it seems that I need to use a combination of the Document.Load(), Document.Save() and Range.Copy(), Range.Paste() methods to create a file for each snippets that I then load into the DB.
Isn't there a easier (more efficient way)?
By the way the code snippets can be hidden text and I was thinking about storing the snippets as RTF.
Finally I got to use Aspose.Words for .NET to extract the code snippets from the Word file I'm interested in and store them as RTF:
// Get insteresting code snippets (in this case text runs with
// style "tw4winMark")
Document sourceDocument = new Document(fileName);
var runs = sourceDocument.GetChildNodes(NodeType.Run, true)
.Select(r => r.Font.StyleName == "tw4winMark").ToList();
// Store snippets into temporary document
// Read Aspose documentation for details
Document document = new Document();
if (runs.Count > 0) {
NodeImporter nodeImporter = new NodeImporter(
runs[0].Document,
document,
ImportFormatMode.KeepSourceFormatting
);
foreach (Run run in runs) {
Run importedRun = nodeImporter.ImportNode(run, true) as Run;
importedRun.Font.Hidden = false;
document.Sections[0].Body.Paragraphs[0].AppendChild(importedRun);
}
}
// save temporary document in MemoryStream as RTF
RtfSaveOptions saveOptions = new RtfSaveOptions();
MemoryStream ms = new MemoryStream();
document.Save(ms, saveOptions);
// retrieve RTF from MemoryStream
ms.Seek(0, SeekOrigin.Begin);
StreamReader sr = new StreamReader(ms);
string rtf = sr.ReadToEnd();
One can then store the rtf into a text field of the database as usual and edit it in a RTF text control.
Document.load, then select the range via a RANGE object, then use the XML property of the range object to get the XML of that range and store it.
You can later insert the XML into another document using the reverse process.
Editing the snippets might prove interesting though, because I'm not aware of any web based WORD compatible editors.

Categories

Resources