I am trying to learn how to use with Microsoft's Open XML SDK. I followed their steps on how to create a Word document using a FileStream and it worked perfectly. Now I want to create a Word document but only in memory, and wait for the user to specify whether they would like to save the file or not.
This document by Microsoft says how to deal with in-memory documents using MemoryStream, however, the document is first loaded from an existing file and "dumped" into a MemorySteam. What I want is to create a document entirely in memory (not based on a file in a drive). What I thought would achieve that was this code:
// This is almost the same as Microsoft's code except I don't
// dump any files into the MemoryStream
using (var mem = new MemoryStream())
{
using (var doc = WordprocessingDocument.Create(mem, WordprocessingDocumentType.Document, true))
{
doc.AddMainDocumentPart().Document = new Document();
var body = doc.MainDocumentPart.Document.AppendChild(new Body());
var paragraph = body.AppendChild(new Paragraph());
var run = paragraph.AppendChild(new Run());
run.AppendChild(new Text("Hello docx"));
using (var file = new FileStream(destination, FileMode.CreateNew))
{
mem.WriteTo(file);
}
}
}
But the result is a file that is 0KB and that can't be read by Word. At first I thought it was because of the size of the MemoryStream so I provided it with an initial size of 1024 but the results were the same. On the other hand if I change the MemoryStream for a FileStreamit works perfectly.
My question is whether what I want to do is possible, and if so, how? I guess it must be possible, just not how I'm doing it. If it isn't possible what alternative do I have?
There's a couple of things going on here:
First, unlike Microsoft's sample, I was nesting the using block code that writes the file to disk inside the block that creates and modifies the file. The WordprocessingDocument gets saved to the stream until it is disposed or when the Save() method is called. The WordprocessingDocument gets disposed automatically when reaching the end of it's using block. If I had not nested the third using statement, thus reaching the end of the second using statement before trying to save the file, I would have allowed the document to be written to the MemoryStream- instead I was writing a still empty stream to disk (hence the 0KB file).
I suppose calling Save()might have helped, but it is not supported by .Net core (which is what I'm using). You can check whether Save()is supported on you system by checking CanSave.
/// <summary>
/// Gets a value indicating whether saving the package is supported by calling <see cref="Save"/>. Some platforms (such as .NET Core), have limited support for saving.
/// If <c>false</c>, in order to save, the document and/or package needs to be fully closed and disposed and then reopened.
/// </summary>
public static bool CanSave { get; }
So the code ended up being almost identical to Microsoft's code except I don't read any files beforehand, rather I just begin with an empty MemoryStream:
using (var mem = new MemoryStream())
{
using (var doc = WordprocessingDocument.Create(mem, WordprocessingDocumentType.Document, true))
{
doc.AddMainDocumentPart().Document = new Document();
var body = doc.MainDocumentPart.Document.AppendChild(new Body());
var paragraph = body.AppendChild(new Paragraph());
var run = paragraph.AppendChild(new Run());
run.AppendChild(new Text("Hello docx"));
}
using (var file = new FileStream(destination, FileMode.CreateNew))
{
mem.WriteTo(file);
}
}
Also you don't need to reopen the document before saving it, but if you do remember to use Open() instead of Create() because Create() will empty the MemoryStream and you'll also end with a 0KB file.
You're passing mem to WordprocessingDocument.Create(), which is creating the document from the (now-empty) MemoryStream, however, I don't think that is associating the MemoryStream as the backing store of the document. That is, mem is only the input of the document, not the output as well. Therefore, when you call mem.WriteTo(file);, mem is still empty (the debugger would confirm this).
Then again, the linked document does say "you must supply a resizable memory stream to [Open()]", which implies that the stream will be written to, so maybe mem does become the backing store but nothing has been written to it yet because the AutoSave property (for which you specified true in Create()) hasn't had a chance to take effect yet (emphasis mine)...
Gets a flag that indicates whether the parts should be saved when disposed.
I see that WordprocessingDocument has a SaveAs() method, and substituting that for the FileStream in the original code...
using (var mem = new MemoryStream())
using (var doc = WordprocessingDocument.Create(mem, WordprocessingDocumentType.Document, true))
{
doc.AddMainDocumentPart().Document = new Document();
var body = doc.MainDocumentPart.Document.AppendChild(new Body());
var paragraph = body.AppendChild(new Paragraph());
var run = paragraph.AppendChild(new Run());
run.AppendChild(new Text("Hello docx"));
// Explicitly close the OpenXmlPackage returned by SaveAs() so destination doesn't stay locked
doc.SaveAs(destination).Close();
}
...produces the expected file for me. Interestingly, after the call to doc.SaveAs(), and even if I insert a call to doc.Save(), mem.Length and mem.Position are both still 0, which does suggest that mem is only used for initialization.
One other thing I would note is that the sample code is calling Open(), whereas you are calling Create(). The documentation is pretty sparse as far as how those two methods differ, but I would have suggested you try creating your document with Open() instead...
using (MemoryStream mem = new MemoryStream())
using (WordprocessingDocument doc = WordprocessingDocument.Open(mem, true))
{
// ...
}
...however when I do that Open() throws an exception, presumably because mem has no data. So, it seems the names are somewhat self-explanatory in that Create() initializes new document data whereas Open() expects existing data. I did find that if I feed Create() a MemoryStream filled with random garbage...
using (var mem = new MemoryStream())
{
// Fill mem with garbage
byte[] buffer = new byte[1024];
new Random().NextBytes(buffer);
mem.Write(buffer, 0, buffer.Length);
mem.Position = 0;
using (var doc = WordprocessingDocument.Create(mem, WordprocessingDocumentType.Document, true))
{
// ...
}
}
...it still produces the exact same document XML as the first code snippet above, which makes me wonder why Create() even needs an input Stream at all.
I was facing the same problem today, after all, the solution is closing the document to fill the memorystream, here is the example, Lance U. Matthews's example help me alot, and finally I realized, after cheking others document types exports, after fill thems, each one calls method Close, but, Microsoft example doesn't show it
private MemoryStream GenerateWord(DataTable dt)
{
MemoryStream mStream = new MemoryStream();
// Create Document
OpenXMLPackaging.WordprocessingDocument wordDocument =
OpenXMLPackaging.WordprocessingDocument.Create(mStream, OpenXML.WordprocessingDocumentType.Document, true);
// Add a main document part.
OpenXMLPackaging.MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();
mainPart.Document = new OpenXMLWordprocessing.Document();
OpenXMLWordprocessing.Body body = mainPart.Document.AppendChild(new OpenXMLWordprocessing.Body());
OpenXMLWordprocessing.Table table = new OpenXMLWordprocessing.Table();
body.AppendChild(table);
OpenXMLWordprocessing.TableRow tr = new OpenXMLWordprocessing.TableRow();
foreach (DataColumn c in dt.Columns)
{
tr.Append(new OpenXMLWordprocessing.TableCell(new OpenXMLWordprocessing.Paragraph(new OpenXMLWordprocessing.Run(new OpenXMLWordprocessing.Text(c.ColumnName.ToString())))));
}
table.Append(tr);
foreach (DataRow r in dt.Rows)
{
if (dt.Rows.Count > 0)
{
OpenXMLWordprocessing.TableRow dataRow = new OpenXMLWordprocessing.TableRow();
for (int h = 0; h < dt.Columns.Count; h++)
{
dataRow.Append(new OpenXMLWordprocessing.TableCell(new OpenXMLWordprocessing.Paragraph(new OpenXMLWordprocessing.Run(new OpenXMLWordprocessing.Text(r[h].ToString())))));
}
table.Append(dataRow);
}
}
mainPart.Document.Save();
wordDocument.Close();
mStream.Position = 0;
return mStream;
}
Related
I'm trying to merge two (or more) Crystal Reports in an ASP.net MVC project and I downloaded the itext7 NuGet package to do so. I'm trying to put together a simple proof-of-concept in which I concatenate a pdf with itself in a single method:
var rpt1 = new CrystalDecisions.CrystalReports.Engine.ReportDocument();
var rpt2 = new CrystalDecisions.CrystalReports.Engine.ReportDocument();
rpt1.Load(Server.MapPath("~/Reports/MyReport.rpt");
rpt2.Load(Server.MapPath("~/Reports/MyReport.rpt");
DataTable table = GetDataMethod();
rpt1.SetDataSource(table);
rpt2.SetDataSource(table);
Stream stream = rpt.ExportToStream(ExportFormatType.PortableDocFormat);
var write = new PdfWriter(stream);
var doc = new PdfDocument(write);
var merger = new PdfMerger(doc);
var doc1 = new PdfDocument(new PdfReader(rpt1.ExportToStream(ExportFormatType.PortableDocFormat)));
var doc2 = new PdfDocument(new PdfReader(rpt2.ExportToStream(ExportFormatType.PortableDocFormat)));
merger.Merge(doc1, 1, doc1.GetNumberOfPages());
merger.Merge(doc2, 1, doc2.GetNumberOfPages());
doc.CopyPagesTo(1, doc2.GetNumberOfPages(), doc2);
stream.Flush();
stream.Position = 0;
return this.File(stream, "application/pdf", "DownloadName.pdf");
You can see I'm sort of throwing everything at the wall and seeing what sticks insofar as I'm using both PdfMerger.Merger() and PdfDocument.CopyPagesTo(), and I think either of those should be sufficient to do the job by itself? (And, of course, I ran the code trying each of those by themselves as well as together.) But when I run the above code the PDF which downloads is unmerged, i.e. the report only appears once. (If I run it with two different reports, then only the first report appears.)
Now, I'm returning the stream while I'm doing all the interesting stuff with the PdfMerger and PdfDocument objects, so it makes sense to me that the stream would be unchanged. But all the examples of using iText 7 I've found return either the stream or a byte array (e.g., this StackOverflow question), so that seems to be the way it is supposed to work.
Any changes I've made to the code either have no effect, throw an error, or result in the downloaded file being unreadable by the browser (i.e. not recognized as a PDF). For example, I tried converting the stream to a byte array and returning that:
using (var ms = new MemoryStream()) {
stream.CopyTo(ms);
byte[] bytes = ms.ToArray();
return new FileContentResult(bytes, "application/pdf");
}
but the browser couldn't open the download then. The same thing happened when I tried closing the PdfDocument before returning the stream (trying it to force it to write the merge to the stream).
There is a lot of confusion with streams in your code. Normally a stream is used either for input or for output. MemoryStream can be used for both, but you need to make sure to not close it to be able to reuse it. It's often simpler and cleaner to create a new instance with the underlying bytes than reusing existing ones, especially taking into account that it does not affect the performance much as the underlying heavy array structures will be reused by new instances anyway. Here is an example of how yo distinguish between the streams. ExportToStream returns you a stream from which you can obtain the byte array with the bytes of your PDF files, then you load those documents into iText and you also create the third document that you will merge the two source documents into. Then you have to make sure to call PdfDocument#Close() to tell iText to finalize your documents and then you can fetch the resultant bytes of the merged document and pass them along, wrapping them into a stream if necessary
var rpt1 = new CrystalDecisions.CrystalReports.Engine.ReportDocument();
var rpt2 = new CrystalDecisions.CrystalReports.Engine.ReportDocument();
rpt1.Load(Server.MapPath("~/Reports/MyReport.rpt");
rpt2.Load(Server.MapPath("~/Reports/MyReport.rpt");
DataTable table = GetDataMethod();
rpt1.SetDataSource(table);
rpt2.SetDataSource(table);
var report1Stream = (MemoryStream)rpt1.ExportToStream(ExportFormatType.PortableDocFormat);
var report2Stream = (MemoryStream)rpt2.ExportToStream(ExportFormatType.PortableDocFormat);
var doc1 = new PdfDocument(new PdfReader(new MemoryStream(report1Stream.ToArray())));
var doc2 = new PdfDocument(new PdfReader(new MemoryStream(report2Stream.ToArray())));
var outStream = new MemoryStream();
var write = new PdfWriter(outStream);
var doc = new PdfDocument(write);
var merger = new PdfMerger(doc);
merger.Merge(doc1, 1, doc1.GetNumberOfPages());
merger.Merge(doc2, 1, doc2.GetNumberOfPages());
doc.Close();
doc1.Close();
doc2.Close();
return this.File(new MemoryStream(outStream.ToArray()), "application/pdf", "DownloadName.pdf");
I'm using iText 7, specifically the HtmlConverter.ConvertToDocument method, to convert HTML to PDF. The problem is, I would really rather not create a PDF file on my server, I'd rather do everything in memory and just send it to the users browser so they can download it.
Could anyone show me an example of how to use this library but instead of writing to file write to a MemoryStream so I can send it directly to the browser?
I've been looking for examples and all I can seem to find are those which refer to file output.
I've tried the following, but keep getting an error about cannot access a closed memory stream.
public FileStreamResult pdf() {
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream)) {
pdfWriter.SetCloseStream(false);
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter)) {
//Returns the written-to MemoryStream containing the PDF.
byte[] byteInfo = workStream.ToArray();
workStream.Write(byteInfo, 0, byteInfo.Length);
workStream.Position = 0;
return new FileStreamResult(workStream, "application/pdf");
}
//return new FileStreamResult(workStream, "application/pdf");
}
}
You meddle with the workStream before the document and pdfWriter have finished creating the result in it. Furthermore, the intent of your meddling is unclear, first you retrieve the bytes from the memory stream, then you write them back into it...?
public FileStreamResult pdf()
{
var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
{
pdfWriter.SetCloseStream(false);
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter))
{
}
}
workStream.Position = 0;
return new FileStreamResult(workStream, "application/pdf");
}
By the way, as you are essentially doing nothing special with the document returned by HtmlConverter.ConvertToDocument, you probably could use a different HtmlConverter method with less overhead in your code.
Generally this approach works
using (var ms = new MemoryStream())
{
//yourStream.Seek(0, SeekOrigin.Begin)
yourStream.CopyTo(ms);
}
I am trying to use itext7 and itext7.pdfhtml to generate a PDF from some HTML on a server and I then return the written-to MemoryStream as a FileContentResult to the client. However, when the client receives the PDF all they get is an unopenable PDF file which, if the file extension is changed to a .txt, can be seen to contain nothing more than "%PDF-1.7%âãÏÓ".
Having experimented with HtmlConverter.ConvertToPdf I was able to get the simple content in the example below to work (at least the body of it anyway); however, I believe I need HtmlConverter.ConvertToDocument instead now since I need the ability to add a footer and set the page size and margins on the resultant PDF with settings not held within the HTML passed in (in other words I need the iText Document object to manipulate).
Here is the code I am using...
public static byte[] GeneratePdfFromHtml(Action<Document> pdfModifier)
{
//Gives the converter some very simple HTML for it to create something with!
var html = "<html><head><title>Extremely Basic Title</title></head><body>Extremely Basic Content</body></html>";
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter))
{
//Passes the document to a delegated function to perform some content, margin or page size manipulation
pdfModifier(document);
//Returns the written-to MemoryStream containing the PDF.
return workStream.ToArray();
}
}
This was the version I had working but it lacks the object I need to pass to my delegate.
public static byte[] GeneratePdfFromHtml(Action<Document> pdfModifier)
{
//Gives the converter some very simple HTML for it to create something with!
var html = "<html><head><title>Extremely Basic Title</title></head><body>Extremely Basic Content</body></html>";
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
{
HtmlConverter.ConvertToPdf(html, pdfWriter);
//No longer able to call this delegate as there is no Document object to use.
//pdfModifier(document);
//Returns the written-to MemoryStream containing the PDF.
return workStream.ToArray();
}
}
In the version you had working you used HtmlConverter.ConvertToPdf. This call internally also creates a Document object but closes it before returning.
Closing the Document object causes all data of the generated PDF still in memory to be flushed to the result stream which then gets finalized with a PDF trailer.
Thus, your working version returns a finished, complete PDF file.
In your new code, though, you use HtmlConverter.ConvertToDocument. This call returns the used Document object but does not close it: You after all still want to use it for some manipulations.
As you don't close the Document object before calling return workStream.ToArray(), you return an incomplete PDF, in your case only a PDF header section.
Thus, you have to close that Document object before retrieving the bytes from your MemoryStream, e.g. explicitly like this
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter))
{
//Passes the document to a delegated function to perform some content, margin or page size manipulation
pdfModifier(document);
document.Close();
//Returns the written-to MemoryStream containing the PDF.
return workStream.ToArray();
}
or implicitly like this:
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
{
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter))
{
//Passes the document to a delegated function to perform some content, margin or page size manipulation
pdfModifier(document);
}
//Returns the written-to MemoryStream containing the PDF.
return workStream.ToArray();
}
I need to merge N PDF files into one. I create a blank file first
byte[] pdfBytes = null;
var ms = new MemoryStream();
var doc = new iTextSharp.text.Document();
var cWriter = new PdfCopy(doc, ms);
Later I cycle through html strings array
foreach (NBElement htmlString in someElement.Children())
{
byte[] msTempDoc = getPdfDocFrom(htmlString.GetString(), cssString.GetString());
addPagesToPdf(cWriter, msTempDoc);
}
In getPdfDocFrom I create pdf file using XMLWorkerHelper and return it as byte array
private byte[] getPdfDocFrom(string htmlString, string cssString)
{
var tempMs = new MemoryStream();
byte[] tempMsBytes;
var tempDoc = new iTextSharp.text.Document();
var tempWriter = PdfWriter.GetInstance(tempDoc, tempMs);
tempDoc.Open();
using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssString)))
{
using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(htmlString)))
{
//Parse the HTML
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(tempWriter, tempDoc, msHtml, msCss);
tempMsBytes = tempMs.ToArray();
}
}
tempDoc.Close();
return tempMsBytes;
}
Later on I try to add pages from this PDF file to the blank one.
private static void addPagesToPdf(PdfCopy mainDocWriter, byte[] sourceDocBytes)
{
using (var msOut = new MemoryStream())
{
PdfReader reader = new PdfReader(new MemoryStream(sourceDocBytes));
int n = reader.NumberOfPages;
PdfImportedPage page;
for (int i = 1; i <= n; i++)
{
page = mainDocWriter.GetImportedPage(reader, i);
mainDocWriter.AddPage(page);
}
}}
It breaks when it tries to create a PdfReader from the byte array I pass to the function. "Rebuild failed: trailer not found.; Original message: PDF startxref not found."
I used another library to work with PDF before. I passed 2 PdfDocuments as an objects and just added pages from one to another in cycle. It didn't support Css though, so I had to switch to ITextSharp.
I don't quite get the difference between PdfWriter and PdfCopy.
There a logical error in your code. When you create a document from scratch as is done in the getPdfDocFrom() method, the document isn't complete until you've triggered the Close() method. In this Close() method, a trailer is created as well as a cross-reference (xref) table. The error tells you that those are missing.
Indeed, you do call the Close() method:
tempDoc.Close();
But by the time you Close() the document, it's too late: you have already created the tempMsBytes array. You need to create that array after you close the document.
Edit: I don't know anything about C#, but if MemoryStream clears its buffer after closing it, you could use mainDocWriter.CloseStream = false; so that the MemoryStream isn't closed when you close the document.
In Java, it would be a bad idea to set the "close stream" parameter to false. When I read the answers to the question Create PDF in memory instead of physical file I see that C# probably doesn't always require this extra line.
Remark: merging files by adding PdfImportedPage instances to a PdfWriter is an example of bad taste. If you are using iTextSharp 5 or earlier, you should use PdfCopy or PdfSmartCopy to do that. If you use PdfWriter, you throw away a lot of information (e.g. link annotations).
my target is to open an existing pdf, add or remove some pages while preserving the metadata (Author, Subject, ...) in a Windows.Forms C# application.
I use iTextSharp and found examples how to add or remove pages by using the PdfConcatenate class. To keep the metadata I use a PdfStamper afterwards. To speed things up I want to do the modifications in memory before storing the result to disk.
The problem is NOT adding or removing the pages but to keep the metadata in the same step.
So can anybody tell me/giva an example on how to achieve this (better) or am I on the completely wrong track?
Here my current code (see comments for problem related lines):
public void RemovePagesInFile(string documentLocation, int pageIndexFrom, int pageCount)
{
// TB: open the pdf
using (PdfReader sourcePdfReader = new PdfReader(documentLocation))
using (MemoryStream concatenatedTargetStream = new MemoryStream((int)sourcePdfReader.FileLength))
{
// TB: use a concatenator to create a new pdf containing only the desired pages
PdfConcatenate concatenator = new PdfConcatenate(concatenatedTargetStream);
// TB: create a list with the page numbers to keep
List<int> pagesToKeep = new List<int>();
for (int i = 1; i <= pageIndexFrom; i++)
{
pagesToKeep.Add(i);
}
for (int i = pageIndexFrom + pageCount + 1; i <= sourcePdfReader.NumberOfPages; i++)
{
pagesToKeep.Add(i);
}
// TB: execute the page copy
sourcePdfReader.SelectPages(pagesToKeep);
concatenator.AddPages(sourcePdfReader);
// TB: problem(s) here:
// 1. when calling concatenator.Close() the memory stream gets disposed as expected.
// concatenator.Close();
// 2. even when calling concatenator.WriterFlush() the memory stream seems to be missing content (error when creating targetReader (see below)).
// concatenator.Writer.Flush();
// 3. when keeping concatenator open the same error as above occures (I assume not all bytes have been written to the memory stream)
// TB: preserve the meta data from the source document
// => ERROR here: "Rebuild trailer not found. Original Error: PDF startxref not found"
using (PdfReader targetReader = new PdfReader(concatenatedTargetStream))
using (MemoryStream targetStream = new MemoryStream((int)concatenatedTargetStream.Length))
{
using (PdfStamper stamper = new PdfStamper(targetReader, targetStream))
{
stamper.MoreInfo = sourcePdfReader.Info;
// TB: same problem as above with stamper ?
stamper.Close();
}
// TB: close the reader to be able to access the source pdf
sourcePdfReader.Close();
// TB: write the modified pdf to the disk
File.WriteAllBytes(documentLocation, targetStream.ToArray());
}
}
}
Two changes need to be made. Call
concatenator.Writer.CloseStream = false
before calling
concatenator.Close()
Do the same thing for the PdfStamper and you're set.