iText 7 - HTML to PDF write to MemoryStream instead of file - c#

I'm using iText 7, specifically the HtmlConverter.ConvertToDocument method, to convert HTML to PDF. The problem is, I would really rather not create a PDF file on my server, I'd rather do everything in memory and just send it to the users browser so they can download it.
Could anyone show me an example of how to use this library but instead of writing to file write to a MemoryStream so I can send it directly to the browser?
I've been looking for examples and all I can seem to find are those which refer to file output.
I've tried the following, but keep getting an error about cannot access a closed memory stream.
public FileStreamResult pdf() {
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream)) {
pdfWriter.SetCloseStream(false);
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter)) {
//Returns the written-to MemoryStream containing the PDF.
byte[] byteInfo = workStream.ToArray();
workStream.Write(byteInfo, 0, byteInfo.Length);
workStream.Position = 0;
return new FileStreamResult(workStream, "application/pdf");
}
//return new FileStreamResult(workStream, "application/pdf");
}
}

You meddle with the workStream before the document and pdfWriter have finished creating the result in it. Furthermore, the intent of your meddling is unclear, first you retrieve the bytes from the memory stream, then you write them back into it...?
public FileStreamResult pdf()
{
var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
{
pdfWriter.SetCloseStream(false);
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter))
{
}
}
workStream.Position = 0;
return new FileStreamResult(workStream, "application/pdf");
}
By the way, as you are essentially doing nothing special with the document returned by HtmlConverter.ConvertToDocument, you probably could use a different HtmlConverter method with less overhead in your code.

Generally this approach works
using (var ms = new MemoryStream())
{
//yourStream.Seek(0, SeekOrigin.Begin)
yourStream.CopyTo(ms);
}

Related

Unable to generate readable PDF using iText 7's HtmlConverter.ConvertToDocument method

I am trying to use itext7 and itext7.pdfhtml to generate a PDF from some HTML on a server and I then return the written-to MemoryStream as a FileContentResult to the client. However, when the client receives the PDF all they get is an unopenable PDF file which, if the file extension is changed to a .txt, can be seen to contain nothing more than "%PDF-1.7%âãÏÓ".
Having experimented with HtmlConverter.ConvertToPdf I was able to get the simple content in the example below to work (at least the body of it anyway); however, I believe I need HtmlConverter.ConvertToDocument instead now since I need the ability to add a footer and set the page size and margins on the resultant PDF with settings not held within the HTML passed in (in other words I need the iText Document object to manipulate).
Here is the code I am using...
public static byte[] GeneratePdfFromHtml(Action<Document> pdfModifier)
{
//Gives the converter some very simple HTML for it to create something with!
var html = "<html><head><title>Extremely Basic Title</title></head><body>Extremely Basic Content</body></html>";
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter))
{
//Passes the document to a delegated function to perform some content, margin or page size manipulation
pdfModifier(document);
//Returns the written-to MemoryStream containing the PDF.
return workStream.ToArray();
}
}
This was the version I had working but it lacks the object I need to pass to my delegate.
public static byte[] GeneratePdfFromHtml(Action<Document> pdfModifier)
{
//Gives the converter some very simple HTML for it to create something with!
var html = "<html><head><title>Extremely Basic Title</title></head><body>Extremely Basic Content</body></html>";
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
{
HtmlConverter.ConvertToPdf(html, pdfWriter);
//No longer able to call this delegate as there is no Document object to use.
//pdfModifier(document);
//Returns the written-to MemoryStream containing the PDF.
return workStream.ToArray();
}
}
In the version you had working you used HtmlConverter.ConvertToPdf. This call internally also creates a Document object but closes it before returning.
Closing the Document object causes all data of the generated PDF still in memory to be flushed to the result stream which then gets finalized with a PDF trailer.
Thus, your working version returns a finished, complete PDF file.
In your new code, though, you use HtmlConverter.ConvertToDocument. This call returns the used Document object but does not close it: You after all still want to use it for some manipulations.
As you don't close the Document object before calling return workStream.ToArray(), you return an incomplete PDF, in your case only a PDF header section.
Thus, you have to close that Document object before retrieving the bytes from your MemoryStream, e.g. explicitly like this
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter))
{
//Passes the document to a delegated function to perform some content, margin or page size manipulation
pdfModifier(document);
document.Close();
//Returns the written-to MemoryStream containing the PDF.
return workStream.ToArray();
}
or implicitly like this:
using (var workStream = new MemoryStream())
using (var pdfWriter = new PdfWriter(workStream))
{
using (var document = HtmlConverter.ConvertToDocument(html, pdfWriter))
{
//Passes the document to a delegated function to perform some content, margin or page size manipulation
pdfModifier(document);
}
//Returns the written-to MemoryStream containing the PDF.
return workStream.ToArray();
}

Merging N pdf files, created from html using ITextSharp, to another blank pdf file

I need to merge N PDF files into one. I create a blank file first
byte[] pdfBytes = null;
var ms = new MemoryStream();
var doc = new iTextSharp.text.Document();
var cWriter = new PdfCopy(doc, ms);
Later I cycle through html strings array
foreach (NBElement htmlString in someElement.Children())
{
byte[] msTempDoc = getPdfDocFrom(htmlString.GetString(), cssString.GetString());
addPagesToPdf(cWriter, msTempDoc);
}
In getPdfDocFrom I create pdf file using XMLWorkerHelper and return it as byte array
private byte[] getPdfDocFrom(string htmlString, string cssString)
{
var tempMs = new MemoryStream();
byte[] tempMsBytes;
var tempDoc = new iTextSharp.text.Document();
var tempWriter = PdfWriter.GetInstance(tempDoc, tempMs);
tempDoc.Open();
using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssString)))
{
using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(htmlString)))
{
//Parse the HTML
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(tempWriter, tempDoc, msHtml, msCss);
tempMsBytes = tempMs.ToArray();
}
}
tempDoc.Close();
return tempMsBytes;
}
Later on I try to add pages from this PDF file to the blank one.
private static void addPagesToPdf(PdfCopy mainDocWriter, byte[] sourceDocBytes)
{
using (var msOut = new MemoryStream())
{
PdfReader reader = new PdfReader(new MemoryStream(sourceDocBytes));
int n = reader.NumberOfPages;
PdfImportedPage page;
for (int i = 1; i <= n; i++)
{
page = mainDocWriter.GetImportedPage(reader, i);
mainDocWriter.AddPage(page);
}
}}
It breaks when it tries to create a PdfReader from the byte array I pass to the function. "Rebuild failed: trailer not found.; Original message: PDF startxref not found."
I used another library to work with PDF before. I passed 2 PdfDocuments as an objects and just added pages from one to another in cycle. It didn't support Css though, so I had to switch to ITextSharp.
I don't quite get the difference between PdfWriter and PdfCopy.
There a logical error in your code. When you create a document from scratch as is done in the getPdfDocFrom() method, the document isn't complete until you've triggered the Close() method. In this Close() method, a trailer is created as well as a cross-reference (xref) table. The error tells you that those are missing.
Indeed, you do call the Close() method:
tempDoc.Close();
But by the time you Close() the document, it's too late: you have already created the tempMsBytes array. You need to create that array after you close the document.
Edit: I don't know anything about C#, but if MemoryStream clears its buffer after closing it, you could use mainDocWriter.CloseStream = false; so that the MemoryStream isn't closed when you close the document.
In Java, it would be a bad idea to set the "close stream" parameter to false. When I read the answers to the question Create PDF in memory instead of physical file I see that C# probably doesn't always require this extra line.
Remark: merging files by adding PdfImportedPage instances to a PdfWriter is an example of bad taste. If you are using iTextSharp 5 or earlier, you should use PdfCopy or PdfSmartCopy to do that. If you use PdfWriter, you throw away a lot of information (e.g. link annotations).

Loading an iTextSharp Document into MemoryStream

I'm developing an ASP.NET application where I have to send an PDF based on a Table created dinamically on the page as attachment on a email. So I have a function that creates the PDF as iTextSharp Document and returns it. If i try just to save this document, it works fine but I'm having a bad time trying to make it as Stream. I tried several things already, but I always get stuck at some point.
I tried to serialize it, but appears that Document is not serializable. Then I tried to work with PdfCopy, but I couldn't find out how to use this to my problem in specific.
The code right now is like this:
//Table,string,string,Stream
//This document returns fine
Document document = Utils.GeneratePDF(table, lastBook, lastDate, Response.OutputStream);
using (MemoryStream ms = new MemoryStream())
{
PdfCopy copy = new PdfCopy(document, ms);
//Need something here to copy from one to another! OR to make document as Stream
ms.Position = 0;
//Email, Subject, Stream
Utils.SendMail(email, lastBook + " - " + lastDate, ms);
}
Try to avoid passing the native iTextSharp objects around. Either pass streams, files or bytes. I don't have an IDE in front of me right now but you should be able to do something like this:
byte[] Bytes;
using(MemoryStream ms = new MemoryStream()){
Utils.GeneratePDF(table, lastBook, lastDate, ms);
Bytes = ms.ToArray();
}
Then you can either change your Utils.SendMail() to accept a byte array or just wrap it in another stream.
EDIT
You might also be able to just do something like this in your code:
using(MemoryStream ms = new MemoryStream()){
Utils.GeneratePDF(table, lastBook, lastDate, ms);
ms.Position = 0;
Utils.SendMail(email, lastBook + " - " + lastDate, ms);
}
I did this in the past by doing something like the following:
using (Document doc = new Document())
{
MemoryStream msPDFData = new MemoryStream();
PdfWriter writer = PdfWriter.GetInstance(doc, msPDFData);
doc.Open();
doc.Add(new Paragraph("I'm a pdf!");
}
If you need access to the raw data you can also do
byte[] pdfData = msPDFData.ToArray();

How do I return a MemoryStream docx file MVC?

I have a docx file that I would like to return after I make edits. I have the following code...
object useFile = Server.MapPath("~/Documents/File.docx");
object saveFile = Server.MapPath("~/Documents/savedFile.docx");
MemoryStream newDoc = repo.ChangeFile(useFile, saveFile);
return File(newDoc.GetBuffer().ToArray(), "application/docx", Server.UrlEncode("NewFile.docx"));
The file seems fine, but I am getting error messages ("the file being corrupt" and another stating "Word found unreadable content. If you trust the source click Yes"). Any ideas?
Thanks in advance
EDIT
This is the ChangeFile in my Model...
public MemoryStream ChangeFile(object useFile, object saveFile)
{
byte[] byteArray = File.ReadAllBytes(useFile.ToString());
using (MemoryStream ms = new MemoryStream())
{
ms.Write(byteArray, 0, (int)byteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(ms, true))
{
string documentText;
using (StreamReader reader = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
documentText = reader.ReadToEnd();
}
documentText = documentText.Replace("##date##", DateTime.Today.ToShortDateString());
using (StreamWriter writer = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
writer.Write(documentText);
}
}
File.WriteAllBytes(saveFile.ToString(), ms.ToArray());
return ms;
}
}
I use a FileStreamResult:
var cd = new System.Net.Mime.ContentDisposition
{
FileName = fileName,
// always prompt the user for downloading, set to true if you want
// the browser to try to show the file inline
Inline = false,
};
Response.AppendHeader("Content-Disposition", cd.ToString());
return new FileStreamResult(documentStream, "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
Don't use MemoryStream.GetBuffer().ToArray() use MemoryStream.ToArray().
The reason why is GetBuffer() relates to the array used to create the memory stream and not the actual data in the memory stream. The underlaying array could actually differ in size.
Hidden on MSDN:
Note that the buffer contains allocated bytes which might be unused.
For example, if the string "test" is written into the MemoryStream
object, the length of the buffer returned from GetBuffer is 256, not
4, with 252 bytes unused. To obtain only the data in the buffer, use
the ToArray method; however, ToArray creates a copy of the data in
memory.

MemoryStream to virtual file

I have the following:
using(var memoryStream = new MemoryStream())
{
gc.CreatePackage(memoryStream);
}
MemoryStream spits out an excel fil. From the memoryStream, how do I go about actually showing the file that is produced. Note that I do not want to actually save the file to disk but merely display it.
So far I have the following but doesn't seem to work:
using (var memoryStream = new MemoryStream())
{
gc.CreatePackage(memoryStream);
using (var fileStream = File.OpenWrite("Test.xlsx"))
{
memoryStream.WriteTo(fileStream);
}
}
But not sure if I am on the right direction. I get an error saying:
System.Web.Mvc.Controller.File(byte[], string)' is a 'method', which
is not valid in the given context
I am not sure if I am going about this in the right direction.
Making the assumption that you are using ASP.NET MVC, you probably want the File response helper:
using (var memoryStream = new MemoryStream())
{
gc.CreatePackage(memoryStream);
memoryStream.Seek(0, SeekOrigin.Begin);
return File(memoryStream, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
}
If you want to have the file source actually appear in the browser, you can also lie about the MIME type and use text/plain instead. The browser will most likely render this as plain text.
You can also add a third parameter to File in order to specify the filename the download should appear to the end-user to be.
Since the error you post seems to indicate that you're trying to return a generated file from an MVC controller, I think this may be what you're looking for.
public ActionResult MyAction()
{
using (var memoryStream = new MemoryStream())
{
gc.CreatePackage(memoryStream);
//Make sure the position of the stream is at 0
memoryStream.Position = 0;
//Return the contents of the stream with the appropriate MIME type
return File(memoryStream, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
}
}
You could use a MemoryMappedFile to create a virtual file.
example code:
Write:
using (var mmf = MemoryMappedFile.CreateNew("MappedFileName", size, MemoryMappedFileAccess.ReadWriteExecute))
{
using (MemoryMappedViewAccessor accessor = mmf.CreateViewAccessor())
{
accessor.Write(Write your data to mapped file);
}
}
Read:
using (var mmf = MemoryMappedFile.OpenExisting("MappedFileName"))
{
using (MemoryMappedViewAccessor accessor = mmf.CreateViewAccessor())
{
accessor.Read......
}
}

Categories

Resources