itextsharp PDF concatenation not working

itextsharp PDF concatenation not working - c#

i have a webapi where i have route to concatenate pdfs and return the byte array using memorystream
public HttpResponseMessage ConcatPDFs(string id, ICollection<int> pdfs) {
using (var db = new ApplicationDbContext())
using (MemoryStream stream = new MemoryStream())
using (Document doc = new Document())
using (PdfCopy pdf = new PdfCopy(doc, stream))
{
doc.Open();
PdfReader reader = null;
PdfImportedPage page = null;
db.PDFForms.Where(p => pdfs.Contains(p.Id)).ToList().ForEach(file =>
{
var filePath = Path.Combine(System.Web.Hosting.HostingEnvironment.MapPath("~/" + string.Format("Content/Uploads/PDFForms/")), file.FileName);
reader = new PdfReader(filePath);
for (int i = 0; i < reader.NumberOfPages; i++)
{
page = pdf.GetImportedPage(reader, i + 1);
pdf.AddPage(page);
}
pdf.FreeReader(reader);
reader.Close();
});
HttpResponseMessage result = new HttpResponseMessage(HttpStatusCode.OK);
result.Content = new ByteArrayContent(stream.ToArray());
result.Content.Headers.ContentType =
new MediaTypeHeaderValue(string.Format("{0}", "application/pdf"));
return result;
}
}
this is my code, however when i stream the data back to the client the browser give me the error, failed to load pdf document. Any idea what i might be doing wrong? thank you.
Edit:
This works if i create a physical file and not use MemoryStream

Perhaps you can debug your application using wireshark?
Get the bytes from the response, paste them in a document and see whether that document can be read with something like Adobe Reader.
From the looks of it though, this does not seem like an iText issue, since you have confirmed that you can create a physical file.
So either it is the implementation of MemoryStream, or some other step that comes in between the creation of the document and sending the get response.
At any rate, I think the first step in solving this problem is storing the bytes you do get back, and comparing them against the physical file.

Related

itext7 CopyPagesTo not opening PDF

I am trying to add a Cover Page PDF file to another PDF file. I am using CopyPagesTo method. CoverPageFilePath will go before any pages in the pdfDocumentFile. I then need to rewrite that new file to the same location. When I run the code and open the new pdf file I get an error about it being damaged.
public static void iText7MergePDF()
{
byte[] modifiedPdfInBytes = null;
string pdfCoverPageFilePath = #"PathtoCoverPage\Cover Page.pdf";
PdfDocument pdfDocumentCover = new PdfDocument(new iText.Kernel.Pdf.PdfReader(pdfCoverPageFilePath));
string pdfDocumentFile =#"PathtoFullDocument.pdf";
var buffer = File.ReadAllBytes(pdfDocumentFile);
using (var originalPdfStream = new MemoryStream(buffer))
using (var modifiedPdfStream = new MemoryStream())
{
var pdfReader = new iText.Kernel.Pdf.PdfReader(originalPdfStream);
var pdfDocument = new PdfDocument(pdfReader, new PdfWriter(modifiedPdfStream));
int numberOfPages = pdfDocumentCover.GetNumberOfPages();
pdfDocumentCover.CopyPagesTo(1, numberOfPages, pdfDocument);
modifiedPdfInBytes = modifiedPdfStream.ToArray();
pdfDocument.Close();
}
System.IO.File.WriteAllBytes(pdfGL, modifiedPdfInBytes);
}

Whenever you have some other type, like a StreamWriter, or here a PdfWriter writing to a Stream, it may not write all the data to the Stream immediately.
Here you Close the pdfDocument for all the data to be written to the MemoryStream.
ie this
modifiedPdfInBytes = modifiedPdfStream.ToArray();
pdfDocument.Close();
Should be
pdfDocument.Close();
modifiedPdfInBytes = modifiedPdfStream.ToArray();

Generate one pdf document with multiple pages converting from html using IText 7

I'm working with IText 7, I've been able to get one html page and generate a pdf for that page, but I need to generate one pdf document from multiple html pages and separated by pages. For example: I have Page1.html, Page2.html and Page3.html. I will need a pdf document with 3 pages, the first page with the content of Page1.html, second page with the content of Page2.html and like that...
This is the code I have and it's working for one html page:
ConverterProperties properties = new ConverterProperties();
PdfWriter writer = new PdfWriter(pdfRoot, new WriterProperties().SetFullCompressionMode(true));
PdfDocument pdfDocument = new PdfDocument(writer);
pdfDocument.AddEventHandler(PdfDocumentEvent.END_PAGE, new HeaderPdfEventHandler());
HtmlConverter.ConvertToPdf(htmlContent, pdfDocument, properties);
Is it possible to loop against the multiple html pages, add a new page to the PdfDocument for every html page and then have only one pdf generated with one page per html page?
UPDATE
I've been following this example and trying to translate it from Java to C#, I'm trying to use PdfMerger and loop around the html pages... but I'm receiving the Exception Cannot access a closed stream, on this line:
temp = new PdfDocument(
new PdfReader(new RandomAccessSourceFactory().CreateSource(baos), rp));
It looks like is related to the ByteArrayOutputStream baos instance. Any suggestions? This is my current code:
foreach (var html in htmlList)
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfDocument temp = new PdfDocument(new PdfWriter(baos));
HtmlConverter.ConvertToPdf(html, temp, properties);
ReaderProperties rp = new ReaderProperties();
temp = new PdfDocument(
new PdfReader(new RandomAccessSourceFactory().CreateSource(baos), rp));
merger.Merge(temp, 1, temp.GetNumberOfPages());
temp.Close();
}
pdfDocument.Close();

You are using RandomAccessSourceFactory and passing there a closed stream which you wrote a PDF document into. RandomAccessSourceFactory expects an input stream instead that is ready to be read.
First of all you should use MemoryStream which is native to .NET world. ByteArrayOutputStream is the class that was ported from Java for internal purposes (although it extends MemoryStream as well). Secondly, you don't have to use RandomAccessSourceFactory - there is a simpler way.
You can create a new MemoryStream instance from the bytes of the MemoryStream that you used to create a temporary PDF with the following line:
baos = new MemoryStream(baos.ToArray());
As an additional remark, it's better to close PdfMerger instance directly instead of closing the document - closing PdfMerger closes the underlying document as well.
All in all, we get the following code that works:
foreach (var html in htmlList)
{
MemoryStream baos = new MemoryStream();
PdfDocument temp = new PdfDocument(new PdfWriter(baos));
HtmlConverter.ConvertToPdf(html, temp, properties);
ReaderProperties rp = new ReaderProperties();
baos = new MemoryStream(baos.ToArray());
temp = new PdfDocument(new PdfReader(baos, rp));
pdfMerger.Merge(temp, 1, temp.GetNumberOfPages());
temp.Close();
}
pdfMerger.Close();

Maybe not so succinctly. I use "using". Similar answer
private byte[] CreatePDF(string html)
{
byte[] binData;
using (var workStream = new MemoryStream())
{
using (var pdfWriter = new PdfWriter(workStream))
{
//Create one pdf document
using (var pdfDoc = new PdfDocument(pdfWriter))
{
pdfDoc.SetDefaultPageSize(iText.Kernel.Geom.PageSize.A4.Rotate());
//Create one pdf merger
var pdfMerger = new PdfMerger(pdfDoc);
//Create two identical pdfs
for (int i = 0; i < 2; i++)
{
using (var newStream = new MemoryStream(CreateDocument(html)))
{
ReaderProperties rp = new ReaderProperties();
using (var newPdf = new PdfDocument(new PdfReader(newStream, rp)))
{
pdfMerger.Merge(newPdf, 1, newPdf.GetNumberOfPages());
}
}
}
}
binData = workStream.ToArray();
}
}
return binData;
}
Create pdf
private byte[] CreateDocument(string html)
{
byte[] binData;
using (var workStream = new MemoryStream())
{
using (var pdfWriter = new PdfWriter(workStream))
{
using (var pdfDoc = new PdfDocument(pdfWriter))
{
pdfDoc.SetDefaultPageSize(iText.Kernel.Geom.PageSize.A4.Rotate());
ConverterProperties props = new ConverterProperties();
using (var document = HtmlConverter.ConvertToDocument(html, pdfDoc, props))
{
}
}
binData = workStream.ToArray();
}
}
return binData;
}

C# Spire Document.SaveToStream not working

I have the following code but it is just creating a 0kb empty file.
using (var stream1 = new MemoryStream())
{
MemoryStream txtStream = new MemoryStream();
Document document = new Document();
fileInformation.Stream.CopyTo(stream1);
document.LoadFromStream(stream1, FileFormat.Auto);
document.SaveToStream(txtStream, FileFormat.Txt);
StreamReader reader = new StreamReader(txtStream);
string text = reader.ReadToEnd();
System.IO.File.WriteAllText(fileName + ".txt", text);
}
I know the data is successfully loaded into document because if do document.SaveToTxt("test.txt", Encoding.UTF8);
instead of the SaveToStream line it exports the file properly.
What am I doing wrong?

When copying a stream, you need to take care to reset the position to 0 if copying. As seen in the answer here, you can do something like this to your streams:
stream1.Position = 0;
txtStream.Position = 0;

iTextSharp System.OutOfMemoryException

I have an issue with trying to create a large PDF file. Basically I have a list of byte arrays, each containing a PDF in a form of a byte array. I wanted to merge the byte arrays into a single PDF. This works great for smaller files (under 2000 pages), but when I tried creating a 12,00 page file it bombed). Originally I was using MemoryStream but after some research, a common solution was to use a FileStream instead. So I tried a file stream approach, however get similar results. The List contains 3,800 records, each containing 4 pages. MemoryStream bombs after around 570. FileStream after about 680 records. The current file size after the code crashed was 60MB. What am I doing wrong? Here is the code I have, and the code crashes on "copy.AddPage(curPg);" directive, inside the "for(" loop.
private byte[] MergePDFs(List<byte[]> PDFs)
{
iTextSharp.text.Document doc = new iTextSharp.text.Document();
byte[] completePDF;
Guid uniqueId = Guid.NewGuid();
string tempFileName = Server.MapPath("~/" + uniqueId.ToString() + ".pdf");
//using (MemoryStream ms = new MemoryStream())
using(FileStream ms = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
{
iTextSharp.text.pdf.PdfCopy copy = new iTextSharp.text.pdf.PdfCopy(doc, ms);
doc.Open();
int i = 0;
foreach (byte[] PDF in PDFs)
{
i++;
// Create a reader
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(PDF);
// Cycle through all the pages
for (int currentPageNumber = 1; currentPageNumber <= reader.NumberOfPages; ++currentPageNumber)
{
// Read a page
iTextSharp.text.pdf.PdfImportedPage curPg = copy.GetImportedPage(reader, currentPageNumber);
// Add the page over to the rest of them
copy.AddPage(curPg);
}
// Close the reader
reader.Close();
}
// Close the document
doc.Close();
// Close the copier
copy.Close();
// Convert the memorystream to a byte array
//completePDF = ms.ToArray();
}
//return completePDF;
return GetPDFsByteArray(tempFileName);
}

A couple of notes:
PdfCopy implements iDisposable, so you should try and see if a using helps.
PdfCopy.FreeReader() will help.
Anyway, not sure if you're using MVC or WebForms, but here's a simple working HTTP handler tested with a 15 page 125KB test file that runs on my workstation:
<%# WebHandler Language="C#" Class="MergeFiles" %>
using System;
using System.Collections.Generic;
using System.Web;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
public class MergeFiles : IHttpHandler
{
public void ProcessRequest(HttpContext context)
{
List<byte[]> pdfs = new List<byte[]>();
var pdf = File.ReadAllBytes(context.Server.MapPath("~/app_data/test.pdf"));
for (int i = 0; i < 4000; ++i) pdfs.Add(pdf);
var Response = context.Response;
Response.ContentType = "application/pdf";
Response.AddHeader(
"content-disposition",
"attachment; filename=MergeLotsOfPdfs.pdf"
);
Response.BinaryWrite(MergeLotsOfPdfs(pdfs));
}
byte[] MergeLotsOfPdfs(List<byte[]> pdfs)
{
using (var ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
for (int i = 0; i < pdfs.Count; ++i)
{
using (PdfReader reader = new PdfReader(
new RandomAccessFileOrArray(pdfs[i]), null))
{
copy.AddDocument(reader);
copy.FreeReader(reader);
}
}
}
}
return ms.ToArray();
}
}
public bool IsReusable { get { return false; } }
}
Tried to make the output file similar to what you described in the question, but YMMV, depending on how large the individual PDFs you're dealing with are in size. Here's the test output from my run:

So after a lot of messing around, I realized that there just was no way around it. However, I did manage to find a work-around. Instead of returning byte array, I return a temp file path, which I then transmit and delete there after.
private string MergeLotsOfPDFs(List<byte[]> PDFs)
{
Document doc = new Document();
Guid uniqueId = Guid.NewGuid();
string tempFileName = Server.MapPath("~/__" + uniqueId.ToString() + ".pdf");
using (FileStream ms = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
{
PdfCopy copy = new PdfCopy(doc, ms);
doc.Open();
int i = 0;
foreach (byte[] PDF in PDFs)
{
i++;
// Create a reader
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(PDF), null);
// Cycle through all the pages
for (int currentPageNumber = 1; currentPageNumber <= reader.NumberOfPages; ++currentPageNumber)
{
// Read a page
PdfImportedPage curPg = copy.GetImportedPage(reader, currentPageNumber);
// Add the page over to the rest of them
copy.AddPage(curPg);
// This is a lie, it still costs money, hue hue hue :)~
copy.FreeReader(reader);
}
reader.Close();
}
// Close the document
doc.Close();
// Close the document
copy.Close();
}
// Return temp file path
return tempFileName;
}
And here is how I send that data to the client.
// Send the merged PDF file to the user.
System.Web.HttpResponse response = System.Web.HttpContext.Current.Response;
response.ClearContent();
Response.ClearHeaders();
response.ContentType = "application/pdf";
response.AddHeader("Content-Disposition", "attachment; filename=1094C.pdf;");
response.WriteFile(tempFileName);
HttpContext.Current.Response.Flush(); // Sends all currently buffered output to the client.
DeleteFile(tempFileName); // Call right after flush but before close
HttpContext.Current.Response.SuppressContent = true; // Gets or sets a value indicating whether to send HTTP content to the client.
HttpContext.Current.ApplicationInstance.CompleteRequest(); // Causes ASP.NET to bypass all events and filtering in the HTTP pipeline chain of execution and directly execute the EndRequest event.
Lastly, here is a fancy DeleteFile method
private void DeleteFile(string fileName)
{
if (File.Exists(fileName))
{
try
{
File.Delete(fileName);
}
catch (Exception ex)
{
//Could not delete the file, wait and try again
try
{
System.GC.Collect();
System.GC.WaitForPendingFinalizers();
File.Delete(fileName);
}
catch
{
//Could not delete the file still
}
}
}
}

Returning memorystream - gives corrupt PDF file or "cannot accessed a closed stream"

I have a web service, which calls the following method. I want to return a memorystream, which is a PDF file.
Now, the problem is the PDF file is corrupt with the following code. I think it's because the files are not being closed. However, if I close them, I get the classic error "Cannot access a closed stream".
When I previously saved it through a filestream, the PDF file wasn't corrupt.
So my humble question is: How to solve it and get back a non-corrupt PDF file? :-)
My code:
public Stream Generate(GiftModel model)
{
var template = HttpContext.Current.Server.MapPath(TemplatePath);
// Magic code which creates a new PDF file from the stream of the other
PdfReader reader = new PdfReader(template);
Rectangle size = reader.GetPageSizeWithRotation(1);
Document document = new Document(size);
MemoryStream fs = new MemoryStream();
PdfWriter writer = PdfWriter.GetInstance(document, fs);
document.Open();
// Two products on every page
int bookNumber = 0;
int pagesWeNeed = (int)Math.Ceiling(((double)model.Books.Count / (double)2));
for (var i = 0; i < pagesWeNeed; i++)
{
PdfContentByte cb = writer.DirectContent;
// Creates a new page
PdfImportedPage page = writer.GetImportedPage(reader, 1);
cb.AddTemplate(page, 0, 0);
// Add text strings
DrawGreetingMessages(model.FromName, model.ReceiverName, model.GiftMessage, cb);
// Draw the books
DrawBooksOnPage(model.Books.Skip(bookNumber).Take(2).ToList(), cb);
// Draw boring shit
DrawFormalities(true, model.GiftLink, cb);
bookNumber += 2;
}
// Close off our streams because we can
//document.Close();
//writer.Close();
reader.Close();
fs.Position = 0;
return fs;
}

Reuse of streams can be problematic, especially if you are using an abstraction and you don't quite know what it is doing to your stream. Because of this I generally recommend never passing streams themselves around. If you can by with it, try just passing the raw underlying byte array itself. But if passing streams is a requirement then I recommend still doing the raw byte array at the end and then wrapping that in a new second stream. Try the below code to see if it works.
public Stream Generate(GiftModel model)
{
//We'll dump our PDF into these when done
Byte[] bytes;
using (var ms = new MemoryStream())
{
using (var doc = new Document())
{
using (var writer = PdfWriter.GetInstance(doc, ms))
{
doc.Open();
doc.Add(new Paragraph("Hello"));
doc.Close();
}
}
//Finalize the contents of the stream into an array
bytes = ms.ToArray();
}
//Return a new stream wrapped around our array
return new MemoryStream(bytes);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

itextsharp PDF concatenation not working - c#

Related

itext7 CopyPagesTo not opening PDF

Generate one pdf document with multiple pages converting from html using IText 7

C# Spire Document.SaveToStream not working

iTextSharp System.OutOfMemoryException

Returning memorystream - gives corrupt PDF file or "cannot accessed a closed stream"

Categories

Resources