itextsharp / MCV combining multiple forms into one PDF using bytes - c#

Hi My applion is MVC3 c#, I am using itextsharp to produce PDF files for pre disgned forms. In this application I have to different forms. To generate a form I use:
public ActionResult TestPDF(long learnerID = 211, long courseID = 11)
{
var caseList = _studyCaseSvc.ListStudyCases().Where(x => x.Course_ID == courseID);
try
{
MemoryStream memoryStream = new MemoryStream();
PdfConcatenate whole = new PdfConcatenate(memoryStream);
foreach (var ca in caseList)
{
byte[] part = null;
if (ca.CaseType == "CTA")
{
part = GenerateEvaluationCAT_PDF(learnerID, ca.ID);
}
else if (ca.CaseType == "CTAH")
{
part = GenerateEvaluationCATH_PDF(learnerID, ca.ID);
}
else
{
part = null;
}
if (part != null)
{
PdfReader partReader = new PdfReader(part);
whole.AddPages(partReader);
partReader.Close();
}
}
whole.Close();
byte[] byteInfo = memoryStream.ToArray();
SendPdfToBrowser(byteInfo);
}
catch (Exception ex)
{
}
return null;
}
I get this error: An item with the same key has already been added. The error happens at AddPages. So I developed this simpler test:
private void merge()
{
try
{
FileStream output = new FileStream("p3.pdf", FileMode.Create);
PdfConcatenate pdfConcatenate = new PdfConcatenate(output, true);
PdfReader r1 = new PdfReader("p2.pdf");
MemoryStream memoryStream = new MemoryStream();
PdfStamper pdfStamper = new PdfStamper(r1, memoryStream);
pdfStamper.FormFlattening = true;
pdfStamper.Close();
PdfReader r2 = new PdfReader(memoryStream.ToArray());
//pdfConcatenate.AddPages(tempReader);
pdfConcatenate.Open();
int n = r1.NumberOfPages;
for (int i = 1; i <= n; i++)
{
PdfImportedPage imp = pdfConcatenate.Writer.GetImportedPage(r1, i);
pdfConcatenate.Writer.AddPage(imp);
}
pdfConcatenate.Writer.FreeReader(r1);
pdfStamper.Close();
r1.Close();
pdfConcatenate.Close();
}
catch (Exception ex)
{
}
}
Same error.

Well, the problem is the misconception that you can combine multiple PDF files into one by simply concatenating them. That is wrong for PDFs (just like it is wrong for most binary file formats).
Thus, you should update your GenerateAllEvaluation_PDF method to have a PdfConcatenate instance instead of your byte Array whole, cf. http://api.itextpdf.com/itext/com/itextpdf/text/pdf/PdfConcatenate.html, open each byte array returned by your GenerateEvaluationCATH_PDF method in a PdfReader, add all pages of these readers to the PdfConcatenate and eventually return the bytes generated by that class.
EDIT (I'm more into Java than C#, thus forgive minor errors)
PdfConcatenate whole = new PdfConcatenate(...);
foreach (var ca in caseList)
{
byte[] part = null;
if (ca.CaseType == "CTA")
{
part = GenerateEvaluationCAT_PDF(learnerID, ca.ID);
}
else if (ca.CaseType == "CTAH")
{
part = GenerateEvaluationCATH_PDF(learnerID, ca.ID);
}
else
{
part = ???;
}
PdfReader partReader = new PdfReader(part);
whole.AddPages(partReader);
partReader.Close();
}
The PdfConcatenate can be constructed with a MemoryStream from which you can retrieve the final byte[].
PS: PdfConcatenate may not be a part of iTextSharp version 4.x yet but it is merely a convenience wrapper of PdfCopy and PdfSmartCopy. Thus, you may simply have a look at the sources of iTextSharp (OSS after all) and be inspired: PdfConcatenate.cs.

Related

Merge PDF and return as FileContentResult

I have an ActionResult that is working fine to return a PDF.
public ActionResult TrainingHours(int id)
{
EmployeeModel model = new EmployeeModel(id);
if (!model.CheckEmployeeAccess())
throw new HttpException(403, "Access Denied");
byte[] report = model.GetTrainingHoursReport(id);
return File(report, Constants.PdfMimeType, model.EmployeeHoursReportFileName);
}
The byte[] is returned from a LocalReport's Render("PDF", DeviceInfoString) method and looks perfect when it is downloaded.
The customer wants to be able to click a link and have the same report run for a group of people and merged into one pdf for download.
public ActionResult MassTrainingHours(int[] employeeIds)
{
if (employeeIds == null)
return new EmptyResult();
List<byte[]> reportsList = new List<byte[]>();
foreach (int employeeId in employeeIds)
{
EmployeeModel model = new EmployeeModel(employeeId);
reportsList.Add(model.GetTrainingHoursReport(employeeId));
}
Post.Domain.Services.PdfMerger merger = new Post.Domain.Services.PdfMerger();
byte[] report = merger.MergeFiles(reportsList);
return File(report, Constants.PdfMimeType, "TrainingHours.pdf");
Here's my class that merges the pdfs using iTextSharp.
public class PdfMerger
{
public byte[] MergeFiles(List<byte[]> inputFiles)
{
MemoryStream outputMS = new MemoryStream();
Document document = new Document();
PdfCopy writer = new PdfCopy(document, outputMS);
writer.CloseStream = false;
PdfImportedPage page = null;
document.Open();
foreach (byte[] fileData in inputFiles)
{
PdfReader reader = new PdfReader(fileData);
int n = reader.NumberOfPages;
for (int i = 1; i <= n; i++)
{
page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}
PRAcroForm form = reader.AcroForm;
if (form != null)
writer.CopyAcroForm(reader);
}
document.Close();
return outputMS.ToArray();
}
}
This all seems to run with no errors but I get absolutely nothing returned from the ActionResult.
If I use the MergeFiles method slightly modified to a FileStream on a winforms application the process runs smoothly and I get the output file I'm looking for.

iTextSharp System.OutOfMemoryException

I have an issue with trying to create a large PDF file. Basically I have a list of byte arrays, each containing a PDF in a form of a byte array. I wanted to merge the byte arrays into a single PDF. This works great for smaller files (under 2000 pages), but when I tried creating a 12,00 page file it bombed). Originally I was using MemoryStream but after some research, a common solution was to use a FileStream instead. So I tried a file stream approach, however get similar results. The List contains 3,800 records, each containing 4 pages. MemoryStream bombs after around 570. FileStream after about 680 records. The current file size after the code crashed was 60MB. What am I doing wrong? Here is the code I have, and the code crashes on "copy.AddPage(curPg);" directive, inside the "for(" loop.
private byte[] MergePDFs(List<byte[]> PDFs)
{
iTextSharp.text.Document doc = new iTextSharp.text.Document();
byte[] completePDF;
Guid uniqueId = Guid.NewGuid();
string tempFileName = Server.MapPath("~/" + uniqueId.ToString() + ".pdf");
//using (MemoryStream ms = new MemoryStream())
using(FileStream ms = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
{
iTextSharp.text.pdf.PdfCopy copy = new iTextSharp.text.pdf.PdfCopy(doc, ms);
doc.Open();
int i = 0;
foreach (byte[] PDF in PDFs)
{
i++;
// Create a reader
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(PDF);
// Cycle through all the pages
for (int currentPageNumber = 1; currentPageNumber <= reader.NumberOfPages; ++currentPageNumber)
{
// Read a page
iTextSharp.text.pdf.PdfImportedPage curPg = copy.GetImportedPage(reader, currentPageNumber);
// Add the page over to the rest of them
copy.AddPage(curPg);
}
// Close the reader
reader.Close();
}
// Close the document
doc.Close();
// Close the copier
copy.Close();
// Convert the memorystream to a byte array
//completePDF = ms.ToArray();
}
//return completePDF;
return GetPDFsByteArray(tempFileName);
}
A couple of notes:
PdfCopy implements iDisposable, so you should try and see if a using helps.
PdfCopy.FreeReader() will help.
Anyway, not sure if you're using MVC or WebForms, but here's a simple working HTTP handler tested with a 15 page 125KB test file that runs on my workstation:
<%# WebHandler Language="C#" Class="MergeFiles" %>
using System;
using System.Collections.Generic;
using System.Web;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
public class MergeFiles : IHttpHandler
{
public void ProcessRequest(HttpContext context)
{
List<byte[]> pdfs = new List<byte[]>();
var pdf = File.ReadAllBytes(context.Server.MapPath("~/app_data/test.pdf"));
for (int i = 0; i < 4000; ++i) pdfs.Add(pdf);
var Response = context.Response;
Response.ContentType = "application/pdf";
Response.AddHeader(
"content-disposition",
"attachment; filename=MergeLotsOfPdfs.pdf"
);
Response.BinaryWrite(MergeLotsOfPdfs(pdfs));
}
byte[] MergeLotsOfPdfs(List<byte[]> pdfs)
{
using (var ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
for (int i = 0; i < pdfs.Count; ++i)
{
using (PdfReader reader = new PdfReader(
new RandomAccessFileOrArray(pdfs[i]), null))
{
copy.AddDocument(reader);
copy.FreeReader(reader);
}
}
}
}
return ms.ToArray();
}
}
public bool IsReusable { get { return false; } }
}
Tried to make the output file similar to what you described in the question, but YMMV, depending on how large the individual PDFs you're dealing with are in size. Here's the test output from my run:
So after a lot of messing around, I realized that there just was no way around it. However, I did manage to find a work-around. Instead of returning byte array, I return a temp file path, which I then transmit and delete there after.
private string MergeLotsOfPDFs(List<byte[]> PDFs)
{
Document doc = new Document();
Guid uniqueId = Guid.NewGuid();
string tempFileName = Server.MapPath("~/__" + uniqueId.ToString() + ".pdf");
using (FileStream ms = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
{
PdfCopy copy = new PdfCopy(doc, ms);
doc.Open();
int i = 0;
foreach (byte[] PDF in PDFs)
{
i++;
// Create a reader
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(PDF), null);
// Cycle through all the pages
for (int currentPageNumber = 1; currentPageNumber <= reader.NumberOfPages; ++currentPageNumber)
{
// Read a page
PdfImportedPage curPg = copy.GetImportedPage(reader, currentPageNumber);
// Add the page over to the rest of them
copy.AddPage(curPg);
// This is a lie, it still costs money, hue hue hue :)~
copy.FreeReader(reader);
}
reader.Close();
}
// Close the document
doc.Close();
// Close the document
copy.Close();
}
// Return temp file path
return tempFileName;
}
And here is how I send that data to the client.
// Send the merged PDF file to the user.
System.Web.HttpResponse response = System.Web.HttpContext.Current.Response;
response.ClearContent();
Response.ClearHeaders();
response.ContentType = "application/pdf";
response.AddHeader("Content-Disposition", "attachment; filename=1094C.pdf;");
response.WriteFile(tempFileName);
HttpContext.Current.Response.Flush(); // Sends all currently buffered output to the client.
DeleteFile(tempFileName); // Call right after flush but before close
HttpContext.Current.Response.SuppressContent = true; // Gets or sets a value indicating whether to send HTTP content to the client.
HttpContext.Current.ApplicationInstance.CompleteRequest(); // Causes ASP.NET to bypass all events and filtering in the HTTP pipeline chain of execution and directly execute the EndRequest event.
Lastly, here is a fancy DeleteFile method
private void DeleteFile(string fileName)
{
if (File.Exists(fileName))
{
try
{
File.Delete(fileName);
}
catch (Exception ex)
{
//Could not delete the file, wait and try again
try
{
System.GC.Collect();
System.GC.WaitForPendingFinalizers();
File.Delete(fileName);
}
catch
{
//Could not delete the file still
}
}
}
}

Putting Multiple Pdfs in a MemoryStream

I'm trying to take preexisting pdf files and read them all into a memory stream to then be shown on a telerik pdf viewer. If I just do one file it works but as soon as I try multiple files it gives me a internal null error (object ref not set to blah blah) and can't step in the code to see where its actualy null. Am I doing this wrong or something?
List<string> applicableReports = CurrentWizard.GetApplicableReports();
previousReportsStream = new MemoryStream();
Stream[] streams = new Stream[applicableReports.Count];
for (int i = 0; i < streams.Length; i++)
{
streams[i] = new MemoryStream(DocumentHelper.Instance.ConvertFileToByteArray(applicableReports[i]));
streams[i].CopyTo(previousReportsStream);
}
RadPdfViewer radPdfViewer = new RadPdfViewer();
RadFixedDocument document = new PdfFormatProvider(previousReportsStream, FormatProviderSettings.ReadAllAtOnce).Import();
radPdfViewer.Document = document;
This is where error is thrown:
RadFixedDocument document = new PdfFormatProvider(previousReportsStream, FormatProviderSettings.ReadAllAtOnce).Import();
DocumentHelper File to byte[]:
public byte[] ConvertFileToByteArray(string fileName)
{
FileInfo fileInfo = new FileInfo(fileName);
byte[] fileData = null;
using (FileStream fileStream = new FileStream(fileInfo.FullName, FileMode.Open, FileAccess.Read))
{
BinaryReader binaryReader = new BinaryReader(fileStream);
fileData = binaryReader.ReadBytes((int)fileStream.Length);
}
return fileData;
}
One possible cause is the h process is out of memory because the code creates many MemoryStream object and does not dispose them.
Try change code to this:
List<string> applicableReports = CurrentWizard.GetApplicableReports();
previousReportsStream = new MemoryStream();
try
{
for (int i = 0; i < streams.Length; i++)
{
using( MemoryStream memStream = new MemoryStream(DocumentHelper.Instance.ConvertFileToByteArray(applicableReports[i]))
{
memStream.CopyTo(previousReportsStream);
}
}
RadPdfViewer radPdfViewer = new RadPdfViewer();
RadFixedDocument document = new PdfFormatProvider(previousReportsStream, FormatProviderSettings.ReadAllAtOnce).Import();
radPdfViewer.Document = document;
}
finally
{
previousReportsStream.Close();
}
As MemoryStream implements the IDisposable interface, you call dispose it to free the native resources; if not, the it will lead to high memory usage.
Please read MSDN for more details.

Insert page into existing PDF using itextsharp

We are using itextsharp to create a single PDF from multiple PDF files. How do I insert a new page into a PDF file that has multiple pages already in the file? When I use add page it is overwriting the existing pages and only saves the 1 page that was selected.
Here is the code that I am using to add the page to the existing PDF:
PdfReader reader = new PdfReader(sourcePdfPath);
Document document = new Document(reader.GetPageSizeWithRotation(1));
PdfCopy pdfCopy = new PdfCopy(document, new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
MemoryStream memoryStream = new MemoryStream();
PdfWriter writer = PdfWriter.GetInstance(document, memoryStream);
document.AddDocListener(writer);
document.Open();
for (int p = 1; p <= reader.NumberOfPages; p++)
{
if (pagesToExtract.FindIndex(s => s == p) == -1) continue;
document.SetPageSize(reader.GetPageSize(p));
document.NewPage();
PdfContentByte cb = writer.DirectContent;
PdfImportedPage pageImport = writer.GetImportedPage(reader, p);
int rot = reader.GetPageRotation(p);
if (rot == 90 || rot == 270)
{
cb.AddTemplate(pageImport, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(p).Height);
}
else
{
cb.AddTemplate(pageImport, 1.0F, 0, 0, 1.0F, 0, 0);
}
pdfCopy.AddPage(pageImport);
}
pdfCopy.Close();
This code works. You need to have a different file to output the results.
private static void AppendToDocument(string sourcePdfPath1, string sourcePdfPath2, string outputPdfPath)
{
using (var sourceDocumentStream1 = new FileStream(sourcePdfPath1, FileMode.Open))
{
using (var sourceDocumentStream2 = new FileStream(sourcePdfPath2, FileMode.Open))
{
using (var destinationDocumentStream = new FileStream(outputPdfPath, FileMode.Create))
{
var pdfConcat = new PdfConcatenate(destinationDocumentStream);
var pdfReader = new PdfReader(sourceDocumentStream1);
var pages = new List<int>();
for (int i = 0; i < pdfReader.NumberOfPages; i++)
{
pages.Add(i);
}
pdfReader.SelectPages(pages);
pdfConcat.AddPages(pdfReader);
pdfReader = new PdfReader(sourceDocumentStream2);
pages = new List<int>();
for (int i = 0; i < pdfReader.NumberOfPages; i++)
{
pages.Add(i);
}
pdfReader.SelectPages(pages);
pdfConcat.AddPages(pdfReader);
pdfReader.Close();
pdfConcat.Close();
}
}
}
}
I've tried this code, and it works for me, but don't forget to do some validations of the number of pages and existence of the paths you use
here is the code:
private static void AppendToDocument(string sourcePdfPath, string outputPdfPath, List<int> neededPages)
{
var sourceDocumentStream = new FileStream(sourcePdfPath, FileMode.Open);
var destinationDocumentStream = new FileStream(outputPdfPath, FileMode.Create);
var pdfConcat = new PdfConcatenate(destinationDocumentStream);
var pdfReader = new PdfReader(sourceDocumentStream);
pdfReader.SelectPages(neededPages);
pdfConcat.AddPages(pdfReader);
pdfReader.Close();
pdfConcat.Close();
}
You could use something like this, where src is the IEnumerable<string> of input pdf filenames. Just make sure that your existing pdf file is one of those sources.
The PdfConcatenate class is in the latest iTextSharp release.
var result = "combined.pdf";
var fs = new FileStream(result, FileMode.Create);
var conc = new PdfConcatenate(fs, true);
foreach(var s in src) {
var r = new PdfReader(s);
conc.AddPages(r);
}
conc.Close();
PdfCopy is intended for use with an empty Document. You should add everything you want, one page at a time.
The alternative is to use PdfStamper.InsertPage(pageNum, rectangle) and then draw a PdfImportedPage onto that new page.
Note that PdfImportedPage only includes the page contents, not the annotations or doc-level information ("document structure", doc-level javascripts, etc) that page may have originally used... unless you use one with PdfCopy.
A Stamper would probably be more efficient and use less code, but PdfCopy will import all the page-level info, not just the page's contents.
This might be important, it might not. It depends on what page you're trying to import.
Had to even out the page count with a multiple of 4:
private static void AppendToDocument(string sourcePdfPath)
{
var tempFileLocation = Path.GetTempFileName();
var bytes = File.ReadAllBytes(sourcePdfPath);
using (var reader = new PdfReader(bytes))
{
var numberofPages = reader.NumberOfPages;
var modPages = (numberofPages % 4);
var pages = modPages == 0 ? 0 : 4 - modPages;
if (pages == 0)
return;
using (var fileStream = new FileStream(tempFileLocation, FileMode.Create, FileAccess.Write))
{
using (var stamper = new PdfStamper(reader, fileStream))
{
var rectangle = reader.GetPageSize(1);
for (var i = 1; i <= pages; i++)
stamper.InsertPage(numberofPages + i, rectangle);
}
}
}
File.Delete(sourcePdfPath);
File.Move(tempFileLocation, sourcePdfPath);
}
I know I'm really late to the part here, but I mixed a bit of the two best answers and created a method if anyone needs it that adds a list of source PDF documents to a single document using itextsharp.
private static void appendToDocument(List<string> sourcePDFList, string outputPdfPath)
{
//Output document name and creation
FileStream destinationDocumentStream = new FileStream(outputPdfPath, FileMode.Create);
//Object to concat source pdf's to output pdf
PdfConcatenate pdfConcat = new PdfConcatenate(destinationDocumentStream);
//For each source pdf in list...
foreach (string sourcePdfPath in sourcePDFList)
{
//If file exists...
if (File.Exists(sourcePdfPath))
{
//Open the document
FileStream sourceDocumentStream = new FileStream(sourcePdfPath, FileMode.Open);
//Read the document
PdfReader pdfReader = new PdfReader(sourceDocumentStream);
//Create an int list
List<int> pages = new List<int>();
//for each page in pdfReader
for (int i = 1; i < pdfReader.NumberOfPages + 1; i++)
{
//Add that page to the list
pages.Add(i);
}
//Add that page to the pages to add to ouput document
pdfReader.SelectPages(pages);
//Add pages to output page
pdfConcat.AddPages(pdfReader);
//Close reader
pdfReader.Close();
}
}
//Close pdfconcat
pdfConcat.Close();
}

iTextSharp is producing a corrupt PDF

The code snippet below returns a corrupt PDF document however if I return mergedDocument instead it always returns a valid PDF. mergedDocument is based on a PDF file i created using Word, whereas completed document is entirely programmatically generated. The code "works" in that it throws no exceptions. Why is iTextSharp creating a corrupt PDF?
byte[] completedDocument = null;
using (MemoryStream streamCompleted = new MemoryStream())
{
using (Document document = new Document())
{
PdfCopy copy = new PdfCopy(document, streamCompleted);
document.Open();
copy.Open();
foreach (var item in eventItems)
{
byte[] mergedDocument = null;
PdfReader reader = new PdfReader(pdfTemplates[item.DataTokens[NotifyTokenType.OrganisationID]]);
using (MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, streamTemplate))
{
foreach (var token in item.DataTokens)
{
if (stamper.AcroFields.Fields.Any(fld => fld.Key == token.Key.ToString()))
{
stamper.AcroFields.SetField(token.Key.ToString(), token.Value);
}
}
stamper.FormFlattening = true;
stamper.Writer.CloseStream = false;
}
mergedDocument = new byte[streamTemplate.Length];
streamTemplate.Position = 0;
streamTemplate.Read(mergedDocument, 0, (int)streamTemplate.Length);
}
reader = new PdfReader(mergedDocument);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
document.SetPageSize(PageSize.A4);
copy.AddPage(copy.GetImportedPage(reader, i));
}
}
completedDocument = new byte[streamCompleted.Length];
streamCompleted.Position = 0;
streamCompleted.Read(completedDocument, 0, (int)streamCompleted.Length);
}
}
return completedDocument;
You need to close the document and copy objects to flush the PDF writing buffer. This, however, causes some problems when trying to read the stream into an array. The fix for that is to use the ToArray() method of the MemoryStream which still works on closed streams. The changes I made have comments on them.
byte[] completedDocument = null;
using (MemoryStream streamCompleted = new MemoryStream())
{
using (Document document = new Document())
{
PdfCopy copy = new PdfCopy(document, streamCompleted);
document.Open();
copy.Open();
foreach (var item in eventItems)
{
byte[] mergedDocument = null;
PdfReader reader = new PdfReader(pdfTemplates[item.DataTokens[NotifyTokenType.OrganisationID]]);
using (MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, streamTemplate))
{
foreach (var token in item.DataTokens)
{
if (stamper.AcroFields.Fields.Any(fld => fld.Key == token.Key.ToString()))
{
stamper.AcroFields.SetField(token.Key.ToString(), token.Value);
}
}
stamper.FormFlattening = true;
stamper.Writer.CloseStream = false;
}
//Copy the stream's bytes
mergedDocument = streamTemplate.ToArray();
}
reader = new PdfReader(mergedDocument);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
document.SetPageSize(PageSize.A4);
copy.AddPage(copy.GetImportedPage(reader, i));
}
//Close the document and the copy
document.Close();
copy.Close();
}
//ToArray() can operate on closed streams
completedDocument = streamCompleted.ToArray();
}
}
return completedDocument;
Also make sure your html doesn't contains hr tag while converting html to pdf
hdnEditorText.Value.Replace("\"", "'").Replace("<hr />", "").Replace("<hr/>", "")

Categories

Resources