I am using this code to import different pdf files pages to a single document. When i import large files (200 pages or above) I am getting a OutOfMemory exception. Am i doing something wrong here?
private bool SaveToFile(string fileName)
{
try
{
iTextSharp.text.Document doc;
iTextSharp.text.pdf.PdfCopy pdfCpy;
string output = fileName;
doc = new iTextSharp.text.Document();
pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(output, System.IO.FileMode.Create));
doc.Open();
foreach (DataGridViewRow item in dvSourcePreview.Rows)
{
string pdfFileName = item.Cells[COL_FILENAME].Value.ToString();
int pdfPageIndex = int.Parse(item.Cells[COL_PAGE_NO].Value.ToString());
pdfPageIndex += 1;
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(pdfFileName);
int pageCount = reader.NumberOfPages;
// set page size for the documents
doc.SetPageSize(reader.GetPageSizeWithRotation(1));
iTextSharp.text.pdf.PdfImportedPage page = pdfCpy.GetImportedPage(reader, pdfPageIndex);
pdfCpy.AddPage(page);
reader.Close();
}
doc.Close();
return true;
}
catch (Exception ex)
{
return false;
}
}
You're creating a new PdfReader for each pass. That's horribly inefficient. And because you've got a PdfImportedPage from each one, all those (probably redundant) PdfReader instances are never GC'ed.
Suggestions:
Two passes. First build a list of files & pages. Second operate on each file in turn, so you only ever have one PdfReader "open" at a time. Use PdfCopy.freeReader() when you're done with a given reader. This will almost certainly change the order in which your pages are added (maybe a Very Bad Thing).
One pass. Cache your PdfReader instances based on the file name. FreeReader again when you're done... but you probably won't be able to free any of them until you've dropped out of your loop. The caching alone may be enough to keep you from running out of memory.
Keep your code as is, but call freeReader() after you close a given PdfReader instance.
I haven't run into an OOM problems with iTextSharp. Are the PDFs created with iTextSharp or something else? Can you isolate the problem to a single PDF or a set of PDFs that might be corrupt? Below is sample code that creates 10 PDFs with 1,000 pages in each. Then it creates one more PDF and randomly pulls 1 page from those PDFs 500 times. On my machine it takes a little while to run but I don't see any memory issues or anything. (iText 5.1.1.0)
using System;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
//Folder that we will be working in
string WorkingFolder = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Big File PDF Test");
//Base name of PDFs that we will be creating
string BigFileBase = Path.Combine(WorkingFolder, "BigFile");
//Final combined PDF name
string CombinedFile = Path.Combine(WorkingFolder, "Combined.pdf");
//Number of "large" files to create
int NumberOfBigFilesToMakes = 10;
//Number of pages to put in the files
int NumberOfPagesInBigFile = 1000;
//Number of pages to insert into combined file
int NumberOfPagesToInsertIntoCombinedFile = 500;
//Create our test directory
if (!Directory.Exists(WorkingFolder)) Directory.CreateDirectory(WorkingFolder);
//First step, create a bunch of files with a bunch of pages, hopefully code is self-explanatory
for (int FileCount = 1; FileCount <= NumberOfBigFilesToMakes; FileCount++)
{
using (FileStream FS = new FileStream(BigFileBase + FileCount + ".pdf", FileMode.Create, FileAccess.Write, FileShare.Read))
{
using (iTextSharp.text.Document Doc = new iTextSharp.text.Document(PageSize.LETTER))
{
using (PdfWriter writer = PdfWriter.GetInstance(Doc, FS))
{
Doc.Open();
for (int I = 1; I <= NumberOfPagesInBigFile; I++)
{
Doc.NewPage();
Doc.Add(new Paragraph("This is file " + FileCount));
Doc.Add(new Paragraph("This is page " + I));
}
Doc.Close();
}
}
}
}
//Second step, loop around pulling random pages from random files
//Create our output file
using (FileStream FS = new FileStream(CombinedFile, FileMode.Create, FileAccess.Write, FileShare.Read))
{
using (Document Doc = new Document())
{
using (PdfCopy pdfCopy = new PdfCopy(Doc, FS))
{
Doc.Open();
//Setup some variables to use in the loop below
PdfReader reader = null;
PdfImportedPage page = null;
int RanFileNum = 0;
int RanPageNum = 0;
//Standard random number generator
Random R = new Random();
for (int I = 1; I <= NumberOfPagesToInsertIntoCombinedFile; I++)
{
//Just to output our current progress
Console.WriteLine(I);
//Get a random page and file. Remember iText pages are 1-based.
RanFileNum = R.Next(1, NumberOfBigFilesToMakes + 1);
RanPageNum = R.Next(1, NumberOfPagesInBigFile + 1);
//Open the random file
reader = new PdfReader(BigFileBase + RanFileNum + ".pdf");
//Set the current page
Doc.SetPageSize(reader.GetPageSizeWithRotation(1));
//Grab a random page
page = pdfCopy.GetImportedPage(reader, RanPageNum);
//Add it to the combined file
pdfCopy.AddPage(page);
//Clean up
reader.Close();
}
//Clean up
Doc.Close();
}
}
}
}
}
}
Related
I have the following problem when printing the pdf file after merge, the pdf documents get cut off.
Sometimes this happens because the documents aren't 8.5 x 11
they might be like 11 x 17.
Can we make it detect the page size and then use that same page size for those documents?
Or, if not, is it possible to have it fit to page?
Following is the code:
package com.sumit.program;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import com.itextpdf.text.Document;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;
public class MergePdf {
public static void main(String[] args) {
try {
List<InputStream> pdfs = new ArrayList<InputStream>();
pdfs.add(new FileInputStream("C:\\Documents and Settings\\Sumit\\Desktop\\NewEcnProject\\Document1.pdf"));
pdfs.add(new FileInputStream("C:\\Documents and Settings\\Sumit\\Desktop\\NewEcnProject\\Landscape.pdf"));
OutputStream output = new FileOutputStream("C:\\Documents and Settings\\Sumit\\Desktop\\NewEcnProject\\merge1.pdf");
MergePdf.concatPDFs(pdfs, output, true);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void concatPDFs(List<InputStream> streamOfPDFFiles,
OutputStream outputStream, boolean paginate) {
Document document = new Document();
try {
List<InputStream> pdfs = streamOfPDFFiles;
List<PdfReader> readers = new ArrayList<PdfReader>();
int totalPages = 0;
Iterator<InputStream> iteratorPDFs = pdfs.iterator();
// Create Readers for the pdfs.
int i=1;
while (iteratorPDFs.hasNext()) {
InputStream pdf = iteratorPDFs.next();
PdfReader pdfReader = new PdfReader(pdf);
System.out.println("Page size is "+pdfReader.getPageSize(1));
readers.add(pdfReader);
totalPages += pdfReader.getNumberOfPages();
i++;
}
// Create a writer for the outputstream
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
writer.setCompressionLevel(9);
document.open();
BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA,
BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
PdfContentByte cb = writer.getDirectContent(); // Holds the PDF data
PdfImportedPage page;
int currentPageNumber = 0;
int pageOfCurrentReaderPDF = 0;
Iterator<PdfReader> iteratorPDFReader = readers.iterator();
// Loop through the PDF files and add to the output.
while (iteratorPDFReader.hasNext()) {
PdfReader pdfReader = iteratorPDFReader.next();
// Create a new page in the target for each source page.
System.out.println("No. of pages "+pdfReader.getNumberOfPages());
i=0;
while (pageOfCurrentReaderPDF < pdfReader.getNumberOfPages()) {
Rectangle r=pdfReader.getPageSize(pdfReader.getPageN(pageOfCurrentReaderPDF+1));
if(r.getWidth()==792.0 && r.getHeight()==612.0)
document.setPageSize(PageSize.A4.rotate());
else
document.setPageSize(PageSize.A4);
document.newPage();
pageOfCurrentReaderPDF++;
currentPageNumber++;
i++;
page = writer.getImportedPage(pdfReader,
pageOfCurrentReaderPDF);
System.out.println("Width is "+page.getWidth());
System.out.println("Height is "+page.getHeight());
cb.newlineText();
cb.addTemplate(page, 0, 0);
// Code for pagination.
if (paginate) {
cb.beginText();
cb.setFontAndSize(bf, 9);
cb.showTextAligned(PdfContentByte.ALIGN_CENTER, ""
+ currentPageNumber + " of " + totalPages, 520,
5, 0);
cb.endText();
}
}
pageOfCurrentReaderPDF = 0;
}
outputStream.flush();
document.close();
outputStream.close();
System.out.println("Merging of Pdfs is done.......");
} catch (Exception e) {
e.printStackTrace();
} finally {
if (document.isOpen())
document.close();
try {
if (outputStream != null)
outputStream.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
}
}
Using the Document and PdfWriter class in combination with the addTemplate() method to merge documents is a bad idea. That's not what the addTemplate() method is meant for. You have explicitly or implicitly defined the page size for the Document you are working with. With the addTemplate() method, you add PdfImportedPage instances, and
when you add a new page with the same page size and rotation, you throw away all interactivity that exists in that page, but otherwise all is well,
when you add a new page with a different page size and rotation, you get the result you describe. Because of the difference in size, the imported page and the new page do not match. Parts get cut off, extra margins appear, rotations are different, etc.
This is all explained in chapter 6 of my book. You should use PdfCopy instead of PdfWriter. See for instance the FillFlattenMerge2 example:
Document document = new Document();
PdfCopy copy = new PdfSmartCopy(document, new FileOutputStream(dest));
document.open();
PdfReader reader;
String line = br.readLine();
// loop over readers
// add the PDF to PdfCopy
reader = new PdfReader(baos.toByteArray());
copy.addDocument(reader);
reader.close();
// end loop
document.close();
In your case, you also need to add page numbers, you can do this in a second go, as is done in the StampPageXofY example:
PdfReader reader = new PdfReader(src);
int n = reader.getNumberOfPages();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
PdfContentByte pagecontent;
for (int i = 0; i < n; ) {
pagecontent = stamper.getOverContent(++i);
ColumnText.showTextAligned(pagecontent, Element.ALIGN_RIGHT,
new Phrase(String.format("page %s of %s", i, n)), 559, 806, 0);
}
stamper.close();
reader.close();
Or you can add them while merging, as is done in the MergeWithToc example.
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(filename));
PageStamp stamp;
document.open();
int n;
int pageNo = 0;
PdfImportedPage page;
Chunk chunk;
for (Map.Entry<String, PdfReader> entry : filesToMerge.entrySet()) {
n = entry.getValue().getNumberOfPages();
for (int i = 0; i < n; ) {
pageNo++;
page = copy.getImportedPage(entry.getValue(), ++i);
stamp = copy.createPageStamp(page);
chunk = new Chunk(String.format("Page %d", pageNo));
if (i == 1)
chunk.setLocalDestination("p" + pageNo);
ColumnText.showTextAligned(stamp.getUnderContent(),
Element.ALIGN_RIGHT, new Phrase(chunk),
559, 810, 0);
stamp.alterContents();
copy.addPage(page);
}
}
document.close();
for (PdfReader r : filesToMerge.values()) {
r.close();
}
reader.close();
I strongly advise against using PdfWriter to merge documents! It's not impossible if you change the page size and the rotation of the page in the Document class, but you're making it harder on yourself. Moreover: using PdfWriter also throws away all interactivity (links, annotations,...) that exists in the pages you're merging. Your customer may experience that as a bug.
I have an issue with trying to create a large PDF file. Basically I have a list of byte arrays, each containing a PDF in a form of a byte array. I wanted to merge the byte arrays into a single PDF. This works great for smaller files (under 2000 pages), but when I tried creating a 12,00 page file it bombed). Originally I was using MemoryStream but after some research, a common solution was to use a FileStream instead. So I tried a file stream approach, however get similar results. The List contains 3,800 records, each containing 4 pages. MemoryStream bombs after around 570. FileStream after about 680 records. The current file size after the code crashed was 60MB. What am I doing wrong? Here is the code I have, and the code crashes on "copy.AddPage(curPg);" directive, inside the "for(" loop.
private byte[] MergePDFs(List<byte[]> PDFs)
{
iTextSharp.text.Document doc = new iTextSharp.text.Document();
byte[] completePDF;
Guid uniqueId = Guid.NewGuid();
string tempFileName = Server.MapPath("~/" + uniqueId.ToString() + ".pdf");
//using (MemoryStream ms = new MemoryStream())
using(FileStream ms = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
{
iTextSharp.text.pdf.PdfCopy copy = new iTextSharp.text.pdf.PdfCopy(doc, ms);
doc.Open();
int i = 0;
foreach (byte[] PDF in PDFs)
{
i++;
// Create a reader
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(PDF);
// Cycle through all the pages
for (int currentPageNumber = 1; currentPageNumber <= reader.NumberOfPages; ++currentPageNumber)
{
// Read a page
iTextSharp.text.pdf.PdfImportedPage curPg = copy.GetImportedPage(reader, currentPageNumber);
// Add the page over to the rest of them
copy.AddPage(curPg);
}
// Close the reader
reader.Close();
}
// Close the document
doc.Close();
// Close the copier
copy.Close();
// Convert the memorystream to a byte array
//completePDF = ms.ToArray();
}
//return completePDF;
return GetPDFsByteArray(tempFileName);
}
A couple of notes:
PdfCopy implements iDisposable, so you should try and see if a using helps.
PdfCopy.FreeReader() will help.
Anyway, not sure if you're using MVC or WebForms, but here's a simple working HTTP handler tested with a 15 page 125KB test file that runs on my workstation:
<%# WebHandler Language="C#" Class="MergeFiles" %>
using System;
using System.Collections.Generic;
using System.Web;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
public class MergeFiles : IHttpHandler
{
public void ProcessRequest(HttpContext context)
{
List<byte[]> pdfs = new List<byte[]>();
var pdf = File.ReadAllBytes(context.Server.MapPath("~/app_data/test.pdf"));
for (int i = 0; i < 4000; ++i) pdfs.Add(pdf);
var Response = context.Response;
Response.ContentType = "application/pdf";
Response.AddHeader(
"content-disposition",
"attachment; filename=MergeLotsOfPdfs.pdf"
);
Response.BinaryWrite(MergeLotsOfPdfs(pdfs));
}
byte[] MergeLotsOfPdfs(List<byte[]> pdfs)
{
using (var ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
for (int i = 0; i < pdfs.Count; ++i)
{
using (PdfReader reader = new PdfReader(
new RandomAccessFileOrArray(pdfs[i]), null))
{
copy.AddDocument(reader);
copy.FreeReader(reader);
}
}
}
}
return ms.ToArray();
}
}
public bool IsReusable { get { return false; } }
}
Tried to make the output file similar to what you described in the question, but YMMV, depending on how large the individual PDFs you're dealing with are in size. Here's the test output from my run:
So after a lot of messing around, I realized that there just was no way around it. However, I did manage to find a work-around. Instead of returning byte array, I return a temp file path, which I then transmit and delete there after.
private string MergeLotsOfPDFs(List<byte[]> PDFs)
{
Document doc = new Document();
Guid uniqueId = Guid.NewGuid();
string tempFileName = Server.MapPath("~/__" + uniqueId.ToString() + ".pdf");
using (FileStream ms = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.Read))
{
PdfCopy copy = new PdfCopy(doc, ms);
doc.Open();
int i = 0;
foreach (byte[] PDF in PDFs)
{
i++;
// Create a reader
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(PDF), null);
// Cycle through all the pages
for (int currentPageNumber = 1; currentPageNumber <= reader.NumberOfPages; ++currentPageNumber)
{
// Read a page
PdfImportedPage curPg = copy.GetImportedPage(reader, currentPageNumber);
// Add the page over to the rest of them
copy.AddPage(curPg);
// This is a lie, it still costs money, hue hue hue :)~
copy.FreeReader(reader);
}
reader.Close();
}
// Close the document
doc.Close();
// Close the document
copy.Close();
}
// Return temp file path
return tempFileName;
}
And here is how I send that data to the client.
// Send the merged PDF file to the user.
System.Web.HttpResponse response = System.Web.HttpContext.Current.Response;
response.ClearContent();
Response.ClearHeaders();
response.ContentType = "application/pdf";
response.AddHeader("Content-Disposition", "attachment; filename=1094C.pdf;");
response.WriteFile(tempFileName);
HttpContext.Current.Response.Flush(); // Sends all currently buffered output to the client.
DeleteFile(tempFileName); // Call right after flush but before close
HttpContext.Current.Response.SuppressContent = true; // Gets or sets a value indicating whether to send HTTP content to the client.
HttpContext.Current.ApplicationInstance.CompleteRequest(); // Causes ASP.NET to bypass all events and filtering in the HTTP pipeline chain of execution and directly execute the EndRequest event.
Lastly, here is a fancy DeleteFile method
private void DeleteFile(string fileName)
{
if (File.Exists(fileName))
{
try
{
File.Delete(fileName);
}
catch (Exception ex)
{
//Could not delete the file, wait and try again
try
{
System.GC.Collect();
System.GC.WaitForPendingFinalizers();
File.Delete(fileName);
}
catch
{
//Could not delete the file still
}
}
}
}
There are two files on disk .jpg and .pdf, i need to read both files and add them to new pdf and send to browser so that it can be downloaded.
New pdf file only contains pdf contents not jpeg file image.
memoryStream myMemoryStream = new MemoryStream();
//----pdf file--------------
iTextSharp.text.pdf.PdfCopy writer2 = new iTextSharp.text.pdf.PdfCopy(doc, myMemoryStream);
doc.Open();
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(imagepath + "/30244.pdf");
reader.ConsolidateNamedDestinations();
for (int i = 1; i <= reader.NumberOfPages; i++) {
iTextSharp.text.pdf.PdfImportedPage page = writer2.GetImportedPage(reader, i);
writer2.AddPage(page);
}
iTextSharp.text.pdf.PRAcroForm form = reader.AcroForm;
if (form != null) {
writer2.CopyAcroForm(reader);
}
//-----------------jpeg file-------------------------------------
MemoryStream myMemoryStream2 = new MemoryStream();
System.Drawing.Image image = System.Drawing.Image.FromFile(imagepath + "/Vouchers.jpg");
iTextSharp.text.Document doc2 = new iTextSharp.text.Document(iTextSharp.text.PageSize.A4);
iTextSharp.text.pdf.PdfWriter.GetInstance(doc2, myMemoryStream2);
doc2.Open();
iTextSharp.text.Image pdfImage = iTextSharp.text.Image.GetInstance(image, System.Drawing.Imaging.ImageFormat.Jpeg);
doc2.Add(pdfImage);
doc2.close();
doc.close();
byte[] content = myMemoryStream.ToArray;
Response.ContentType = "application/pdf";
Response.AppendHeader("Content-Disposition", "attachment; filename=LeftCorner568.pdf");
Response.BinaryWrite(content);
Since you've been having trouble with this for a while now I'm going to give you a long-ish answer that will hopefully help you.
First, I don't have access to an ASP.Net server so I'm running everything from a folder on the desktop. So instead of reading and writing from and to relative paths you'll see me working from Environment.GetFolderPath(Environment.SpecialFolder.Desktop). I'm assuming that you'll be able to swap your paths in later.
Second, (and not that it really matter) I don't have SSRS so instead I created a helper method that makes a fake PDF for me to work from that returns a PDF as a byte array:
/// <summary>
/// Create a fake SSRS report
/// </summary>
/// <returns>A valid PDF stored as a byte array</returns>
private Byte[] getSSRSPdfAsByteArray() {
using (var ms = new System.IO.MemoryStream()) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, ms)) {
doc.Open();
doc.Add(new Paragraph("This is my SSRS report"));
doc.Close();
}
}
return ms.ToArray();
}
}
Third, just so that we're on the same page and to have something to work with I created two additional helper methods that generate some sample images and PDFs:
/// <summary>
/// Create sample images in the folder provided
/// </summary>
/// <param name="count">The number of images to create</param>
/// <param name="workingFolder">The folder to create images in</param>
private void createSampleImages(int count, string workingFolder) {
var random = new Random();
for (var i = 0; i < count; i++) {
using (var bmp = new System.Drawing.Bitmap(200, 200)) {
using (var g = System.Drawing.Graphics.FromImage(bmp)) {
g.Clear(Color.FromArgb(random.Next(0, 255), random.Next(0, 255), random.Next(0, 255)));
}
bmp.Save(System.IO.Path.Combine(workingFolder, string.Format("Image_{0}.jpg", i)));
}
}
}
/// <summary>
/// Create sample PDFs in the folder provided
/// </summary>
/// <param name="count">The number of PDFs to create</param>
/// <param name="workingFolder">The folder to create PDFs in</param>
private void createSamplePDFs(int count, string workingFolder) {
var random = new Random();
for (var i = 0; i < count; i++) {
using (var ms = new System.IO.MemoryStream()) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, ms)) {
doc.Open();
var pageCount = random.Next(1, 10);
for (var j = 0; j < pageCount; j++) {
doc.NewPage();
doc.Add(new Paragraph(String.Format("This is page {0} of document {1}", j, i)));
}
doc.Close();
}
}
System.IO.File.WriteAllBytes(System.IO.Path.Combine(workingFolder, string.Format("File_{0}.pdf", i)), ms.ToArray());
}
}
}
Just to reiterate, you obviously wouldn't have a need for these three helper methods, they're just so that you and I have a common set of files to work from. These helper methods are also intentionally not commented.
Fourth, at the end of the code below I'm storing the final PDF into a byte array called finalFileBytes and I'm then writing that to disk. Once again, I'm working on the desktop so this is where you'd do Response.BinaryWrite(finalFileBytes) instead.
Fifth, there's different ways to merge and combine files. PdfCopy, PdfSmartCopy and PdfStamper are all commonly used. I encourage you to read the official iText/iTextSharp book or at least the free Chapter 6, Working with existing PDFs that goes into great detail about this. In the code below I'm using PdfSmartCopy and I'm converting each image to a PDF before importing them. There might be a better way but I'm not sure if you can do it all in one pass or not. Bruno would know better than me. But the below works.
See the individual code comments for more details on what's going on.
//The folder that all of our work will be done in
var workingFolder = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Pdf Test");
//This is the final PDF that we'll create for testing purposes
var finalPDF = System.IO.Path.Combine(workingFolder, "test.pdf");
//Create our working directory if it doesn't exist already
System.IO.Directory.CreateDirectory(workingFolder);
//Create sample PDFs and images
createSampleImages(10, workingFolder);
createSamplePDFs(10, workingFolder);
//Create our sample SSRS PDF byte array
var SSRS_Bytes = getSSRSPdfAsByteArray();
//This variable will eventually hold our combined PDF as a byte array
Byte[] finalFileBytes;
//Write everything to a MemoryStream
using (var finalFile = new System.IO.MemoryStream()) {
//Create a generic Document
using (var doc = new Document()) {
//Use PdfSmartCopy to intelligently merge files
using (var copy = new PdfSmartCopy(doc, finalFile)) {
//Open our document for writing
doc.Open();
//#1 - Import the SSRS report
//Bind a reader to our SSRS report
using (var reader = new PdfReader(SSRS_Bytes)) {
//Loop through each page
for (var i = 1; i <= reader.NumberOfPages; i++) {
//Add the imported page to our final document
copy.AddPage(copy.GetImportedPage(reader, i));
}
}
//#2 - Image the images
//Loop through each image in our working directory
foreach (var f in System.IO.Directory.EnumerateFiles(workingFolder, "*.jpg", SearchOption.TopDirectoryOnly)) {
//There's different ways to do this and it depends on what exactly "add an image to a PDF" really means
//Below we add each individual image to a PDF and then merge that PDF into the main PDF
//This could be optimized greatly
//From https://alandjackson.wordpress.com/2013/09/27/convert-an-image-to-a-pdf-in-c-using-itextsharp/
//Get the size of the current image
iTextSharp.text.Rectangle pageSize = null;
using (var srcImage = new Bitmap(f)) {
pageSize = new iTextSharp.text.Rectangle(0, 0, srcImage.Width, srcImage.Height);
}
//Will eventually hold the PDF with the image as a byte array
Byte[] imageBytes;
//Simple image to PDF
using (var m = new MemoryStream()) {
using (var d = new Document(pageSize, 0, 0, 0, 0)) {
using (var w = PdfWriter.GetInstance(d, m)) {
d.Open();
d.Add(iTextSharp.text.Image.GetInstance(f));
d.Close();
}
}
//Grab the bytes before closing out the stream
imageBytes = m.ToArray();
}
//Now merge using the same merge code as #1
using (var reader = new PdfReader(imageBytes)) {
for (var i = 1; i <= reader.NumberOfPages; i++) {
copy.AddPage(copy.GetImportedPage(reader, i));
}
}
}
//#3 - Merge additional PDF
//Look for each PDF in our working directory
foreach (var f in System.IO.Directory.EnumerateFiles(workingFolder, "*.pdf", SearchOption.TopDirectoryOnly)) {
//Because I'm writing samples files to disk but not cleaning up afterwards
//I want to avoid adding my output file as an input file
if (f == finalPDF) {
continue;
}
//Now merge using the same merge code as #1
using (var reader = new PdfReader(f)) {
for (var i = 1; i <= reader.NumberOfPages; i++) {
copy.AddPage(copy.GetImportedPage(reader, i));
}
}
}
doc.Close();
}
}
//Grab the bytes before closing the stream
finalFileBytes = finalFile.ToArray();
}
//At this point finalFileBytes holds a byte array of a PDF
//that contains the SSRS PDF, the sample images and the
//sample PDFs. For demonstration purposes I'm just writing to
//disk but this could be written to the HTTP stream
//using Response.BinaryWrite()
System.IO.File.WriteAllBytes(finalPDF, finalFileBytes);
I have written some code that merges together multiple PDF's into a single PDF that I then display from the MemoryStream. This works great. What I need to do is add a table of contents to the end of the file with links to the start of each of the individual PDF's. I planned on doing this using the GotoLocalPage action which has an option for page numbers but it doesn't seem to work. If I change the action to the code below to one of the presset ones like PDFAction.FIRSTPAGE it works fine. Does this not work because I am using the PDFCopy object for the writer parameter of GotoLocalPage?
Document mergedDoc = new Document();
MemoryStream ms = new MemoryStream();
PdfCopy copy = new PdfCopy(mergedDoc, ms);
mergedDoc.Open();
MemoryStream tocMS = new MemoryStream();
Document tocDoc = null;
PdfWriter tocWriter = null;
for (int i = 0; i < filesToMerge.Length; i++)
{
string filename = filesToMerge[i];
PdfReader reader = new PdfReader(filename);
copy.AddDocument(reader);
// Initialise TOC document based off first file
if (i == 0)
{
tocDoc = new Document(reader.GetPageSizeWithRotation(1));
tocWriter = PdfWriter.GetInstance(tocDoc, tocMS);
tocDoc.Open();
}
// Create link for TOC, added random number of 3 for now
Chunk link = new Chunk(filename);
PdfAction action = PdfAction.GotoLocalPage(3, new PdfDestination(PdfDestination.FIT), copy);
link.SetAction(action);
tocDoc.Add(new Paragraph(link));
}
// Add TOC to end of merged PDF
tocDoc.Close();
PdfReader tocReader = new PdfReader(tocMS.ToArray());
copy.AddDocument(tocReader);
copy.Close();
displayPDF(ms.ToArray());
I guess an alternative would be to link to a named element (instead of page number) but I can't see how to add an 'invisible' element to the start of each file before adding to the merged document?
I would just go with two passes. In your first pass, do the merge as you are but also record the filename and page number it should link to. In your second pass, use a PdfStamper which will give you access to a ColumnText that you can use general abstractions like Paragraph in. Below is a sample that shows this off:
Since I don't have your documents, the below code creates 10 documents with a random number of pages each just for testing purposes. (You obviously don't need to do this part.) It also creates a simple dictionary with a fake file name as the key and the raw bytes from the PDF as a value. You have a true file collection to work with but you should be able to adapt that part.
//Create a bunch of files, nothing special here
//files will be a dictionary of names and the raw PDF bytes
Dictionary<string, byte[]> Files = new Dictionary<string, byte[]>();
var r = new Random();
for (var i = 1; i <= 10; i++) {
using (var ms = new MemoryStream()) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, ms)) {
doc.Open();
//Create a random number of pages
for (var j = 1; j <= r.Next(1, 5); j++) {
doc.NewPage();
doc.Add(new Paragraph(String.Format("Hello from document {0} page {1}", i, j)));
}
doc.Close();
}
}
Files.Add("File " + i.ToString(), ms.ToArray());
}
}
This next block merges the PDFs. This is mostly the same as your code except that instead of writing a TOC here I'm just keeping track of what I want to write in the future. Where I'm using file.value you'd use your full file path and where I'm using file.key you'd use your file's name instead.
//Dictionary of file names (for display purposes) and their page numbers
var pages = new Dictionary<string, int>();
//PDFs start at page 1
var lastPageNumber = 1;
//Will hold the final merged PDF bytes
byte[] mergedBytes;
//Most everything else below is standard
using (var ms = new MemoryStream()) {
using (var document = new Document()) {
using (var writer = new PdfCopy(document, ms)) {
document.Open();
foreach (var file in Files) {
//Add the current page at the previous page number
pages.Add(file.Key, lastPageNumber);
using (var reader = new PdfReader(file.Value)) {
writer.AddDocument(reader);
//Increment our current page index
lastPageNumber += reader.NumberOfPages;
}
}
}
}
mergedBytes = ms.ToArray();
}
This last block actually writes the TOC. If we use a PdfStamper we can create a ColumnText which allows us to use Paragraphs
//Will hold the final PDF
byte[] finalBytes;
using (var ms = new MemoryStream()) {
using (var reader = new PdfReader(mergedBytes)) {
using (var stamper = new PdfStamper(reader, ms)) {
//The page number to insert our TOC into
var tocPageNum = reader.NumberOfPages + 1;
//Arbitrarily pick one page to use as the size of the PDF
//Additional logic could be added or this could just be set to something like PageSize.LETTER
var tocPageSize = reader.GetPageSize(1);
//Arbitrary margin for the page
var tocMargin = 20;
//Create our new page
stamper.InsertPage(tocPageNum, tocPageSize);
//Create a ColumnText object so that we can use abstractions like Paragraph
var ct = new ColumnText(stamper.GetOverContent(tocPageNum));
//Set the working area
ct.SetSimpleColumn(tocPageSize.GetLeft(tocMargin), tocPageSize.GetBottom(tocMargin), tocPageSize.GetRight(tocMargin), tocPageSize.GetTop(tocMargin));
//Loop through each page
foreach (var page in pages) {
var link = new Chunk(page.Key);
var action = PdfAction.GotoLocalPage(page.Value, new PdfDestination(PdfDestination.FIT), stamper.Writer);
link.SetAction(action);
ct.AddElement(new Paragraph(link));
}
ct.Go();
}
}
finalBytes = ms.ToArray();
}
We are using itextsharp to create a single PDF from multiple PDF files. How do I insert a new page into a PDF file that has multiple pages already in the file? When I use add page it is overwriting the existing pages and only saves the 1 page that was selected.
Here is the code that I am using to add the page to the existing PDF:
PdfReader reader = new PdfReader(sourcePdfPath);
Document document = new Document(reader.GetPageSizeWithRotation(1));
PdfCopy pdfCopy = new PdfCopy(document, new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
MemoryStream memoryStream = new MemoryStream();
PdfWriter writer = PdfWriter.GetInstance(document, memoryStream);
document.AddDocListener(writer);
document.Open();
for (int p = 1; p <= reader.NumberOfPages; p++)
{
if (pagesToExtract.FindIndex(s => s == p) == -1) continue;
document.SetPageSize(reader.GetPageSize(p));
document.NewPage();
PdfContentByte cb = writer.DirectContent;
PdfImportedPage pageImport = writer.GetImportedPage(reader, p);
int rot = reader.GetPageRotation(p);
if (rot == 90 || rot == 270)
{
cb.AddTemplate(pageImport, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(p).Height);
}
else
{
cb.AddTemplate(pageImport, 1.0F, 0, 0, 1.0F, 0, 0);
}
pdfCopy.AddPage(pageImport);
}
pdfCopy.Close();
This code works. You need to have a different file to output the results.
private static void AppendToDocument(string sourcePdfPath1, string sourcePdfPath2, string outputPdfPath)
{
using (var sourceDocumentStream1 = new FileStream(sourcePdfPath1, FileMode.Open))
{
using (var sourceDocumentStream2 = new FileStream(sourcePdfPath2, FileMode.Open))
{
using (var destinationDocumentStream = new FileStream(outputPdfPath, FileMode.Create))
{
var pdfConcat = new PdfConcatenate(destinationDocumentStream);
var pdfReader = new PdfReader(sourceDocumentStream1);
var pages = new List<int>();
for (int i = 0; i < pdfReader.NumberOfPages; i++)
{
pages.Add(i);
}
pdfReader.SelectPages(pages);
pdfConcat.AddPages(pdfReader);
pdfReader = new PdfReader(sourceDocumentStream2);
pages = new List<int>();
for (int i = 0; i < pdfReader.NumberOfPages; i++)
{
pages.Add(i);
}
pdfReader.SelectPages(pages);
pdfConcat.AddPages(pdfReader);
pdfReader.Close();
pdfConcat.Close();
}
}
}
}
I've tried this code, and it works for me, but don't forget to do some validations of the number of pages and existence of the paths you use
here is the code:
private static void AppendToDocument(string sourcePdfPath, string outputPdfPath, List<int> neededPages)
{
var sourceDocumentStream = new FileStream(sourcePdfPath, FileMode.Open);
var destinationDocumentStream = new FileStream(outputPdfPath, FileMode.Create);
var pdfConcat = new PdfConcatenate(destinationDocumentStream);
var pdfReader = new PdfReader(sourceDocumentStream);
pdfReader.SelectPages(neededPages);
pdfConcat.AddPages(pdfReader);
pdfReader.Close();
pdfConcat.Close();
}
You could use something like this, where src is the IEnumerable<string> of input pdf filenames. Just make sure that your existing pdf file is one of those sources.
The PdfConcatenate class is in the latest iTextSharp release.
var result = "combined.pdf";
var fs = new FileStream(result, FileMode.Create);
var conc = new PdfConcatenate(fs, true);
foreach(var s in src) {
var r = new PdfReader(s);
conc.AddPages(r);
}
conc.Close();
PdfCopy is intended for use with an empty Document. You should add everything you want, one page at a time.
The alternative is to use PdfStamper.InsertPage(pageNum, rectangle) and then draw a PdfImportedPage onto that new page.
Note that PdfImportedPage only includes the page contents, not the annotations or doc-level information ("document structure", doc-level javascripts, etc) that page may have originally used... unless you use one with PdfCopy.
A Stamper would probably be more efficient and use less code, but PdfCopy will import all the page-level info, not just the page's contents.
This might be important, it might not. It depends on what page you're trying to import.
Had to even out the page count with a multiple of 4:
private static void AppendToDocument(string sourcePdfPath)
{
var tempFileLocation = Path.GetTempFileName();
var bytes = File.ReadAllBytes(sourcePdfPath);
using (var reader = new PdfReader(bytes))
{
var numberofPages = reader.NumberOfPages;
var modPages = (numberofPages % 4);
var pages = modPages == 0 ? 0 : 4 - modPages;
if (pages == 0)
return;
using (var fileStream = new FileStream(tempFileLocation, FileMode.Create, FileAccess.Write))
{
using (var stamper = new PdfStamper(reader, fileStream))
{
var rectangle = reader.GetPageSize(1);
for (var i = 1; i <= pages; i++)
stamper.InsertPage(numberofPages + i, rectangle);
}
}
}
File.Delete(sourcePdfPath);
File.Move(tempFileLocation, sourcePdfPath);
}
I know I'm really late to the part here, but I mixed a bit of the two best answers and created a method if anyone needs it that adds a list of source PDF documents to a single document using itextsharp.
private static void appendToDocument(List<string> sourcePDFList, string outputPdfPath)
{
//Output document name and creation
FileStream destinationDocumentStream = new FileStream(outputPdfPath, FileMode.Create);
//Object to concat source pdf's to output pdf
PdfConcatenate pdfConcat = new PdfConcatenate(destinationDocumentStream);
//For each source pdf in list...
foreach (string sourcePdfPath in sourcePDFList)
{
//If file exists...
if (File.Exists(sourcePdfPath))
{
//Open the document
FileStream sourceDocumentStream = new FileStream(sourcePdfPath, FileMode.Open);
//Read the document
PdfReader pdfReader = new PdfReader(sourceDocumentStream);
//Create an int list
List<int> pages = new List<int>();
//for each page in pdfReader
for (int i = 1; i < pdfReader.NumberOfPages + 1; i++)
{
//Add that page to the list
pages.Add(i);
}
//Add that page to the pages to add to ouput document
pdfReader.SelectPages(pages);
//Add pages to output page
pdfConcat.AddPages(pdfReader);
//Close reader
pdfReader.Close();
}
}
//Close pdfconcat
pdfConcat.Close();
}