ITextSharp - using PdfStamper resulting MemoryStream to close

ITextSharp - using PdfStamper resulting MemoryStream to close - c#

I'm using ITextSharp to split multi-page PDF files into single page files. I also managed to add those single page PDFs to a zip file using MemoryStream.
Now, I need to add password protection to those PDFs using PdfStamper, before adding them into a zip file. But whenever I tried this, an ObjectDisposedException - Cannot access a closed Stream. is being throwed.
Ionic.Zip.ZipFile zipFile = new Ionic.Zip.ZipFile();
int cnt = 0;
try
{
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(new iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdfPath), new ASCIIEncoding().GetBytes(""));
for (cnt = 1; cnt <= reader.NumberOfPages; cnt++)
{
using (MemoryStream memoryStream = new MemoryStream())
{
using (iTextSharp.text.Document document = new iTextSharp.text.Document())
{
iTextSharp.text.pdf.PdfWriter writer = iTextSharp.text.pdf.PdfWriter.GetInstance(document, memoryStream);
using (PdfStamper stamper = new PdfStamper(reader, memoryStream))
{
stamper.SetEncryption(
null,
Encoding.ASCII.GetBytes("password_here"),
PdfWriter.ALLOW_PRINTING,
PdfWriter.ENCRYPTION_AES_128);
}
memoryStreamForZipFile = new MemoryStream(memoryStream.ToArray());
memoryStreamForZipFile.Seek(0, SeekOrigin.Begin);
}
}
}
zipFile.Save(destinationFolder + "/" + fileName.Replace(".pdf", ".zip"));
reader.Close();
reader.Dispose();
}
catch
{
}
finally
{
GC.Collect();
}
return cnt - 1;
I have removed some codes above for clarity.
If I'll remove the PdfStamper "using" block, the code works just fine. I also tried to juggle the position of PdfStamper to see if I used it in the wrong place.
Am I not using using blocks properly? Or I have to fix some code sequence in here?

You removed some lines that are essential are wrong; for instance: I assume that you are adding a PdfImportedPage to the PdfContentByte of a PdfWriter. If that's so, you are ignoring all the warnings given in the official documentation.
You should replace your code by something like this:
PdfReader reader = new PdfReader(pathToFile);
int n = reader.NumberOfPages;
int cnt;
for (cnt = 1; cnt <= reader.NumberOfPages; cnt++)
{
reader = new PdfReader(pathToFile);
reader.SelectPages(cnt.ToString());
MemoryStream memoryStream = new MemoryStream();
using (PdfStamper stamper = new PdfStamper(reader, memoryStream))
{
stamper.SetEncryption(
null,
Encoding.ASCII.GetBytes("password_here"),
PdfWriter.ALLOW_PRINTING,
PdfWriter.ENCRYPTION_AES_128);
}
reader.Close();
// now do something with the memoryStream.ToArray()
}
As you can see, there is no need to introduce a Document or a PdfWriter object. If you use those classes, you throw away all interactivity that exists in the original pages. You also get into trouble if the page size of the original pages is different from A4.
Note that you can't reuse the PdfReader instance when using PdfStamper. Once you pass a PdfReader instance to a PdfStamper, that instance is tampered.

Related

C# iTextSharp Merge multiple pdf via byte array

I am new to using iTextSharp and working with Pdf files in general, but I think I'm on the right track.
I iterate through a list of pdf files, convert them to bytes, and push all of the resulting bytes into a byte array. From there I pass the byte array to concatAndAddContent() to merge all of the pdf's into a single large pdf. Currently I'm just getting the last pdf in the list (they seem to be overwriting)
public static byte[] concatAndAddContent(List<byte[]> pdfByteContent)
{
byte[] allBytes;
using (MemoryStream ms = new MemoryStream())
{
Document doc = new Document();
PdfWriter writer = PdfWriter.GetInstance(doc, ms);
doc.SetPageSize(PageSize.LETTER);
doc.Open();
PdfContentByte cb = writer.DirectContent;
PdfImportedPage page;
PdfReader reader;
foreach (byte[] p in pdfByteContent)
{
reader = new PdfReader(p);
int pages = reader.NumberOfPages;
// loop over document pages
for (int i = 1; i <= pages; i++)
{
doc.SetPageSize(PageSize.LETTER);
doc.NewPage();
page = writer.GetImportedPage(reader, i);
cb.AddTemplate(page, 0, 0);
}
}
doc.Close();
allBytes = ms.GetBuffer();
ms.Flush();
ms.Dispose();
}
return allBytes;
}
Above is the working code that results in a single pdf being created, and the rest of the files are being ignored. Any suggestions

This is pretty much just a C# version of Bruno's code here.
This is pretty much the simplest, safest and recommended way to merge PDF files. The PdfSmartCopy object is able to detect redundancies in the multiple files which can reduce file size some times. One of the overloads on it accepts a full PdfReader object which can be instantiated however you want.
public static byte[] concatAndAddContent(List<byte[]> pdfByteContent) {
using (var ms = new MemoryStream()) {
using (var doc = new Document()) {
using (var copy = new PdfSmartCopy(doc, ms)) {
doc.Open();
//Loop through each byte array
foreach (var p in pdfByteContent) {
//Create a PdfReader bound to that byte array
using (var reader = new PdfReader(p)) {
//Add the entire document instead of page-by-page
copy.AddDocument(reader);
}
}
doc.Close();
}
}
//Return just before disposing
return ms.ToArray();
}
}

List<byte[]> finallist= new List<byte[]>();
finallist.Add(concatAndAddContent(bytes));
System.IO.File.WriteAllBytes("path",finallist);

iTextSharp - Printing Merged PDF Templates

I am using iTextSharp in asp.net (C#) in order to merge more than one PDf template. There is a functionality of print which prints only data of the template. While merging the templates the fields/Controls which are available in the template is renamed using RenameField Method of iTextSharp.
This implementation has broken the Print functionality. As the Print functionality has been written according to the fields.
For merging Templates, I am using a PDFCopy.
Document document = new Document();
bool flag = true;
using (FileStream fileStream = File.Create(newFile))
{
PdfSmartCopy copy = new PdfSmartCopy(document, fileStream);
PdfReader reader;
MemoryStream baos;
for (int i = 0; i < loopCount; i++)
{
reader = new PdfReader(pdfTemplate);
baos = new MemoryStream();
stamper = new PdfStamper(reader, baos);
AcroFields pdfDoc = stamper.AcroFields;
BuildData(datarow,pdfDoc, obj)
renameFields(reader);
stamper.FormFlattening = false;
stamper.Close();
reader = new PdfReader(baos.ToArray());
copy.AddPage(copy.GetImportedPage(reader, 1));
}
document.Close();
strFileName = newFile;
}
private static void renameFields(PdfReader pdfReader)
{
string prepend = String.Format("_{0}", counter++);
foreach (KeyValuePair<string, AcroFields.Item> de in pdfReader.AcroFields.Fields)
{
pdfReader.AcroFields.RenameField(de.Key.ToString(), prepend + de.Key.ToString());
}
}
Edit1: This is the solution i found in itextsharp doc,BUT it's not working
"Using PdfCopy with documents
that have named destinations is one of these exceptions. All annotations, such
as link annotations, are kept with PdfCopy, but they no longer work for links to local
named destinations. There is a workaround for this problem."
PdfReader[] readers = {
new PdfReader(LinkActions.RESULT2),
new PdfReader(LinkActions.RESULT1) };
Document document = new Document();
PdfCopy copy =
new PdfCopy(document, new FileOutputStream(RESULT1));
document.open();
int n;
for (int i = 0; i < readers.length; i++) {
readers[i].consolidateNamedDestinations();
n = readers[i].getNumberOfPages();
for (int page = 0; page < n; ) {
copy.addPage(copy.getImportedPage(readers[i], ++page));
}
}

The forms no longer work because you have forgotten to add a single line: copy.setMergeFields();
See the MergeForms2 example:
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
Document document = new Document();
PdfCopy copy = new PdfSmartCopy(document, new FileOutputStream(dest));
copy.setMergeFields();
document.open();
List<PdfReader> readers = new ArrayList<PdfReader>();
for (int i = 0; i < 3; ) {
PdfReader reader = new PdfReader(renameFields(src, ++i));
readers.add(reader);
copy.addDocument(reader);
}
document.close();
for (PdfReader reader : readers) {
reader.close();
}
}
public byte[] renameFields(String src, int i) throws IOException, DocumentException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, baos);
AcroFields form = stamper.getAcroFields();
Set<String> keys = new HashSet<String>(form.getFields().keySet());
for (String key : keys) {
form.renameField(key, String.format("%s_%d", key, i));
}
stamper.close();
reader.close();
return baos.toByteArray();
}
It seems that you're also making the assumption that your template consists of only one page:
copy.addPage(copy.getImportedPage(reader, 1));
It is safer to add the document all at once:
copy.addDocument(reader);
Important:
My examples are written in Java. You are working with iTextSharp in C#. You will have to adapt the methods by changing the Java-specific methods into C#-specific properties or methods.

Returning memorystream - gives corrupt PDF file or "cannot accessed a closed stream"

I have a web service, which calls the following method. I want to return a memorystream, which is a PDF file.
Now, the problem is the PDF file is corrupt with the following code. I think it's because the files are not being closed. However, if I close them, I get the classic error "Cannot access a closed stream".
When I previously saved it through a filestream, the PDF file wasn't corrupt.
So my humble question is: How to solve it and get back a non-corrupt PDF file? :-)
My code:
public Stream Generate(GiftModel model)
{
var template = HttpContext.Current.Server.MapPath(TemplatePath);
// Magic code which creates a new PDF file from the stream of the other
PdfReader reader = new PdfReader(template);
Rectangle size = reader.GetPageSizeWithRotation(1);
Document document = new Document(size);
MemoryStream fs = new MemoryStream();
PdfWriter writer = PdfWriter.GetInstance(document, fs);
document.Open();
// Two products on every page
int bookNumber = 0;
int pagesWeNeed = (int)Math.Ceiling(((double)model.Books.Count / (double)2));
for (var i = 0; i < pagesWeNeed; i++)
{
PdfContentByte cb = writer.DirectContent;
// Creates a new page
PdfImportedPage page = writer.GetImportedPage(reader, 1);
cb.AddTemplate(page, 0, 0);
// Add text strings
DrawGreetingMessages(model.FromName, model.ReceiverName, model.GiftMessage, cb);
// Draw the books
DrawBooksOnPage(model.Books.Skip(bookNumber).Take(2).ToList(), cb);
// Draw boring shit
DrawFormalities(true, model.GiftLink, cb);
bookNumber += 2;
}
// Close off our streams because we can
//document.Close();
//writer.Close();
reader.Close();
fs.Position = 0;
return fs;
}

Reuse of streams can be problematic, especially if you are using an abstraction and you don't quite know what it is doing to your stream. Because of this I generally recommend never passing streams themselves around. If you can by with it, try just passing the raw underlying byte array itself. But if passing streams is a requirement then I recommend still doing the raw byte array at the end and then wrapping that in a new second stream. Try the below code to see if it works.
public Stream Generate(GiftModel model)
{
//We'll dump our PDF into these when done
Byte[] bytes;
using (var ms = new MemoryStream())
{
using (var doc = new Document())
{
using (var writer = PdfWriter.GetInstance(doc, ms))
{
doc.Open();
doc.Add(new Paragraph("Hello"));
doc.Close();
}
}
//Finalize the contents of the stream into an array
bytes = ms.ToArray();
}
//Return a new stream wrapped around our array
return new MemoryStream(bytes);
}

Remove Javascript from PDF using iTextSharp

This seems like something that should be quick to do, but in practice there seems to be a problem. I have a bunch of PDF forms that include form fields and embedded javascript. I would like to remove the javascript code safely, but leave the PDF form fields intact.
So far I've been able to find lots of solutions, but all the solutions have either eliminated both the javascript and the form fields, or left both intact.
Here's solution A; it copies both form fields and javascript:
var pdfReader = new PdfReader(infilename);
using (MemoryStream memoryStream = new MemoryStream()) {
PdfCopyFields copy = new PdfCopyFields(memoryStream);
copy.AddDocument(pdfReader);
copy.Close();
File.WriteAllBytes(rawfilename, memoryStream.ToArray());
}
Alternately, I have solution B, that strips out both form fields and javascript:
Document document = new Document();
using (MemoryStream memoryStream = new MemoryStream()) {
PdfWriter writer = PdfWriter.GetInstance(document, memoryStream);
document.Open();
document.AddDocListener(writer);
for (int p = 1; p <= pdfReader.NumberOfPages; p++) {
document.SetPageSize(pdfReader.GetPageSize(p));
document.NewPage();
PdfContentByte cb = writer.DirectContent;
PdfImportedPage pageImport = writer.GetImportedPage(pdfReader, p);
int rot = pdfReader.GetPageRotation(p);
if (rot == 90 || rot == 270) {
cb.AddTemplate(pageImport, 0, -1.0F, 1.0F, 0, 0, pdfReader.GetPageSizeWithRotation(p).Height);
} else {
cb.AddTemplate(pageImport, 1.0F, 0, 0, 1.0F, 0, 0);
}
}
document.Close();
File.WriteAllBytes(rawfile, memoryStream.ToArray());
}
Does anyone know how to modify either solution A or B to eliminate the javascript but leave the form fields in place?
EDIT: Solution code is here!
using (MemoryStream memoryStream = new MemoryStream()) {
PdfStamper stamper = new PdfStamper(pdfReader, memoryStream);
for (int i = 0; i <= pdfReader.XrefSize; i++) {
object o = pdfReader.GetPdfObject(i);
PdfDictionary pd = o as PdfDictionary;
if (pd != null) {
pd.Remove(PdfName.AA);
pd.Remove(PdfName.JS);
pd.Remove(PdfName.JAVASCRIPT);
}
}
stamper.Close();
pdfReader.Close();
File.WriteAllBytes(rawfile, memoryStream.ToArray());
}

To manipulate a single PDF you should use the class PdfStamper and manipulate its contents, in your case iterating over the existing form fields and removing the JavaScript entries.
The iTextSharp sample AddJavaScriptToForm.cs corresponding to AddJavaScriptToForm.java from chapter 13 of iText in Action — 2nd Edition shows how JavaScript actions are added to fields, the central code being:
PdfStamper stamper = new PdfStamper(reader, ms);
AcroFields form = stamper.AcroFields;
AcroFields.Item fd = form.GetFieldItem("married");
PdfDictionary dictYes = (PdfDictionary) PdfReader.GetPdfObject(fd.GetWidgetRef(0));
PdfDictionary yesAction = ...;
dictYes.Put(PdfName.AA, yesAction);
Thus, to remove such JavaScript form field actions you have to iterate over all those PDF form fields and remove the /AA values in the associated dictionaries:
dictXXX.Remove(PdfName.AA);
EDIT: (provided by Ted Spence) Here is the final code that successfully removes javascript while leaving all form fields intact:
using (MemoryStream memoryStream = new MemoryStream())
{
PdfStamper stamper = new PdfStamper(pdfReader, memoryStream);
for (int i = 0; i <= pdfReader.XrefSize; i++)
{
PdfDictionary pd = pdfReader.GetPdfObject(i) as PdfDictionary;
if (pd != null)
{
pd.Remove(PdfName.AA); // Removes automatic execution objects
pd.Remove(PdfName.JS); // Removes javascript objects
pd.Remove(PdfName.JAVASCRIPT); // Removes other javascript objects
}
}
stamper.Close();
pdfReader.Close();
File.WriteAllBytes(rawfile, memoryStream.ToArray());
}
EDIT: (by mkl) The solution above is somewhat overachieving because it touches each and every indirect dictionary object. On the other hand it ignores inline dictionaries (I haven't checked the spec, though; maybe all /AA, /JS, and /JAVASCRIPT entries appear only in dictionaries which have to be indirect objects, or at least are de-referenced by this code).
If fulfilling this task was my job, I would try and access the objects possibly carrying JavaScript more specifically.
The advantage of this overachieving procedure might be, though, that even PDF objects are inspected which currently are not specified as carrying JavaScript but will be in later PDF versions.

Add the following lines after the for loop to keep the AcroForm:
var form = pdfReader.AcroForm;
if (form != null)
writer.CopyAcroForm(reader);

How to add images to pdf with template and concat of pages

I'm struggling to insert images on a multi-page PDF.
To create several pages I'm using PdfConcatenate, and it works. I get to add pages of my template perfectly. The problem starts when I try to add images. It just doesn't load them.
Here's the code that works to add images:
string pdfTemplate = #"Tools\template.pdf";
string targetPdfPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), fileName + ".pdf");
FileStream output = new FileStream(targetPdfPath, FileMode.Create);
PdfConcatenate pdfConcatenate = new PdfConcatenate(output);
PdfReader pdfReader = new PdfReader(pdfTemplate);
MemoryStream memoryStream = getMemoryStream(output);
PdfStamper pdfStamper = new PdfStamper(pdfReader, output);
int cardIndex = 1;
foreach (Registry reg in registries)
{
setFields(reg, pdfStamper, cardIndex);
if (cardIndex == 4)
{
pdfConcatenate.AddPages(pdfReader);
pdfReader = new PdfReader(pdfTemplate);
pdfStamper = new PdfStamper(pdfReader, output);
cardIndex = 1;
}
else
{
cardIndex++;
}
}
//if (cardIndex != 1)
// pdfConcatenate.AddPages(pdfReader);
//make the form no longer editable
pdfStamper.FormFlattening = true;
pdfStamper.Close();
pdfReader.Close();
//pdfConcatenate.Close();
If use MemoryStream for PdfStamper and uncomment these lines:
//if (cardIndex != 1)
// pdfConcatenate.AddPages(pdfReader);
//pdfConcatenate.Close();
I get it to add pages, but without images.
Any idea of what is wrong?
SOLUTION: (Thanks to #mkl)
string pdfTemplate = #"Tools\template.pdf";
string targetPdfPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), fileName + ".pdf");
FileStream output = new FileStream(targetPdfPath, FileMode.Create);
PdfConcatenate pdfConcatenate = new PdfConcatenate(output);
PdfReader pdfReader = new PdfReader(pdfTemplate);
MemoryStream memoryStream = new MemoryStream();
PdfStamper pdfStamper = new PdfStamper(pdfReader, memoryStream);
int cardIndex = 1;
foreach (Registry reg in registries)
{
setFields(reg, pdfStamper, cardIndex);
if (cardIndex == 4)
{
pdfStamper.FormFlattening = true;
pdfStamper.Close();
PdfReader tempReader = new PdfReader(memoryStream.ToArray());
pdfConcatenate.AddPages(tempReader);
memoryStream = new MemoryStream();
pdfReader = new PdfReader(pdfTemplate);
pdfStamper = new PdfStamper(pdfReader, memoryStream);
cardIndex = 1;
}
else
{
cardIndex++;
}
}
if (cardIndex != 1)
{
pdfStamper.FormFlattening = true;
pdfStamper.Close();
PdfReader tempReader = new PdfReader(memoryStream.ToArray());
pdfConcatenate.AddPages(tempReader);
tempReader.Close();
}
pdfStamper.Close();
pdfReader.Close();
pdfConcatenate.Close();

The problem most likely is some misconception on how PdfStamperworks. You seem to think it somehow manipulates the data in the PdfReader it stamps, and also pages exported from that reader beforehand. This is not the case, a PdfStamper generates a new PDF file (in its output stream) based on the data in the reader but the contents of the reader itself are not updated to also reflect all the changes (the PdfReader object may be touched in the process, though, and not be reusable afterwards). So...
As already mentioned in the comment, you have the PdfConcatenate and an unknown number of PdfStamper instances all writing the same `FileStream' output. As each of these objects creates an independant PDF, you are lucky if one of then wins because then you'll get at least a proper PDF as output. Otherwise you either get some exception or garbage consisting of multiple intermingled PDFs. Thus, make only PdfConcatenate target your output file.
If your actual intent is to repeatedly fill the template fields with the content of 4 cards each time and combine the results, you should not add the pages from the PdfReader of the template to the PdfConcatenate --- the pages in that reader are not filled in! --- but instead have the PdfStamper output to a MemoryStream, fill its fields, flatten its form, close it, open its output in a new PdfReader, and add all the pages in that reader to the PdfConcatenate.
I don't dare to put that into code as I'm predominantly using Java and writing down untested C# code most likely would include multiple errors... ;)
PS: Currently you count on all the PdfReader instances you open to be implicitly closed somewhere. While that is true currently, recent check-ins in the iText SVN repository seem to indicate that these implicit close calls are removed from the code. Thus, please also start explicitly closing PdfReader instances you dont't use anymore. Otherwise you will soon have to deal with memory leaks due to readers closing much too late..

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.