Compression of Splited PDF Files

Compression of Splited PDF Files - c#

How to compress a sliced pdf documents in c#..??
i have a pdf document. i am slicing that document. if the orginal pdf document size 10 mb after slicing size is increasing to 15 mb. thats why i have to compress the sliced document. is any way to compress..?? please help me..
public int ExtractPages(string sourcePdfPath, string DestinationFolder)
{
int p = 0, initialcount = 0;
try
{
iTextSharp.text.Document document;
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(new iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdfPath), new ASCIIEncoding().GetBytes(""));
if (!Directory.Exists(DestinationFolder))
{
Directory.CreateDirectory(DestinationFolder);
}
else
{
DirectoryInfo di = new DirectoryInfo(DestinationFolder);
initialcount = di.GetFiles("*.pdf", SearchOption.AllDirectories).Length;
}
for (p = 1; p <= reader.NumberOfPages; p++)
{
using (MemoryStream memoryStream = new MemoryStream())
{
document = new iTextSharp.text.Document();
iTextSharp.text.pdf.PdfWriter writer = iTextSharp.text.pdf.PdfWriter.GetInstance(document, memoryStream);
writer.SetPdfVersion(iTextSharp.text.pdf.PdfWriter.PDF_VERSION_1_2);
writer.CompressionLevel = iTextSharp.text.pdf.PdfStream.BEST_COMPRESSION;
writer.SetFullCompression();
document.SetPageSize(reader.GetPageSize(p));
document.NewPage();
document.Open();
document.AddDocListener(writer);
iTextSharp.text.pdf.PdfContentByte cb = writer.DirectContent;
iTextSharp.text.pdf.PdfImportedPage pageImport = writer.GetImportedPage(reader, p);
int rot = reader.GetPageRotation(p);
if (rot == 90 || rot == 270)
{
cb.AddTemplate(pageImport, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(p).Height);
}
else
{
cb.AddTemplate(pageImport, 1.0F, 0, 0, 1.0F, 0, 0);
}
document.Close();
document.Dispose();
File.WriteAllBytes(DestinationFolder + "/" + p + ".pdf", memoryStream.ToArray());
}
}
reader.Close();
reader.Dispose();
}
catch
{
}
finally
{
GC.Collect();
}
if (initialcount > (p - 1))
{
for (int k = (p - 1) + 1; k <= initialcount; k++)
{
try
{
File.Delete(DestinationFolder + "/" + k + ".pdf");
}
catch
{
}
}
}
return p - 1;
}

First of all you should not use PdfWriter with GetImportedPage and its direct content with AddTemplate for a task like that at hand. Instead have a look at the Webified iTextSharp Examples of iText in Action — 2nd Edition.
There you'll find the sample Burst.cs with the central code
PdfReader reader = new PdfReader(pdf);
// loop over all the pages in the original PDF
int n = reader.NumberOfPages;
for (int i = 0; i < n; i++)
{
using (MemoryStream ms = new MemoryStream())
{
// We'll create as many new PDFs as there are pages
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
copy.AddPage(copy.GetImportedPage(reader, i + 1));
}
}
// store ms.ToArray() somewhere
}
}
(I removed some ZIP file packing those webified samples use.)
As you see, no need anymore to deal with page rotations or anything.
Now this all being said, the sum of the sizes of the individual files will very likely be larger than the size of the original file. After all, in the original file resources could be shared. E,g, a font used on all pages only needed to be embedded once while in the split documents the font has to be embedded in each individual document with a page on which that font is used.
PS: If keeping meta information is important, you might want to use PdfReader.selectPages and PdfStamper instead. For this I only have Java code:
for (int i = 1; i <= TEST_FILE_PAGES; i++)
{
FileOutputStream fos = new FileOutputStream(String.format("%03d.pdf", i));
PdfReader reader = new PdfReader(TEST_FILE);
reader.selectPages(Collections.singletonList(i));
PdfStamper stamper = new PdfStamper(reader, fos);
stamper.close();
fos.close();
}
This keeps the PDF meta information and, therefore, might be more apropos depending on your requirements. It is much slower, though, as for each page export the PdfReader contents are manipulated and, therefore, have to be re-read for exporting the next page.

Related

Updating existing markup (FreeText Callout) PDF using itext7 .NET

I have a code below to update existing markup (FreeText Callout) PDF using itext7 .NET. It does not appear correctly, but edit it in the bluebeam then it is shown the correct content as this image:
What am I missing?
public void UpdateMarkupCallout()
{
string inPDF = #"C:\in PDF.pdf";
string outPDF = #"C:\out PDF.pdf";
PdfDocument pdfDoc = new PdfDocument(new PdfReader(inPDF), new PdfWriter(outPDF));
int numberOfPages = pdfDoc.GetNumberOfPages();
for (int i = 1; i <= numberOfPages; i++)
{
PdfDictionary page = pdfDoc.GetPage(i).GetPdfObject();
PdfArray annotArray = page.GetAsArray(PdfName.Annots);
if (annotArray == null)
{
continue;
}
int size = annotArray.Size();
for (int x = 0; x < size; x++)
{
PdfDictionary curAnnot = annotArray.GetAsDictionary(x);
if (curAnnot.GetAsString(PdfName.Contents) != null)
{
string contents = curAnnot.GetAsString(PdfName.Contents).ToString();
if (contents != "" && contents.Contains("old content"))
{
curAnnot.Put(PdfName.Contents, new PdfString("new content"));
}
}
}
}
pdfDoc.Close();
}
The attached files: here

The answer is in Java but conversion to C# should be a matter of some easy letter case replacements and small tweaks.
Unfortunately, there is no silver bullet solution here, at least not without significant effort.
1. Partial proper solution
There are several issues here. First, you are only updating /Contents key, while the annotations you are editing also have /RC key which stands for A rich text string (see Adobe XML Architecture, XML Forms Architecture (XFA) Specification, version 3.3) that shall be used to generate the appearance of the annotation. (ISO 32000).
On top of that, the appearance (/AP entry) must be regenerated. as dictated by the specification. This is not what iText is capable of doing at the moment, so you will have to do it yourself.
You need to determine the area where the text must be drawn, taking /RD, or rect diff entry into account.
To create your appearance you can use pdfHTML add-on which would process the rich text representation from /RC into layout elements that you can transfer to an XObject that you can put into /AP.
With the code similar to the following:
PdfDocument pdfDocument = new PdfDocument(new PdfReader("in PDF.pdf"),
new PdfWriter("out PDF.pdf"));
int numberOfPages = pdfDocument.getNumberOfPages();
for (int i = 1; i <= numberOfPages; i++) {
PdfDictionary page = pdfDocument.getPage(i).getPdfObject();
PdfArray annotArray = page.getAsArray(PdfName.Annots);
if (annotArray == null) {
continue;
}
int size = annotArray.size();
for (int x = 0; x < size; x++) {
PdfDictionary curAnnot = annotArray.getAsDictionary(x);
if (curAnnot.getAsString(PdfName.Contents) != null) {
String contents = curAnnot.getAsString(PdfName.Contents).toString();
if (!contents.isEmpty() && contents.contains("old content")) //set layer for a FreeText with this content
{
curAnnot.put(PdfName.Contents, new PdfString("new content"));
String richText = curAnnot.getAsString(PdfName.RC).toUnicodeString();
Document document = Jsoup.parse(richText);
for (Element element : document.select("p")) {
element.html("new content");
}
curAnnot.put(PdfName.RC, new PdfString(document.body().outerHtml()));
Rectangle bbox = curAnnot.getAsRectangle(PdfName.Rect);
Rectangle textBbox = bbox.clone();
// left, top, right, bottom
PdfArray rectDiff = curAnnot.getAsArray(PdfName.RD);
if (rectDiff != null) {
textBbox.applyMargins(rectDiff.getAsNumber(1).floatValue(),
rectDiff.getAsNumber(2).floatValue(),
rectDiff.getAsNumber(3).floatValue(),
rectDiff.getAsNumber(0).floatValue(), false);
}
float leftRectDiff = rectDiff != null ? rectDiff.getAsNumber(0).floatValue() : 0;
float topRectDiff = rectDiff != null ? rectDiff.getAsNumber(1).floatValue() : 0;
List<IElement> elements = HtmlConverter.convertToElements(document.body().outerHtml());
PdfFormXObject appearance = new PdfFormXObject(
new Rectangle(0, 0, bbox.getWidth(), bbox.getHeight()));
Canvas canvas = new Canvas(new PdfCanvas(appearance, pdfDocument),
new Rectangle(leftRectDiff, topRectDiff, textBbox.getWidth(), textBbox.getHeight()));
canvas.setProperty(Property.RENDERING_MODE, RenderingMode.HTML_MODE);
for (IElement ele : elements) {
if (ele instanceof IBlockElement) {
canvas.add((IBlockElement) ele);
}
}
curAnnot.getAsDictionary(PdfName.AP).put(PdfName.N, appearance.getPdfObject());
}
}
}
}
pdfDocument.close();
You would get the result that looks like that:
You can see that the new text is displayed as expected, but the overall visual representation is far from our expectations - the background filling, the borders and the arrows are missing. So to generate the appearance properly you would have to further explore other PDF properties such as /CL (arrow descriptors), /BS (border style), /C (background color) etc. This takes quite some time - reading up on the spec, parsing the relevant entries and applying those in your drawing operations. You can get some inspiration from PdfFormField class implementation.
2. Easy solution without any guarantees
In case you expect the text in your annotation to consist of only one line, be plain Latin text and in general the variability of the input documents is small, you can take the current appearance and assume that the text string will be written there in one chunk (it's the case for your input document).
Note that this is a hacky approach which is prone to many potential errors/bugs.
Sample code:
PdfDocument pdfDocument = new PdfDocument(new PdfReader("in PDF.pdf"),
new PdfWriter("out PDF.pdf"));
int numberOfPages = pdfDocument.getNumberOfPages();
for (int i = 1; i <= numberOfPages; i++) {
PdfDictionary page = pdfDocument.getPage(i).getPdfObject();
PdfArray annotArray = page.getAsArray(PdfName.Annots);
if (annotArray == null) {
continue;
}
int size = annotArray.size();
for (int x = 0; x < size; x++) {
PdfDictionary curAnnot = annotArray.getAsDictionary(x);
if (curAnnot.getAsString(PdfName.Contents) != null) {
String contents = curAnnot.getAsString(PdfName.Contents).toString();
String oldContent = "old content";
if (!contents.isEmpty() && contents.contains(oldContent)) {
String newContent = "new content";
curAnnot.put(PdfName.Contents, new PdfString(newContent));
String richText = curAnnot.getAsString(PdfName.RC).toUnicodeString();
Document document = Jsoup.parse(richText);
for (Element element : document.select("p")) {
element.html(newContent);
}
curAnnot.put(PdfName.RC, new PdfString(document.body().outerHtml()));
PdfStream currentAppearance = curAnnot.getAsDictionary(PdfName.AP).getAsStream(PdfName.N);
String currentBytes = new String(currentAppearance.getBytes(), StandardCharsets.UTF_8);
currentBytes = currentBytes.replace("(" + oldContent + ") Tj", "(" + newContent + ") Tj");
currentAppearance.setData(currentBytes.getBytes(StandardCharsets.UTF_8));
}
}
}
}
pdfDocument.close();
Visual result (as you can see, this is what we want):
3. Non-compliant solution
Another way, which is not compliant with the PDF specification, is to remove /AP entry whatsoever. You can do it in the very same loop with curAnnot.remove(PdfName.AP);. Most major PDF viewers are going to regenerate the appearance themselves. However, my viewer generated the appearance in not the most appealing way:
So as you can see the result will depend on the PDF-viewer and this very well illustrates the reason why PDF specification mandates presence of /AP. Once again, this way is not compliant with the PDF spec .

Convert Pdf file pages to Images with itextsharp

I want to convert Pdf pages in Images using ItextSharp lib.
Have any idea how to convert each page in image file

iText/iTextSharp can generate and/or modify existing PDFs but they do not perform any rendering which is what you are looking for. I would recommend checking out Ghostscript or some other library that knows how to actually render a PDF.

you can use ImageMagick convert pdf to image
convert -density 300 "d:\1.pdf" -scale #1500000 "d:\a.jpg"
and split pdf can use itextsharp
here is the code from others.
void SplitePDF(string filepath)
{
iTextSharp.text.pdf.PdfReader reader = null;
int currentPage = 1;
int pageCount = 0;
//string filepath_New = filepath + "\\PDFDestination\\";
System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
//byte[] arrayofPassword = encoding.GetBytes(ExistingFilePassword);
reader = new iTextSharp.text.pdf.PdfReader(filepath);
reader.RemoveUnusedObjects();
pageCount = reader.NumberOfPages;
string ext = System.IO.Path.GetExtension(filepath);
for (int i = 1; i <= pageCount; i++)
{
iTextSharp.text.pdf.PdfReader reader1 = new iTextSharp.text.pdf.PdfReader(filepath);
string outfile = filepath.Replace((System.IO.Path.GetFileName(filepath)), (System.IO.Path.GetFileName(filepath).Replace(".pdf", "") + "_" + i.ToString()) + ext);
reader1.RemoveUnusedObjects();
iTextSharp.text.Document doc = new iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage));
iTextSharp.text.pdf.PdfCopy pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(outfile, System.IO.FileMode.Create));
doc.Open();
for (int j = 1; j <= 1; j++)
{
iTextSharp.text.pdf.PdfImportedPage page = pdfCpy.GetImportedPage(reader1, currentPage);
pdfCpy.SetFullCompression();
pdfCpy.AddPage(page);
currentPage += 1;
}
doc.Close();
pdfCpy.Close();
reader1.Close();
reader.Close();
}
}

You can use Ghostscript
to convert the PDF files into Images, I used the following parameters to convert the needed PDF into tiff image with multiple frames :
gswin32c.exe -sDEVICE=tiff12nc -dBATCH -r200 -dNOPAUSE -sOutputFile=[Output].tiff [PDF FileName]
Also you can use the -q parameter for silent mode
You can get more information about its output devices from here
After that I can easily load the tiff frames like the following
using (FileStream stream = new FileStream(#"C:\tEMP\image_$i.tiff", FileMode.Open, FileAccess.Read, FileShare.Read))
{
BitmapDecoder dec = BitmapDecoder.Create(stream, BitmapCreateOptions.IgnoreImageCache, BitmapCacheOption.None);
BitmapEncoder enc = BitmapEncoder.Create(dec.CodecInfo.ContainerFormat);
enc.Frames.Add(dec.Frames[frameIndex]);
}

I did it with MuPDFCore NuGet. Here is the link to guide I used : https://giorgiobianchini.com/MuPDFCore/MuPDFCore.pdf
using System;
using System.Threading.Tasks;
using MuPDFCore;
using VectSharp.Raster;
MuPDFContext context = new MuPDFContext();
MuPDFDocument document = new MuPDFDocument(context, #"C:\install\test.pdf");
//Renderers: one per page
MuPDFMultiThreadedPageRenderer[] renderers = new MuPDFMultiThreadedPageRenderer[document.Pages.Count];
//Page size: one per page
RoundedSize[] renderedPageSizes = new RoundedSize[document.Pages.Count];
//Boundaries of the tiles that make up each page: one array per page, with one element per thread
RoundedRectangle[][] tileBounds = new RoundedRectangle[document.Pages.Count][];
//Addresses of the memory areas where the image data of the tiles will be stored: one array per page, with one element per thread
IntPtr[][] destinations = new IntPtr[document.Pages.Count][];
//Cycle through the pages in the document to initialise everything
for (int i = 0; i < document.Pages.Count; i++)
{
//Initialise the renderer for the current page, using two threads (total number of threads: number of pages x 2
renderers[i] = document.GetMultiThreadedRenderer(i, 2);
//Determine the boundaries of the page when it is rendered with a 1.5x zoom factor
RoundedRectangle roundedBounds = document.Pages[i].Bounds.Round(2);//quality ..can use 0.5 ,1 etc.
renderedPageSizes[i] = new RoundedSize(roundedBounds.Width, roundedBounds.Height);
//Determine the boundaries of each tile by splitting the total size of the page by the number of threads.
tileBounds[i] = renderedPageSizes[i].Split(renderers[i].ThreadCount);
destinations[i] = new IntPtr[renderers[i].ThreadCount];
for (int j = 0; j < renderers[i].ThreadCount; j++)
{
//Allocate the required memory for the j-th tile of the i-th page.
//Since we will be rendering with a 24-bit-per-pixel format, the required memory in bytes is height x width x 3.
destinations[i][j] = System.Runtime.InteropServices.Marshal.AllocHGlobal(tileBounds[i][j].Height * tileBounds[i][j].Width * 3);
}
}
//Start the actual rendering operations in parallel.
Parallel.For(0, document.Pages.Count, i =>
{
renderers[i].Render(renderedPageSizes[i], document.Pages[i].Bounds, destinations[i], PixelFormats.RGB);
});
//The code in this for-loop is not really part of MuPDFCore - it just shows an example of using VectSharp to "stitch" the tiles up and produce the full image.
for (int i = 0; i < document.Pages.Count; i++)
{
//Create a new (empty) image to hold the whole page.
VectSharp.Page renderedPage = new VectSharp.Page(renderedPageSizes[i].Width,
renderedPageSizes[i].Height);
//Draw each tile onto the image.
for (int j = 0; j < renderers[i].ThreadCount; j++)
{
//Create a raster image object containing the pixel data. Yay, we do not need to copy/marshal anything!
VectSharp.RasterImage tile = new VectSharp.RasterImage(destinations[i][j], tileBounds[i][j].Width,
tileBounds[i][j].Height, false, false);
//Draw the tile on the main image page.
renderedPage.Graphics.DrawRasterImage(tileBounds[i][j].X0, tileBounds[i][j].Y0, tile);
}
//Save the full page as a PNG image.
renderedPage.SaveAsPNG(#"C:\install\page"+ i.ToString() + ".png");
}
//Clean-up code.
for (int i = 0; i < document.Pages.Count; i++)
{
//Release the allocated memory.
for (int j = 0; j < renderers[i].ThreadCount; j++)
{
System.Runtime.InteropServices.Marshal.FreeHGlobal(destinations[i][j]);
}
//Release the renderer (if you skip this, the quiescent renderer’s threads will not be stopped, and your application will never exit!
renderers[i].Dispose();
}
document.Dispose();
context.Dispose();
}

you can extract Image from PDF
and save as JPG
here is the sample code
you need Itext Sharp
public IEnumerable<System.Drawing.Image> ExtractImagesFromPDF(string sourcePdf)
{
// NOTE: This will only get the first image it finds per page.
var pdf = new PdfReader(sourcePdf);
var raf = new RandomAccessFileOrArray(sourcePdf);
try
{
for (int pageNum = 1; pageNum <= pdf.NumberOfPages; pageNum++)
{
PdfDictionary pg = pdf.GetPageN(pageNum);
// recursively search pages, forms and groups for images.
PdfObject obj = ExtractImagesFromPDF_FindImageInPDFDictionary(pg);
if (obj != null)
{
int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(CultureInfo.InvariantCulture));
PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
PdfStream pdfStrem = (PdfStream)pdfObj;
PdfImageObject pdfImage = new PdfImageObject((PRStream)pdfStrem);
System.Drawing.Image img = pdfImage.GetDrawingImage();
yield return img;
}
}
}
finally
{
pdf.Close();
raf.Close();
}
}

Using itextsharp to split a pdf into smaller pdf's based on size

So we have some really inefficient code that splits a pdf into smaller chunks based on a maximum size allowed. Aka. if the max size is 10megs, an 8 meg file would be skipped, while a 16 meg file would be split based on the number of pages.
This is code that I inherited and feel like there has got to be a more efficient way to do this that requiring only one method and less instantiation of objects.
We use the following code to call the methods:
List<int> splitPoints = null;
List<byte[]> documents = null;
splitPoints = this.GetPDFSplitPoints(currentDocument, maxSize);
documents = this.SplitPDF(currentDocument, maxSize, splitPoints);
Methods:
private List<int> GetPDFSplitPoints(IClaimDocument currentDocument, int maxSize)
{
List<int> splitPoints = new List<int>();
PdfReader reader = null;
Document document = null;
int pagesRemaining = currentDocument.Pages;
while (pagesRemaining > 0)
{
reader = new PdfReader(currentDocument.Data);
document = new Document(reader.GetPageSizeWithRotation(1));
using (MemoryStream ms = new MemoryStream())
{
PdfCopy copy = new PdfCopy(document, ms);
PdfImportedPage page = null;
document.Open();
//Add pages until we run out from the original
for (int i = 0; i < currentDocument.Pages; i++)
{
int currentPage = currentDocument.Pages - (pagesRemaining - 1);
if (pagesRemaining == 0)
{
//The whole document has bee traversed
break;
}
page = copy.GetImportedPage(reader, currentPage);
copy.AddPage(page);
//If the current collection of pages exceeds the maximum size, we save off the index and start again
if (copy.CurrentDocumentSize > maxSize)
{
if (i == 0)
{
//One page is greater than the maximum size
throw new Exception("one page is greater than the maximum size and cannot be processed");
}
//We have gone one page too far, save this split index
splitPoints.Add(currentDocument.Pages - (pagesRemaining - 1));
break;
}
else
{
pagesRemaining--;
}
}
page = null;
document.Close();
document.Dispose();
copy.Close();
copy.Dispose();
copy = null;
}
}
if (reader != null)
{
reader.Close();
reader = null;
}
document = null;
return splitPoints;
}
private List<byte[]> SplitPDF(IClaimDocument currentDocument, int maxSize, List<int> splitPoints)
{
var documents = new List<byte[]>();
PdfReader reader = null;
Document document = null;
MemoryStream fs = null;
int pagesRemaining = currentDocument.Pages;
while (pagesRemaining > 0)
{
reader = new PdfReader(currentDocument.Data);
document = new Document(reader.GetPageSizeWithRotation(1));
fs = new MemoryStream();
PdfCopy copy = new PdfCopy(document, fs);
PdfImportedPage page = null;
document.Open();
//Add pages until we run out from the original
for (int i = 0; i <= currentDocument.Pages; i++)
{
int currentPage = currentDocument.Pages - (pagesRemaining - 1);
if (pagesRemaining == 0)
{
//We have traversed all pages
//The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document
fs.Flush();
copy.Close();
documents.Add(fs.ToArray());
document.Close();
fs.Dispose();
break;
}
page = copy.GetImportedPage(reader, currentPage);
copy.AddPage(page);
pagesRemaining--;
if (splitPoints.Contains(currentPage + 1) == true)
{
//Need to start a new document
//The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document
fs.Flush();
copy.Close();
documents.Add(fs.ToArray());
document.Close();
fs.Dispose();
break;
}
}
copy = null;
page = null;
fs.Dispose();
}
if (reader != null)
{
reader.Close();
reader = null;
}
if (document != null)
{
document.Close();
document.Dispose();
document = null;
}
if (fs != null)
{
fs.Close();
fs.Dispose();
fs = null;
}
return documents;
}
As far as I can tell, the only code online that I can see is VB and doesn't necessarily address the size issue.
UPDATE:
We're experiencing OutofMemory exceptions and I believe it's an issue with the Large Object Heap. So one thought was to reduce the code footprint and that would possibly reduce the number of large objects on the heap.
Basically this is part of a loop that goes through any number of PDF's, and then splits them and stores them in the database. Right now, we had to change the method from doing all of them at once (last run was 97 pdf's of various sizes), to running 5 pdf's through the system every 5 minutes. This is not ideal and won't scale well when we ramp up the tool to more clients.
(we're dealing with 50 -100 meg pdf's, but they could be larger).

I also inherited this exact code, and there appears to be a major flaw in it. In the GetPDFSplitPoints method, it's checking the total size of the copied pages against maxsize to determine at which page to split the file.
In the SplitPDF method, when it reaches the page where the split occurs, sure enough the MemoryStream at that point is below the maximum size allowed, and one more page would put it over the limit. But after document.Close(); is executed, much more is added to the MemoryStream (in one example PDF I worked with, the Length of the MemoryStream went from 9 MB to 19 MB before and after the document.Close). My understanding is that all the necessary resources for the copied pages are added upon Close.
I'm guessing I'll have to rewrite this code completely to ensure I don't exceed the max size while retaining the integrity of the original pages.

Using iTextSharp to add repeating data to an existing PDF?

I am going to be using iTextSharp to insert data to a PDF that the Graphics department has created. Most of this data is simple data-to-field mapping, but some data is a list of items that needs to be added (e.g. product data; users can have any number of products and the data needs to be displayed for all of them).
Is it possible to do this with iTextSharp? The PDF template cannot, obviously, be created with a certain number of fields as there is no way of knowing how many fields there will be - it could be 1, or 10, or even 100; what I need to be able to do is "re-use" a section of the PDF and repeat that section for each item within a loop.
Is that doable?

In the past I needed to do something similar. I needed to create a PDF with an unknown number of images + content. In my case an 'Entry' was defined by an image and a set of fields.
What I did is I had a doc. that served as a 'Entry' template. I then generated a temp. pdf file for each 'Entry', and stored the generated file names in a List.
After all 'Entries' were processed I then merged all temporary pdf docs, into one final document.
Here is some code to give you a better idea (it's not compilable, just serves as a ref, as I took certain parts from my older project).
List<string> files = new List<string>(); // list of files to merge
foreach (string pageId in pages)
{
// create an intermediate page
string intermediatePdf = Path.Combine(_tempPath, System.Guid.NewGuid() + ".pdf");
files.Add(intermediatePdf);
string pdfTemplate = Path.Combine(_templatePath, _template);
CreatePage(pdfTemplate, intermediatePdf, pc, pageValues, imageMap, tmd);
}
// merge into resulting pdf file
string outputFolder = "~/Output/";
if (preview)
{
outputFolder = "~/temp/";
}
string pdfResult = Path.Combine(HttpContext.Current.Server.MapPath(outputFolder), Guid.NewGuid().ToString() + ".pdf");
PdfMerge.MergeFiles(pdfResult, files);
//////////////////////////////////////////////////////////////////////////
// delete temporary files...
foreach (string fd in files)
{
File.Delete(fd);
}
return pdfResult;
Here is the code to merge the templates:
public class PdfMerge
{
public static void MergeFiles(string destinationFile, List<string> sourceFiles)
{
int f = 0;
// we create a reader for a certain document
PdfReader reader = new PdfReader(sourceFiles[f]);
// we retrieve the total number of pages
int n = reader.NumberOfPages;
// step 1: creation of a document-object
Document document = new Document(reader.GetPageSizeWithRotation(1));
// step 2: we create a writer that listens to the document
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(destinationFile, FileMode.Create));
// step 3: we open the document
document.Open();
PdfContentByte cb = writer.DirectContent;
PdfImportedPage page;
int rotation;
// step 4: we add content
while (f < sourceFiles.Count)
{
int i = 0;
while (i < n)
{
i++;
document.SetPageSize(reader.GetPageSizeWithRotation(i));
document.NewPage();
page = writer.GetImportedPage(reader, i);
rotation = reader.GetPageRotation(i);
if (rotation == 90 || rotation == 270)
{
cb.AddTemplate(page, 0, -1f, 1f, 0, 0, reader.GetPageSizeWithRotation(i).Height);
}
else
{
cb.AddTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
}
f++;
if (f < sourceFiles.Count)
{
reader = new PdfReader(sourceFiles[f]);
// we retrieve the total number of pages
n = reader.NumberOfPages;
}
}
// step 5: we close the document
document.Close();
}
}
Hope it helps!

How to add a blank page to a pdf using iTextSharp?

I am trying to do something I thought would be quite simple, however it is not so straight forward and google has not helped.
I am using iTextSharp to merge PDF documents (letters) together so they can all be printed at once. If a letter has an odd number of pages I need to append a blank page, so we can print the letters double-sided.
Here is the basic code I have at the moment for merging all of the letters:
// initiaise
MemoryStream pdfStreamOut = new MemoryStream();
Document document = null;
MemoryStream pdfStreamIn = null;
PdfReader reader = null;
int numPages = 0;
PdfWriter writer = null;
for int(i = 0;i < letterList.Count; i++)
{
byte[] myLetterData = ...;
pdfStreamIn = new MemoryStream(myLetterData);
reader = new PdfReader(pdfStreamIn);
numPages = reader.NumberOfPages;
// open the streams to use for the iteration
if (i == 0)
{
document = new Document(reader.GetPageSizeWithRotation(1));
writer = PdfWriter.GetInstance(document, pdfStreamOut);
document.Open();
}
PdfContentByte cb = writer.DirectContent;
PdfImportedPage page;
int importedPageNumber = 0;
while (importedPageNumber < numPages)
{
importedPageNumber++;
document.SetPageSize(reader.GetPageSizeWithRotation(importedPageNumber));
document.NewPage();
page = writer.GetImportedPage(reader, importedPageNumber);
cb.AddTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
}
I have tried using:
document.SetPageSize(reader.GetPageSizeWithRotation(1));
document.NewPage();
at the end of the for loop for an odd number of pages without success.

Well I was almost there. The document won't actually create the page until you put something on it, so as soon as I added an empty table, bam! It worked!
Here is the code that will add a blank page if the document I am merging has an odd number of pages:
if (numPages > 0 && numPages % 2 == 1)
{
bool result = document.NewPage();
document.Add(new Table(1));
}
If this doesn't work in newer versions, try this instead:
document.Add(new Chunk());

Another alternative that works successfully.
if (numPaginas % 2 != 0)
{
documentoPdfUnico.SetPageSize(leitorPdf.GetPageSizeWithRotation(1));
documentoPdfUnico.NewPage();
conteudoPdf.AddTemplate(PdfTemplate.CreateTemplate(escritorPdf, 2480, 3508), 1f, 0, 0, 1f, 0, 0);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Compression of Splited PDF Files - c#

Related

Updating existing markup (FreeText Callout) PDF using itext7 .NET

Convert Pdf file pages to Images with itextsharp

Using itextsharp to split a pdf into smaller pdf's based on size

Using iTextSharp to add repeating data to an existing PDF?

How to add a blank page to a pdf using iTextSharp?

Categories

Resources