PDF modify takes time in C#

PDF modify takes time in C# - c#

I'm using iTextSharp to modify a pdf file and to add specific data into it.
Here is the scenario I have, I have a DataTable that contains thousands of rows, each row represents a customer or client, and I have one pdf template.
I have to modify the pdf template for each row (client or customer) to add their id to the file in addition to other data and then it will be added to the DataTable for that client.
I'm using this code to do the needed job, but it times out when processing huge amount of rows or it takes at least 15 mins, 4k rows in my case- because this means opening the pdf file 4k times and modifying it as needed.
// file is the pdf tmeplate, id is the customer's id - represens 1 row in the DataTable, landingPage: is client's specific page should be added to the file
private static byte[] GeneratePdfFromPdfFile(byte[] file, int id, string landingPage)
{
try
{
using (var ms = new MemoryStream())
{
//Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
var doc = new iTextSharp.text.Document();
//Create a writer that's bound to our PDF abstraction and our stream
var writer = PdfWriter.GetInstance(doc, ms);
//Open the document for writing
doc.Open();
PdfContentByte cb = writer.DirectContent;
////parse html code to xml
//iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
PdfReader reader = new PdfReader(file);
for (int pageNumber = 1; pageNumber < reader.NumberOfPages + 1; pageNumber++)
{
doc.SetPageSize(reader.GetPageSizeWithRotation(1));
doc.NewPage();
//Insert to Destination on the first page
PdfImportedPage page = writer.GetImportedPage(reader, pageNumber);
int rotation = reader.GetPageRotation(pageNumber);
if (rotation == 90 || rotation == 270)
{
cb.AddTemplate(page, 0, -1f, 1f, 0, 0, reader.GetPageSizeWithRotation(pageNumber).Height);
}
else
{
cb.AddTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
}
// Add a new page to the pdf file
doc.NewPage();
// set pdf open action to open the link embedded in the file.
string _embeddedURL = "http://" + landingPage + "/Default.aspx?code=" + id;
PdfAction act = new PdfAction(_embeddedURL);
writer.SetOpenAction(act);
doc.Close();
return ms.ToArray();
}
}
catch { return null; }
}
Note: I'm using ForEach loop to iterate through the DataTable rows

Your question (the text) doesn't correspond with your code (the C# part). You are asking one thing (add content to an existing PDF), and doing another thing (add an option action to an existing PDF), but regardless of what it is that you want to do, you shouldn't use PdfWriter to make a shallow copy of every page of an existing PDF (throwing away all interactivity).
You can significantly reduce the lines of code you wrote by using PdfStamper:
PdfReader reader = new PdfReader(file);
MemoryStream ms = new MemoryStream();
PdfStamper stamper = new PdfStamper(reader, ms);
string _embeddedURL = "http://" + landingPage + "/Default.aspx?code=" + id;
PdfAction act = new PdfAction(_embeddedURL);
stamper.Writer.SetOpenAction(act);
stamper.Close();
reader.Close();
return ms.ToArray();
Please read chapter 6 of my book to see why PdfWriter is the wrong class for you. PdfStamper is meant to manipulate an existing PDF document, keeping all its existing features intact.
Also be aware that you're using an old version of iText. The most recent version is iText 7. Please consult the Jump-Start tutorial for more info on using the latest iText version.

Related

Duplicating PDF pages creates corrupted file

My program is suppost to take a single paged PDF document and duplicating the page while adding some content from an Excel table.
Everything works for editing the first page, but beyond that it creates a corrupt PDF file.
I used an if else statement in the foreach loop that is iteraing trough the excel table(for every row it creates a new page and adds it).
What it does is that there are two same copies of content on every page. The if else statement simply add the content once on the upper and the next on the lowe half and so on.
Here is the code:(ignore the non-english comments)
// open the reader
PdfReader reader = new PdfReader(btn_izberiPole.Content.ToString());
iTextSharp.text.Rectangle size = reader.GetPageSizeWithRotation(1);
Document document = new Document(size);
// open the writer
FileStream fs = new FileStream(btn_izberiCiljnoMapo.Content.ToString(), FileMode.Create, FileAccess.Write); //todo: preveri če je odprto
PdfWriter writer = PdfWriter.GetInstance(document, fs);
document.Open();
int count = 1;
var stamper = new PdfStamper(reader, fs);
foreach (string sifra in seznamSifer) //iterates trough every number to be added
{
if (count % 2 != 0) //za prvo polovico pol
{
// the pdf content
PdfContentByte cb = writer.DirectContent;
PdfImportedPage page = writer.GetImportedPage(reader, 1); //add old page
cb.AddTemplate(page, 0, 0);
var pdfContentByte = stamper.GetOverContent(1);
//prvi je sivi
iTextSharp.text.Image barcode = iTextSharp.text.Image.GetInstance(izdelajCrtnoKodo(System.Drawing.Color.LightGray, sifra), System.Drawing.Imaging.ImageFormat.Jpeg);
//barcode.ScaleToFit(200f, 15f); //tole za velikost zgornjih črtnih kod
barcode.SetAbsolutePosition(80, 235.9f);
barcode.ScaleAbsolute(new iTextSharp.text.Rectangle(91, 9.4f));
pdfContentByte.AddImage(barcode);
//druga crtna koda
barcode = iTextSharp.text.Image.GetInstance(izdelajCrtnoKodo(System.Drawing.Color.White, sifra), System.Drawing.Imaging.ImageFormat.Jpeg);
barcode.ScaleAbsolute(new iTextSharp.text.Rectangle(91, 9.4f));
barcode.SetAbsolutePosition(367, 217.4f);
pdfContentByte.AddImage(barcode);
//tretja crtna koda
barcode.ScaleAbsolute(new iTextSharp.text.Rectangle(79.4f, 8.5f)); //tole za velikost spodnje črtne kode //
barcode.SetAbsolutePosition(381.5f, 172.6f);
pdfContentByte.AddImage(barcode);
}
else //za drugo polovico
{
// the pdf content
PdfContentByte cb = writer.DirectContent;
PdfImportedPage page = writer.GetImportedPage(reader, 1); //add old page
cb.AddTemplate(page, 0, 0);
// var stamper = new PdfStamper(reader, fs);
var pdfContentByte = stamper.GetOverContent(1);
//prvi je sivi
iTextSharp.text.Image barcode = iTextSharp.text.Image.GetInstance(izdelajCrtnoKodo(System.Drawing.Color.LightGray, sifra), System.Drawing.Imaging.ImageFormat.Jpeg);
barcode.SetAbsolutePosition(80, 35.9f); //org 235.9f
barcode.ScaleAbsolute(new iTextSharp.text.Rectangle(91, 9.4f));
pdfContentByte.AddImage(barcode);
//druga crtna koda
barcode = iTextSharp.text.Image.GetInstance(izdelajCrtnoKodo(System.Drawing.Color.White, sifra), System.Drawing.Imaging.ImageFormat.Jpeg);
barcode.ScaleAbsolute(new iTextSharp.text.Rectangle(91, 9.4f));
barcode.SetAbsolutePosition(367, 217.4f);
pdfContentByte.AddImage(barcode);
//tretja crtna koda
barcode.ScaleAbsolute(new iTextSharp.text.Rectangle(79.4f, 8.5f)); //tole za velikost spodnje črtne kode //
barcode.SetAbsolutePosition(381.5f, 172.6f);
pdfContentByte.AddImage(barcode);
// stamper.Close(); //tale more bit namesto doc.close
document.NewPage();
}
count++;
/* if (count == 5) */break;
}
stamper.Close(); //tale more bit namesto doc.close
I tried several combinations by adding the file stream to be created in loop each time and some other methods. None worked so far. Any suggestions?
EDIT: I removed PDF stamper and used only writer for adding content.
One last problem I am encountering: document.Add(image) is adding images below the main layer. I can select those images in PDF but they are not visible. How can I solve this?
LAST EDIT: I solved this by using pdfContentByte.AddImage(barcode) instead of document.Add(barcode)
Thank you for your help guys! Next time I will for sure read the documentation first.

ItextSharp MVC5 C# - text in front of existing pdf

I'm building a web app for editing PDF files using iTextSharp.
When I try to write to the PDF, the text is getting printed behind an existing content, however I need to print it on top of it.
Can someone explain to me how can I set a depth property for my text?
This is my code
using (var reader = new PdfReader(oldFile))
{
using (var fileStream = new FileStream(newFile, FileMode.Create, FileAccess.Write))
{
var document = new Document(reader.GetPageSizeWithRotation(1));
var writer = PdfWriter.GetInstance(document, fileStream);
document.Open();
try
{
PdfContentByte cb = writer.DirectContent;
cb.BeginText();
try
{
cb.SetFontAndSize(BaseFont.CreateFont(), 12);
cb.SetTextMatrix(10, 100);
cb.ShowText("Customer Name");
}
finally
{
cb.EndText();
}
PdfImportedPage page = writer.GetImportedPage(reader, 1);
cb.AddTemplate(page, 0, 0);
}
finally
{
document.Close();
writer.Close();
reader.Close();
}
}
}

Can someone explain to me how can I set a depth property for my text?
Pdf does not have an explicit depth or z-axis property. What is drawn first, therefore, is covered by what is drawn later.
So if you want to have the template under your added text, you should pull the code adding the template before the code adding the text:
PdfContentByte cb = writer.DirectContent;
PdfImportedPage page = writer.GetImportedPage(reader, 1);
cb.AddTemplate(page, 0, 0);
cb.BeginText();
try
{
cb.SetFontAndSize(BaseFont.CreateFont(), 12);
cb.SetTextMatrix(10, 100);
cb.ShowText("Customer Name");
}
finally
{
cb.EndText();
}
Alternatively you can make use of am itextsharp feature: it actually created two content streams, the direct content and the under content, and puts the under content before the direct content.
Thus, if rearranging the code as above is not an option for you, you can instead add the background to the under content instead of the direct content.

C# iTextSharp Merge multiple pdf via byte array

I am new to using iTextSharp and working with Pdf files in general, but I think I'm on the right track.
I iterate through a list of pdf files, convert them to bytes, and push all of the resulting bytes into a byte array. From there I pass the byte array to concatAndAddContent() to merge all of the pdf's into a single large pdf. Currently I'm just getting the last pdf in the list (they seem to be overwriting)
public static byte[] concatAndAddContent(List<byte[]> pdfByteContent)
{
byte[] allBytes;
using (MemoryStream ms = new MemoryStream())
{
Document doc = new Document();
PdfWriter writer = PdfWriter.GetInstance(doc, ms);
doc.SetPageSize(PageSize.LETTER);
doc.Open();
PdfContentByte cb = writer.DirectContent;
PdfImportedPage page;
PdfReader reader;
foreach (byte[] p in pdfByteContent)
{
reader = new PdfReader(p);
int pages = reader.NumberOfPages;
// loop over document pages
for (int i = 1; i <= pages; i++)
{
doc.SetPageSize(PageSize.LETTER);
doc.NewPage();
page = writer.GetImportedPage(reader, i);
cb.AddTemplate(page, 0, 0);
}
}
doc.Close();
allBytes = ms.GetBuffer();
ms.Flush();
ms.Dispose();
}
return allBytes;
}
Above is the working code that results in a single pdf being created, and the rest of the files are being ignored. Any suggestions

This is pretty much just a C# version of Bruno's code here.
This is pretty much the simplest, safest and recommended way to merge PDF files. The PdfSmartCopy object is able to detect redundancies in the multiple files which can reduce file size some times. One of the overloads on it accepts a full PdfReader object which can be instantiated however you want.
public static byte[] concatAndAddContent(List<byte[]> pdfByteContent) {
using (var ms = new MemoryStream()) {
using (var doc = new Document()) {
using (var copy = new PdfSmartCopy(doc, ms)) {
doc.Open();
//Loop through each byte array
foreach (var p in pdfByteContent) {
//Create a PdfReader bound to that byte array
using (var reader = new PdfReader(p)) {
//Add the entire document instead of page-by-page
copy.AddDocument(reader);
}
}
doc.Close();
}
}
//Return just before disposing
return ms.ToArray();
}
}

List<byte[]> finallist= new List<byte[]>();
finallist.Add(concatAndAddContent(bytes));
System.IO.File.WriteAllBytes("path",finallist);

Adding Page Navigation links to PDF document using itextsharp [duplicate]

I have written some code that merges together multiple PDF's into a single PDF that I then display from the MemoryStream. This works great. What I need to do is add a table of contents to the end of the file with links to the start of each of the individual PDF's. I planned on doing this using the GotoLocalPage action which has an option for page numbers but it doesn't seem to work. If I change the action to the code below to one of the presset ones like PDFAction.FIRSTPAGE it works fine. Does this not work because I am using the PDFCopy object for the writer parameter of GotoLocalPage?
Document mergedDoc = new Document();
MemoryStream ms = new MemoryStream();
PdfCopy copy = new PdfCopy(mergedDoc, ms);
mergedDoc.Open();
MemoryStream tocMS = new MemoryStream();
Document tocDoc = null;
PdfWriter tocWriter = null;
for (int i = 0; i < filesToMerge.Length; i++)
{
string filename = filesToMerge[i];
PdfReader reader = new PdfReader(filename);
copy.AddDocument(reader);
// Initialise TOC document based off first file
if (i == 0)
{
tocDoc = new Document(reader.GetPageSizeWithRotation(1));
tocWriter = PdfWriter.GetInstance(tocDoc, tocMS);
tocDoc.Open();
}
// Create link for TOC, added random number of 3 for now
Chunk link = new Chunk(filename);
PdfAction action = PdfAction.GotoLocalPage(3, new PdfDestination(PdfDestination.FIT), copy);
link.SetAction(action);
tocDoc.Add(new Paragraph(link));
}
// Add TOC to end of merged PDF
tocDoc.Close();
PdfReader tocReader = new PdfReader(tocMS.ToArray());
copy.AddDocument(tocReader);
copy.Close();
displayPDF(ms.ToArray());
I guess an alternative would be to link to a named element (instead of page number) but I can't see how to add an 'invisible' element to the start of each file before adding to the merged document?

I would just go with two passes. In your first pass, do the merge as you are but also record the filename and page number it should link to. In your second pass, use a PdfStamper which will give you access to a ColumnText that you can use general abstractions like Paragraph in. Below is a sample that shows this off:
Since I don't have your documents, the below code creates 10 documents with a random number of pages each just for testing purposes. (You obviously don't need to do this part.) It also creates a simple dictionary with a fake file name as the key and the raw bytes from the PDF as a value. You have a true file collection to work with but you should be able to adapt that part.
//Create a bunch of files, nothing special here
//files will be a dictionary of names and the raw PDF bytes
Dictionary<string, byte[]> Files = new Dictionary<string, byte[]>();
var r = new Random();
for (var i = 1; i <= 10; i++) {
using (var ms = new MemoryStream()) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, ms)) {
doc.Open();
//Create a random number of pages
for (var j = 1; j <= r.Next(1, 5); j++) {
doc.NewPage();
doc.Add(new Paragraph(String.Format("Hello from document {0} page {1}", i, j)));
}
doc.Close();
}
}
Files.Add("File " + i.ToString(), ms.ToArray());
}
}
This next block merges the PDFs. This is mostly the same as your code except that instead of writing a TOC here I'm just keeping track of what I want to write in the future. Where I'm using file.value you'd use your full file path and where I'm using file.key you'd use your file's name instead.
//Dictionary of file names (for display purposes) and their page numbers
var pages = new Dictionary<string, int>();
//PDFs start at page 1
var lastPageNumber = 1;
//Will hold the final merged PDF bytes
byte[] mergedBytes;
//Most everything else below is standard
using (var ms = new MemoryStream()) {
using (var document = new Document()) {
using (var writer = new PdfCopy(document, ms)) {
document.Open();
foreach (var file in Files) {
//Add the current page at the previous page number
pages.Add(file.Key, lastPageNumber);
using (var reader = new PdfReader(file.Value)) {
writer.AddDocument(reader);
//Increment our current page index
lastPageNumber += reader.NumberOfPages;
}
}
}
}
mergedBytes = ms.ToArray();
}
This last block actually writes the TOC. If we use a PdfStamper we can create a ColumnText which allows us to use Paragraphs
//Will hold the final PDF
byte[] finalBytes;
using (var ms = new MemoryStream()) {
using (var reader = new PdfReader(mergedBytes)) {
using (var stamper = new PdfStamper(reader, ms)) {
//The page number to insert our TOC into
var tocPageNum = reader.NumberOfPages + 1;
//Arbitrarily pick one page to use as the size of the PDF
//Additional logic could be added or this could just be set to something like PageSize.LETTER
var tocPageSize = reader.GetPageSize(1);
//Arbitrary margin for the page
var tocMargin = 20;
//Create our new page
stamper.InsertPage(tocPageNum, tocPageSize);
//Create a ColumnText object so that we can use abstractions like Paragraph
var ct = new ColumnText(stamper.GetOverContent(tocPageNum));
//Set the working area
ct.SetSimpleColumn(tocPageSize.GetLeft(tocMargin), tocPageSize.GetBottom(tocMargin), tocPageSize.GetRight(tocMargin), tocPageSize.GetTop(tocMargin));
//Loop through each page
foreach (var page in pages) {
var link = new Chunk(page.Key);
var action = PdfAction.GotoLocalPage(page.Value, new PdfDestination(PdfDestination.FIT), stamper.Writer);
link.SetAction(action);
ct.AddElement(new Paragraph(link));
}
ct.Go();
}
}
finalBytes = ms.ToArray();
}

How do I append a PDF file from binary to an already 'in-progress' PDF, using iTextSharp?

In code, I am in the process of created a PDF document using iTextSharp. I have already added content to the document and have closed the document, successfully retrieving it in a response to a web browser.
What I am trying to do is append another PDF document to the one I am creating but it has to come from binary or an object of type Byte[].
I realize that there is the available method document.Add(stuff) but I am trying to convert the binary to an object and then essentially add that to the document in progress. I have seen questions and posts similar to my scenario but they are mostly dealing with Images.
Here is what I have...
while (sqlExpDocDataReader.Read())
{
// Read data and fill temp. objects
string docName = sqlExpDocDataReader["docName"].ToString();
string docType = sqlExpDocDataReader["docType"].ToString();
Byte[] docData = (Byte[])sqlExpDocDataReader["docData"];
// Get current page size
var pageWidth = document.PageSize.Width;
var pageHeight = document.PageSize.Height;
// Is this an image or PDF?
if (docType.Contains("pdf"))
{
// Could I use a memeory stream some how?
MemoryStream ms = new MemoryStream(docData.ToArray());
}
else
{
// Here I see how to do it with images.
Image doc = Image.GetInstance(docData);
doc.ScaleToFit(pageWidth, pageHeight); // width, height
document.Add(doc);
}
}
Any ideas?

With a bit more digging, here is how I was able to resolve my issue...
Basically, I created a MemoryStream object from my binary data and then created a PdfReader to read that object, where normally we would read a file.
I then looped through each page of the reader object (or file if you'd like) and appended them as they where found.
if (docType.Contains("pdf"))
{
MemoryStream ms = new MemoryStream(docData.ToArray());
PdfReader pdfReader = new PdfReader(ms);
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(pdfReader, i);
document.Add(iTextSharp.text.Image.GetInstance(page));
}
}

public static byte[] UnificarImagenesPDF(IEnumerable<DocumentoDTO> documentos)// "documents" is a list of objects that are located in the database, the images and pdf are stored in a binary attribute of "documents"
{
using (MemoryStream workStream = new MemoryStream())
{
iTextSharp.text.Document doc = new iTextSharp.text.Document();//to create a itextSharp Document
PdfWriter writer = PdfWriter.GetInstance(doc, workStream);
doc.Open();
foreach (DocumentoDTO d in documentos)// "documentos" has an attribute where the document extension type is saved (eg pdf, jpg, png, etc)
{
try
{
if (d.sExtension == ".pdf")
{
MemoryStream ms = new MemoryStream(d.bBinarios.ToArray());
PdfReader pdfReader = new PdfReader(ms); //
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(pdfReader, i);
doc.Add(resizeImagen(iTextSharp.text.Image.GetInstance(page)));//Each sheet of the PDF document is added to the document created in itextsharp, and the resizeImage function is used so that the images are centered in the ITEXTSHARP document
doc.NewPage();// add a new page on ITEXTSHARP document
}
}
if (d.sExtension != ".pdf")
{
doc.Add(resizeImagen(Image.GetInstance((byte[])d.bBinarios)));
doc.NewPage();
}
}
catch
{ }
}
doc.Close();
writer.Close();
return workStream.ToArray();
}
}
private static iTextSharp.text.Image resizeImagen(iTextSharp.text.Image image)
{
if (image.Height > image.Width)
{
//Maximum height is 800 pixels.
float percentage = 0.0f;
percentage = 700 / image.Height;
image.ScalePercent(percentage * 100);
}
else
{
//Maximum width is 600 pixels.
float percentage = 0.0f;
percentage = 540 / image.Width;
image.ScalePercent(percentage * 100);
}
return image;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

PDF modify takes time in C# - c#

Related

Duplicating PDF pages creates corrupted file

ItextSharp MVC5 C# - text in front of existing pdf

C# iTextSharp Merge multiple pdf via byte array

Adding Page Navigation links to PDF document using itextsharp [duplicate]

How do I append a PDF file from binary to an already 'in-progress' PDF, using iTextSharp?

Categories

Resources