Import pdf at a specific page - c#

I have the requirement to merge pdf together. I need to import a pdf at a specific page into another one.
Let me illustrate this to you.
I have two pdf, first one is 50 pages long and the second one is 4pages long. I need to import the second one at the 13th page of the first pdf.
I don't find any exemple. There are plenty exemple on how to merge pdf but nothing about merging at a specific page.
Based on this exemple it look like I need to iterate over all pages one by one and import them in a new pdf. That look a bit painfull espicially if you have big pdf and need to merge many. I would create x new pdf to merge x+1 pdf.
Is there something I don't understand or is it really the way to go?

Borrowing from the example, this should be easy to do with a few modifications. You just need to add all the pages before the merge, then all the pages from the second document, then all the rest of the original pages.
Try something like this (not tested or robust - just a starting point maybe):
// Used the ExtractPages as a starting point.
public void MergeDocuments(string sourcePdfPath1, string sourcePdfPath2,
string outputPdfPath, int insertPage) {
PdfReader reader1 = null;
PdfReader reader2 = null;
Document sourceDocument1 = null;
Document sourceDocument2 = null;
PdfCopy pdfCopyProvider = null;
PdfImportedPage importedPage = null;
try {
reader1 = new PdfReader(sourcePdfPath1);
reader2 = new PdfReader(sourcePdfPath2);
// Note, I'm assuming pages are 0 based. If that's not the case, change to 1.
sourceDocument1 = new Document(reader1.GetPageSizeWithRotation(0));
sourceDocument2 = new Document(reader2.GetPageSizeWithRotation(0));
pdfCopyProvider = new PdfCopy(sourceDocument1,
new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
sourceDocument1.Open();
sourceDocument2.Open();
int length1 = reader1.NumberOfPages;
int length2 = reader2.NumberOfPages;
int page1 = 0; // Also here I'm assuming pages are 0-based.
// Having these three loops is the key. First is pages before the merge.
for (;page1 < insertPage && page1 < length1; page1++) {
importedPage = pdfCopyProvider.GetImportedPage(reader1, page1);
pdfCopyProvider.AddPage(importedPage);
}
// These are the pages from the second document.
for (int page2 = 0; page2 < length2; page2++) {
importedPage = pdfCopyProvider.GetImportedPage(reader2, page2);
pdfCopyProvider.AddPage(importedPage);
}
// Finally, add the remaining pages from the first document.
for (;page1 < length1; page1++) {
importedPage = pdfCopyProvider.GetImportedPage(reader1, page1);
pdfCopyProvider.AddPage(importedPage);
}
sourceDocument1.Close();
sourceDocument2.Close();
reader1.Close();
reader2.Close();
} catch (Exception ex) {
throw ex;
}
}

Related

Error on closing an empty iTextSharp document

I am successfully merging PDF documents; now as I'm trying to implement the error handling in case no PDF document has been selected, it throws an error when closing the document: The document has no pages
In case no PDF document has been added in the "foreach" - loop, I still need to close the document!? Or not? If you open an object then it has do be closed at some point. So how to I escape correctly in case no page had been added?
private void MergePDFs()
{
DataSourceSelectArguments args = new DataSourceSelectArguments();
DataView view = (DataView)SourceCertCockpit.Select(args);
System.Data.DataTable table = view.ToTable();
List<PdfReader> readerList = new List<PdfReader>();
iTextSharp.text.Document document = new iTextSharp.text.Document();
PdfCopy copy = new PdfCopy(document, Response.OutputStream);
document.Open();
int index = 0;
foreach (DataRow myRow in table.Rows)
{
if (ListadoCertificadosCockpit.Rows[index].Cells[14].Text == "0")
{
PdfReader Reader = new PdfReader(Convert.ToString(myRow[0]));
Chapter Chapter = new Chapter(Convert.ToString(Convert.ToInt32(myRow[1])), 0);
Chapter.NumberDepth = 0;
iTextSharp.text.Section Section = Chapter.AddSection(Convert.ToString(myRow[10]), 0);
Section.NumberDepth = 0;
iTextSharp.text.Section SubSection = Section.AddSection(Convert.ToString(myRow[7]), 0);
SubSection.NumberDepth = 0;
document.Add(Chapter);
readerList.Add(Reader);
for (int i = 1; i <= Reader.NumberOfPages; i++)
{
copy.AddPage(copy.GetImportedPage(Reader, i));
}
Reader.Close();
}
index++;
}
if (document.PageNumber == 0)
{
document.Close();
return;
}
document.Close();
string SalesID = SALESID.Text;
Response.ContentType = "application/pdf";
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.AppendHeader("content-disposition", "attachment;filename=" + SalesID + ".pdf");
}
In the old days, iText didn't throw an exception when you created a document and "forgot" to add any content. This resulted in a document with a single, blank page. This was considered a bug: people didn't like single-page, empty documents. Hence the design decision to throw an exception.
Something similar was done for newPage(). A new page can be triggered explicitly (when you add document.newPage() in your code) or implicitly (when the end of a page is reached). In the old days, this often resulted in unwanted blank pages. Hence the decision to ignore newPage() in case the current page is empty.
Suppose you have this:
document.newPage();
document.newPage();
One may expect that two new pages are created. That's not true. We've made a design decision to ignore the second document.newPage() because no content was added after the first document.newPage().
This brings us to the question: what if we want to insert a blank page? Or, in your case: what if it's OK to create a document with nothing more than a single blank page?
In that case, we have to tell iText that the current page shouldn't be treated as an empty page. You can do so by introducing the following line:
writer.setPageEmpty(false);
Now the current page will be fooled into thinking that it has some content, even though it may be blank.
Adding this line to your code will avoid the The document has no pages exception and solve your problem of streams not being closed.
Take a look at the NewPage example if you want to experiment with the setPageEmpty() method.
You can add an empty page before closing the document, or catch the exception and ignore it.
In case you are still interested in a solution, or may be someone else.
I had exactly the same issue and I workaround-ed it by:
Declaring a boolean to figure out if at least one page have been added and before closing the document I referred on it.
If no pages have been copied, I add a new page in the document thanks to the AddPages method, with a rectangle as parameter. I did not find a simplest way to add a page.
So the code should be as bellow (with possibly some syntax errors as I'm not familiar with C#):
private void MergePDFs()
{
DataSourceSelectArguments args = new DataSourceSelectArguments();
DataView view = (DataView)SourceCertCockpit.Select(args);
System.Data.DataTable table = view.ToTable();
List<PdfReader> readerList = new List<PdfReader>();
iTextSharp.text.Document document = new iTextSharp.text.Document();
PdfCopy copy = new PdfCopy(document, Response.OutputStream);
document.Open();
int index = 0;
foreach (DataRow myRow in table.Rows)
{
if (ListadoCertificadosCockpit.Rows[index].Cells[14].Text == "0")
{
PdfReader Reader = new PdfReader(Convert.ToString(myRow[0]));
Chapter Chapter = new Chapter(Convert.ToString(Convert.ToInt32(myRow[1])), 0);
Chapter.NumberDepth = 0;
iTextSharp.text.Section Section = Chapter.AddSection(Convert.ToString(myRow[10]), 0);
Section.NumberDepth = 0;
iTextSharp.text.Section SubSection = Section.AddSection(Convert.ToString(myRow[7]), 0);
SubSection.NumberDepth = 0;
document.Add(Chapter);
readerList.Add(Reader);
bool AtLeastOnePage = false;
for (int i = 1; i <= Reader.NumberOfPages; i++)
{
copy.AddPage(copy.GetImportedPage(Reader, i));
AtLeastOnePage = true;
}
Reader.Close();
}
index++;
}
if (AtLeastOnePage)
{
document.Close();
return true;
}
else
{
Rectangle rec = new Rectangle(10, 10, 10, 10);
copy.AddPage(rec, 1);
document.Close();
return false;
}
string SalesID = SALESID.Text;
Response.ContentType = "application/pdf";
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.AppendHeader("content-disposition", "attachment;filename=" + SalesID + ".pdf");
}

Pdfsharp Out of Memory Exception when Combine Multi Pdf File

I have to convert into a single pdf a large number (but undefined) pdf into one for this, I'm using the code PDFsharp here.
// Get some file names
string[] files = filesToPrint.ToArray();
// Open the output document
PdfDocument outputDocument = new PdfDocument();
PdfPage newPage;
int nProcessedFile = 0;
int nMemoryFile = 5;
int nStepConverted = 0;
String sNameLastCombineFile = "";
// Iterate files
foreach (string file in files)
{
// Open the document to import pages from it.
PdfDocument inputDocument = PdfReader.Open(file, PdfDocumentOpenMode.Import);
// Iterate pages
int count = inputDocument.PageCount;
for (int idx = 0; idx < count; idx++)
{
// Get the page from the external document...
PdfPage page = inputDocument.Pages[idx];
// ...and add it to the output document.
outputDocument.AddPage(page);
}
nProcessedFile++;
if (nProcessedFile >= nMemoryFile)
{
//nProcessedFile = 0;
//nStepConverted++;
//sNameLastCombineFile = "ConcatenatedDocument" + nStepConverted.ToString() + " _tempfile.pdf";
//outputDocument.Save(sNameLastCombineFile);
//outputDocument.Close();
}
}
// Save the document...
const string filename = "ConcatenatedDocument1_tempfile.pdf";
outputDocument.Save(filename);
// ...and start a viewer.
Process.Start(filename);
For small numbers of files the code works but then at some point
generates an exception of out of memory
is there a solution?
p.s
I was thinking of saving the files in step and then the remaining aggiungingere so liebrare memory but I can not find the way.
UPDATE1:
if (nProcessedFile >= nMemoryFile)
{
nProcessedFile = 0;
//nStepConverted++;
sNameLastCombineFile = "ConcatenatedDocument" + nStepConverted.ToString() + " _tempfile.pdf";
outputDocument.Save(sNameLastCombineFile);
outputDocument.Close();
outputDocument = PdfReader.Open(sNameLastCombineFile,PdfDocumentOpenMode.Modify);
}
UPDATE 2 versione 1.32
Complete example
Error on line:
PdfDocument inputDocument = PdfReader.Open(file, PdfDocumentOpenMode.Import);
Text error:
Cannot handle iref streams. The current implementation of PDFsharp cannot handle this PDF feature introduced with Acrobat 6.
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
List<String> filesToPrint = new List<string>();
filesToPrint = Directory.GetFiles(#"D:\Downloads\RACCOLTA\FILE PDF", "*.pdf").ToList();
// Get some file names
string[] files = filesToPrint.ToArray();
// Open the output document
PdfDocument outputDocument = new PdfDocument();
PdfPage newPage;
int nProcessedFile = 0;
int nMemoryFile = 5;
int nStepConverted = 0;
String sNameLastCombineFile = "";
try
{
// Iterate files
foreach (string file in files)
{
// Open the document to import pages from it.
PdfDocument inputDocument = PdfReader.Open(file, PdfDocumentOpenMode.Import);
// Iterate pages
int count = inputDocument.PageCount;
for (int idx = 0; idx < count; idx++)
{
// Get the page from the external document...
PdfPage page = inputDocument.Pages[idx];
// ...and add it to the output document.
outputDocument.AddPage(page);
}
nProcessedFile++;
if (nProcessedFile >= nMemoryFile)
{
nProcessedFile = 0;
//nStepConverted++;
sNameLastCombineFile = "ConcatenatedDocument" + nStepConverted.ToString() + " _tempfile.pdf";
outputDocument.Save(sNameLastCombineFile);
outputDocument.Close();
inputDocument = PdfReader.Open(sNameLastCombineFile , PdfDocumentOpenMode.Modify);
}
}
// Save the document...
const string filename = "ConcatenatedDocument1_tempfile.pdf";
outputDocument.Save(filename);
// ...and start a viewer.
Process.Start(filename);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.ReadKey();
}
}
}
}
UPDATE3
Code that generate exception out of memory
int count = inputDocument.PageCount;
for (int idx = 0; idx < count; idx++)
{
// Get the page from the external document...
newPage = inputDocument.Pages[idx];
// ...and add it to the output document.
outputDocument.AddPage(newPage);
newPage.Close();
}
I can not exactly which row general exception
I had a simular issue, saving, closing and reopening the PdfDocument did not really help.
I am adding al lot (100+) large (upto 5Mb) images (tiff, jpg, etc) to a pdf document where every images has its own page. It crashed around image #50. After the save-close-reopen it did finish the whole document but was still getting close to max memory, around 3Gb. Some more images and it would still crash.
After more refining, I implemented a using for the XGraphics object, it was a little better again but not much.
The big step forward was disposing of the XImage within the loop! After that the application never used more than 100-200Kb, I removed the save-close-reopen for the PdfDocument and it was no problem.
After saving and closing outputDocument (the code is commented out in your snippet), you have to open outputDocument again, using PdfDocumentOpenMode.Modify.
It could help to add using(...) for the inputDocument.
If your code is running as a 32-bit process, then switching to 64 bit will allow your process to use more than 2 GB of RAM (assuming your computer has more than 2 GB RAM).
Update: The message "Cannot handle iref streams" means you have to use PDFsharp 1.50 Prerelease, available on NuGet.

c# / openxml merge of word documents to one, fails --closed--

I have to merge several word documents with a small c# console application. So far so good. The documents are generated in arcplan. Around 30 files are generated, but somehow some documents are corrupted, but still shows me Content.
If I merge now all files which are correct my document is fine but if i have a corrupted file in my bunch of files any corrupted generates an empty page. I debugged it of course, but i dont see anything going wrong which explains the empty page.
the arguments are like this:
"C:\temp\Report_C_01.docx" "C:\temp\Report_D_01.docx" "C:\temp\Report_E_01.docx"
here´s my Code:
public static void Merge(params String[] filepaths)
{
String pathName = Path.GetDirectoryName(filepaths[0]);
subfolder = Path.Combine(pathName, "Output\\"); //Wird für den gemergten File benötigt
if (filepaths != null && filepaths.Length > 1)
{
WordprocessingDocument myDoc = WordprocessingDocument.Open(#filepaths[0], true); //Wordfiles werden geöffnet
MainDocumentPart mainPart = myDoc.MainDocumentPart;
for (int i = 1; i < filepaths.Length; i++)
{
String altChunkId = "AltChunkId" + i;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
FileStream fileStream = File.Open(#filepaths[i], FileMode.Open);
chunk.FeedData(fileStream);
DocumentFormat.OpenXml.Wordprocessing.AltChunk altChunk = new DocumentFormat.OpenXml.Wordprocessing.AltChunk();
altChunk.Id = altChunkId;
//new page, if you like it...
mainPart.Document.Body.AppendChild(new Paragraph(new Run(new Break() { Type = BreakValues.Page } )));
//next document
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
}
mainPart.Document.Save();
myDoc.Close();
for (int i = 0; i < 1; i++)
{
String fileNameWE = Path.GetFileName(filepaths[i]);
File.Copy(filepaths[i], subfolder + fileNameWE);
}
foreach (String fp in filepaths)
{
File.Delete(fp);
}
}
else
{
Console.WriteLine("Nur 1 Argument");
}
}
Hope someone can help me.
Best regards
Christian
Fixed it. It seems word cannot merge different Formats in one Document. So if you have 2 documents with a footer and other 3 without it just won´t work. Obviously it can happen that some customers have these kind of issues; at least the Code is fine

How to Add bookmarks to PDF file?

i have 4 pdf templates files by using itextsharp i added values and i merged/added 4 pdf files into single document, so all 4 pages are under one single pdf file name.Now i want to add bookmark to my pdf file. is there any way to do in C# ?for better understanding ,please refer below images
Hi ,this is what i am trying to do, i am not getting any error but still there is no bookmark in my pdf, i want to add bookmark with 4 sections as showed in the image.after merging i want add bookmark to final pdf.
enter code herepublic string MergePDFs()
{
string outPutFilePath = #"D:\jeldsbre.pdf";
string genereatedpdfs = #"D:\genereatedpdfs";
using (FileStream stream = new FileStream(outPutFilePath, FileMode.Create))
{
Document pdfDoc = new Document(PageSize.A4);
PdfCopy pdf = new PdfCopy(pdfDoc, stream);
pdf.SetMergeFields();
pdfDoc.Open();
var files = Directory.GetFiles(genereatedpdfs);
Console.WriteLine("Merging files count: " + files.Length);
int i = 1;
foreach (string file in files)
{
Console.WriteLine(i + ". Adding: " + file);
pdf.AddDocument(new PdfReader(file));
i++;
}
List<Dictionary<string, object>> bookmarks = new List<Dictionary<string, object>>();
IList<Dictionary<string, object>> tempBookmarks = new List<Dictionary<string, object>>();
SimpleBookmark.ShiftPageNumbers(tempBookmarks, 1, null);
bookmarks.AddRange(tempBookmarks);
SimpleBookmark.ShiftPageNumbers(tempBookmarks, 3, null);
bookmarks.AddRange(tempBookmarks);
pdf.Outlines = bookmarks;
if (pdfDoc != null)
pdfDoc.Close();
string base64 = GetBase64(outPutFilePath);
return base64;
}
}
Assuming that your original PDFs already have bookmarks, then you should concatenate not only the documents (using the PdfCopy class), you should also concatenate the different bookmarks structures of the different files (using the SimpleBookMark class), not forgetting to take into account that you need to shift the page numbers correctly.
This is done in the ConcatenateBookmarks example in chapter 7 of my book:
// Create a list for the bookmarks
ArrayList<HashMap<String, Object>> bookmarks = new ArrayList<HashMap<String, Object>>();
List<HashMap<String, Object>> tmp;
for (int i = 0; i < src.length; i++) {
reader = new PdfReader(src[i]);
// merge the bookmarks
tmp = SimpleBookmark.getBookmark(reader);
SimpleBookmark.shiftPageNumbers(tmp, page_offset, null);
bookmarks.addAll(tmp);
// add the pages
n = reader.getNumberOfPages();
page_offset += n;
for (int page = 0; page < n; ) {
copy.addPage(copy.getImportedPage(reader, ++page));
}
copy.freeReader(reader);
reader.close();
}
// Add the merged bookmarks
copy.setOutlines(bookmarks);
For a C# version of this example, please take a look at http://tinyurl.com/itextsharpIIA2C07 for the corresponding iTextSharp example:
// Create a list for the bookmarks
List<Dictionary<String, Object>> bookmarks =
new List<Dictionary<String, Object>>();
for (int i = 0; i < src.Count; i++) {
PdfReader reader = new PdfReader(src[i]);
// merge the bookmarks
IList<Dictionary<String, Object>> tmp =
SimpleBookmark.GetBookmark(reader);
SimpleBookmark.ShiftPageNumbers(tmp, page_offset, null);
foreach (var d in tmp) bookmarks.Add(d);
// add the pages
int n = reader.NumberOfPages;
page_offset += n;
for (int page = 0; page < n; ) {
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
}
// Add the merged bookmarks
copy.Outlines = bookmarks;
If the existing documents don't have any bookmarks (or if you don't want to copy any existing documents), then your question is a duplicate of a question I answered half a year ago: Merge pdfs and add bookmark with iText in java

Saving 2 copies of PDF template, 2nd file corrupt - iTextSharp

I have a PDF template file for 60 labels per page. My goal was to make copies of the template as needed, fill in the form data and then merge the files into a single PDF (or provide links to individual files...either works)
The problem is that the 2nd PDF copy comes out corrupt regardless of date.
The workflow is user selects a date. The lunch orders for that day are gathered into a generic list that in turn is used to fill in the form fields on the template. At 60, the file is saved as a temp file and a new copy of the template is used for the next 60 names, etc...
09/23/2013 through 09/25 have data. On the 25th there are only 38 orders, so this works as intended. On 09/24/2013 there are over 60 orders, the first page works, but the 2nd page is corrupt.
private List<string> CreateLabels(DateTime orderDate)
{
// create file name to save
string fName = ConvertDateToStringName(orderDate) + ".pdf"; // example 09242013.pdf
// to hold Temp File Names
List<string> tempFNames = new List<string>();
// Get path to template/save directory
string path = Server.MapPath("~/admin/labels/");
string pdfPath = path + "8195a.pdf"; // template file
// Get the students and their lunch orders
List<StudentLabel> labels = DalStudentLabel.GetStudentLabels(orderDate);
// Get number of template pages needed
decimal recCount = Convert.ToDecimal(labels.Count);
decimal pages = Decimal.Divide(recCount, 60);
int pagesNeeded = Convert.ToInt32(Math.Ceiling(pages));
// Make the temp names
for (int c = 0; c < pagesNeeded; c++)
{
tempFNames.Add(c.ToString() + fName); //just prepend a digit to the date string
}
//Create copies of the empty templates
foreach (string tName in tempFNames)
{
try
{ File.Delete(path + tName); }
catch { }
File.Copy(pdfPath, path + tName);
}
// we know we need X pages and there is 60 per page
int x = 0;
// foreach page needed
for (int pCount = 0; pCount < pagesNeeded; pCount++)
{
// Make a new page
PdfReader newReader = new PdfReader(pdfPath);
// pCount.ToString replicates temp names
using (FileStream stream = new FileStream(path + pCount.ToString() + fName, FileMode.Open))
{
PdfStamper stamper = new PdfStamper(newReader, stream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
StudentLabel lbl = null;
string lblInfo = "";
// fill in acro fields with lunch data
foreach (string fieldKey in fieldKeys)
{
try
{
lbl = labels[x];
}
catch
{
break;
} // if we're out of labels, then we're done
lblInfo = lbl.StudentName + "\n";
lblInfo += lbl.Teacher + "\n";
lblInfo += lbl.MenuItem;
form.SetField(fieldKey, lblInfo);
x++;
if (x % 60 == 0) // reached 60, time for new page
{
break;
}
}
stamper.Writer.CloseStream = false;
stamper.FormFlattening = true;
stamper.Close();
newReader.Close();
stream.Flush();
stream.Close();
}
}
return tempFNames;
}
Why are you pre-allocating your files? My guess is that's your problem. You're binding a PdfStamper to a PdfReader for input and an exact copy of the same pdf to a FileStream object for output. The PdfStamper will generate your output file for you, you don't need to help it. You're trying to append new data to an existing file and I'm not quite sure what happens in that case (as I've never actually seen anyone do it.)
So drop your whole File.Copy pre-allocation and change your FileStream declaration to:
using (FileStream stream = new FileStream(path + pCount.ToString() + fName, FileMode.Create, FileAccess.Write, FileShare.None))
You'll obviously also need to adjust how your return array gets populated, too.

Categories

Resources