How to Add bookmarks to PDF file? - c#

i have 4 pdf templates files by using itextsharp i added values and i merged/added 4 pdf files into single document, so all 4 pages are under one single pdf file name.Now i want to add bookmark to my pdf file. is there any way to do in C# ?for better understanding ,please refer below images
Hi ,this is what i am trying to do, i am not getting any error but still there is no bookmark in my pdf, i want to add bookmark with 4 sections as showed in the image.after merging i want add bookmark to final pdf.
enter code herepublic string MergePDFs()
{
string outPutFilePath = #"D:\jeldsbre.pdf";
string genereatedpdfs = #"D:\genereatedpdfs";
using (FileStream stream = new FileStream(outPutFilePath, FileMode.Create))
{
Document pdfDoc = new Document(PageSize.A4);
PdfCopy pdf = new PdfCopy(pdfDoc, stream);
pdf.SetMergeFields();
pdfDoc.Open();
var files = Directory.GetFiles(genereatedpdfs);
Console.WriteLine("Merging files count: " + files.Length);
int i = 1;
foreach (string file in files)
{
Console.WriteLine(i + ". Adding: " + file);
pdf.AddDocument(new PdfReader(file));
i++;
}
List<Dictionary<string, object>> bookmarks = new List<Dictionary<string, object>>();
IList<Dictionary<string, object>> tempBookmarks = new List<Dictionary<string, object>>();
SimpleBookmark.ShiftPageNumbers(tempBookmarks, 1, null);
bookmarks.AddRange(tempBookmarks);
SimpleBookmark.ShiftPageNumbers(tempBookmarks, 3, null);
bookmarks.AddRange(tempBookmarks);
pdf.Outlines = bookmarks;
if (pdfDoc != null)
pdfDoc.Close();
string base64 = GetBase64(outPutFilePath);
return base64;
}
}

Assuming that your original PDFs already have bookmarks, then you should concatenate not only the documents (using the PdfCopy class), you should also concatenate the different bookmarks structures of the different files (using the SimpleBookMark class), not forgetting to take into account that you need to shift the page numbers correctly.
This is done in the ConcatenateBookmarks example in chapter 7 of my book:
// Create a list for the bookmarks
ArrayList<HashMap<String, Object>> bookmarks = new ArrayList<HashMap<String, Object>>();
List<HashMap<String, Object>> tmp;
for (int i = 0; i < src.length; i++) {
reader = new PdfReader(src[i]);
// merge the bookmarks
tmp = SimpleBookmark.getBookmark(reader);
SimpleBookmark.shiftPageNumbers(tmp, page_offset, null);
bookmarks.addAll(tmp);
// add the pages
n = reader.getNumberOfPages();
page_offset += n;
for (int page = 0; page < n; ) {
copy.addPage(copy.getImportedPage(reader, ++page));
}
copy.freeReader(reader);
reader.close();
}
// Add the merged bookmarks
copy.setOutlines(bookmarks);
For a C# version of this example, please take a look at http://tinyurl.com/itextsharpIIA2C07 for the corresponding iTextSharp example:
// Create a list for the bookmarks
List<Dictionary<String, Object>> bookmarks =
new List<Dictionary<String, Object>>();
for (int i = 0; i < src.Count; i++) {
PdfReader reader = new PdfReader(src[i]);
// merge the bookmarks
IList<Dictionary<String, Object>> tmp =
SimpleBookmark.GetBookmark(reader);
SimpleBookmark.ShiftPageNumbers(tmp, page_offset, null);
foreach (var d in tmp) bookmarks.Add(d);
// add the pages
int n = reader.NumberOfPages;
page_offset += n;
for (int page = 0; page < n; ) {
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
}
// Add the merged bookmarks
copy.Outlines = bookmarks;
If the existing documents don't have any bookmarks (or if you don't want to copy any existing documents), then your question is a duplicate of a question I answered half a year ago: Merge pdfs and add bookmark with iText in java

Related

How to Extract Text From a Landscape PDF File

I'm trying to extract the text from the landscape pdf file, I'm using iTextSharp, for Portrait pages, it works well but returns an empty string for Landscape pages.
here is my code
PdfReader reader = new PdfReader(pdfFile);
int intPageNum = reader.NumberOfPages;
var sb = new StringBuilder();
for (int i = 1; i <= intPageNum; i++) {
var text = PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy());
sb.Append(text + "\n");
}

I don't want to rotate page but iTextSharp programmatically rotates my pages

I have pdf files in the "C:\\pdfs\\" directory. I want to get these pdf files and insert metadatas from the meta_data.txt file. With my lock of iTextSharp knowledge, I coded like this:
var pdf_files = Directory.GetFiles("C:\\pdfs\\", "*.pdf");
var i = 0;
foreach (var pdf_file in pdf_files)
{
var read = new PdfReader(pdf_file);
var size = read.GetPageSizeWithRotation(1);
var document = new Document(size);
var write = PdfWriter.GetInstance(document, new FileStream("C:\\temp\\" + "file_" + i, FileMode.Create, FileAccess.Write));
var datas = File.ReadAllLines("C:\\pdfs\\" + #"meta_data.txt");
var str = datas[i].Split('#');
document.AddTitle(str[1]);
document.AddSubject(str[2]);
document.AddCreator(str[3]);
document.AddAuthor(str[4]);
document.AddKeywords(str[5]);
document.Open();
var cb = write.DirectContent;
for (var pageNum = 1; pageNum <= read.NumberOfPages; pageNum++)
{
document.NewPage();
var page = write.GetImportedPage(read, pageNum);
cb.AddTemplate(page, 0, 0);
}
document.Close();
read.Close();
File.Delete(pdf_file);
File.Move("C:\\temp\\" + "file_" + i, "C:\\created\\" + "file_" + i);
i++;
}
This codes get an instance of main pdf files, while creation to temp directory, injects metadatas and later move the created directory. I don't find more practical method than this.
Anyway, in some pdf files (generated as an original pdf file) there is no problem like this:
But some other pdf files (generated from scan or very old dated pdf file) are rotated programmatically. And it seems disgusting like this:
Worse than all, I don't know how I fix the problem. Could you help me how I manage this problem.
The correct answer to this question looks like this:
PdfReader reader = new PdfReader(src);
using (PdfStamper stamper = new PdfStamper(reader,
new FileStream("C:\\temp\\" + "file_" + i, FileMode.Create, FileAccess.Write))) {
Dictionary<String, String> info = reader.Info;
info["Title"] = "Hello World stamped";
info["Subject"] = "Hello World with changed metadata";
info["Keywords"] = "iText in Action, PdfStamper";
info["Creator"] = "Silly standalone example";
info["Author"] = "Bruno Lowagie";
stamper.MoreInfo = info;
}
Habip OĞUZ ignored what I wrote in Chapter 6 of "iText in Action - Second Edition", namely that using Document, PdfWriter and AddTemplate() is wrong when you want to manipulate a single existing PDF. By using AddTemplate(), you throw away plenty of features, such as interactivity, structure, and so on. You also create a PDF that is suboptimal because every page will be stored as a Form XObject.

Error on closing an empty iTextSharp document

I am successfully merging PDF documents; now as I'm trying to implement the error handling in case no PDF document has been selected, it throws an error when closing the document: The document has no pages
In case no PDF document has been added in the "foreach" - loop, I still need to close the document!? Or not? If you open an object then it has do be closed at some point. So how to I escape correctly in case no page had been added?
private void MergePDFs()
{
DataSourceSelectArguments args = new DataSourceSelectArguments();
DataView view = (DataView)SourceCertCockpit.Select(args);
System.Data.DataTable table = view.ToTable();
List<PdfReader> readerList = new List<PdfReader>();
iTextSharp.text.Document document = new iTextSharp.text.Document();
PdfCopy copy = new PdfCopy(document, Response.OutputStream);
document.Open();
int index = 0;
foreach (DataRow myRow in table.Rows)
{
if (ListadoCertificadosCockpit.Rows[index].Cells[14].Text == "0")
{
PdfReader Reader = new PdfReader(Convert.ToString(myRow[0]));
Chapter Chapter = new Chapter(Convert.ToString(Convert.ToInt32(myRow[1])), 0);
Chapter.NumberDepth = 0;
iTextSharp.text.Section Section = Chapter.AddSection(Convert.ToString(myRow[10]), 0);
Section.NumberDepth = 0;
iTextSharp.text.Section SubSection = Section.AddSection(Convert.ToString(myRow[7]), 0);
SubSection.NumberDepth = 0;
document.Add(Chapter);
readerList.Add(Reader);
for (int i = 1; i <= Reader.NumberOfPages; i++)
{
copy.AddPage(copy.GetImportedPage(Reader, i));
}
Reader.Close();
}
index++;
}
if (document.PageNumber == 0)
{
document.Close();
return;
}
document.Close();
string SalesID = SALESID.Text;
Response.ContentType = "application/pdf";
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.AppendHeader("content-disposition", "attachment;filename=" + SalesID + ".pdf");
}
In the old days, iText didn't throw an exception when you created a document and "forgot" to add any content. This resulted in a document with a single, blank page. This was considered a bug: people didn't like single-page, empty documents. Hence the design decision to throw an exception.
Something similar was done for newPage(). A new page can be triggered explicitly (when you add document.newPage() in your code) or implicitly (when the end of a page is reached). In the old days, this often resulted in unwanted blank pages. Hence the decision to ignore newPage() in case the current page is empty.
Suppose you have this:
document.newPage();
document.newPage();
One may expect that two new pages are created. That's not true. We've made a design decision to ignore the second document.newPage() because no content was added after the first document.newPage().
This brings us to the question: what if we want to insert a blank page? Or, in your case: what if it's OK to create a document with nothing more than a single blank page?
In that case, we have to tell iText that the current page shouldn't be treated as an empty page. You can do so by introducing the following line:
writer.setPageEmpty(false);
Now the current page will be fooled into thinking that it has some content, even though it may be blank.
Adding this line to your code will avoid the The document has no pages exception and solve your problem of streams not being closed.
Take a look at the NewPage example if you want to experiment with the setPageEmpty() method.
You can add an empty page before closing the document, or catch the exception and ignore it.
In case you are still interested in a solution, or may be someone else.
I had exactly the same issue and I workaround-ed it by:
Declaring a boolean to figure out if at least one page have been added and before closing the document I referred on it.
If no pages have been copied, I add a new page in the document thanks to the AddPages method, with a rectangle as parameter. I did not find a simplest way to add a page.
So the code should be as bellow (with possibly some syntax errors as I'm not familiar with C#):
private void MergePDFs()
{
DataSourceSelectArguments args = new DataSourceSelectArguments();
DataView view = (DataView)SourceCertCockpit.Select(args);
System.Data.DataTable table = view.ToTable();
List<PdfReader> readerList = new List<PdfReader>();
iTextSharp.text.Document document = new iTextSharp.text.Document();
PdfCopy copy = new PdfCopy(document, Response.OutputStream);
document.Open();
int index = 0;
foreach (DataRow myRow in table.Rows)
{
if (ListadoCertificadosCockpit.Rows[index].Cells[14].Text == "0")
{
PdfReader Reader = new PdfReader(Convert.ToString(myRow[0]));
Chapter Chapter = new Chapter(Convert.ToString(Convert.ToInt32(myRow[1])), 0);
Chapter.NumberDepth = 0;
iTextSharp.text.Section Section = Chapter.AddSection(Convert.ToString(myRow[10]), 0);
Section.NumberDepth = 0;
iTextSharp.text.Section SubSection = Section.AddSection(Convert.ToString(myRow[7]), 0);
SubSection.NumberDepth = 0;
document.Add(Chapter);
readerList.Add(Reader);
bool AtLeastOnePage = false;
for (int i = 1; i <= Reader.NumberOfPages; i++)
{
copy.AddPage(copy.GetImportedPage(Reader, i));
AtLeastOnePage = true;
}
Reader.Close();
}
index++;
}
if (AtLeastOnePage)
{
document.Close();
return true;
}
else
{
Rectangle rec = new Rectangle(10, 10, 10, 10);
copy.AddPage(rec, 1);
document.Close();
return false;
}
string SalesID = SALESID.Text;
Response.ContentType = "application/pdf";
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.AppendHeader("content-disposition", "attachment;filename=" + SalesID + ".pdf");
}

Import pdf at a specific page

I have the requirement to merge pdf together. I need to import a pdf at a specific page into another one.
Let me illustrate this to you.
I have two pdf, first one is 50 pages long and the second one is 4pages long. I need to import the second one at the 13th page of the first pdf.
I don't find any exemple. There are plenty exemple on how to merge pdf but nothing about merging at a specific page.
Based on this exemple it look like I need to iterate over all pages one by one and import them in a new pdf. That look a bit painfull espicially if you have big pdf and need to merge many. I would create x new pdf to merge x+1 pdf.
Is there something I don't understand or is it really the way to go?
Borrowing from the example, this should be easy to do with a few modifications. You just need to add all the pages before the merge, then all the pages from the second document, then all the rest of the original pages.
Try something like this (not tested or robust - just a starting point maybe):
// Used the ExtractPages as a starting point.
public void MergeDocuments(string sourcePdfPath1, string sourcePdfPath2,
string outputPdfPath, int insertPage) {
PdfReader reader1 = null;
PdfReader reader2 = null;
Document sourceDocument1 = null;
Document sourceDocument2 = null;
PdfCopy pdfCopyProvider = null;
PdfImportedPage importedPage = null;
try {
reader1 = new PdfReader(sourcePdfPath1);
reader2 = new PdfReader(sourcePdfPath2);
// Note, I'm assuming pages are 0 based. If that's not the case, change to 1.
sourceDocument1 = new Document(reader1.GetPageSizeWithRotation(0));
sourceDocument2 = new Document(reader2.GetPageSizeWithRotation(0));
pdfCopyProvider = new PdfCopy(sourceDocument1,
new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
sourceDocument1.Open();
sourceDocument2.Open();
int length1 = reader1.NumberOfPages;
int length2 = reader2.NumberOfPages;
int page1 = 0; // Also here I'm assuming pages are 0-based.
// Having these three loops is the key. First is pages before the merge.
for (;page1 < insertPage && page1 < length1; page1++) {
importedPage = pdfCopyProvider.GetImportedPage(reader1, page1);
pdfCopyProvider.AddPage(importedPage);
}
// These are the pages from the second document.
for (int page2 = 0; page2 < length2; page2++) {
importedPage = pdfCopyProvider.GetImportedPage(reader2, page2);
pdfCopyProvider.AddPage(importedPage);
}
// Finally, add the remaining pages from the first document.
for (;page1 < length1; page1++) {
importedPage = pdfCopyProvider.GetImportedPage(reader1, page1);
pdfCopyProvider.AddPage(importedPage);
}
sourceDocument1.Close();
sourceDocument2.Close();
reader1.Close();
reader2.Close();
} catch (Exception ex) {
throw ex;
}
}

Memory error joining pdf files

I'm building a program in C# that reads 3 PDF files, performs some sorting based on the user ID and generates an output PDF file.
It currently generates 1254 pdf files(approx 5 MB each) and when I try to join them, throws an Out Of Memmory Exception .
How can I join these files, considering that it results in a 4 GB file size?
The following code joins all PDF files into a single output file.
public void CombineMultiplePDFs(string[] fileNames, string outFile)
{
int pageOffset = 0;
ArrayList master = new ArrayList();
int f = 0;
Document document = null;
PdfCopy writer = null;
while (f < fileNames.Length)
{
// we create a reader for a certain document
PdfReader reader = new PdfReader(fileNames[f]);
reader.ConsolidateNamedDestinations();
// we avisorieve the total number of pages
int n = reader.NumberOfPages;
pageOffset += n;
if (f == 0)
{
// step 1: creation of a document-object
document = new Document(reader.GetPageSizeWithRotation(1));
// step 2: we create a writer that listens to the document
writer = new PdfCopy(document, new FileStream(outFile, FileMode.Create));
// step 3: we open the document
document.Open();
}
// step 4: we add content
for (int i = 0; i < n; )
{
++i;
if (writer != null)
{
PdfImportedPage page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}
}
PRAcroForm form = reader.AcroForm;
if (form != null && writer != null)
{
writer.CopyAcroForm(reader);
}
f++;
}
// step 5: we close the document
if (document != null)
{
document.Close();
}
}

Categories

Resources