Pdf Merge/Overlap with iText - c#

I have used iText for some various utility, such us merge and editing of pdf files with success. Now I need to overlap 2 pdf pages:
For Instance:
INPUT:
PDF#1 (1 Page)
PDF#2 (1 Page)
OUTPUT:
PDF#3 (1 Page: This is the result of the 2 Input Pages Overlapped)
I don't know if it's possible to do this with iText latest version. I am also considering to use one of the 2 input PDF Files as background for the PDF Output Files.
Thank you in advance.

It's actually pretty easy to do. The PdfWriter object has an instance method called GetImportedPage() which returns a PdfImportedPage object. This object can be passed to a PdfContentByte's AddTemplate() method.
GetImportedPage() takes a PdfReader object and the page number that you want to get. You can get a PdfContentByte from an instance of a PdfWriter's DirectContent property.
The code below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.2.0 that shows this all off. It first creates two files on the desktop, the first with just a solid red background color and the second with just a paragraph. It then combines those two files overlapping into a third document. See the code for additional comments.
using System;
using System.IO;
using System.Windows.Forms;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1 {
public partial class Form1 : Form {
public Form1() {
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e) {
//Folder that we'll work from
string workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string pdf1 = Path.Combine(workingFolder, "pdf1.pdf");//PDF with solid red background color
string pdf2 = Path.Combine(workingFolder, "pdf2.pdf");//PDF with text
string pdf3 = Path.Combine(workingFolder, "pdf3.pdf");//Merged PDF
//Create a basic PDF filled with red, nothing special
using (FileStream fs = new FileStream(pdf1, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.LETTER)) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
PdfContentByte cb = writer.DirectContent;
cb.SetColorFill(BaseColor.RED);
cb.Rectangle(0, 0, doc.PageSize.Width, doc.PageSize.Height);
cb.Fill();
doc.Close();
}
}
}
//Create a basic PDF with a single line of text, nothing special
using (FileStream fs = new FileStream(pdf2, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.LETTER)) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
doc.Add(new Paragraph("This is a test"));
doc.Close();
}
}
}
//Create a basic PDF
using (FileStream fs = new FileStream(pdf3, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.LETTER)) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
//Get page 1 of the first file
PdfImportedPage imp1 = writer.GetImportedPage(new PdfReader(pdf1), 1);
//Get page 2 of the second file
PdfImportedPage imp2 = writer.GetImportedPage(new PdfReader(pdf2), 1);
//Add the first file to coordinates 0,0
writer.DirectContent.AddTemplate(imp1, 0, 0);
//Since we don't call NewPage the next call will operate on the same page
writer.DirectContent.AddTemplate(imp2, 0, 0);
doc.Close();
}
}
}
this.Close();
}
}
}

Related

Pdf Merge with itext sharp

I am trying to create a desktop application combining existing pdf files into one.
I found some code that helps me with my design and selecting the files and merging them but my code creates pdf files and then sends the new file to the desktop. I need my code to grab existing pdf files and merging them together to create a file with those files together and have it sent to my desktop. Attached is my code, please let me know what I need to fix. I am very new to C# I understand basics but I am unsure where to change things and how to in this certain area.
namespace mergePdf
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void btnMerge_Click(object sender, EventArgs e)
{
//Folder that we'll work from
string workingFolder =
Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string pdf1 = Path.Combine(workingFolder, "pdf1.pdf");//PDF
with solid red background color
string pdf2 = Path.Combine(workingFolder, "pdf2.pdf");//PDF
with text
string pdf3 = Path.Combine(workingFolder, "pdf3.pdf");
//Merged PDF
//Create a basic PDF filled with red, nothing special
using (FileStream fs = new FileStream(pdf1, FileMode.Create,
FileAccess.Write, FileShare.None))
{
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter writer = PdfWriter.GetInstance(doc,
fs))
{
doc.Open();
PdfContentByte cb = writer.DirectContent;
cb.SetColorFill(BaseColor.RED);
cb.Rectangle(0, 0, doc.PageSize.Width,
doc.PageSize.Height);
cb.Fill();
doc.Close();
}
}
}
//Create a basic PDF with a single line of text, nothing
special
using (FileStream fs = new FileStream(pdf2, FileMode.Create,
FileAccess.Write, FileShare.None))
{
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter writer = PdfWriter.GetInstance(doc,
fs))
{
doc.Open();
doc.Add(new Paragraph("This is a test"));
doc.Close();
}
}
}
//Create a basic PDF
using (FileStream fs = new FileStream(pdf3, FileMode.Create,
FileAccess.Write, FileShare.None))
{
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter writer = PdfWriter.GetInstance(doc,
fs))
{
doc.Open();
//Get page 1 of the first file
PdfImportedPage imp1 = writer.GetImportedPage(new
PdfReader(pdf1), 1);
//Get page 2 of the second file
PdfImportedPage imp2 = writer.GetImportedPage(new
PdfReader(pdf2), 1);
//Add the first file to coordinates 0,0
writer.DirectContent.AddTemplate(imp1, 0, 0);
//Since we don't call NewPage the next call will
operate on the same page
writer.DirectContent.AddTemplate(imp2, 0, 0);
doc.Close();
}
}
}
this.Close();
}
private void Form1_Load(object sender, EventArgs e)
{
textBoxPdfFile1Path.Text =
System.IO.Path.Combine(Application.StartupPath,
#"C:\Users\jesse\Downloads");
textBoxPdfFile2Path.Text =
System.IO.Path.Combine(Application.StartupPath,
#"C:\Users\jesse\Downloads");
}
private void btnSelectFile1_Click(object sender, EventArgs e)
{
OpenFileDialog fd = new OpenFileDialog();
fd.Filter = "PDF files (*.pdf)|*.pdf|All files (*.*)|*.*";
if (fd.ShowDialog() == DialogResult.OK)
{
textBoxPdfFile1Path.Text = fd.FileName;
}
}
private void btnSelectFile2_Click(object sender, EventArgs e)
{
OpenFileDialog fd = new OpenFileDialog();
fd.Filter = "PDF files (*.pdf)|*.pdf|All files (*.*)|*.*";
if (fd.ShowDialog() == DialogResult.OK)
{
textBoxPdfFile2Path.Text = fd.FileName;
}
}
}
}
I expect the output to combine existing files into one file which will be sent to my desktop. Right now it creates two pdf sample files and combines them but I have no idea how to select from existing instead.
Using the below code should do what you want as far merging the documents.
To get a list of the actual paths to the individual pdfs is up to you on how to do.
Once the FileStream is closed your document should be created at the specified 'newPdfPath' path.
using (FileStream stream = new FileStream(newPdfPath, FileMode.Create))
{
Document document = new Document();
PdfCopy pdf = new PdfCopy(document, stream);
PdfReader reader = null;
document.Open();
foreach (var item in listOfPathsToPDFs)
{
reader = new PdfReader(item);
pdf.AddDocument(reader);
reader.Close();
}
document.Close();
}

ItextSharp Find text in pdf and highlight it

I am working on PDF functionality, I want to search text in PDF and highlight the found text in the PDF. For that I am using iTextsharp.
I did not get any solution yet, please provide me with a solution.
I have written following code;
public ActionResult Index1()
{
string outputFile = #"D:\Test.pdf";
using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
{
doc.Open();
doc.Add(new Paragraph("This is a test and sample pdf for test and wait for it"));
doc.Close();
}
}
}
List<int> pages = new List<int>();
PdfReader pdfReader = new PdfReader(outputFile);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
if (currentPageText.Contains("test"))
{
pages.Add(page);
}
}
pdfReader.Close();
//Create a new file from our test file with highlighting
string highLightFile = #"D:\Test1.pdf";
//Bind a reader and stamper to our test PDF
PdfReader reader = new PdfReader(outputFile);
using (FileStream fs = new FileStream(highLightFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (PdfStamper stamper = new PdfStamper(reader, fs))
{
iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(60.6755f, 749.172f, 94.0195f, 735.3f);
float[] quad = { rect.Left, rect.Bottom, rect.Right, rect.Bottom, rect.Left, rect.Top, rect.Right, rect.Top };
PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);
//Set the color
highlight.Color = BaseColor.YELLOW;
//Add the annotation
stamper.AddAnnotation(highlight, 1);
}
}
return View();
}
Above code creates one PDF (test1.pdf)
And in another PDF it highlights some text with hard-coded coordinates, I need to find the coordinates of some text in the PDF.
But I could not find the coordinates of the text I'm looking for.
In iText7, we have implemented this functionality.
It can be found in the class RegexBasedLocationExtractionStrategy.
I suggest you have a look to see how it was done, since this functionality was not backported to iText5.

Remove metadata from existing Pdf using iTextsharp

I created a pdf and added a metadata into it and also encrypted it uisng iTextsharp library.
Now I want to remove the encryption from the pdf. I successfully did so using iTextSharp but was not able to remove the metadata that I added.
Can anyone please giude me how can I remove the metadata. Its urgent.
Thanks.
When removing meta data it is easiest to work directly with the PdfReader object. Once you do that you can write that back to disk. The code below is a full working C# 2010 WinForms application targeting iTextSharp 5.1.2.0. It first creates a PDF with some meta data, then it modifies an in-memory version of the PDF using a PdfReader, and finally writes the changes to disk. See the code for additional comments.
using System;
using System.Collections.Generic;
using System.IO;
using System.Windows.Forms;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1 {
public partial class Form1 : Form {
public Form1() {
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e) {
//File with meta data added
string InputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.pdf");
//File with meta data removed
string OutputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Output.pdf");
//Create a file with meta data, nothing special here
using (FileStream FS = new FileStream(InputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document Doc = new Document(PageSize.LETTER)) {
using (PdfWriter writer = PdfWriter.GetInstance(Doc, FS)) {
Doc.Open();
Doc.Add(new Paragraph("Test"));
//Add a standard header
Doc.AddTitle("This is a test");
//Add a custom header
Doc.AddHeader("Test Header", "This is also a test");
Doc.Close();
}
}
}
//Read our newly created file
PdfReader R = new PdfReader(InputFile);
//Loop through each piece of meta data and remove it
foreach (KeyValuePair<string, string> KV in R.Info) {
R.Info.Remove(KV.Key);
}
//The code above modifies an in-memory representation of the PDF, we need to write these changes to disk now
using (FileStream FS = new FileStream(OutputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document Doc = new Document()) {
//Use the PdfCopy object to copy each page
using (PdfCopy writer = new PdfCopy(Doc, FS)) {
Doc.Open();
//Loop through each page
for (int i = 1; i <= R.NumberOfPages; i++) {
//Add it to the new document
writer.AddPage(writer.GetImportedPage(R, i));
}
Doc.Close();
}
}
}
this.Close();
}
}
}

iTextSharp HTMLWorker ParseHTML Tablestyle and PDFStamper

Hi I have succesfully used a HTMLWorker to convert a gridview using asp.NET / C#.
(1) I have applied some limited style to the resulting table but cannot see how to apply tablestyle for instance grid lines or apply other formatting style such as a large column width for example for a particular column.
(2) I would actually like to put this text onto a pre-existing template which contains a logo etc. I've used PDF Stamper before for this but cannot see how I can use both PDFStamper and HTMLWorker at once. HTMLWorker needs a Document which implements iDocListener ... but that doesnt seem compatible with usign a PDFStamper. I guess what I am looking for is a way to create a PDFStamper, write title etc, then add the parsed HTML from the grid. The other problem is that the parsed content doesnt interact with the other stuff on the page. For instance below I add a title chunk to the page. Rather than starting below it, the parsed HTML writes over the top. How do I place / interact the parsed HTML content with the rest of what is on the PDF document ?
Thanks in advance
Rob
Here';s the code I have already
Document pdfDoc = new Document(PageSize.A4, 10f, 10f, 30f, 0f);
HTMLWorker htmlWorker = new HTMLWorker(pdfDoc);
StyleSheet styles = new StyleSheet();
styles.LoadTagStyle("th", "size", "12px");
styles.LoadTagStyle("th", "face", "helvetica");
styles.LoadTagStyle("span", "size", "10px");
styles.LoadTagStyle("span", "face", "helvetica");
styles.LoadTagStyle("td", "size", "10px");
styles.LoadTagStyle("td", "face", "helvetica");
htmlWorker.SetStyleSheet(styles);
PdfWriter.GetInstance(pdfDoc, HttpContext.Current.Response.OutputStream);
pdfDoc.Open();
//Title - but this gets obsured by data, doesnt move it down
Font font = new Font(Font.FontFamily.HELVETICA, 14, Font.BOLD);
Chunk chunk = new Chunk(title, font);
pdfDoc.Add(chunk);
//Body
htmlWorker.Parse(sr);
Let me first give you a couple of links to look over when you get a chance:
ItextSharp support for HTML and CSS
How to apply font properties on while passing html to pdf using itextsharp
These answers go deeper into what's going on and I recommend reading them when you get a chance. Specifically the second one will show you why you need to use pt instead of px.
To answer your first question let me show you a different way to use the HTMLWorker class. This class has a static method on it called ParseToList that will convert HTML to a List<IElement>. The objects in that list are all iTextSharp specific versions of your HTML. Normally you would do a foreach on those and just add them to a document but you can modify them before adding which is what you want to do. Below is code that takes a static string and does that:
string file1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File1.pdf");
using (FileStream fs = new FileStream(file1, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
{
doc.Open();
//Our HTML
string html = "<table><tr><th>First Name</th><th>Last Name</th></tr><tr><td>Chris</td><td>Haas</td></tr></table>";
//ParseToList requires a StreamReader instead of just a string so just wrap it
using (StringReader sr = new StringReader(html))
{
//Create a style sheet
StyleSheet styles = new StyleSheet();
//...styles omitted for brevity
//Convert our HTML to iTextSharp elements
List<IElement> elements = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(sr, styles);
//Loop through each element (in this case there's actually just one PdfPTable)
foreach (IElement el in elements)
{
//If the element is a PdfPTable
if (el is PdfPTable)
{
//Cast it
PdfPTable tt = (PdfPTable)el;
//Change the widths, these are relative width by the way
tt.SetWidths(new float[] { 75, 25 });
}
//Add the element to the document
doc.Add(el);
}
}
doc.Close();
}
}
}
Hopefully you can see that once you get access to the raw PdfPTable you can tweak it as necessary.
To answer your second question, if you want to use the normal Paragraph and Chunk objects with a PdfStamper then you need to use a PdfContentByte object. You can get this from your stamper in one of two ways, either by asking for one that sits "above" existing content, stamper.GetOverContent(int) or one that sits "below" existing content, stamper.GetUnderContent(int). Both versions take a single parameter saying what page to work with. Once you have a PdfContentByte you can create a ColumnText object bound to it and use this object's AddElement() method to add your normal elements. Before doing this (and this answers your third question), you'll want to create at least one "column". When I do this I generally create one that essentially covers the entire page. (This part might sound weird but we're essentially make a single row, single column table cell to add our objects to.)
Below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.0 that shows off everything above. First it creates a generic PDF on the desktop. Then it creates a second document based off of the first, adds a paragraph and then some HTML. See the comments in the code for any questions.
using System;
using System.Collections.Generic;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text;
using iTextSharp.text.html.simpleparser;
using iTextSharp.text.pdf;
using System.IO;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
//The two files that we are creating
string file1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File1.pdf");
string file2 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File2.pdf");
//Create a base file to write on top of
using (FileStream fs = new FileStream(file1, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
{
doc.Open();
doc.Add(new Paragraph("Hello world"));
doc.Close();
}
}
}
//Bind a reader to our first document
PdfReader reader = new PdfReader(file1);
//Create our second document
using (FileStream fs = new FileStream(file2, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (PdfStamper stamper = new PdfStamper(reader, fs))
{
StyleSheet styles = new StyleSheet();
//...styles omitted for brevity
//Our HTML
string html = "<table><tr><th>First Name</th><th>Last Name</th></tr><tr><td>Chris</td><td>Haas</td></tr></table>";
//ParseToList requires a StreamReader instead of just a string so just wrap it
using (StringReader sr = new StringReader(html))
{
//Get our raw PdfContentByte object letting us draw "above" existing content
PdfContentByte cb = stamper.GetOverContent(1);
//Create a new ColumnText object bound to the above PdfContentByte object
ColumnText ct = new ColumnText(cb);
//Get the dimensions of the first page of our source document
iTextSharp.text.Rectangle page1size = reader.GetPageSize(1);
//Create a single column object spanning the entire page
ct.SetSimpleColumn(0, 0, page1size.Width, page1size.Height);
ct.AddElement(new Paragraph("Hello world!"));
//Convert our HTML to iTextSharp elements
List<IElement> elements = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(sr, styles);
//Loop through each element (in this case there's actually just one PdfPTable)
foreach (IElement el in elements)
{
//If the element is a PdfPTable
if (el is PdfPTable)
{
//Cast it
PdfPTable tt = (PdfPTable)el;
//Change the widths, these are relative width by the way
tt.SetWidths(new float[] { 75, 25 });
}
//Add the element to the ColumnText
ct.AddElement(el);
}
//IMPORTANT, this actually commits our object to the PDF
ct.Go();
}
}
}
this.Close();
}
}
}
protected void LinkPdf_Click(object sender, EventArgs e)
{
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=TestPage.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter hw = new HtmlTextWriter(sw);
this.Page.RenderControl(hw);
StringReader sr = new StringReader(sw.ToString());
Document pdfDoc = new Document(PageSize.A4, 10f, 10f, 100f, 0f);
HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
pdfDoc.Open();
htmlparser.Parse(sr);
pdfDoc.Close();
Response.Write(pdfDoc);
Response.End();
}

Hebrew text in PDF

I wrote a PDF document, and I try to write in Hebrew (UTF-8), and I can not in Windows Forms using C# and Visual Studio 2010 using the following code.
Document Doc = new Document(PageSize.LETTER);
//Create our file stream
using (FileStream fs = new FileStream("C:\\Users\\moshe\\Desktop\\Test18.pdf", FileMode.Create, FileAccess.Write, FileShare.Read))
{
//Bind PDF writer to document and stream
PdfWriter writer = PdfWriter.GetInstance(Doc, fs);
//Open document for writing
Doc.Open();
//Add a page
Doc.NewPage();
//Full path to the Unicode Arial file
string ARIALUNI_TFF = Path.Combine("C:\\Users\\moshe\\Desktop\\proj\\gold\\fop\\gold", "ARIAL.TTF");
//Create a base font object making sure to specify IDENTITY-H
BaseFont bf = BaseFont.CreateFont(ARIALUNI_TFF, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
//Create a specific font object
iTextSharp.text.Font f = new iTextSharp.text.Font(bf, 12);
//Write some text
Doc.Add(new Phrase("מה קורה", f));
//Write some more text
Doc.Add(new Phrase("תודה לכולם", f));
//Close the PDF
Doc.Close();
I put the font in the folder.
What do I need to do?
Use a PdfPTable, then you can set the right-to-left mode:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Document Doc = new Document(PageSize.LETTER);
//Create our file stream
using (FileStream fs = new FileStream(#"C:\Users\moshe\Desktop\Test18.pdf", FileMode.Create, FileAccess.Write, FileShare.Read))
{
//Bind PDF writer to document and stream
PdfWriter writer = PdfWriter.GetInstance(Doc, fs);
//Open document for writing
Doc.Open();
//Add a page
Doc.NewPage();
//Full path to the Arial file
string ARIALUNI_TFF = Path.Combine(#"C:\Users\moshe\Desktop\proj\gold\fop\gold", "ARIAL.TTF");
//Create a base font object making sure to specify IDENTITY-H
BaseFont bf = BaseFont.CreateFont(ARIALUNI_TFF, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
//Create a specific font object
iTextSharp.text.Font f = new iTextSharp.text.Font(bf, 12);
//Use a table so that we can set the text direction
PdfPTable T = new PdfPTable(1);
//Hide the table border
T.DefaultCell.BorderWidth = 0;
//Set RTL mode
T.RunDirection = PdfWriter.RUN_DIRECTION_RTL;
//Add our text
T.AddCell(new Phrase("מה קורה", f));
//Add table to document
Doc.Add(T);
//Close the PDF
Doc.Close();
}
}
}
}

Categories

Resources