How can I remove page breaks from a pdf, so the output would be a single 'page' PDF? So if a normal page is 400x900 and I have 4 pages, a resulting file would be 1600x900. I previously did this for Tif files (Remove page breaks in multi-page tif to make one long page), but would like to do it with PDF. Could I possibly convert to ps, remove whatever code means 'page break', then convert back to pdf?
This can be done in the iTextSharp library by using a single columned PdfTable and dynamically changing the size of the document dependent upon the number of pages.
You'll of course need a few references to the iTextSharp DLL found here
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
Here's a simple example:
public static void MergePages()
{
using (PdfReader reader = new PdfReader(#"C:\Users\cmilne\Desktop\AA0081913.pdf"))//Original PDF containing page breaks.
{
int pages = reader.NumberOfPages;
float postProcessPageHeight = 0;
float postProcessPageWidth = 0;
for (int p = 1; p <= bill.PageCount; p++)
{
var size = bill.PdfReader.GetPageSize(p);
postProcessPageHeight += (size.Height);
if (size.Width > postProcessPageWidth)
postProcessPageWidth = (size.Width);
}
var rect = new Rectangle(postProcessPageWidth, postProcessPageHeight);
using (Document document = new Document(rect, 0, 0, 0, 0))
{
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(#"C:\Users\cmilne\Desktop\AA0081913_NEW.pdf", FileMode.Create)); //Declare location\name of new PDF not containing page breaks.
document.Open();
PdfImportedPage page;
PdfPTable table = new PdfPTable(1);
table.WidthPercentage = 100;
for (int i = 1; i <= pages; i++)
{
page = writer.GetImportedPage(reader, i);
table.AddCell(iTextSharp.text.Image.GetInstance(page));
}
document.Add(table);
document.Close();
}
}
}
The ending page size must be smaller than 14400 by 14400. (This is all that iTextSharp allows) An 8 1/2 x 11 PDF at a common resolution would make the max about 18 pages.
Use the iTextSharp C# library. It gives you a lot of options to manipulate PDFs. I've used it before when I had to write an import application for a closed-source document repository. It worked like a charm. The only downside is their documentation is kind of spotty because they want you to purchase their book. You can browser their Java API though for free since its almost identical to the C#, and just play around with it to find the C# version.
iText: http://itextpdf.com/
Related
I am converting html page to pdf using HtmlToPdf() of SelectPDF. Since html content is big, I am breaking it in half and creating 2 PDFs.
I am struggling to edit the total_pages in the footer to display actual total number of the pages, not only the current document; as well as page_number to display the actual page number in the context of both PDFs.
How can I assess {page_number} and {total_pages} to calculate proper values? All examples I found use PdfDocument(), not HtmlToPdf().
Dim converter As New HtmlToPdf()
Dim text As New PdfTextSection(0, 10, "Page: {page_number} of {total_pages} ")
text.HorizontalAlign = PdfTextHorizontalAlign.Center
converter.Footer.Add(text)
I am tagging both C# and VB since SelectPDF is for both languages, and relevant sample from either one will work for me. Thank you
Today I've stumbled upon the same issue and I have found a work-around for the problem. The converter was able to show page numbers for it's the generated document but can't be aware of multiple generated files (you can't access the page properties) so all my pages I concatenated were showing Page 1 of 1.
First I define one PdfDocument (see it as the main document) and I use HtmlToPdf to append html converted files to this main document.
// Create converter
converter = new HtmlToPdf();
PdfTextSection text = new PdfTextSection(0, 10, "Page: {page_number} of {total_pages} ", new Font("Arial", 8));
text.HorizontalAlign = PdfTextHorizontalAlign.Right;
converter.Footer.Add(text);
// Create main document
pdfDocument = new PdfDocument();
Then I add pages (from html) using this method
public void AddPage(string htmlPage)
{
PdfDocument doc = converter.ConvertHtmlString(htmlPage);
pdfDocument.Append(doc);
converter.Footer.TotalPagesOffset += doc.Pages.Count;
converter.Footer.FirstPageNumber += doc.Pages.Count;
}
This results in correct page numbers for the main document. The same trick could be used for splitting files and page numbers over multiple documents like you described.
EDIT: In case you don't see any page numbering using the HtmlToPdf converter, don't forget to set following property:
converter.Options.DisplayFooter = true;
There is an open source library called itextsharp that will help get total page count.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text.pdf;
using iTextSharp.text.xml;
namespace GetPages_PDF
{
class Program
{
static void Main(string[] args)
{
// Right side of equation is location of YOUR pdf file
string ppath = "C:\\aworking\\Hawkins.pdf";
PdfReader pdfReader = new PdfReader(ppath);
int numberOfPages = pdfReader.NumberOfPages;
Console.WriteLine(numberOfPages);
Console.ReadLine();
}
}
}
Then you can stamp text also on the page but you will need to add the location to where it needs to go.
link: http://crmhunt.com/how-to-modify-pdf-file-using-itextsharp/
hope this helps in some way.
You should use the following properties:
FirstPageNumber - Controls the page number for the first page being
rendered.
TotalPagesOffset - Controls the total number of pages
offset in the generated pdf document.
More details here:
http://selectpdf.com/html-to-pdf/docs/html/HtmlToPdfHeadersAndFooters.htm
The answers above did not work for me as I was trying to merge multiple PDFs with different orientations. bonnoj's answer did add page numbers but they were incorrect and I couldn't find a way to correct them. So I took a different approach - I created a PDF, then for each HTML page I added a pdfPage and then added a PdfHtmlElement to that page. Finally I loop over the pages and add a custom footer to each page. This may not be the most efficient way to do this but it's the only way that I could find that added the footer in the correct place when mixing portrait and landscape pages. Hopefully it will save somebody else spending hours playing with different properties.
var pdfDocument = new PdfDocument(PdfStandard.Full);
foreach (var (html, pdfPageOrientation) in pages)
{
var page = pdfDocument.AddPage(PdfCustomPageSize.A4, new PdfMargins(marginLeft, marginRight, marginTop, marginBottom));
page.Orientation = pdfPageOrientation;
var pdfHtmlElement = new PdfHtmlElement(html, "");
page.Add(pdfHtmlElement);
}
var pdfFont = pdfDocument.AddFont(PdfStandardFont.Helvetica);
pdfFont.Size = 12;
foreach (PdfPage page in pdfDocument.Pages)
{
var customFooter = pdfDocument.AddTemplate(page.PageSize.Width, 30);
var pdfFooterTextElement = new PdfTextElement(0, 15,
pageFooterText,
pdfFont)
{
HorizontalAlign = PdfTextHorizontalAlign.Right,
VerticalAlign = PdfTextVerticalAlign.Bottom,
};
customFooter.Add(pdfFooterTextElement);
page.CustomFooter = customFooter;
}
pdfDocument.Save(stream);
How can I write a multi-page ToC to the end of a PDF consisting of merged documents, using iTextSharp?
The answer to Create Index File(TOC) for merged pdf using itext library in java explains how to create a ToC page when merging PDFs (catalogued in the iTextSharp book http://developers.itextpdf.com/examples/merging-pdf-documents/merging-documents-and-create-table-contents#795-mergewithtoc.java). Code in this answer is based on those examples.
However it only works if the ToC is 1 page long. If the content becomes longer, then it repeats itself on the same page rather than spanning into the next page.
Trying to add the link directly to the text via:
ct.Add(new Chunk("link").SetLocalGoto("p1"))
causes an exception ("Cannot add Annotations, not enough pages in document").
Can anyone explain a method that will allow me to append multiple pages of content to a PDF when merging them (the more general the approach, the better). Is there a way to write into the document using Document.Add() instead of having to copy in template pages and write on the top of them?
(Note, code is in c#)
This answer is based on the example from the iTextSharp documentation, but converted to C#.
To make the added text span multiple pages, I found I could use ColumnText.HasMoreText(ct.Go()) to tell me if there was more text than could fit on the current page. You can then save the current page, re-create a new page template, and move the columntext to the new page. Below this is in a function called CheckForNewPage:
private bool CheckForNewPage(PdfCopy copy, ref PdfImportedPage page, ref PdfCopy.PageStamp stamp, ref PdfReader templateReader, ColumnText ct)
{
if (ColumnText.HasMoreText(ct.Go()))
{
//Write current page
stamp.AlterContents();
copy.AddPage(page);
//Start a new page
ct.SetSimpleColumn(36, 36, 559, 778);
templateReader = new PdfReader("template.pdf");
page = copy.GetImportedPage(templateReader, 1);
stamp = copy.CreatePageStamp(page);
ct.Canvas = stamp.GetOverContent();
ct.Go();
return true;
}
return false;
}
This should be called each time text is added to the ct variable.
If CheckForNewPage returns true you can then increment the page count, and reset the y variable to the top of the new page so that link annotation is in the correct place on the new page.
e.g.
var tocPageCount = 0;
var para = new iTextSharp.text.Paragraph(documentName);
ct.AddElement(para);
ct.Go();
if (CheckForNewPage(context, copy, ref page, ref stamp, ref tocReader, ct))
{
tocPageCount++;
y = 778;
}
//Add link annotation
action = PdfAction.GotoLocalPage(d.DocumentID.ToString(), false);
link = new PdfAnnotation(copy, TOC_Page.Left, ct.YLine, TOC_Page.Right, y, action);
stamp.AddAnnotation(link);
y = ct.YLine;
This creates the pages correctly. The below code adapts the end of ToC2 example for re-ordering the pages, in order to handle more than 1 page.
var rdr = new PdfReader(baos.toByteArray());
var totalPageCount = rdr.NumberOfPages;
rdr.SelectPages(String.Format("{0}-{1}, 1-{2}", totalPageCount - tocPageCount +1, totalPageCount, totalPageCount - tocPageCount));
PdfStamper stamper = new PdfStamper(rdr, new FileStream(outputFilePath, FileMode.Create));
stamper.Close();
By re-using the CheckForNewPage function, you should be able to add any content to new pages you create, and have it span multiple pages. If you don't need the annnotations you call CheckForNewPage in a loop at the end of adding all your content (just don't call ct.Go() beforehand).
I have a small Problem using iTextSharp and C#.
Context:
I download PDFs and merge them into one huge.
Problem:
On every page the first couple centimeters are just White and the pdf I Import starts after that White chunk.
The end of every page is correct. There is no overlapping or missing objects/text - which you would assume since it has to deal with less space. I think it might get stretched vertically.
So the Import works fine, but it always adds a few centrimeters of White on the top of every page.
It feels like a top-margin. But I can't seem to fix it.
Any ideas?
I appreciate your help. Thanks a lot.
public void method()
{
// needed variables for the pdf-merging part
fs = new FileStream(Variables.destinationFile, FileMode.Create);
writer = PdfWriter.GetInstance(doc, fs);
doc.Open();
doc.SetPageSize(PageSize.A4);
doc.SetMargins(0f, 0f, 0f, 0f);
pdfContent = writer.DirectContent;
byte[] result;
int numPages;
foreach (Tuple<string, string, int> currentTuple in someArray)
try
{
result = client.DownloadData(new Uri(adress + currentTuple.Item1 + ".pdf"));
// read and add the pages to the output file
reader = new PdfReader(result);
numPages = reader.NumberOfPages;
for (int i = 1; i < numPages + 1; i++)
{
doc.NewPage();
page = writer.GetImportedPage(reader, i);
pdfContent.AddTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
catch (Exception e)
{
}
}
doc.Close();
writer.Close();
fs.Close();
}
p.s. why does it always delete my "hi there"? :)
You are using the wrong method to merge documents. Your method throws away all interactivity and does not respect page sizes (which explains the problem you are reporting). Please tell me where you got the inspiration for merging documents this way, so that I can go and spank the person responsible for the example you were using ;-)
The correct way of concatenating documents is explained in chapter 6 of my book.
You can find some more examples here:
ITextSharp PdfCopy use examples
copy pdf form with PdfCopy not working in itextsharp 5.4.5.0
Merge PDFs iTextSharp
itextsharp PdfCopy and landscape pages
...
As you can see, your question has been answered many times before on StackOverflow, in the sense that many people have been using the correct way to merge documents (using PdfCopy) instead of doing it the wrong way (using PdfWriter and AddTemplate()).
In your comment, you say that the method AddPage() doesn't exist in PdfCopy. Let's take a look at the most recent version of that class: PdfCopy.cs
I clearly see:
/**
* Add an imported page to our output
* #param iPage an imported page
* #throws IOException, BadPdfFormatException
*/
public virtual void AddPage(PdfImportedPage iPage) {
Note that recent versions also have an AddDocument() method:
virtual public void AddDocument(PdfReader reader) {
Using this method, you no longer have to loop over all the pages, but you can add all the pages of the PDF being read by PdfReader at once.
If you only want to add a selection of pages, you can use:
virtual public void AddDocument(PdfReader reader, List<int> pagesToKeep) {
Please do not use unofficial versions! The official version can be downloaded here: http://sourceforge.net/projects/itextsharp/files/itextsharp/
iText Group does not take any responsibility regarding old versions of iTextSharp, nor can we be held responsible for forks of our software.
At my work sometimes I have to merge from few to few hundreds pdf files. All the time I've been using Writer and ImportedPages classes. But when I have merged all files into one, file size becomes enormous, sum of all merged files sizes, because fonts being attached to every page, and not reused (fonts are embedded to every page, not whole document).
Not very long time ago I found out about PdfSmartCopy class, which reuses embedded fonts and images. And here the problem kicks in. Very often, before merging files together, I have to add additional content to them (images, text). For this purpose I usually use PdfContentByte from Writer object.
Document doc = new Document();
PdfWriter writer = PdfWriter.GetInstance(doc, new FileStream("C:\test.pdf", FileMode.Create));
PdfContentByte cb = writer.DirectContent;
cb.Rectangle(100, 100, 100, 100);
cb.SetColorStroke(BaseColor.RED);
cb.SetColorFill(BaseColor.RED);
cb.FillStroke();
When I do similar thing with PdfSmartCopy object, pages are merged, but no additional content being added. Full code of my test with PdfSmartCopy:
using (Document doc = new Document())
{
using (PdfSmartCopy copy = new PdfSmartCopy(doc, new FileStream(Path.GetDirectoryName(pdfPath[0]) + "\\testas.pdf", FileMode.Create)))
{
doc.Open();
PdfContentByte cb = copy.DirectContent;
for (int i = 0; i < pdfPath.Length; i++)
{
PdfReader reader = new PdfReader(pdfPath[i]);
for (int ii = 0; ii < reader.NumberOfPages; ii++)
{
PdfImportedPage import = copy.GetImportedPage(reader, ii + 1);
copy.AddPage(import);
cb.Rectangle(100, 100, 100, 100);
cb.SetColorStroke(BaseColor.RED);
cb.SetColorFill(BaseColor.RED);
cb.FillStroke();
doc.NewPage();// net nesessary line
//ColumnText col = new ColumnText(cb);
//col.SetSimpleColumn(100,100,500,500);
//col.AddText(new Chunk("wdasdasd", PdfFontManager.GetFont(#"C:\Windows\Fonts\arial.ttf", 20)));
//col.Go();
}
}
}
}
}
Now I have few questions:
Is it possible to edit PdfSmartCopy object's DirectContent?
If not, is there another way to merge multiple pdf files into one not increasing its size dramatically and still being able to add additional content to pages while merging?
First this: using PdfWriter/PdfImportedPage is not a good idea. You throw away all interactive features! Being the author of iText, it's very frustrating to so many people making the same mistake in spite of the fact that I wrote two books about this, and in spite of the fact that I convinced my publisher to offer one of the most important chapters for free: http://www.manning.com/lowagie2/samplechapter6.pdf
Is my writing really that bad? Or is there another reason why people keep on merging documents using PdfWriter/PdfImportedPage?
As for your specific questions, here are the answers:
Yes. Download the sample chapter and search the PDF file for PageStamp.
Only if you create the PDF in two passes. For instance: create the huge PDF first, then reduce the size by passing it through PdfCopy; or create the merged PDF first with PdfCopy, then add the extra content in a second pass using PdfStamper.
Code after using Bruno Lowagie answer
for (int i = 0; i < pdfPath.Length; i++)
{
PdfReader reader = new PdfReader(pdfPath[i]);
PdfImportedPage page;
PdfSmartCopy.PageStamp stamp;
for (int ii = 0; ii < reader.NumberOfPages; ii++)
{
page = copy.GetImportedPage(reader, ii + 1);
stamp = copy.CreatePageStamp(page);
PdfContentByte cb = stamp.GetOverContent();
cb.Rectangle(100, 100, 100, 100);
cb.SetColorStroke(BaseColor.RED);
cb.SetColorFill(BaseColor.RED);
cb.FillStroke();
stamp.AlterContents(); // don't forget to add this line
copy.AddPage(page);
}
}
2.Only if you create the PDF in two passes. For instance: create the huge PDF first, then reduce the size by passing it through PdfCopy; or create the merged PDF first with PdfCopy, then add the extra content in a second pass using PdfStamper.
It is much more difficult to use the PdfStamper with a second pass. When your working with lots of data it's far easier to create 1 pdf stamp then append.
PdfCopyFields had worked well for this. Now it doesn't work as of the 5.4.4.0 release which is why I'm here.
I'm using iText to generate a PDF document that consists of several copies of almost the same information.
E.g.: An invoice. One copy is given to the customer, another is filed and a third one is given to an accountant for book-keeping.
All the copies must be exactly the same except for a little piece of text that indicates who is the copy to (Customer, Accounting, File, ...).
There are two possible scenarios (I don't know if the solution is the same for both of them):
a) Each copy goes in a different page.
b) All the copies goes in the same page (the paper will have cutting holes to separete copies).
There will be a wrapper or helper class which uses iText to generate the PDF in order to be able to do something like var pdf = HelperClass.CreateDocument(DocuemntInfo info);. The multiple-copies problem will be solved inside this wrapper/helper.
What does iText provides to accomplish this? Do I need to write each element in the document several times in different positions/pages? Or does iText provide some way to write one copy to the document and then copy it to other position/page?
Note: It's a .Net project, but I tagged the question with both java and c# because this qustion is about how to use iText properly the answer will help both laguage developers.
If each copy goes on a different page, you can create a new document and copy in the page multiple times. Using iText in Java you can do it like this:
// Create output PDF
Document document = new Document(PageSize.A4);
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
PdfContentByte cb = writer.getDirectContent();
// Load existing PDF
PdfReader reader = new PdfReader(templateInputStream);
PdfImportedPage page = writer.getImportedPage(reader, 1);
// Copy first page of existing PDF into output PDF
document.newPage();
cb.addTemplate(page, 0, 0);
// Add your first piece of text here
document.add(new Paragraph("Customer"));
// Copy second page of existing PDF into output PDF
document.newPage();
cb.addTemplate(page, 0, 0);
// Add your second piece of text here
document.add(new Paragraph("Accounting"));
// etc...
document.close();
If you want to put all the copies on the same page, the code is similar but instead of using zeroes in addTemplate(page, 0, 0) you'll need to set values for the correct position; the numbers to use depend on the size and shape of your invoice.
See also iText - add content to existing PDF file — the above code is based on the code I wrote in that answer.
Here's how I see this working.
PdfReader reader = new PdfReader( templatePDFPath );
Document doc = new Document();
PdfWriter writer = PdfWriter.createInstance( doc, new FileOutputStream("blah.pdf" ) );
PdfImportedPage inputPage = writer.getImportedPage( reader, 1 );
PdfDirectContent curPageContent = writer.getDirectContent();
String extraStuff[] = getExtraStuff();
for (String stuff : extraStuff) {
curPageContent.saveState();
curPageContent.addTemplate( inputPage /*, x, y*/ );
curPageContent.restoreState();
curPageContent.beginText();
curPageContent.setTextMatrix(x, y);
curPageContent.setFontAndSize( someFont, someSize );
// the actual work:
curPageContent.showText( stuff );
curPageContent.EndText();
// save the contents of curPageContent out to the file and reset it for the next page.
doc.newPage();
}
That's the bare minimum of work on the computer's part. Quite Efficient, and it'll result in a smaller PDF. Rather than having N copies of that page, with tweaks, you have one copy of that page that's reused on N pages, with little tweaks on top.
You could do the same thing, and use the "x,y" parameters in addTemplate to draw them all on the same page. Up to you.
PS: you'll need to figure out the coordinates for setTextMatrix in advance.
You could also use PDfCopy Or PDfSmartCopy to do this.
PdfReader reader = new PdfReader("Path\To\File");
Document doc = new Document();
PdfCopy copier = new PdfCopy(doc, ms1);
//PdfSmartCopy copier = new PdfSmartCopy(doc, ms1);
doc.Open();
copier.CloseStream = false;
PdfImportedPage inputPage = writer.GetImportedPage(reader, 1);
PdfContentByte curPageContent = writer.DirectContent;
for (int i = 0; i < count; i++)
{
copier.AddPage(inputPage);
}
doc.Close();
ms1.Flush();
ms1.Position = 0;
The difference between PdfCopy and PdfSmartCopy is that PdfCopy copies the entire PDF for each page, while PdfSmartCopy outputs a PDF that internally contains only one copy and all pages reference it, resulting in a smaller file and less bandwidth on a network, however it uses more memory on the server and takes longer to process.