I'm developping an utility that take tab incremented text file as an input to create a bookmark tree (aka outlines) in an existing PDF file, using iText7.
Obvisously this is not the real code, but this is basically how I build the tree:
PdfReader reader = new PdfReader(srcFilePath);
PdfWriter writer = new PdfWriter(targetFilePath);
PdfDocument pdfDoc = new PdfDocument(reader, writer);
PdfOutline rootOutline = pdfDoc.GetOutlines(false);
PdfOutline mainTitleOutline;
(mainTitleOutline = rootOutline.AddOutline("Title 1")).AddDestination(PdfExplicitDestination.CreateFit(pdfDoc.GetPage(1)));
mainTitleOutline.AddOutline("Sub title 1.1").AddDestination(PdfExplicitDestination.CreateFit(pdfDoc.GetPage(2)));
mainTitleOutline.AddOutline("Sub title 1.2").AddDestination(PdfExplicitDestination.CreateFit(pdfDoc.GetPage(3)));
(mainTitleOutline = rootOutline.AddOutline("Title 2")).AddDestination(PdfExplicitDestination.CreateFit(pdfDoc.GetPage(4)));
mainTitleOutline.AddOutline("Sub title 2.1").AddDestination(PdfExplicitDestination.CreateFit(pdfDoc.GetPage(5)));
mainTitleOutline.AddOutline("Sub title 2.2").AddDestination(PdfExplicitDestination.CreateFit(pdfDoc.GetPage(6)));
pdfDoc.Close();
This works pretty well when the PDF doesn't already have any bookmark, but when there are (pdfDoc.GetOutlines(false).GetAllChildren().Count > 0), I'd like to delete the whole tree before hand (hence overwrite them), because if I don't, I end up ADDING the new outlines to the old ones.
Is there a way to do it?
This piece of convenient API is indeed something that is missing now but you can still do it on a low level with one line of code:
pdfDocument.GetCatalog().GetPdfObject().Remove(PdfName.Outlines);
Just make sure to remove the outlines before you access them first time, i.e.:
PdfReader reader = new PdfReader(srcFilePath);
PdfWriter writer = new PdfWriter(targetFilePath);
PdfDocument pdfDoc = new PdfDocument(reader, writer);
// Remove outlines before getting PdfOutline object by calling GetOutlines
pdfDocument.GetCatalog().GetPdfObject().Remove(PdfName.Outlines);
PdfOutline rootOutline = pdfDoc.GetOutlines(false);
Related
I have one pdf template, which I try to override in new document. I need to get some fields from template PDF and re-write their values in new PDF. I do this :
PdfReader templatereader = new PdfReader("Templates//PDF_Template_Empty.pdf");
PdfDocument template = new PdfDocument(templatereader);
var writer = new PdfWriter(OutputFilepath);
PdfDocument newreport = new PdfDocument(writer);
var fields = PdfAcroForm.GetAcroForm(template, true); //!!!
But then i catch the Exception :
iText.Kernel.PdfException: 'There is no associate PdfWriter for making
indirects.'
What i am doing wrong and how to fix it ? iTextsharp 7.
The error explains exactly what goes wrong: you didn't define a PdfWriter instance for the PdfDocument instance named template. You create a PdfWriter instance for newreport, but you never use newreport.
This is how it should be done:
PdfReader templatereader = new PdfReader("Templates//PDF_Template_Empty.pdf");
var writer = new PdfWriter(OutputFilepath);
PdfDocument template = new PdfDocument(templatereader, writer);
var fields = PdfAcroForm.GetAcroForm(template, true);
As you can see, there is no need for the newreport instance. The template instance takes the templatereader as input and will create a new PDF as output using the writer.
Is it possible to modify/remove the creation date in metadata? I'm looking to do something similar to this:
Overwrite creationDate in pdf using iText and pdf writer
EDIT:
I have tried the following methods:
writer.Info.Remove(PdfName.CREATIONDATE);
or
writer.Info.Put(PdfName.CREATIONDATE, new PdfDate(new DateTime(2017, 01, 01)));
where writer is a PdfWriter object.
However, that creates a copy of the object (a PdfDictionary) and doesn't modify the PDF I'm creating.
I also can't assign i.e. writer.Info = info
I tried following the advice given in the Java article.
I tried to do this:
var info = writer.Info;
stamper.MoreInfo = info
where stamper is a PdfStamper
But the types are incompatible and I don't think this would work. Does anyone know the actual methods to remove/modify the metadata?
EDIT 2:
Here is the code, I'm creating a new file from an existing PDF.
var filename = #"C:\Users\Someone\Documents\aPdf.pdf";
using( var output = new MemoryStream() )
{
Document document = new Document();
PdfCopy writer = new PdfCopy( document, output );
writer.CloseStream = false;
document.Open();
//read in PDF
PdfReader reader = new PdfReader(filename);
reader.ConsolidateNamedDestinations();
PdfImportedPage page = writer.GetImportedPage(reader, 1);
writer.AddPage(page);
reader.Close();
writer.Close();
document.Close();
return output.ToArray();
}
Now, when I open the file with a text editor this line is inserted (I need it constant/gone):
<</Producer(iTextSharp’ 5.5.12 ©2000-2017 iText Group NV \(AGPL-version\))/CreationDate(D:20180412155130+01'00')/ModDate(D:20180412155130+01'00')>>
The reason why we need to remove/set the date is that we're taking the MD5 hash of the file. Every time a new document is generated, that line changes leading to different MD5 hashes.
As I was trying to get a constant MD5 checksum for the generated file, I had to also set the ID constant, as mentioned by mkl.
My solution was to search byte array produced (i.e. the created PDF), and manually set the values to constants. The text is ASCII chars. I removed the /CreationDate and /ModifiedDated from the PDF entirely, and set the generated ID to a constant arbitrary value.
I am trying to copy one page from an existing .pdf file and paste it to a new document like this:
using (var writer = new PdfWriter(OutputFile))
{
var reader = new PdfReader("Templates//PDF_Template_Empty.pdf");
PdfDocument template = new PdfDocument(reader);
var titlepage = template.GetPage(1);
using (var pdf = new PdfDocument(writer))
{
pdf.AddPage(titlepage); // exception
But on .AddPage() it throws this exception :
iText.Kernel.PdfException: 'Page iText.Kernel.Pdf.PdfPage cannot be
added to document iText.Kernel.Pdf.PdfDocument, because it belongs to
document iText.Kernel.Pdf.PdfDocument.'
How can I fix this ?
A PDF page object usually has a number of related objects. If you only add the page itself to a new document and not the related objects, the resulting page would be incomplete.
Thus, iText 7 checks in AddPage whether the page in question has been created inside the target document or not, and in the latter case throws an exception to prevent missing dependent objects.
To copy pages across documents there is the PdfDocument method CopyPagesTo with many overloads. For you e.g.
PdfDocument template = new PdfDocument(reader);
using (var pdf = new PdfDocument(writer))
{
// copy template pages 1..1 to pdf as target page 1 onwards
template.CopyPagesTo(1, 1, pdf, 1);
}
(Beware, if there are extras on the page, you might want to choose an overload of that method which accepts an additional IPdfPageExtraCopier instance, e.g. for AcroForm fields a PdfPageFormCopier.)
I'm using IText7 version 7.0.2.2, I'm new with it, I'm trying to merge several pdfs at the same time into one that I'm uploading first, that is working fine, the problem is when I try dynamically to insert some text in one of the pdfs and then merge it, I'm using PdfWriter to write some content into the pdf and then try to merge it, but I'm getting this exception: 'Cannot copy indirect object from the document that is being written.
This is some of the code I'm using:
private byte[] MergePdfForms( HttpPostedFileBase firstPdf, List<SectionAndPdfs> sectionsAndPdf)
{
var dest = new MemoryStream();
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfMerger merger = new PdfMerger(pdf);
firstSourcePdf = new PdfDocument(new PdfReader(keyValuePair.Value), new PdfWriter(dest));
Document document = new Document(firstSourcePdf);
document.Add(new Paragraph(sectionsAndPdf[i].Key).SetBackgroundColor(iText.Kernel.Colors.Color.GRAY));
merger.Merge(firstSourcePdf, 1, subPages); //I'm getting the exception here..
firstSourcePdf.Close();
}
This is a known bug in the class PdfDestination. It was fixed, and will be present in our next release. At the moment you can of course use the snapshot release, which should solve the problem.
I need to remove the first few pages of a PDF file. Apparently, the easiest way to do that is to create a copy of it and not duplicate the unwanted pages. This works, but they look a lot smaller than they should. Any ideas?
How it should look
How it actually looks
private static void ClipSpecificPDF(string input, string output, int pagesToCut)
{
PdfReader myReader = new PdfReader(input);
using (FileStream fs = new FileStream(output, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document doc = new Document())
{
using (PdfWriter myWriter = PdfWriter.GetInstance(doc, fs))
{
//Open the desitination for writing
doc.Open();
//Loop through each page that we want to keep
for (int i = pagesToCut; i < myReader.NumberOfPages; i++)
{
//Add a new blank page to destination document
var PS = myReader.GetPageSizeWithRotation(i);
myWriter.SetPageSize(PS);
doc.NewPage();
//Extract the given page from our reader and add it directly to the destination PDF
myWriter.DirectContent.AddTemplate(myWriter.GetImportedPage(myReader, i + 1), 0, 0);
}
//Close our document
doc.Close();
}
}
}
}
The problem you describe is explained in the FAQ. For instance in the answer to the questions:
How to merge documents correctly?
Why does the function to concatenate / merge PDFs cause issues in some cases?
Using PdfWriter to manipulate PDF documents is a very bad idea. Read chapter 6 of my book to discover why this is a bad idea, and take a look at Table 6.1 to find out which class is a better fit.
In the same chapter, you'll find the SelectPages example. Suppose that you want to create a new PDF containing only page 4 to 8. In that case, you simply use the SelectPages() method and PdfStamper:
PdfReader reader = new PdfReader(src);
reader.SelectPages("4-8");
PdfStamper stamper = new PdfStamper(reader, new FileStream(dest, FileMode.Create, FileAccess.Write));
stamper.Close();
reader.Close();
By using PdfReader, the page size is preserved, as well as any of the interactive features that may be present.
Your approach is bad because you do not respect the original page size: you copy a document with letter (?) format to a document with A4 pages. If the origin of the page doesn't correspond with the lower-left corner, parts of your document will be invisible. If there are interactive features in your PDF, they will be lost. Of all the possible examples you could have followed, you picked the worst one...