How to hide one section in PDF using iText7

How to hide one section in PDF using iText7 - c#

Based on condition I need to hide one section and the section below should move above. So that while generating the PDF the hidden section should not show as blank.

Some clarification:
If you are doing this with an existing pdf, it is not likely to work. Pdf documents are not WYSIWYG format. Think of them more as containers of drawing-instructions than as containers of text.
Moving a section of an existing document will not work because:
the document itself contains no information on what instructions go together to make up lines, paragraphs, and sections
the document uses compression and byte-offsets, moving or deleting part of it would imply that you need to re-calculate all the byte-offsets
If you drop the requirement of re-flowing the text, it is certainly possible. iText already has an add-on for that called pdfSweep which look at all the drawing and rendering operations and removes the ones that intersect with a given rectangle (or adjusts them, for instance when a path goes through the rectangle)
If you are generating the pdf, this is of course trivial. You can simply do something like:
File outputFile = new File(System.getProperty("user.home"),"output.pdf");
PdfDocument pdfDocument = new PdfDocument(new PdfWriter(outputFile));
Document layoutDocument = new Document(pdfDocument);
if(some_condition)
{
layoutDocument.add(new Paragraph("Lorem Ipsum Dolor Sit Amet"));
}
layoutDocument.add(new Paragraph("Never gonna give you up. Never gonna let you down."));
Check out http://itextpdf.com/itext7/pdfsweep

Related

How to change redaction text of a redact annotation created in Adobe Acrobat

Update: 2021-01-15 - Added Bounty
I am trying to alter the redaction annotation to change the underlying text that gets burned into a PDF when you apply redactions. In Acrobat, you can set up a collection of "redaction codes" that can be used to identify why you are marking something as redacted. My goal is to overwrite what was selected by the user with a system defined value. The code will be ran prior to the redactions being applied.
In my attempts, I have discovered that the "preview" that is available in Acrobat products when hovering your cursor over a redact box is unique to Acrobat, and most other viewers won't show the preview. It also seems like the preview is maintained separately from the actual redaction that is applied. I don't need to alter the text that is shown in the preview, just what is shown after redactions are applied.
I have added a bounty of 150 reputation, as I don't think that I will be able to work out a solution on my own. My original question specified iText7, as that was the library that got me the closest in my own attempts. While I would prefer to use iText7, I will also consider solutions using other libraries that I can reasonably access (I do have a small budget that I could use to purchase another library, if I need to).
I've kept my original question and the follow-up with what I've personally tried below. I appreciate any help offered.
If you need a sample to test with, this DropBox folder has a file called 01 - Original.pdf that you can use as the source document. The desired result is to be able to change the text that appears when applying redactions from "Original Overlay Text" to any other value, such as "New Text".
Original Question:
I am trying to alter the text contained within every redaction annotation in a PDF, using iText7. The PdfRedactAnnotation object has a method called SetOverlayText() that looks like it should do what I want. So, I wrote a method that opens a PDF, loops through the pages, then loops through the annotations on each page, and checks if an annotation is a PdfRedactAnnotation. If it is, it calls SetOverlayText().
When debugging and looking at the annotation properties, I can see that the OverlayText has definitely changed. When I open the file and check the overlay text by hovering over a redaction marking with my cursor, however, the original overlay text is still there.
Additionally, if I apply the redactions, the original overlay text is what gets burned into the page.
However, when I right-click on the annotation (before applying redactions), the overlay text immediately gets updated to the new text:
At this point, when I apply redactions, it's the new text that is burned into the PDF.
Is there any way that I can trigger the Redaction Annotation update programmatically, without having to open and right-click on every one? I've included my code below. Thank you for any advice anyone might be able to offer.
PdfDocument pdfDoc = new PdfDocument(new PdfReader(#"C:\temp\Test - Original.pdf"), new PdfWriter(#"C:\temp\Test - Output.pdf"));
Document doc = new Document(pdfDoc);
int pageCount = pdfDoc.GetNumberOfPages();
for (int i = 1; i <= pageCount; i++)
{
var annotations = pdfDoc.GetPage(i).GetAnnotations();
foreach(var annotation in annotations)
{
if (annotation is PdfRedactAnnotation)
{
PdfRedactAnnotation redact = (PdfRedactAnnotation)annotation;
redact.SetOverlayText(new PdfString("New Text"));
}
}
}
doc.Close();
Update: Findings as of 2021-01-07
As #mkl's answer points out, the PDF Redact Annotation Specification clarifies the underlying redact annotation DOM entries. OverlayText is just one part of the equation. If you use OverlayText then there must be a DA element defined (DA is a string that provides formatting info for the OverlayText). Finally, if RO is defined, it supersedes pretty much all of the other independent display entries.
My testing document was made using Acrobat DC Pro, by manually adding a redaction in Acrobat. Doing this results in a Redact annotation with all of the above entries set. Copies of my test documents can be found in this DropBox folder.
(Side note: In my original question, I mention hovering over the redaction's red rectangle in order to preview what the applied redaction will look like... After testing in multiple browsers and other PDF Viewers like Foxit Reader, it looks like the function to 'preview' what the redaction will look like when applied by hovering your mouse over the red outline is only supported in Acrobat products. All other viewers tested will only show the red border, with nothing occurring when you hover your cursor over it. The black rectangles shown above can only be viewed in other programs after redactions have been applied.
Additional testing has shown that the hover-over preview is maintained separately from the redaction details itself, with Acrobat operating to try to keep the hover-over details in-sync with the underlying annotation. It is best to ignore the hover-over preview when testing, and refer to the results after applying redactions.)
#mkl's recommendation to remove the RO entry in order to try to let the OverlayText take priority was a good idea, but it unfortunately didn't work. There was no notable difference from my original results.
After poking around in iText7's PdfRedactAnnotation, I found that the following methods all result in a reference to the Redact object's RO entry:
PdfRedactAnnotation redact = (PdfRedactAnnotation)annotation;
redact.GetRolloverAppearanceObject();
redact.GetRedactionRolloverAppearance();
redact.GetPdfObject().Get(PdfName.RO);
redact.GetAppearanceDictionary().Get(PdfName.R);
(I confirmed they are in fact the exact same reference by checking the equality comparator. As reference types, they all returned true when tested using ==).
On further testing, I have concluded that the RO property must have a copy of the same OverlayText stored internally. If you have two redactions with different original values, you can "copy" the RO element from one redaction to another:
PdfObject ro = firstRedact.GetPdfObject().Get(PdfName.RO);
secondRedact.GetPdfObject().Put(PdfName.RO, ro);
If you do this and apply redactions, the "overlay text" from the first redact will have replaced the "overlay text" in the second. The other RO element values are also copied (such as BBox, which defines the black rectangle's dimensions)... but at least those elements can be adjusted.
The problem remains that the iText7 PdfObject of RO has 7 sub elements, and none of them or their descendant elements appear to expose the text that I'm trying to change.
My final test was whether I could copy RO elements from one PDF to another (so that I could use a second source PDF with an annotation with the desired RO "overlay text" already configured), but it looks like indirect objects don't like being .Put() into other documents.
So now, I'm left with trying to either find a way to access/alter the text stored away in RO, or to clone a preconfigured RO from another document.

What does the specification say?
The OverlayText entry of redaction annotations is specified as
Key
Type
Value
OverlayText
text string
(Optional) A text string specifying the overlay text that should be drawn over the redacted region after the affected content has been removed. This entry is ignored if the RO entry is present.
(ISO 32000-2, Table 195 — Additional entries specific to a redaction annotation)
Maybe in your source PDF the redaction annotation has a RO taking precedence.
Furthermore, that table says this concerning the DA entry:
Key
Type
Value
DA
byte string
(Required if OverlayText is present, ignored otherwise) The appearance string that shall be used in formatting the overlay text when it is drawn after the affected content has been removed (see 12.7.4.3, "Variable text"). This entry is ignored if the RO entry is present.
If you use OverlayText, therefore, you also have to make sure the DA default appearance string is set. Did you?
The RO entry in the same table is specified as
Key
Type
Value
RO
stream
(Optional) A form XObject specifying the overlay appearance for this redaction annotation. After this redaction is applied and the affected content has been removed, the overlay appearance should be drawn such that its origin lines up with the lower-left corner of the annotation rectangle. This form XObject is not necessarily related to other annotation appearances, and may or may not be present in the AP dictionary. This entry takes precedence over the IC, OverlayText, DA, and Q entries.
So what to do now?
According to the details posted above, one obvious option to proceed is to create a redaction overlay XObject (RO) for the changed redaction annotations. You can do this by replacing your
if (annotation is PdfRedactAnnotation)
{
PdfRedactAnnotation redact = (PdfRedactAnnotation)annotation;
redact.SetOverlayText(new PdfString("New Text"));
}
by
if (annotation is PdfRedactAnnotation)
{
PdfRedactAnnotation redact = (PdfRedactAnnotation)annotation;
redact.SetOverlayText(new PdfString("New Text"));
Rectangle rectangle = redact.GetRectangle().ToRectangle();
PdfStream stream = redact.GetRedactRolloverAppearance();
if (stream != null)
{
rectangle = stream.GetAsArray(PdfName.BBox).ToRectangle();
}
PdfFormXObject redactionOverlay = new PdfFormXObject(rectangle);
redactionOverlay.GetPdfObject().Put(PdfName.Matrix, new PdfArray(new double[] { 1, 0, 0, 1, -rectangle.GetX(), -rectangle.GetY() }));
using (Canvas canvas = new Canvas(redactionOverlay, pdfDocument))
{
PdfCanvas pdfCanvas = canvas.GetPdfCanvas();
pdfCanvas.SetFillColorGray(0);
pdfCanvas.Rectangle(rectangle);
pdfCanvas.Fill();
pdfCanvas.SetFillColorGray(1);
canvas.Add(new Paragraph("New Text"));
}
stream = redactionOverlay.GetPdfObject();
redact.SetRolloverAppearance(stream);
redact.SetDownAppearance(stream);
redact.SetRedactRolloverAppearance(stream);
}
The result after redacting in Acrobat:
By adapting the used fill colors and the paragraph style you can make the appearance correspond more closely to the Adobe Acrobat generated appearances (or you alternatively can generate a look completely of your own design).
Beware, I only have a fairly old Adobe Acrobat version available, v9.5, so probably current versions don't accept a redaction appearance as generated above or at least apply it differently.

I was able to change the redaction annotation overlay text and, upon redaction, have that text display correctly over the redacted block. I used the SyncFusion Essential PDF library that is included as a part of SyncFusion File Formats. (I am not affiliated with SyncFusion, though I do have a paid license to their File Formats libraries through my employer.) I tested with Adobe Acrobat Pro DC.
When I first attempted to replace the redaction overlay text, I ran into a similar issue with SyncFusion as the OP did with iText 7: the overlay would display as changed after running my code, but redaction would bring back the formerly replaced overlay text. As there was no way to change both the displayed text overlay and the overlay text accessible by the redaction process, I got around this issue by writing code that makes the desired changes, exports the PDF's annotations to a JSON file, deletes the PDF's annotations, and then imports the JSON file back into the PDF. This generates new annotations that have the same text value for both the text overlay and the redaction process (the redaction process overlay text, I believe, is generated as a result of the creation of the PDF annotation). This is the code using SyncFusion Essential PDF:
using System.Drawing;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf.Interactive;
using Syncfusion.Pdf.Parsing;
using Syncfusion.Pdf;
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(#"C:\Users\Joe\Desktop\Redact\MarkedOriginal.pdf");
PdfLoadedPage page = loadedDocument.Pages[0] as PdfLoadedPage;
foreach (PdfLoadedRedactionAnnotation redactionAnnotation in loadedDocument.Pages[0].Annotations)
{
PdfStandardFont font = new PdfStandardFont(PdfFontFamily.Helvetica, 10);
redactionAnnotation.Font = font;
redactionAnnotation.TextColor = Color.White;
redactionAnnotation.BorderColor = Color.Black; //See note in SO answer about this
redactionAnnotation.OverlayText = "New Text";
}
//Export, delete, and then import annotations to create a redaction annotation with the same preview and final redaction
loadedDocument.ExportAnnotations(#"C:\Users\Joe\Desktop\Redact\Output.json", AnnotationDataFormat.Json);
for (int i = 1; i <= loadedDocument.Pages[0].Annotations.Count; i++)
{
loadedDocument.Pages[0].Annotations.RemoveAt(i);
}
loadedDocument.ImportAnnotations(#"C:\Users\Joe\Desktop\Redact\Output.json", AnnotationDataFormat.Json);
loadedDocument.Save();
loadedDocument.Close(true);
If OP needs the border of the redaction marking boxes to be a color other than black, some more code will need to be written. I found that when I used redactionAnnotation.BorderColor = Color.Black; the redaction marking box looked as expected. However, when I used Color.Red or other colors, the border retained the black color with the new color also bordering the first redaction and only black bordering the second redaction in the file supplied by the OP. With further research, I suspect this can be remediated via SyncFusion, iText 7, or possibly by editing the JSON file's annotation defaultappearance line prior to importing the file back into the PDF. This is the defaultappearance line generated when I ran my code:
"defaultappearance": "1 1 1 RG 0 g 0 Tc 0 Tw 100 Tz 0 TL 0 Ts 0 Tr /Helv 10 Tf"
It's worth pointing out that SyncFusion has free and paid tiers for licensing their software. The SyncFusion Community License is, per SyncFusion, free for "companies and individuals with less than $1 million USD in annual gross revenue and 5 or fewer developers." The SyncFusion File Formats Developer License would cover everyone else.

How to add a footer as a watermark so that it can be removed later

I have some scanned PDF documents (pretty flat, no selectable text, tags, objects, etc) and I would like to add a footer that can also be removed after being added. However, if it overwrites on top of anything, I want to remove the footer only. We can assume that, after the watermark is added, it won't be rescanned, changed, or flattened. (I should mention, in case any iText employees see this question, that my organization has recently purchased a license but I just started this project and I am waiting to have it sent to me so I can register for official support.)
I found an excellent answer for adding and removing watermarks here: iText 7 - Add and Remove Watermark on a PDF . My problem, as stupid as it might sound, is I'm really struggling with getting the variables right, even after lots of trial and error. The scanned documents seem to be coming in as portrait (when viewed in a PDF viewer) but they have a rotation of 270 such that, PdfDocument.GetPage(i).GetPageSize() and GetPageSizeWithRotation() have the height and width reversed and I need to take this into account but also don't want to assume that this is always the case. The footer should be centered at the bottom of the page.
The method signature can be as in the link provided (https://stackoverflow.com/a/45225597):
public static void WatermarkPDF(string sourceFile, string destinationPath)
Thank you in advance for the help and support.
Okay, BIG TIME EDIT: requirements are changing. In fact, they want to be able to have 2 lines of text as a left aligned header and have the ability to remove or replace either or or both AND additionally, have a right aligned footer that also can be removed or replaced. Not sure anymore if this should be implemented as a watermark. Again, I can assume that, once I add the headers and/or footers, the document won't be reflattened or edited in any major way... so, if they are added as elements, they should be able to be removed as elements but the problem is the scanned documents have no structure to begin with anyway (at least they don't seem to so far). So, there's no parent element, tag, or whatever.

Continuous Labelling using iTextSharp PDF

I have working iTextsharp code for printing labels. Now the problem is, I have a requirement to print the labels in Continuous paper which i am not able to select in iTextsharp configuration (iTextSharp.text.PageSize.A4). Please advice how can i select the page size according to my current scenario.
Thanks

Your problem is related to PDF as a document format. In PDF, the content is distributed over different pages. You can define the size of such a page yourself. You mention iTextSharp.text.PageSize.A4, but you can define the page size as a Rectangle object yourself. See iTextsharp landscape document
If you want a long, narrow page, you could define the page size like this:
Document Doc = new Document(new Rectangle(595f, 14400f));
There are some implementation limits though. The maximum height or width of a page is 14,400 user units. See the blog post Help, I only see blank pages in my PDF!
However, I am pretty sure that you don't want to create a long narrow page. If you want to print labels on "continuous paper", you want to create a PDF document in which every page has the size of exactly one label. Your PDF will have as many pages as there are labels.
Suppose that the size of one label is 5 by 2 inch (width: 12.7cm; height: 5.08cm), then you should create a document like this:
Document Doc = new Document(new Rectangle(360, 144));
And you should make sure that all the content of a label fits on a single page. Your label printer should know that each page in the PDF should be printed on a separate label.
(Thank you #amedeeVanGasse for correcting my initial answer.)

iText7, Is there a way to control carriage return inside rectangle?

I'm using iText 7 with C# and I have to write a long line.
With canvas.BeginText().ShowText("My text") I can't find a way to make the text pass on a second line, \n is not recognized.
So I used rectangles and document renderer but I have the same problem, I can't control where I want my text to create a new line.
I use an existing PDF (a model) where I have to write some texts (as short as a single line) and some paragraphs (composed of several lines). Those elements are defined in an xml where I can have some carriage returns to delimit new lines in paragraphs. Is short, the document is composed dynamically and it's content and element's placement are defined inside an xml file.

Adding content with low-level methods such as BeginText(), ShowText(), EndText() and so on, requires a sound knowledge of the PDF specification (ISO 32000). The fact that you are surprised at the fact that \n is ignored tells me that you aren't that well versed in PDF.
iText was written for people who don't want to deal with the low-level syntax of PDF. For instance: if you want to add text inside a rectangle with iText, you just have to create a Canvas object to which you pass a Rectangle object:
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfPage page = pdf.AddNewPage();
PdfCanvas pdfCanvas = new PdfCanvas(page);
Rectangle rectangle = new Rectangle(36, 650, 100, 100);
Canvas canvas = new Canvas(pdfCanvas, pdf, rectangle);
PdfFont font = PdfFontFactory.CreateFont(FontConstants.TIMES_ROMAN);
PdfFont bold = PdfFontFactory.CreateFont(FontConstants.TIMES_BOLD);
Text title =
new Text("The Strange Case of Dr. Jekyll and Mr. Hyde").SetFont(bold);
Text author = new Text("Robert Louis Stevenson").SetFont(font);
Paragraph p = new Paragraph().Add(title).Add(" by ").Add(author);
canvas.Add(p);
pdf.Close();
This example can be found in chapter 2 of the online iText 7 tutorial.
The screenshot shows how a long sentence was added inside a Rectangle, and how that sentence got distributed over different lines (introducing new lines automatically). The concept of the \n character doesn't exist in PDF (check ISO 32000 when in doubt). If you want to introduce a newline, it's sufficient to put one part of the content in one Paragraph and the other part in another Paragraph.

iTextSharp - Copying elements from one PDF to another

I want to copy certain elements from one PDF to another using iTextSharp.
I want to read one PDF, read text elements from that and correct them and create a new PDF using the updated text elements and all the images etc. from the first PDF.
Please help me how this can be achieved.

This task is very complex. I wrote a program to do this for a large greeting card maker.
First you have to locate the text and calculate the glyph bounding boxes. Next you have to modify the contents stream to remove the text. The text may be broken into many pieces depending on the PDF creator. You have to remove those operators from the contents stream and adjust the CTM because some operators use relative positioning. Finally, you have to insert the replacement text, matching the original text's style (font, size, color, orientation, etc.)
As for copying elements from one PDF to another, most of the steps above are required plus you have to copy resources, eg. fonts, colorspaces, patterns, etc, to the new PDF.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.