iTextSharp - Change order of Optional content groups - c#

I have a PDF file with a hierarchy of layers (aka OCG). Using the following code snippet
var ocProps = reader.Catalog.GetAsDict(PdfName.OCPROPERTIES);
var occd = ocProps.GetAsDict(PdfName.D);
var order = occd.GetAsArray(PdfName.ORDER);
I can query the current order from the source file. But I have no idea how to modify this data in order to copy it into a new file with the following snippet.
var reader = new PdfReader(input);
var document = new Document(reader.GetPageSizeWithRotation(1));
var pdfCopyProvider = new PdfCopy(document,
new System.IO.FileStream(output, System.IO.FileMode.Create));
document.Open();
// TBD do OCG modification ...
var importedPage = pdfCopyProvider.GetImportedPage(reader, 1);
pdfCopyProvider.AddPage(importedPage);
document.Close();
Nonetheless, the ocg information is copied to the new pdf file by default. I saw a comment from Bruno Lowagie several weeks ago concerning merging of ocgs https://stackoverflow.com/questions/21573892/itextsharp-merge-impose-pdfs-while-maintaining-layers-optional-content-groups but I'm not sure whether this includes simple copying also.
Any hint on this is welcome. Merging of ocgs might be a topic in future so hints on that topic are welcome, too
Regards,
Holger
Added: I'm using most recent version 5.5.0.0
Added:
In addition to Bruno's answer, I publish the C# version of the manipulatePdf method
public void ManipulatePdf(string source, string destination)
{
var reader = new PdfReader(source);
var ocProps = reader.Catalog.GetAsDict(PdfName.OCPROPERTIES);
var occd = ocProps.GetAsDict(PdfName.D);
var order = occd.GetAsArray(PdfName.ORDER);
var nestedLayers = (PdfObject)order[0];
var nestedLayerArray = (PdfObject)order[1];
var groupedLayers = (PdfObject)order[2];
var radiogroup = (PdfObject)order[3];
order[0] = radiogroup;
order[1] = nestedLayers;
order[2] = nestedLayerArray;
order[3] = groupedLayers;
var stamper = new PdfStamper(reader, new System.IO.FileStream(destination, System.IO.FileMode.Create));
stamper.Close();
reader.Close();
}

You're already very close to the solution. See the ChangeOCGOrder to find out how to change ocg.pdf into ocg_reordered.pdf. (The code is in Java, but you shouldn't have any trouble porting it to... VB.NET?)
You already had something like this:
PdfDictionary catalog = reader.getCatalog();
PdfDictionary ocProps = catalog.getAsDict(PdfName.OCPROPERTIES);
PdfDictionary occd = ocProps.getAsDict(PdfName.D);
PdfArray order = occd.getAsArray(PdfName.ORDER);
This is good: you're looking at the right place!
Now you need something like this:
PdfObject nestedLayers = order.getPdfObject(0);
PdfObject nestedLayerArray = order.getPdfObject(1);
PdfObject groupedLayers = order.getPdfObject(2);
PdfObject radiogroup = order.getPdfObject(3);
order.set(0, radiogroup);
order.set(1, nestedLayers);
order.set(2, nestedLayerArray);
order.set(3, groupedLayers);
In my example, the ORDER array contains 4 elements. I get these four elements, and I change the order of the entries in the original array.
Note that I could also have done something like:
order.addFirst(order.remove(3));
That would have the same effect as the 8 lines of code above, but the 8 lines help you understand the mechanism.

Related

iText Acrofields Are Empty, form.getAsArray(PdfName.FIELDS) is also null

I am using iText7.NET. A third party has provided PDF's with fields, the fields are present and Adobe Acrobat seems to have no issues opening and displaying the PDF, but in iText the fields collection is empty.
I've seen the answer at ItextSharp - Acrofields are empty and the related knowledge-base articles on iText's site, but the fix does not work in my case, as form.getAsArray(PdfName.FIELDS) returns null, so it cannot be added to.
Also I've checked for Xfa and that does not seem to present
XfaForm xfa = form.GetXfaForm();
xfa.IsXfaPresent() // returns false
Is it possible to add PdfName.FIELDS to the document and then populate?
Thank You
So I think I have figured out what causes the issue and have a short term fix for my particular case. In this document some fields are sub type "Link", not "Widget" and the fix code I was using (based on link above which most likely came from here https://kb.itextsupport.com/home/it7kb/faq/why-are-the-acrofields-in-my-document-empty) will fail. My fix is is to skip sub type link, although perhaps a better solution exists that would not skip Links, which I don't need.
If I don't skip Links, when the saved PDF is loaded again it fails on
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdfDoc, true);
In the lower level code in itext.forms, IterateFields() is called and within that it passes formField.GetParent() as a parameter to
PdfFormField.MakeFormField, GetParent() returns null for the Link fields so there is an exception.
Below is the RUPS hierarchy to the first subtype Link field that causes a problem
So the solution at the moment to fix my particular issue is to skip sub type links. The code is as follows
PdfReader reader = new PdfReader(pdf);
MemoryStream dest = new MemoryStream();
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdfDoc = new PdfDocument(reader, writer);
PdfCatalog root = pdfDoc.GetCatalog();
PdfDictionary form = root.GetPdfObject().GetAsDictionary(PdfName.AcroForm);
PdfArray fields = form.GetAsArray(PdfName.Fields);
if (fields == null)
{
form.Put(PdfName.Fields, new PdfArray());
fields = form.GetAsArray(PdfName.Fields);
}
for (int i = 1; i <= pdfDoc.GetNumberOfPages(); i++)
{
PdfPage page = pdfDoc.GetPage(i);
var annots = page.GetAnnotations();
for (int j = 0; j < annots.Count(); j++)
{
PdfObject o = annots[j].GetPdfObject();
PdfDictionary m = o as PdfDictionary;
string subType = m?.GetAsName(PdfName.Subtype)?.GetValue() ?? "";
if (subType != "Link")
{
fields.Add(o);
fields.SetModified();
}
}
}
pdfDoc.Close();

C# - Read the content of PDF(form based) in the form of text [duplicate]

Good Morning,
I don't know, how can i read the field name form below pdf.
I used all methods for AcroFields, but all methods returns 0 or null
http://www.finanse.mf.gov.pl/documents/766655/1481810/PIT-8C(7)_v1-0E.pdf
my code:
try {
PdfReader.unethicalreading = true;
PdfReader reader = new PdfReader(new FileInputStream("/root/TestPit8/web/notmod.pdf"));
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("/root/TestPit8/web/testpdf.pdf"));
AcroFields form = stamper.getAcroFields();
form.setField("text_1", "666");
form.setField("text_2", "666");
form.setField("text_3", "666");
form.setFieldProperty("text_3", "clrfflags", TextField.PASSWORD, null);
form.setFieldProperty("text_3", "setflags", PdfAnnotation.FLAGS_PRINT, null);
form.setField("text_3", "12345678", "xxxxxxxx");
form.setFieldProperty("text_4", "textsize", new Float(12), null);
form.regenerateField("text_4");
stamper.close();
reader.close();
} catch( Exception ex) {
ex.printStackTrace();
}
Thx forhelp
The form you share is a pure XFA form. XFA stands for the XML Forms Architecture.
Please read The Best iText Questions on StackOverflow and scroll to the section entitled "Interactive forms".
These are the first two questions of this section:
How to fill out a pdf file programmatically? (AcroForm
technology)
How to fill out a pdf file programmatically? (Dynamic
XFA)
You are filling out the form as if it were based on AcroForm technology. That isn't supposed to work, is it? Your form is an XFA form!
Filling out an XFA form is explained in my book, in the XfaMovies example:
public void manipulatePdf(String src, String xml, String dest)
throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader,
new FileOutputStream(dest));
AcroFields form = stamper.getAcroFields();
XfaForm xfa = form.getXfa();
xfa.fillXfaForm(new FileInputStream(xml));
stamper.close();
reader.close();
}
In this case, src is a path to the original form, xml is a path to the XML data, and dest is the path of the filled out form.
If you want to read the data, you need the XfaMovie example:
This reads the full form (all the XFA):
public void readXfa(String src, String dest)
throws IOException, ParserConfigurationException, SAXException,
TransformerFactoryConfigurationError, TransformerException {
FileOutputStream os = new FileOutputStream(dest);
PdfReader reader = new PdfReader(src);
XfaForm xfa = new XfaForm(reader);
Document doc = xfa.getDomDocument();
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.transform(new DOMSource(doc), new StreamResult(os));
reader.close();
}
If you only want the data, you need to examine the datasets node:
public void readData(String src, String dest)
throws IOException, ParserConfigurationException, SAXException,
TransformerFactoryConfigurationError, TransformerException {
FileOutputStream os = new FileOutputStream(dest);
PdfReader reader = new PdfReader(src);
XfaForm xfa = new XfaForm(reader);
Node node = xfa.getDatasetsNode();
NodeList list = node.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
if("data".equals(list.item(i).getLocalName())) {
node = list.item(i);
break;
}
}
list = node.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
if("movies".equals(list.item(i).getLocalName())) {
node = list.item(i);
break;
}
}
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.transform(new DOMSource(node), new StreamResult(os));
reader.close();
}
Note that I don't understand why you think there are fields such as text_1, text_2 in the form. XFA fields are easy to recognize because the contain plenty of [] characters.
Also: from the screenshot below (taken with iText RUPS), it is clear that there are no such fields in the form:
The tools are there on the iText web site. The documentation is there. Please use it!
Update:
So... instead of accepting my comprehensive answer, you decided to post a comment asking me to do your work in your place by asking where I can find example code? in spite of the fact that I provided links to XfaMovie and XfaMovies.
Well, here are two new examples for you:
ReadXFA takes xfa_form_poland.pdf and reads the data with data.xml as result.
FillXFA2 takes xfa_form_poland.pdf and fills it out with xfa_form_poland.xml resulting in xfa_form_poland_filled.pdf
Of course: I don't understand Polish, so I didn't always fill out the correct values, but now at least you have no longer a reason to ask where I can find example code?
Update 2:
In an extra comment, you claim that you can't find the NIP number (number 10 in the form) anywhere in the data structure.
This means either that you haven't examined data.xml, or that you don't understand XML.
Allow me to show the relevant part of the XML that contains the NIP number:
<Deklaracja xmlns="http://crd.gov.pl/wzor/2014/12/05/1880/" xmlns:etd="http://crd.gov.pl/xml/schematy/dziedzinowe/mf/2011/06/21/eD/DefinicjeTypy/">
....
<Podmiot2 rola="Podatnik">
<etd:OsobaFizyczna>
<etd:NIP>0123456789</etd:NIP>
<etd:ImiePierwsze>JUST TRY</etd:ImiePierwsze>
<etd:Nazwisko>DUDE</etd:Nazwisko>
<etd:DataUrodzenia>2015-02-19</etd:DataUrodzenia>
</etd:OsobaFizyczna>
</Podmiot2>
...
</Deklaracja>
In other words, the field name you're looking for is probably something like this: Deklaracja[0].Podmiot2[0].OsobaFizyczna[0].NIP[0] (whatever these words may mean, I only know one Polish word: Podpis).

Add picture in header to a .docx using novacode DocX

Header header_default = doc.Headers.first;
Paragraph p1 = header_default.InsertParagraph();
I've tried to add a picture in my header in a word file. I tried it with:
p1.AppendPicture(picture);
And also with a table:
Table t1 = header_default.InsertTable(10,2);
t1.Pictures.Add(picture);
Well the big problem is that the code never run to this place it's always crashing by inserting a paragraph to the header:
Paragraph p1 = header_default.InsertParagraph();
or
Table t1 = header_default.InsertTable(10,2);
Error: System.NullReferenceException
I'm new in .net and docx library hope someone can help me with the prblem
This is how I do it, notice I use Doc.Headers.odd rather than first
Doc.AddHeaders();
var headerDefault = Doc.Headers.odd;
var headlineFormat = GetTopHeadlineFormat();
var logo = System.Drawing.Image.FromFile(AppSettings.MulalleyLogoSmall);
using (var ms = new MemoryStream())
{
logo.Save(ms, logo.RawFormat);
ms.Seek(0, SeekOrigin.Begin);
var img = Doc.AddImage(ms);
var pic1 = img.CreatePicture();
var p = headerDefault.InsertParagraph();
p.InsertPicture(pic1);
p.InsertParagraphBeforeSelf(Doc.InsertParagraph());
}

iTextSharp Add Watermark Only If it Doesn't Already Exist

Does anyone know if there is a way to check for a watermark on a PDF document using iTextSharp?
I want to do this before adding a new one. In my case, I have to add a new watermark if it wasn't already added by someone, but I don't know how to check this using iTextSharp's PdfReader class.
Something like this:
var reader = new PdfReader(bytes);
var stamper = new PdfStamper(reader, ms);
var dc = stamper.GetOverContent(pageNumber);
bool alreadyStamped = cd.CheckIfTextOrImageExists();
After some investigation thanks to the #ChrisHaas comment I was able to achieve that verification. So, if text is present on the particular page, I can find it using SimpleTextExtractionStrategy, even if it's in the WaterMark collection.
PdfReader pdfReader = new PdfReader(bytes);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
if (currentPageText.Contains(searthText))
{
// adding new WaterMark here
Console.WriteLine("text was found on page "+i);
}
}
pdfReader.Close();
Hopefully, this approach helps someone, who got a similar issue.

Change object's color inside existiong PDF with iTextSharp

Major part of my job is automation of engineering process, so I have to create simple program, that compares 2 different version of 1 drawn element, by overlapping drawings, in order to review differences. Drawings represent single sheet PDF files.
I'm using .Net Framework and C# 4.5;
iTextSharp library for editing PDF files;
Initially, I'm getting 2 files, read them and create the third one, that contains the result;
var file1 = "file1.pdf";
var file2 = "file2.pdf";
var result = "result.pdf";
using (Stream f1Stream = new FileStream(file1, FileMode.Open))
using (Stream f2Stream = new FileStream(file2, FileMode.Open))
using (Stream resultStream = new FileStream(result, FileMode.Create, FileAccess.ReadWrite))
using (PdfReader f2Reader = new PdfReader(f2Stream))
using (PdfReader f1Reader = new PdfReader(f1Stream))
{
PdfStamper pdfStamper = new PdfStamper(f1Reader, resultStream);
PdfContentByte pdfContentByte = pdfStamper.GetOverContent(1);
var page = pdfStamper.GetImportedPage(f2Reader, 1);
pdfContentByte.AddTemplate(page,2,2);
pdfStamper.Close();
}
The code above makes just that, but a few sequential questions are arising
I want to change the color of elements in the result file i.e. elements that come from the 1st drawing in green and the others from 2nd one - in red color. Maybe I have to change the color of entities in initial 2 PDFs and then to merge;
Initial files have layers, and because they are two sequential revision of the same construction element and differences between them are very few, they have identical layers. And I want to have " layerFoo " and " layerFoo# " in the result PDF. Maybe I have to rename all the layers in one the the 2 initial PDFs and then to merge them.
Аll suggestions are welcomed including usage of another library :)
--> Edit1
Big thanks to Chris Haas! You are absolutely right for token type and string value! iTextRUPS is great helping tool for understanding the structure of PDF files.
Following code is taken from the post that you pointed me out.
The following statement:
stream.SetData(System.Text.Encoding.ASCII.GetBytes(String.Join("\n", newBuf.ToArray())));
updates the stream of the file and then with
using (var fs = new FileStream(file2, FileMode.Create, FileAccess.Write, FileShare.None))
{
var stamper = new PdfStamper(reader, fs);
reader.SetPageContent(1,reader.GetPageContent(1));
stamper.Close();
}
the new file is created with updated stream.
I made 1 simple test file with only 2 lines, change their color and save back to a new file.
No problem!
After that, I tried the same simple operation with real file, that represents real drawing of construction element, the result file was less than half of the original and was broken.
What comes to mind is the updated stream is saved to the new file but the other information inside other containers is not saved, it's just the stream.
Because I stuck with that, I continue to the next step of investigation -> layers
I wrote this code in order to get available layers in a PDF file. I will try to insert more records into layers dictionary to see what will happen.
var resourcesReference = page.Get(PdfName.RESOURCES) as PdfIndirectReference;
var resources = PdfReader.GetPdfObject(resourcesReference) as PdfDictionary;
var propertiesObjhectReferences = resources.Get(PdfName.PROPERTIES);
var properties = PdfReader.GetPdfObject(propertiesObjhectReferences) as PdfDictionary;
foreach (var property in properties.Keys)
{
var layerReference = properties.Get(property);
var layerObject = PdfReader.GetPdfObject(layerReference) as PdfDictionary;
foreach (var key in layerObject.Keys)
{
if (key.ToString()!=PdfName.TYPE.ToString())
{
var layerName = layerObject.GetAsString(key).ToUnicodeString();
}
}
}
If I come back to my main goal from the top of the post, I tends to insert the stream and layers from first file into second in order to obtain result file, that contains objects from the previous 2, painted in different colors + layers from both.
Feel free to suggest me another, more simpler and beautiful solution! I will be happy if you revise my code and correct it! Thank You very much!
EDIT 2
I will simplify the work because the lack of time, just change the color of entities inside one PDF and put it on the background on the other.
const string Pdf = "file1.pdf";
var reader = new PdfReader(Pdf);
var page = reader.GetPageN(1);
var objectReference = page.Get(PdfName.CONTENTS) as PdfIndirectReference;
var stream = (PRStream)PdfReader.GetPdfObject(objectReference);
var streamBytes = PdfReader.GetStreamBytes(stream);
var tokenizer = new PRTokeniser(new RandomAccessFileOrArray(streamBytes));
var newBuf = new List<string>();
while (tokenizer.NextToken())
{
var token = tokenizer.StringValue;
newBuf.Add(token);
if (tokenizer.TokenType == PRTokeniser.TokType.OTHER
&& newBuf[newBuf.Count - 1].Equals("S", StringComparison.CurrentCultureIgnoreCase))
{
newBuf.Insert(newBuf.Count - 1, "0");
newBuf.Insert(newBuf.Count - 1, "1");
newBuf.Insert(newBuf.Count - 1, "1");
newBuf.Insert(newBuf.Count - 1, "RG");
}
}
var resultStream = String.Join("\n", newBuf.ToArray());
stream.SetData(System.Text.Encoding.ASCII.GetBytes(resultStream));
var file2 = Pdf.Insert(Pdf.Length - 4, "Result");
using (var fs = new FileStream(file2, FileMode.Create, FileAccess.Write, FileShare.None))
{
var stamper = new PdfStamper(reader, fs);
reader.SetPageContent(1, reader.GetPageContent(1));
stamper.Close();
}
Result PDF is broken and iTextRUPS throws exception when try to get the stream data from the page.

Categories

Resources