I have a ready made PDF, and I would need to modify the trimbox, bleedbox with SetBoxSize and use the setPDFXConformance. Is there a way to do this?
I've tried with stamper.Writer, but it doesn't care about what I set there
2011.02.01.
We've tested it with Acrobat Pro, and it said that the trimbox was not defined. It seems the the stamper's writer's methods/properties don't effect the resulting pdf. Here are the source and result files: http://stemaweb.hu/pdfs.zip
my code:
PdfReader reader = new PdfReader(#"c:\source.pdf");
PdfStamper stamper = new PdfStamper(reader, new FileStream(#"c:\result.pdf", FileMode.Create));
stamper.Writer.SetPageSize(PageSize.A4);
stamper.Writer.PDFXConformance = PdfWriter.PDFX32002;
stamper.Writer.SetBoxSize("trim", new iTextSharp.text.Rectangle(20, 20, 100, 100));
PdfContentByte cb = stamper.GetOverContent(1);
/*drawing*/
stamper.Close();
Because the boxes are not visible, I tried to modify the pagesize with the writer but that didn't do anything either.
SetPDFXConformance won't turn a "normal" PDF into a PDF/X pdf. SetPDFXConformance is really just for document generation, causing iText to throw an exception if you do something blatantly off spec.
"it doesn't care about what I set there". Trim and bleed boxes are not something you can see visually in Reader. How are you testing for them?
Could you post some code, and a link to your output PDF?
Ah. You're using stamper.Writer. In this case, that doesn't work out so well. All the page level, Well Supported Actions via PdfStamper will take a page number or page's PdfDictionary as an argument. SetBoxSize just takes a string & a rectangle, so that's youre clue.
Going "under the hood" as you are is actually defaulting back to PdfWriter.setBoxSize... which is only for creating PDFs, not modifying an existing page.
So: You need to use the low-level PDF objects make the changes you want. No Problemo:
for (int i = 1; i <= myReader.getNumberOfPages(); ++i) {
PdfDictionary pageDict = myREADER_YES_READER.getPageN(i);
PdfRectangle newBox = new PdfRectangle( 20, 20, 100, 100 );
pageDict.put(PdfName.TRIMBOX, newBox);
newBox = new PdfRectangle( PageSize.A4 );
pageDict.put(PdfName.MEDIABOX, newBox );
}
/* drawing */
stamper.close();
As to the PDFX32002 conformance, I think you're going to have to go code diving to figure out exactly what is needed. Writer.PDFXConformance is another aspect of Writer that only works when generating a PDF, not modifying an existing one.
The good news is that PdfXConformanceImp is a public class. The bad news is that its only used internally by PdfWriter and PdfContentByte... hey. You are getting some changes in behavior with your present code (just not enough). Specifically, if you try something that isn't allowed within that PdfContentByte, you'll get a PdfXConformanceException with message describing the restriction you've violated. Trying to add an optional content group (layer) would throw for example.
Ah. That's not so bad. MAYBE. Try this:
PDFXConformanceImp pdfx = new PDFXConformanceImp();
pdfx.setConformance(PdfWriter.PDFX32002);
pdfx.commpleteInfoDictionary(stamper.Writer.getInfo());
pdfx.completeExtraCatalog(stamper.Writer.getExtraCatalog());
stamper.close();
If you drop stamper.Writer.PDFXConformance = PdfWriter.PDFX32002;, you won't get exceptions when you do something Forbidden in your contentByte. Other than that, I don't think it'll matter.
Hmm.. That's not the whole solution. The OutputIntents from the extraCatalog are merged into the main catalog as well. Perhaps this will work:
//replace the completeExtraCatalog call above with this
pdfx.completeExtraCatalog(myReader.getCatalog());
I wish you luck.
Related
I am having a bit of a struggle reading a PDF using iText7 in Blazor WebAssembly.
The InputFile component creates a IBrowserFile:
<div>
<InputFile OnChange="#OnFileSelection"></InputFile>
<div class="row">
<textarea>#outputText</textarea>
</div>
</div>
I can then read the file with Stream - and iText7 will supposedly read that - but it won't give a page count or anything else that I have tried. It also doesn't seem to pass over the reader, and doesn't even seem to get to the pageCount.
int pageCount = 0;
IBrowserFile pdfFile = e.File;
Stream stream = pdfFile.OpenReadStream();
PdfDocument pdfDoc = new PdfDocument(new PdfReader(stream));
pageCount = pdfDoc.GetNumberOfPages();
stream.Close();
outputText = $"{pageCount}";
StateHasChanged();
I have also tried reading the Stream into a MemoryStream first, same outcome. I have followed the information here:
https://learn.microsoft.com/en-us/aspnet/core/blazor/file-uploads?view=aspnetcore-6.0&pivots=webassembly
Same outcome.
Is there a way to handle the PDF file in such a way as the functionality of iText7 remains intact, so you can get page counts, extracted text etc?
The file I am testing on is below the 500kb limit, it is 66kb. I don't need to display the PDF - I just need to know what the contents of it are ideally on a page by page basis, but for now, simply being able to read a page or get a page count would be a big step forward.
If you look at your developer console, you'll find that it's emitting the error:
Synchronous reads are not supported.
You'll notice you aren't using await anywhere, and, unfortunately, neither is iText7. Blazor strongly enforces use of asynchronous semantics and if it's violated, you'll see an error like this.
Fortunately, you can still make this work. You said:
I have also tried reading the Stream into a MemoryStream first, same outcome.
You should show what you tried, but my hunch is it looked something like this:
var copy = new MemoryStream();
stream.CopyTo(copy);
copy.Position = 0;
PdfDocument pdfDoc = new PdfDocument(new PdfReader(copy));
This will lead to the exact same error on the line .CopyTo. And for the same reasons. If you instead make the copy process properly use async semantics, it will work:
var copy = new MemoryStream();
await stream.CopyToAsync(copy);
copy.Position = 0;
PdfDocument pdfDoc = new PdfDocument(new PdfReader(copy));
Notice await stream.CopyToAsync(copy);. You'll need to make your surrounding method async in order for the await to work, but the return type ought to be a Task already. (And if it isn't, you can make it so)
Using this, I was able to see the page count display in your text area.
I've been attempting to find an easy solution to exporting a Canvas in my WPF Application to a PDF Document.
So far, the best solution has been to use the PrintDialog and set it up to automatically use the Microsoft Print the PDF 'printer'. The only problem I have had with this is that although the PrintDialog is skipped, there is a FileDialog to choose where the file should be saved.
Sadly, this is a deal-breaker because I would like to run this over a large number of canvases with automatically generated PDF names (well, programitically provided anyway).
Other solutions I have looked at include:
Using PrintDocument, but from my experimentation I would have to manually iterate through all my Canveses children and manually invoke the correct Draw method (of which a lot of my custom elements with transformation would be rather time consuming to do)
Exporting as a PNG image and then embedding that in a PDF. Although this works, TextBlocks within my canvas are no longer text. So this isn't an ideal situation.
Using the 3rd party library PDFSharp has the same downfall as the PrintDocument. A lot of custom logic for each element.
With PDFSharp. I did find a method fir generating the XGraphics from a Canvas but no way of then consuming that object to make a PDF Page
So does anybody know how I can skip or automate the PDF PrintDialog, or consume PDFSharp XGraphics to make
A page. Or any other ideas for directions to take this besides writing a whole library to convert each of my Canvas elements to PDF elements.
If you look at the output port of a recent windows installation of Microsoft Print To PDF
You may note it is set to PORTPROMP: and that is exactly what causes the request for a filename.
You might note lower down, I have several ports set to a filename, and the fourth one down is called "My Print to PDF"
So very last century methodology; when I print with a duplicate printer but give it a different name I can use different page ratios etc., without altering the built in standard one. The output for a file will naturally be built:-
A) Exactly in one repeatable location, that I can file monitor and rename it, based on the source calling the print sequence, such that if it is my current default printer I can right click files to print to a known \folder\file.pdf
B) The same port can be used via certain /pt (printto) command combinations to output, not just to that default port location, but to a given folder\name such as
"%ProgramFiles%\Windows NT\Accessories\WORDPAD.EXE" /pt listIN.doc "My Print to PDF" "My Print to PDF" "listOUT.pdf"
Other drivers usually charge for the convenience of WPF programmable renaming, but I will leave you that PrintVisual challenge for another of your three wishes.
MS suggest XPS is best But then they would be promoting it as a PDF competitor.
It does not need to be Doc[X]2PDF it could be [O]XPS2PDF or aPNG2PDF or many pages TIFF2PDF etc. etc. Any of those are Native to Win 10 also other 3rd party apps such as [Free]Office with a PrintTo verb will do XLS[X]2PDF. Imagination becomes pagination.
I had a great success in generating PDFs using PDFSharp in combination with SkiaSharp (for more advanced graphics).
Let me begin from the very end:
you save the PdfDocument object in the following way:
PdfDocument yourDocument = ...;
string filename = #"your\file\path\document.pdf"
yourDocument.Save(filename);
creating the PdfDocument with a page can be achieved the following way (adjust the parameters to fit your needs):
PdfDocument yourDocument = new PdfDocument();
yourDocument.PageLayout = PdfPageLayout.SinglePage;
yourDocument.Info.Title = "Your document title";
PdfPage yourPage = yourDocument.AddPage();
yourDocument.Orientation = PageOrientation.Landscape;
yourDocument.Size = PageSize.A4;
the PdfPage object's content (as an example I'm putting a string and an image) is filled in the following way:
using (XGraphics gfx = XGraphics.FromPdfPage(yourPage))
{
XFont yourFont = new XFont("Helvetica", 20, XFontStyle.Bold);
gfx.DrawString(
"Your string in the page",
yourFont,
XBrushes.Black,
new XRect(0, XUnit.FromMillimeter(10), page.Width, yourFont.GetHeight()),
XStringFormats.Center);
using (Stream s = new FileStream(#"path\to\your\image.png", FileMode.Open))
{
XImage image = XImage.FromStream(s);
var imageRect = new XRect()
{
Location = new XPoint() { X = XUnit.FromMillimeter(42), Y = XUnit.FromMillimeter(42) },
Size = new XSize() { Width = XUnit.FromMillimeter(42), Height = XUnit.FromMillimeter(42.0 * image.PixelHeight / image.PixelWidth) }
};
gfx.DrawImage(image, imageRect);
}
}
Of course, the font objects can be created as static members of your class.
And this is, in short to answer your question, how you consume the XGraphics object to create a PDF page.
Let me know if you need more assistance.
I am using iTextSharp for PDF processing, and I need to extract all text from an existing PDF that is written in a certain font.
A way to do that is to inherit from a RenderFilter and only allow text that has a certain PostscriptFontName. The problem is that when I do this, I see the following font names in the PDF:
CIDFont+F1
CIDFont+F2
CIDFont+F3
CIDFont+F4
CIDFont+F5
which is nothing like the actual font names I am looking for.
I have tried enumerating the font resources, and it shows the same result.
I have tried opening the PDF in the full Adobe Acrobat. It also shows the mangled font names:
I have tried analysing the file with iText RUPS. Same result.
That is, I have not been able to see the actual font names anywhere in the document structure.
Yet, Adobe Acrobat DC does show the correct font names in the Format pane when I select various text boxes on the document canvas (e.g. Arial, Courier New, Roboto), so that information must be stored somewhere.
How do I get those real font names when parsing PDFs with iTextSharp?
As determined in the course of the comments to the question, the font names are anonymized in all PDF metadata for the font but the embedded font program itself contains the actual font name.
(So the PDF strictly speaking is broken, even though in a way hardly any software will ever complain about.)
If we want to retrieve those names, therefore, we have to look inside these font programs.
Here a proof of concept following the architecture used in this answer you referenced, i.e. using a RenderFilter:
class FontProgramRenderFilter : RenderFilter
{
public override bool AllowText(TextRenderInfo renderInfo)
{
DocumentFont font = renderInfo.GetFont();
PdfDictionary fontDict = font.FontDictionary;
PdfName subType = fontDict.GetAsName(PdfName.SUBTYPE);
if (PdfName.TYPE0.Equals(subType))
{
PdfArray descendantFonts = fontDict.GetAsArray(PdfName.DESCENDANTFONTS);
PdfDictionary descendantFont = descendantFonts[0] as PdfDictionary;
PdfDictionary fontDescriptor = descendantFont.GetAsDict(PdfName.FONTDESCRIPTOR);
PdfStream fontStream = fontDescriptor.GetAsStream(PdfName.FONTFILE2);
byte[] fontData = PdfReader.GetStreamBytes((PRStream)fontStream);
MemoryStream dataStream = new MemoryStream(fontData);
dataStream.Position = 0;
MemoryPackage memoryPackage = new MemoryPackage();
Uri uri = memoryPackage.CreatePart(dataStream);
GlyphTypeface glyphTypeface = new GlyphTypeface(uri);
memoryPackage.DeletePart(uri);
ICollection<string> names = glyphTypeface.FamilyNames.Values;
return names.Where(name => name.Contains("Arial")).Count() > 0;
}
else
{
// analogous code for other font subtypes
return false;
}
}
}
The MemoryPackage class is from this answer which was my first find searching for how to read information from a font in memory using .Net.
Applied to your PDF file like this:
using (PdfReader pdfReader = new PdfReader(SOURCE))
{
FontProgramRenderFilter fontFilter = new FontProgramRenderFilter();
ITextExtractionStrategy strategy = new FilteredTextRenderListener(
new LocationTextExtractionStrategy(), fontFilter);
Console.WriteLine(PdfTextExtractor.GetTextFromPage(pdfReader, 1, strategy));
}
the result is
This is Arial.
Beware: This is a mere proof of concept.
On one hand you will surely also need to implement the part commented as analogous code for other font subtypes above; and even the TYPE0 part is not ready for production use as it only considers FONTFILE2 and does not handle null values gracefully.
On the other hand you will want to cache names for fonts already inspected.
I am using iTextSharp (5.5.5.90) to generate PDF files. I am using paragraphs and importing pages from readers and such. Here is how I create my document, from there I just append what I need:
FileStream fs = new FileStream("filename.pdf", FileMode.Create, FileAccess.Write, FileShare.None);
Document doc = new Document(new Rectangle(PageSize.LETTER), 58, 58, 100, 50);
PdfWriter writer = PdfWriter.GetInstance(doc, fs);
Once the file is created, I add paragraphs like this:
doc.Add(new Paragraph("Paragraph text"));
And import pages from readers like this:
writer.DirectContent.AddTemplate(writer.GetImportedPage(reader, page), 0, 0);
My question is how would I go back to page one after generating the entire document and add an element to page one? I will be adding a barcode (I know how to add barcodes, tables and such where I want them on the current page), but I don't know how to "go back" to page one to add an element.
Here is the full code, but you won't be able to compile it because of dependencies. Also, don't get caught up in the details of the full code, as this is a large project to create dynamically generated documents. https://pastebin.com/kABi7fzW
I can't attest as to the exactly calls to iTextSharp's as we approached our documentation quite differently; open a Word template, at the data as DataTables, etc., do a MailMerge, close and reopen and save as PDF. Sounds more involved but doesn't require the granular level of detail you're doing of creating the document paragraph by paragraph but it does allow the document generator to worry about content and not style placement (handled via Word, manually and external to the application).
From experience with iTextSharp, you'll have a lot of trouble trying to float an element on top of a section to insert the barcode. The document generation tool has an annoying tendency to not quite work in this scenario. We endured many weeks of back and forth with iTextSharp support and a version upgrade and still couldn't get it to behave properly in all scenarios.
As discussed in the comments and given how you've already written your code (I doubt you'll scrap all that code and start with a MailMerge unless you really, really have to), you'll need to insert a placeholder block that you can locate via iTextSharp's PdfBuilder api. I'd imagine that setting a bookmark location would likely be the easiest way.
If it's possible (preferable?) to have the barcode on a page of its own, then you already have the code needed to do this (circa line 324 in your pastebin link) with;
// create doc...
// reopen doc and get page count
doc.NewPage();
// add barcode with page count + 1
// save
I'm not sure that this is possible but I figured it would be worth asking. I have figured out how to set the font of a formfield using the pdfstamper and acrofields methods but I would really like to be able to set the font of different parts of the text in the same field. Here's how I'm setting the font of the form fields currently:
// Use iTextSharp PDF Reader, to get the fields and send to the
//Stamper to set the fields in the document
PdfReader pdfReader = new PdfReader(fileName);
// Initialize Stamper (ms is a MemoryStream object)
PdfStamper pdfStamper = new PdfStamper(pdfReader, ms);
// Get Reference to PDF Document Fields
AcroFields pdfFormFields = pdfStamper.AcroFields;
//create a bold font
iTextSharp.text.Font bold = FontFactory.GetFont(FontFactory.COURIER, 8f, iTextSharp.text.Font.BOLD);
//set the field to bold
pdfFormFields.SetFieldProperty(nameOfField, "textfont", bold.BaseFont, null);
//set the text of the form field
pdfFormFields.SetField(nameOfField, "This: Will Be Displayed In The Field");
// Set the flattening flag to false, so the document can continue to be edited
pdfStamper.FormFlattening = true;
// close the pdf stamper
pdfStamper.Close();
What I'd like to be able to do where I set the text above is set the "This: " to bold and leave the "Will Be Displayed In The Field" non-bolded. I'm not sure this is actually possible but I figured it was worth asking because it would really be helpful in what I'm currently working on.
Thanks in advance!
Yes, kinda. PDF fields can have a rich text value (since acrobat 6/pdf1.5) along with a regular value.
The regular value uses the font defined in the default appearances... a single font.
The rich value format (which iText doesn't support directly, at least not yet), is described in chapter 12.7.3.4 of the PDF Reference. <b>, <i>, <p>, and quite a few css2 text attributes. It requires a with various attributes.
To enable rich values, you have to set bit 26 of the field flags (PdfName.FF) for a text field. PdfFormField doesn't have a "setRichValue", but they're dictionaries, so you can just:
myPdfFormField.put(PdfName.RV, new PdfString( richTextValue ) );
If you're trying to add rich text to an existing field that doesn't already support it:
AcroFields fields = stamper.getAcroFields();
AcroFields.Item fldItem = fields.getFieldItem(fldName);
PdfDictionary mergedDict = item.getMerged(0);
int flagVal = mergedDict.getAsNumber(PdfName.FF).intValue();
flagVal |= (1 << 26);
int writeFlags = AcroFields.Item.WRITE_MERGED | AcroFields.Item.WRITE_VALUE;
fldItem.writeToAll(PdfName.FF, new PdfNumber(flagVal), writeFlags);
fldItem.writeToAll(PdfName.RV, new PdfString(richTextValue), writeFlags);
I'm actually adding rich text support to iText (not sharp) as I type this message. Hurray for contributors on SO. Paulo's been good about keeping iTextSharp in synch lately, so that shouldn't be an issue. The next trunk release should have this feature... so you'd be able to write:
myPdfFormField.setFieldFlags( PdfFormField.FF_RICHTEXT );
myPdfFormField.setRichValue( richTextValue );
or
// note that this will fail unless the rich flag is set
acroFields.setFieldRichValue( richTextValue );
NOTE: iText's appearance generation hasn't been updated, just the value side of things. That would take considerably more work. So you'll want to acroFields.setGenerateAppearances(false) or have JS that resets the field value when the form its opened to force Acrobat/Reader to build the appearance[s] for you.
It took me some time to figure out after richtextfield did not work the way it was suppose too with
acrofields and most of the cases were the pdf was not editable with different fonts at runtime.
I worked out way of setting different fonts in acrofields and passing values and editing at runtime with itextsharp and I thought it will be useful for others.
Create pdf with a text field in PDF1.pdf (I hope you know how to create field in pdf)
e.g., txtComments
Go to the property section and Set the property to richtext,Mulitiline
Format the text content in word or pdf by adding the fonts and colors.
If done in word, copy and paste the content in pdf - txtcomments field.
Note:
If you want to add dynamic content. Set the parameter “{0}” to the txtComment field in the pdf.
using string format method you can set values to it.This is shown in the code below.
e.g., "Set different parts of a form field to have different fonts using {0}"
Add the following code in a button (this is not specific) event by reference in the itextsharp.dll 5.4.2
Response.ContentType = "application/pdf";
Response.AddHeader("Content-disposition","attachment; filename=your.pdf");
PdfReader reader = new PdfReader(#"C:\test\pdf1.pdf");
PdfStamper stamp = new PdfStamper(reader, Response.OutputStream);
AcroFields field = pdfStamp.AcroFields;
string comments = field .GetFieldRichValue("txtcomments");
string Name = "Test1";
string value = string.Format(comments,Name);
field.SetField("txtComment", value );
field.GenerateAppearances = false;//Pdf knows what to do;
stamp.FormFlattening = false;//available for edit at run time
stamp.FreeTextFlattening = true;
stamp.Close();
reader.Close()
Hope this helps.