I am generating a pdf using iTextSharp. If certain properties are true then I also want to insert an existing pdf with static content.
private byte[] GeneratePdf(DraftOrder draftOrder)
// create a pdf document
var document = new Document();
// set the page size, set the orientation
document.SetPageSize(PageSize.A4);
// create a writer instance
var pdfWriter = PdfWriter.GetInstance(document, new FileStream(file, FileMode.Create));
document.Open();
if(draftOrder.hasProperty){
//add these things to the pdf
var textToBeAdded = "<table><tr>....</table>";
}
FormatHtml(document, textToBeAdded , css);
if(someOtherProperty){
//add static pdf from file
document.NewPage();
var reader = new PdfReader("myPath/existing.pdf");
PdfImportedPage page;
for(var i = 0; i < reader.NumberOfPages; i++){
//It's this bit I don't really understand
//**how can I add the page read to the document being created?**
}
I can load the pdf from the source but when I iterate over the pages I can't seem to be able to add them to the document I am creating.
Cheers
Please read http://manning.com/lowagie2/samplechapter6.pdf
If you don't mind losing all interactivity, you can get the template from the writer object with the GetImportedPage() method and add it to the document with AddTemplate ().
This question has been answered many times on StackOverflow and you'll notice that I always warn about some dangers: you need to realize that the dimensions of the imported page can be different from the page size you initially defined. Because of this invisible parts of the imported page can become visible; visible parts can become invisible.
I'd prefer adding the extra page in a second ho using PdfCopy, but maybe that's just me.
Related
I'm given to read a pdf texts and do some stuffs are extracting the texts. I 'm using iTextSharp to read the PDF. The problem here is that the PdfTextExtractor.GetTextFromPage doesnt give me all the contents of the page. For ex
In the above PDF I m unable to read texts that are highlighted in blue. Rest of the characters I m able t read. Below is the line that does the above
`string filePath = "myFile path";
PdfReader pdfReader = new PdfReader(filePath);
for (int page = 1; page<=1; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
}`
Any suggestions here?
I have went through lots of queries and solution in SO but not specific to this query.
The reason for text extraction not extracting those texts is pretty simple: Those texts are not part of the static page content but form fields! But "Text extraction" in iText (and other PDF libraries I know, too) is considered to mean "extraction of the text of the static page content". Thus, those texts you miss simply are not subject to text extraction.
If you want to make form field values subject to your text extraction code, too, you first have to flatten the form field visualizations. "Flattening" here means making them part of the static page content and dropping all their form field dynamics.
You can do that by adding after reading the PDF in this line
PdfReader pdfReader = new PdfReader(filePath);
code to flatten this PDF and loading the flattened PDF into the pdfReader, e.g. like this:
MemoryStream memoryStream = new MemoryStream();
PdfStamper pdfStamper = new PdfStamper(pdfReader, memoryStream);
pdfStamper.FormFlattening = true;
pdfStamper.Writer.CloseStream = false;
pdfStamper.Close();
memoryStream.Position = 0;
pdfReader = new PdfReader(memoryStream);
Extracting the text from this re-initialized pdfReader will give you the text from the form fields, too.
Unfortunately, the flattened form text is added at the end of the content stream. As your chosen text extraction strategy SimpleTextExtractionStrategy simply returns the text in the order it is drawn, the former form fields contents all are extracted at the end.
You can change this by using a different text extraction strategy, i.e. by replacing this line:
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
Using the LocationTextExtractionStrategy (which is part of the iText distribution) already returns a better result; unfortunately the form field values are not exactly on the same base line as the static contents we perceive to be on the same line, so there are some unexpected line breaks.
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
Using the HorizontalTextExtractionStrategy (from this answer which contains both a Java and a C# version thereof) the result is even better. Beware, though, this strategy is not universally better, read the warnings in the answer text.
ITextExtractionStrategy strategy = new HorizontalTextExtractionStrategy();
I'm creating Pdfs using itext7. It allows me to add paragraphs to first page but I'm not sure how can I add content to second page. If I create Canvas after calling AddNewPage() then it works fine, but it doesn't work when I use a paragraph and I add it to the document. Thanks for help. In my example, firstPageText and secondPageText will be displayed on first page:
protected void CreatePdf(string filePath, string firstPageText, string secondPageText)
{
PdfWriter writer = new PdfWriter(filePath);
PdfDocument pdfDocument = new PdfDocument(writer);
Document doc = new Document(pdfDocument);
doc.Add(new Paragraph(firstPageText));
pdfDocument.AddNewPage();
doc.Add(new Paragraph(secondPageText));
doc.Close();
}
This is explained in chapter 2 of the iText 7: Building Blocks. Allow me to copy a snippet of that tutorial:
If we had used an AreaBreak of type NEXT_PAGE, a new page would have been started; see figure 2.11.
In the
JekyllHydeV5
example, we changed a single line:
AreaBreak nextPage = new AreaBreak(AreaBreakType.NEXT_PAGE);
Instead of skipping to the next column, iText now skips to the next
page.
By default, the newly created page will have the same page size as
the current page. If you want iText to create a page of another
size, you can use the constructor that accepts a PageSize object as
a parameter. For instance: new AreaBreak(PageSize.A3).
There's also an AreaBreak of type LAST_PAGE. This AreaBreakType
is to be used when switching between different renderers.
It surprises me that you'd do this:
doc.Add(new Paragraph(firstPageText));
pdfDocument.AddNewPage();
doc.Add(new Paragraph(secondPageText));
While the documented way is to do it like this:
doc.Add(new Paragraph(firstPageText));
doc.Add(new AreaBreak(AreaBreakType.NEXT_PAGE));
doc.Add(new Paragraph(secondPageText));
Is it possible using IText to copy PDF pages from a full PDF document and return partial document based on a form field name? For example I need to copy the beginning of a pdf document and stop at a certain text field called [STOP_HERE], so whatever contents before this fields need to be extracted, the [STOP_HERE] field could be located on a different page for each document, so using page numbers wouldn't help here.
I searched online and all I can find is a way to copy only form fields from a document but not the whole document elements including images texts with their exact location and style.
Can IText do the job here?
EDIT: More details
[STOP_HERE] is an AcroForms text field which has been placed in a document by the PDF design person to indicate that everything before this element should be copied as is into a different document. The field itself is not important, I don't want to fill or do anything with it, it's just used as a signal to let the document parser stop there and copy all previous (upper) contents, I just don't know how to read all contents (without changing style, contents, etc) before this field.
Is it possible using IText to copy PDF pages from a full PDF document and return partial document based on a form field name? For example I need to copy the beginning of a pdf document and stop at a certain text field called [STOP_HERE]
Unfortunately the OP didn't tell whether the page containing the form field [STOP_HERE] is to be included or not. As that is a mere +/-1 matter, though, I simply assumed the page is to be included.
Thus, the task can be implemented like this:
PdfReader reader = new PdfReader(srcFile);
AcroFields.Item field = reader.AcroFields.Fields["[STOP_HERE]"];
if (field != null)
{
int firstPage = reader.NumberOfPages + 1;
for (int index = 0; index < field.Size; index++)
{
int page = field.GetPage(index);
if (page > 0 && page < firstPage)
firstPage = page;
}
if (firstPage <= reader.NumberOfPages)
{
reader.SelectPages("1-" + firstPage);
PdfStamper stamper = new PdfStamper(reader, new FileStream(dstFile, FileMode.Create, FileAccess.Write));
stamper.Close();
}
}
reader.Close();
The code opens the source file in a PdfReader and first looks for the field. If it exists, it iterates over all appearances of that field and determines the earliest page with an appearance of the field. If there is such a page, the code restricts the reader to the pages up to that page and stores this restriction using a PdfStamper.
I am trying to fill up a form with ITextsharp, and trying out the following code to get all the fields in the pdf:
string pdfTemplate = #"c:\Temp\questionnaire.pdf";
PdfReader pdfReader = new PdfReader(pdfTemplate);
StringBuilder sb = new StringBuilder();
foreach (var de in pdfReader.AcroFields.Fields)
{
sb.Append(de.Key.ToString() + Environment.NewLine);
}
But the foreach loop is always null count. Do I need to do something to file itself as I have tried the example from here and it works fine... this is an example of pdf I am trying to fill
any ideas?
Edit ::
As it turned out, the PDF "form" to fill in actually wasn't a form (in PDF terms) at all. Thus, your have two choices:
You add the text to the page contents directly using hardcoded or configured "field" positions and dimensions as described by #tschmit007 in comments to his answer.
You add actual PDF form fields to your PDF to generate a true PDF form which you take as template to fill in later.
You can add actual form fields either using some graphical tool allowing that, e.g. Adobe Acrobat, or you can use iText(Sharp). Have a look at chapter 8 of iText in Action — 2nd Edition and the samples available here for Java and here for .Net.
Those samples mostly add form fields to newly generated PDF documents. You can virtually use the same code, though, for adding form fields to a PdfStamper which exposes its inner PdfWriter using stamper.getWriter() in Java and the stamper.Writer in C#. Instead of writer.addAnnotation(field) you have to use stamper.addAnnotation(field, page), though.
try:
using (FileStream outFile = new FileStream("result.pdf", FileMode.Create)) {
PdfReader pdfReader = new PdfReader("file.pdf");
PdfStamper pdfStamper = new PdfStamper(pdfReader, outFile);
AcroFields fields = pdfStamper.AcroFields;
//rest of the code here
//fields.SetField("n°1", "value");
//...
pdfStamper.Close();
pdfReader.Close();
}
I've generated a pdf using iTextSharp and I can preview it very well in ASP.Net but I need to send it directly to printer without a preview. I want the user to click the print button and automatically the document prints.
I know that a page can be sent directly to printer using the javascript window.print() but I don't know how to make it for a PDF.
Edit: it is not embedded, I generate it like this;
...
FileStream stream = new FileStream(Request.PhysicalApplicationPath + "~1.pdf", FileMode.Create);
Document pdf = new Document(PageSize.LETTER);
PdfWriter writer = PdfWriter.GetInstance(pdf, stream);
pdf.Open();
pdf.Add(new Paragraph(member.ToString()));
pdf.Close();
Response.Redirect("~1.pdf");
...
And here I am.
Finally I made it, but I had to use an IFRAME, I defined an IFrame in the aspx and didn't set the src property, in the cs file I made generated the pdf file and set the src property of the iFrame as the generated pdf file name, like this;
Document pdf = new Document(PageSize.LETTER);
PdfWriter writer = PdfWriter.GetInstance(pdf,
new FileStream(Request.PhysicalApplicationPath + "~1.pdf", FileMode.Create));
pdf.Open();
//This action leads directly to printer dialogue
PdfAction jAction = PdfAction.JavaScript("this.print(true);\r", writer);
writer.AddJavaScript(jAction);
pdf.Add(new Paragraph("My first PDF on line"));
pdf.Close();
//Open the pdf in the frame
frame1.Attributes["src"] = "~1.pdf";
And that made the trick, however, I think that i should implement your solution Stefan, the problem is that I'm new to asp.net and javascript and if I don't have a complete source code I could not code your suggestion but at least is the first step, I was very surprised how much code in html and javascript i need to learn. Thnx.
Is the pdf embedded in the page with embedd-tag or just opened in a frame or how are you showing it?
If its embedded, just make sure that the object is selected and then do a print().
Get the ref to the embedded document.
var x = document.getElementById("mypdfembeddobject");
x.click();
x.setActive();
x.focus();
x.print();
It's a little more tricky if you're using pdfsharp but quite doable
PdfDocument document = new PdfDocument();
PdfPage page = document.AddPage();
XGraphics gfx = XGraphics.FromPdfPage(page);
XFont font = new XFont("Verdana", 20, XFontStyle.BoldItalic);
// Draw the text
gfx.DrawString("Hello, World!", font, XBrushes.Black,
new XRect(0, 0, page.Width, page.Height),
XStringFormats.Center);
// real stuff starts here
// current version of pdfsharp doesn't support actions
// http://www.pdfsharp.net/wiki/WorkOnPdfObjects-sample.ashx
// so we got to get close to the metal see chapter 12.6.4 of
// http://partners.adobe.com/public/developer/pdf/index_reference.html
PdfDictionary dict = new PdfDictionary(document); //
dict.Elements["/S"] = new PdfName("/JavaScript"); //
dict.Elements["/JS"] = new PdfString("this.print(true);\r");
document.Internals.AddObject(dict);
document.Internals.Catalog.Elements["/OpenAction"] =
PdfInternals.GetReference(dict);
document.Save(Server.MapPath("2.pdf"));
frame1.Attributes["src"] = "2.pdf";
ALso, try this gem:
<link ref="mypdf" media="print" href="mypdf.pdf">
I havent tested it, but what I have read about it, it can be used in this way to let the mypdf.pdf be printed instead of page content whatever method you are using to print the page.
Search for media="print" to check out more.
You can embed javascript in the pdf, so that the user gets a print dialog as soon as their browser loads the pdf.
I'm not sure about iTextSharp, but the javascript that I use is
var pp = this.getPrintParams();
pp.interactive = pp.constants.interactionLevel.automatic;
this.print(pp);
For iTextSharp, check out http://itextsharp.sourceforge.net/examples/Chap1106.cs