How to fill forms like this using iText for .NET - c#

Trying to fill name and address on each boxes using
cb.SetTextMatrix(x, y);// x and y positions .
cb.ShowText("data");
But fails to do so .
the problem

The code you are using isn't entirely incorrect, but it has several flaws. For starters: you don't know the value of the x and y parameters, and that's kind of crucial if you want the text to be in the correct position.
Also: you are writing PDF syntax directly into the content stream. In your snippet, you forgot to create the text object (with cb.BeginText() and cb.EndText()). If you are new at PDF, you shouldn't try writing PDF syntax directly into the content stream unless you have a solid understanding of ISO-32000-1. Have you ever read ISO-32000-1? If not, then why are you using low-level operations? That doesn't make much sense, does it? There are helper classes such as ColumnText to add content at absolute positions.
Looking at the screen shot you shared, I see that some fields require "Comb" functionality. This functionality makes sure that each small box contains exactly one glyph (if you don't know what a glyph is, think of it as the visual representation of a character).
If you want to make it easy on yourself, you should test if the form is interactive first. Answer this question:
Does the form contain AcroFields?
If the answer is "Yes", fill out the form using the AcroFields object. You can find out which field names to use by following the instructions in the answer to this question: How do I enumerate all the fields in a PDF file in ITextSharp ?
If the answer is "No", open the file in Adobe Acrobat, and manually add fields. Define the fields as Comb fields so that each box contains each glyph. To get a nice-looking result, select a monospaced font such as Courier (using a proportional font will probably give you an uglier result). This operation adds AcroForm fields.
Once you have an interactive form with AcroFields (assuming you have defined them correctly), filling out the form is as easy as this in iText 5:
PdfReader reader = new PdfReader(template);
PdfStamper stamper = new PdfStamper(reader,
new FileStream(newFile, FileMode.Create));
AcroFields form = stamper.AcroFields;
form.SetField(key1, value1);
form.SetField(key2, value2);
form.SetField(key3, value3);
...
stamper.Close();
See How to create and fill out a PDF form
However, since you are new at all of this, I recommend that you use iText 7 as described in the jump-start tutorial:
PdfDocument pdf = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<String, PdfFormField> fields = form.GetFormFields();
PdfFormField toSet;
fields.TryGetValue(key1, out toSet);
toSet.SetValue(value1);
fields.TryGetValue(key2, out toSet);
toSet.SetValue(value2);
...
pdf.Close();
If you want to remove the interactivity after filling out the form, you need to flatten the fields. This is also documented on the official web site.

Related

PDF Table Structure

I have a PDF file with tabular structure but I am not able to store it in database as the PDF file is in Mangal font.
So two problems occur to me:
Extract table data from PDF
Text is in Marathi language
I have managed to do this for English with the following code:
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, i+1, strategy);
text.Append(currentText);
string rawPdfContent = Encoding.UTF8.GetString(Encoding.Convert(Encoding.UTF8, Encoding.UTF8, pdfReader.GetPageContent(i + 1)));
This encoding gives tabular structure but only for English font, want to know for Marathi.
Funnily enough, requirement no. 1 is actually the hardest.
In order to understand why, you need to understand PDF a bit.
PDF is not a WYSIWYG format. If you open a PDF file in notepad (or notepad++), you'll see that it doesn't seem to contain any human-readable information.
In fact, PDF contains instructions that tell a viewer program (like Adobe) how to render the PDF.
So instead of having an actual table in there (like you might expect in an HTML document), it will contain stuff like:
draw a line from .. to ..
go to position ..
draw the characters '123'
set the font to Helvetica bold
go to position ..
draw a line from .. to ..
draw the characters '456'
etc
See also How does TextRenderInfo work in iTextSharp?
In order to extract the table from the PDF, you need to do several things.
implement IEventListener (this is a class that you can attach to a Parser instance, a Parser will go over the entire page, and notify all listeners of things like TextRenderInfo, ImageRenderInfo and PathRenderInfo events)
watch out for PathRenderInfo events
build a datastructure that tracks which paths are being drawn
as soon as you detect a cluster of lines that is at roughly 90° angles, you can assume a table is being drawn
determine the biggest bounding box that fits the cluster of lines (this is know as the convex hull problem, and the algorithm to solve it is called the gift wrapping algorithm)
now you have a rectangle that tells you where (on the page) the table is located.
you can now recursively apply the same logic within the table to determine rows and columns
you can also keep track of TextRenderInfo events, and sort them into bins depending on the rectangles that fit each individual cell of the table
This is a lot of work. None of this is trivial. In fact this is the kind of stuff people write phd theses about.
iText has a good implementation of most of these algorithms in the form of the pdf2Data tool.
Code:
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, i+1, strategy);
string rawPdfContent = Encoding.UTF8.GetString(Encoding.Convert(Encoding.UTF8, Encoding.UTF8, pdfReader.GetPageContent(i + 1)));
Then I have identified lines (Horizontal and Vertical) from PDF. As for lines PDF has either re or m and l Keywords.
Then I worked for marathi text which I got from iTextSharp.
Then I merged both for desired location I extract the text using code-
Int64 width = Convert.ToInt64(linesVertical[5].StartPoint.X) - Convert.ToInt64(linesVertical[2].StartPoint.X);
Int64 height = Convert.ToInt64(linesVertical[2].EndPoint.Y) - (Convert.ToInt64(linesVertical[2].StartPoint.Y));
System.util.RectangleJ rect = new System.util.RectangleJ(Convert.ToInt64(linesVertical[2].StartPoint.X), (800 - Convert.ToInt64(linesVertical[2].EndPoint.Y) + 150), width, height);
RenderFilter[] renderFilter = new RenderFilter[1];
renderFilter[0] = new RegionTextRenderFilter(rect);
ITextExtractionStrategy textExtractionStrategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), renderFilter);
Owner_Name = PdfTextExtractor.GetTextFromPage(reader, 1, textExtractionStrategy);

iTextSharp Mixed Alignments in Same Object

I want to have a function that returns one object, with this object containing two paragraph with different alignments. This is easy to do manually by making them separate paragraphs and adding them to the pdf one at a time, but I would like my function to return it as a whole object to be added to a pdf. Is this possible? As an example of what I want:
someTextHere
someMoreTextHere
But as one object which I can then add to a pdf.
I have created a small standalone iText 7 example that creates the following output:
Th PDF file shown in the screen shot was created like this:
public void createPdf(String dest) throws IOException {
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
Document document = new Document(pdf);
Div div = new Div()
.add(new Paragraph("Left").setTextAlignment(TextAlignment.LEFT))
.add(new Paragraph("Right").setTextAlignment(TextAlignment.RIGHT))
.setBackgroundColor(ColorConstants.GRAY)
.setWidth(200);
document.add(div);
document.close();
}
As you can see, I created a Div element (similar to a <div> tag in HTML) to which I added two Paragraph objects with a different text alignment. That seems to be exactly what you need.
I am not a C# developer, hence I provide the code in Java. However, if you're proficient in C#, you shouldn't have any problem porting it from Java to C# (it's just a matter of changing lowercases into uppercases, such as changing add() into Add()).
Note that this is iText 7 code; if you're still using iText 5, you should consider upgrading to the latest iText version since iText 5 has gone into maintenance mode a while ago. Maintenance mode means that development on that version has stopped; it's no longer supported for users who aren't a customer.

WPF to XPS with outline and structure

I am developing a WPF reporting application.
My report is constructed as WPF Control (FlowDocument or FixedDocument) and
contains tables. I want to save it as XPS preserving its structure (this means that I can copy a table as a table, not plain text like explained in this article). I found a way to save WPF Control with XpsDocumentWriter or XpsSerializationManager, but the result has no structure or outline. Is it possible to save WPF Control as Xps preserving its structure?
XPS is a fixed document format, and WPF allows you to save a FlowDocument to a FixedDocument as a XPS file, code is needed when you want to add more features, you can follow this article to go further in that.
Convert XAML Flow Document to XPS with Style (multiple page, page size, header, margin)
It seems that there is no way to preserve WPF element semantics when serializing it with XpsDocumentWriter or XpsSerializationManager.
The only way to construct a document with structure is using low-level API from System.Windows.Xps.Packaging namespace, as described in this article. Using this API you can obtain XmlWriter for constructing FixedPage content
XpsDocument document = new XpsDocument(destFileName,FileAccess.ReadWrite);
IXpsFixedDocumentSequenceWriter docSeqWriter = document.AddFixedDocumentSequence();
IXpsFixedDocumentWriter docWriter = docSeqWriter.AddFixedDocument();
IXpsFixedPageWriter pageWriter = docWriter.AddFixedPage();
XmlWriter xmlWriter = pageWriter.XmlWriter;
and a Stream for writing document structure
XpsResource storyFraments = pageWriter.AddStoryFragment();
Stream stream = storyFraments.GetStream();
Although there are classes in the System.Windows.Documents.DocumentStructures namesapace representing StoryFragments elements and it’s children, you cannot use them while writing to the resource stream.

OpenXML: Anyway to see if a Word Document fits one page

While I doubt it, if I open up a word document using OpenXML sdk in C# and add some info, is there any way for me to see if it still fits one page?
If it doesn't I wan't to reduce font size on specific items I added until it fits.
I could write this algorithm if I had the current size in relation to page size with margins and all that.
I ran across this example on another site, don't know if it'll work in your case, as it requires the Office PIA...
var app = new Word.Application();
var doc = app.Documents.Open("path/to/file");
doc.Repaginate()
var pageNumber = doc.BuiltInDocumentProperties("Number of Pages").Value as int;

Copy Form Fields From One PDF to Another

I have a situation where I need to copy all of the form fields from one PDF to another. The purpose is to automate the overlaying of the fields when small edits are made to the underlying Word pages.
I've been using the trial version of Aspose.Pdf.Kit, and I'm able to copy everything but Radio buttons to a new form. However Aspose doesn't support copying the radio buttons, which completely nullifies it's usefulness, not to mention their customer support has been subpar.
In any event, I'm looking for some sort of library or plug-in that does support copying all types of form fields.
Does anyone have any ideas?
Thanks,
~DJ
Yes, it is possible. No, setField() won't do the trick... madisonw's code will copy the field values, but not the fields themselves.
OTOH, it really isn't that hard.
Something like:
PdfReader currentReader = new PdfReader( CURRENT_PDF_PATH ); // throws
PdfReader pdfFromWord = new PdfReader( TWEAKED_PDF_FROM_WORD_PATH ); // throws
PdfStamper stamper = new PdfStamper( currentReader , outputFile ); //throws
for( int i = 1; i <= tempalteReader.getNumberOfPages(); ++i) {
stamper.replacePage( pdfFromWord, i, i );
}
stamper.close(); // throws
I'm ignoring a bunch of exceptions, and am writing in Java, but C# should look virtually identical.
Also, this code ignores the case where someone ADDS A PAGE... which would get quite thorny. Was it added before or after the pages with fields on them? Did those pages reflow at all, requiring you to move the fields? At that point you really need a manual process with Acrobat Pro.
I agree with Oded, iTextSharp should be able to do the job. I've used code similar the following snippet and never had problems with any field types. I'm sure there must have been a radio button in the mix.
private void CopyFields(PdfStamper targetFile, PdfReader sourceFile){
{
foreach (DictionaryEntry de in targetFile.AcroFields.Fields)
{
string fieldName = de.Key.ToString();
target.AcroFields.SetField(fieldName, sourceFile.AcroFields.GetField(fieldName));
}
}

Categories

Resources