itext create pdf based on existing one with changed content - c#

I got quite complicated ready pdf file. It has got barcode and fancy looking table.
I have to create based on it application which will generate pdfs that will look the same but contain different records in the table and different barcode.
Is it possible to copy existing pdf and just change content of barcode and table ?
What would be the best approach to create the same looking pdf but with different content ?
Whank You very much for help

If the barcode and table are static I would open it in photoshop or illustrator delete everything I dont want, Then save it as a pdf again. Then follow this guide iText - add content to existing PDF file and use it as a template to put my custom content in.
If the table and bar code are dynamically generated (each one is different) and you need to crop out content on the fly I would pull some hacky crap and draw white squares over all the content I want gone. then proceed to use it as a template.
Just my 2 cents given the information provided.

Related

Making PDF file from C# application

How do I make a PDF with C# application?
I would like to make an application which creates a PDF document, but it could be Excel too.
I wish to make a file which contains tables and header.
You could use ITextSharp to create your PDF file.
And this post to help you with Excel.
Take a look at PDFFlow library for generation of PDF in C#. It has features that you need: tables, multi-page spread tables, repeating areas (headers, footers, left, right), table repeating headers... and even more unique features (like automatic page creation, multi-level list, word-like tabulation, barcodes/QR codes, etc...) and easy fluent syntax:
var DocumentBuilder.New()
.AddSection()
.AddParagraph("Hello World!").SetFontColor(Color.Red)
.ToSection()
.AddImageToSection("Smile.png")
.AddTable()
.AddColumnToTable("Column1").AddColumnToTable("Column2")
.AddRow().AddCellToRow("Row1, Cell1").AddCell("Row1, Cell 2")
.ToTable()
.AddRow().AddCellToRow("Row2, Cell1").AddCell("Row2, Cell 2")
.ToDocument()
.Build("Result.pdf");
This is first pages of my business document with multi-page spread table and repeating headers, which I created using this library:
There are examples with source code and explanation, I found them very useful: examples.
Hope, this will help.

Skip section of PDF when extracting text and Get image coordinates

Currently, I can extract all the text chunks with their location data from a PDF. The problem is that the PDF contains images with text annotations which I do not want including in the extraction.
However, for whatever reason whenever I search the PDF for images, it only finds 1 of the images and usually throws the exception: The colour space is not supported. It's as if it doesn't recognise them as images?
I am not wishing to extract the images, just locate where they start and end in relation to the PDF so I can exempt the text that is on top of the images.
For example:
Where the numbers on the graph are unwanted and need to be removed from the extracted text.
Im just not sure how to:
A) Locate all the images and store the coordinates of where it starts and ends
B) Ignore the text that is on top of the images in the PDF document
(I am using iTextSharp to try and achieve this, but so far I am not having much luck)
I'm not exactly sure how iTextSharp works but the PostScript language reference or the PDF Reference manuals may be a good place to start figuring out what you need to know.
I just cracked open a PDF file in a text editor to check out the format because I haven't seen it in a while and then realized what the problem might be.
PDFs support "Images", and "Stream Objects" which can contain image data. Stream objects actually declare enough information that you can know where they begin and end and write something to manually ignore them.
A Stream Object Header looks like this:
<</Intent/RelativeColorimetric/Subtype/Image/Length 19678/Filter/DCTDecode/Name/X/Metadata 4314 0 R/BitsPerComponent 8/ColorSpace 5247 0 R/Width 290/Height 372/Type/XObject>>stream
It's entirely possible that your particular PDF has only one "Image" and then the rest of it is "Streams".
I suggest cracking it open to take a look. It would also be beneficial if you included some sample code with on the library you're using.
I also found by opening a PDF in a text editor this string /Type /Page which seems to create new pages, so you there's a chance you could count those to determine which page you're currently on.
The header at the top of the document I'm reviewing is %PDF-1.2 and the latest version is 1.7, so there may be some disparity here because of that.
Any chance you can share the PDF file you're working with?

How to recognize tables inside a pdf file

I want to recognize tables inside a pdf files. What SDK is used in C# to recognize tables inside pdfs and some mechanism to read cell by cell, can any one please suggest?
PDF Sharp is good and its free. I've never done this in specific but it does correlate to all the major objects in the PDF format.
Tables do not exist inside a PDF as a structure unless it was created as Marked content with additional tagging in it. I wrote a blog post explaining some of the issues with text extraction from PDF files at http://www.jpedal.org/PDFblog/2009/04/pdf-text/

Consolidate the same image used multiple times in a PDF

I am generating PDF documents using DevExpress XtraReports.
I am using the same image over and over (in rows of status lights).
The PDF generated seems to duplicate the image definition for each image included. I would prefer if it included the image once and referenced it wherever it needed another copy - this would drastically reduce the size of my PDF docs.
Is there any way to achieve this using DevExpress or even post processed via a third party application. Any help is appreciated.
Two options:
OPT1: I suppose your image is a background or a company logo and the image is the same on all the pages of the pdf. If yes, then create the pdf without the image. Post-process the pdf and add the image on all the pages (you can do that using itext/itextsharp or pdflib).
OPT2: take your actual pdf and convert it using Ghoscript. Using Ghsoscript you can do a "pdf to pdf" conversion. During the conversion Ghostscript try to identify repeated images and removes them. The resulting file is smaller. (Ghostscript is not always able to do that... try with you pdf file).
It is possible to re-use the same image content in multiple locations throughout your document. But it's a fair bit easier to do this while adding the image(s) to the PDF.
I'm not sure if DevExpress supports this.

Filling out a PDF / form using Winform application

I have a 2-page PDF that I want to overlay information on (think of a form that someone manually fills out) using a C#/.NET Windows application. After this form is generated, it will need to be previewed and printed (exported into a graphic or PDF is nice, but not a requirement).
At first glance, I'm thinking of two ways to do this:
Use a PDF manipulator like iTextSharp, take a copy of the blank PDF form, and add text to the PDF. Then, launch Adobe Reader to do a print or preview.
Convert the PDF into a graphic, and put the graphic into a C# Report. Then, overlay text fields onto the report, and use a .NET ReportViewer control to preview and print the report.
The text does not need to be searchable or copyable or have any of the cool things that PDF gives me, so I'm leaning towards the second option. Am I missing anything, or is there something I'm not thinking of? Thanks in advance.
We use #1 extensively and it works perfectly. You shouldn't have any problems, it's rather easy (just need to make the fields writeable and use a FDF file and merge it with the PDF file). At least that's how we did it.

Categories

Resources