How to recognize tables inside a pdf file - c#

I want to recognize tables inside a pdf files. What SDK is used in C# to recognize tables inside pdfs and some mechanism to read cell by cell, can any one please suggest?

PDF Sharp is good and its free. I've never done this in specific but it does correlate to all the major objects in the PDF format.

Tables do not exist inside a PDF as a structure unless it was created as Marked content with additional tagging in it. I wrote a blog post explaining some of the issues with text extraction from PDF files at http://www.jpedal.org/PDFblog/2009/04/pdf-text/

Related

Making PDF file from C# application

How do I make a PDF with C# application?
I would like to make an application which creates a PDF document, but it could be Excel too.
I wish to make a file which contains tables and header.
You could use ITextSharp to create your PDF file.
And this post to help you with Excel.
Take a look at PDFFlow library for generation of PDF in C#. It has features that you need: tables, multi-page spread tables, repeating areas (headers, footers, left, right), table repeating headers... and even more unique features (like automatic page creation, multi-level list, word-like tabulation, barcodes/QR codes, etc...) and easy fluent syntax:
var DocumentBuilder.New()
.AddSection()
.AddParagraph("Hello World!").SetFontColor(Color.Red)
.ToSection()
.AddImageToSection("Smile.png")
.AddTable()
.AddColumnToTable("Column1").AddColumnToTable("Column2")
.AddRow().AddCellToRow("Row1, Cell1").AddCell("Row1, Cell 2")
.ToTable()
.AddRow().AddCellToRow("Row2, Cell1").AddCell("Row2, Cell 2")
.ToDocument()
.Build("Result.pdf");
This is first pages of my business document with multi-page spread table and repeating headers, which I created using this library:
There are examples with source code and explanation, I found them very useful: examples.
Hope, this will help.

itext create pdf based on existing one with changed content

I got quite complicated ready pdf file. It has got barcode and fancy looking table.
I have to create based on it application which will generate pdfs that will look the same but contain different records in the table and different barcode.
Is it possible to copy existing pdf and just change content of barcode and table ?
What would be the best approach to create the same looking pdf but with different content ?
Whank You very much for help
If the barcode and table are static I would open it in photoshop or illustrator delete everything I dont want, Then save it as a pdf again. Then follow this guide iText - add content to existing PDF file and use it as a template to put my custom content in.
If the table and bar code are dynamically generated (each one is different) and you need to crop out content on the fly I would pull some hacky crap and draw white squares over all the content I want gone. then proceed to use it as a template.
Just my 2 cents given the information provided.

Is there a way to read table data (parse) from pdf file using Adobe reader Type library

I am working on C#.net project. I have a pdf file which contains some table structure data. I have google a lot how even not able to get the answer about how to read the table data from pdf file in C# code.
I tried iTextSharp, PdfBOX, pdfSharp etc. However not able to get it. Is there a way to read the data.
OR
Is it possible to read table data from pdf file using adobe reader type lib?
Have seen many question on stackoverflow, however not a perfact answer i got.
Please help me out.
Check out Ivy PDF: www.ivytools.net
It can extract tabular data pretty well (assuming the tables have a proper structure).

Generating reports using custom data (html table) not dataset

I read several articles on this site and many others recommending report generating tools of various kinds but all use dataset as there datasource.
This is not my requirement!
I need some way to export the data which is displayed in the htmltable on mypage.aspx.
user clicks view record
a pdf opens (it should contain the data of the html table which is made dynamically)
I want to generate a pdf on the fly using a htmltable as a source of data and displaying it a a pdf doc in the browser but not having to save the result to the webserver.
You may take a look at iTextSharp and here's a tutorial on how to get started. There's also a section in the documentation which illustrates how to create tables in PDF documents.
Check out this thread regarding abcPDF. What you are describing is very similar to what I have used their library for. When I was originally testing out various methods, I had tried iTextSharp but found for the very minimal cost, abcPDF did everything that I required (sepcifically HTML to PDF generation) with minimal lines of code.
Generation PDF from HTML (component for .NET)

Need the .NET PDF library to edit pdf info

I need the 100% .NET library to edit PDF Info like Author, Title, Creator, Subject and Keywords. All PDF libraries I tried are unable to do this without completely resaving the hole PDF documents. So for huge files (>35MB) it takes too much time. I need only to update several text fields (see above) and I don't need to resave the entire document for this.
Is there any lib that can do it like image libraries change IPTC/EXIF fields (without changing the original image)?
Thanks for any help,
Murat
iTextSharp (hosted here) ?

Categories

Resources