Programmatic Reading of PDFs in C# [closed] - c#

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I see many questions and answers about using C# to generate PDF files. I have a related, but different task.
I have a large number of PDF files already created, and I would like to validate certain parts of the content with Regular Expressions (RegExs). I want to open the PDFs in C#, and be able to read out the text in something approaching a linear fashion.
If headers, footers, any sidebars, etc, get skipped or read out of order, it doesn't matter. I'm just after as much of the main-body text as I can retrieve.
Can you point me towards tools, libraries, API's, etc, that will enable me to programmatically read text in PDF files?

I have used PDFSharp not later than last automn and found it very easy to use in comparison to others. Home page for PDFSharp.

I have successfully used two different libraries for this purpose. One is PDF Box (part of the Apache project), and also one from Snowtide Informatics.
Both are Java libraries, but you can use then with .NET in combination with IKVM.

There is a library for .NET called
PDF Clown
There is also a nice article over at codeProject article
that details a few other libraries and approaches for reading
PDF documents.

Here is another one:
http://csharp-source.net/open-source/pdf-libraries

Looks like iTextSharp was a popular answer Reading PDF documents in .NET
Also check out Reading/Writing PDF files in Visual C# Windows Forms

Related

Diff tool that can be integrated into a C# app [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have a small C# app and I'd like to provide the ability to preview diffs and accept changes. My inputs are only text files. I came across some tools like kdiff3 and winmerge and I was wondering if anyone's integrated them inside a C# app and if yes, how was it done? I also came across some nice projects on CodeProject from an earlier stackoverflow question but since those projects were written in 2004, I was wondering if you have any suggestions for an open source diff and merge tool that I can integrate? Thanks!
Have you checked out csdiff ?
http://code.google.com/p/csdiff/
You might want to checkout DiffPlex. It is (amongst other things) a library that can be used to generate text diffs. It also provides some higher level classes that provide a more complete "diff model" that should be easier to use for rendering diffs in, say, a textbox.
Personally, I have only used it for minor tasks, but it looks powerful enough to handle more sophisticated scenarios as well.
Winmerge, as you mentioned, can be integrated with other apps via the command line. Here's an example of visual studio using these command line parameters to replace the built in diff client. In regards to launching winmerge itself, I found this simple example of how to call an external program from C#.

Print formatted documents [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm developing an application in c# using winforms and I need to print some documents like contracts, amortization tables and some other stuff, all based in predefined templates. Template's structure aren't that complex, just some tables, text formatting and two column pages. The goal is to print documents based on those templates with some data loaded dynamically. First I was thinking to achieve that using some PDF libraries like PDFSharp or iText but then I found out that there are other technologies like XPS or XSL-FO maybe suitable for my needs.
What are your recommendations guys? Any help would be appreciated.
How about doing office interop and just using Word templates?
The client would probably have to want to do this. The thing I like about it, is with bookmarks is pretty easy to replace data in the template with data from your application. It's also convenient when some of the static text changes. You can just update the templates and distribute them, no recompiling. Also if the client wants to preview it in Word before printing that is somewhat automatic. There are some nitty gritty details about using interop but with C# 4.0 it certainly got a lot easier and is always a good skill to have.
On the downside if your templates are in a convenient location an ornery user could go in and modify things and break the reports. Also with tables if you have a lot of data, the standard documented method of adding a cell at a time can be pretty slow. Creating delimited text and then doing the text-to-table call is pretty fast but then you have to do all the formatting in code.
On the XPS side I did see a good overview video on pluralsight. It was part of the WPF course so I'm not sure how it translates to a winform app. I think you get to watch at least a couple of hours for free so it might be worth checking out if you want to get an overview of how XPS printing works.

C# .NET RTF to HTML Convertor, and HTML to RTF Convertor Library [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I need a library for my ASP.NET MVC3 Application that can convert to and from HTML and RTF.
There are a lot out there that do just one, converting RTF to HTML. But I need to be able to go backwards too.
The closest Ive come is:
http://code.msdn.microsoft.com/windowsdesktop/Converting-between-RTF-and-aaa02a6e
But that throws the error The calling thread must be STA, because many UI components require this. Despite Ive not changed the code at all.
Disclaimer: I'm working for this company.
Doomsknight, try to use our dll library. about RTF to HTML i have posted here, what about HTMl to RTF our company also have the component for this "HTML to RTF" here http://www.htmltortf.com/convert-html-to-rtf-net/html-to-rtf-csharp-aspnet.php. Online version of HTML to RTF Pro DLL.Net also available.
Small sample code to help:
Convert HTML file to RTF file in C#:
SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();
h.PageStyle.PageSize.Letter();
h.ConvertFile(#"c:\test.htm", #"c:\test.rtf");
All the best :)
You should use xslt, if your html is xml compliant for the converstion to rtf. Do a search for html2rtf.xsl[t] and you should be able to find something.

Free OCR SDK for .net which can extract text,tables with format and images into Office word document [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I want to have a free OCR SDK which can extract text, tables with data and images from scanned document files (.tiff,.png etc) and store into Office Word document file.
Please help me to short out this issue. I have already done extracting text only from images using MODI but could not get the way using MODI how to extract tables and images and store into Office Word Document file.
I’m not sure whether opensource SDKs can solve your tasks. Based on what you describe I see that you need a complex ocr application with document logical structure reconstruction functions. If you are planning business software you may look at ABBYY FineReader Engine. It has a set of document analyzing and reconstruction features, provides api for c# and it’s free to try. It’s not affordable for free-to-use programs, but when it comes to business software – ABBYY OCR technologies can add a serious value to your product, so consider trying it out. I work # ABBYY and can provide you additional info if necessary.
Best regards, Nikolay.

MS-Office Document Conversion to .PDF Format [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am looking for a MS-Office document to .PDF 3rd party software that does not create the need for my code to manipulate the COM directly. I am looking for a package that is native to .net. I have already looked at the following:
http://www.cete.com/
http://www.pdfonline.com
Are there any other SDK packages that you are aware of that can meet my needs?
If the software package manipulates the COM on it's own, that is fine. I just don't want to perform any operations against the COM within my code. I would also prefer it to be C# based.
I think you might find one here: http://www.codeproject.com/KB/cs/sertf2pdf.aspx ; and perhaps here as well: www.novapdf.com/kb/convert-word-to-pdf-microsoft-office-word-documents-to-pdf-208.html.
Have a look at ASPOSE.NET Total & TXTextControl .NET.
Have a look at the PDF Conversion Services. It provides everything in a modern and convenient Web Services based SDK. All self contained, no need for IIS.
C# sample code can be found here.
Did you check on the win32ole for converting the microsoft document into PDF
Check the saveas command on the document object of the Win32ole.
You can save an MS office doc(doc/xls/PPT) as PDf

Categories

Resources