Parsing pdf files [closed] - c#

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have a requirement to split a large pdf document into smaller files based on the content of the file. We use BCL easyPDF to manipulate pdf files. easyPDF can split pdf documents based on a page number, but it cannot split the document based on the file content. Also it does not have a search function (as far as I can tell, if I am wrong please someone let me know.) to determine the location of the content.
Now can someone tell me how I can find the location of text in a pdf file using .net?
Thanks

You might try Docotic.Pdf library for your task.
The library can extract text from PDFs (with or without formatting).
Or you could just retrieve a collection of words with their bounding rectangles from PDFs. This should help you to find location of the text in a file.
Disclaimer: I work for the vendor of the library.

You need a PDF library in .NET such as iText.Net.

take a look at this question. there are links to some libraries that may satisfy your requirements
How to programatically search a PDF document in c#

Related

Is there an effective way to convert PDF to SVG in C# WPF? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I'm struggling on how to convert PDF to SVG in C# WPF (Net5).
I have already read various articles and tested using InkScape.
The various PDF documents I have have a lot of text and images. Math formulas and complex characters are also included.
I needed a Vector image that didn't require any editing, so I was hoping to convert it by changing all the objects to paths.
I was able to give these options to InkScape's Command Line.
The results were very satisfactory.
Most of the characters I have are ignoring fonts and converting them to vector form whenever possible.
However,
There are a few issues.
It's taking too long to convert all the pages in my PDF.
Also, to solve these problems
When the transformation is executed by dividing the task
Conversion is not done properly and InkScape keeps running and hangs.
inkscape --export-filename=d:\testtest.svg --actions="select-all;object-to-path;" --pdf-poppler --pdf-page=190 d:\Test.pdf
The command I used is as above.
Is there any other command to embed all fonts and images inside SVG when converting PDF to SVG?
someone to me Can you recommend a separate library or commercial product that converts PDF to SVG (I can convert any object I want to a path and save it)?
Spire.PDF library provides the ability to convert PDF to SVG, you may give it a try.
Code example:
PdfDocument document = new PdfDocument();
document.LoadFromFile("Test.pdf");
document.SaveToFile(#"E:\Program Files\Result.svg", FileFormat.SVG);
For more information, visit this link.

PDF to HTML Conversion API - Best Option using C# [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I would like the best way and method to convert from PDF to HTML, Note the PDFs contain different layouts, smart arts, images. Can you please suggest? I would actually prefer an API which I can use in a C# program and thus programmatically convert a number of files. I would prefer converting the images and embed them as base64 itself
Some time ago (2013), I developed a PDF to epub (a variation of HTML) converter.
I also wanted to develop in C# and looked what was available, but the best libraries are in C/C++. You probably know that PDF is a very tricky format, and even the best converters fail on some documents, so you really have to stick with the best options.
From C#, you can easily call C or C++ functions, so using a library in those languages is not be much of a problem.
Poppler http://poppler.freedesktop.org/ is the PDF library that I chose: It is based on Xpdf PDF viewer. It is reliable but you will have to postprocess the HTML code anyway. This package contains command line utilities including pdftohtml, a PDF to HTML converter. Sources files are also available.
Another very good option is PDFLib: http://www.pdflib.com/ It is a commercial product.

Word document (.doc & .docx) to pdf conversion using C# in ASP.NET [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am looking for an easy way to convert doc and docx extension files to pdf using C# and ASP.NET. I had previously used iTextSharp for the purpose but that requires creating a document from the scratch. But I want a way to convert the Word files as it is to pdf. Like if it includes images, tables etc they must be converted to pdf as it is. Is there any free library or code? Thanks in advance.
1) You can check this PDFConverter, it might helpful to you. It is a COM component, callable from .NET.
2) Or you can check this open source library PDFSharp.
3) And the third option is Aspose libraray
If you are able to buy a component then you can use Aspose Words, which is best for converting Word document to PDF as it is. There is no need to install Ms Office if this component is used.

Convert (doc,docx,xls,xlsx,jpeg,jpg,txt,pdf,rtf) file into pdf by using asp.net code [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am searching any kind of tool which can be convert any kind of extension of them (doc,docx,xls,xlsx,jpeg,jpg,txt,pdf,rtf) into pdf file in asp.net code like aspose tool but its too much costly i want same functionality like aspose tool but not much expensive.
Please suggest me any kind of tool like this....
Thanks
iTextSharp maybe would do the trick for you?
Here is a link for you
I am sure that if you want to put the time into it, the office com-interop objects can do it. Libre office has an api that can do it as well.
3-Heights have a component (Document Converter) that does this. For HTMl there are several other alternatives (see here: Convert HTML to PDF in .NET).
If you have full control over the web server, you could try to print those documents to a PDF printer or use an installed Acrobat Writer, though I wouldn't recommend this solution. It has somehow the smell of a hack...
Edit: 3 Heights Document Converter Service

what tool should I use to generate multiple pages pdf file? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'm now working on a .net/c# project requires to generate a contract pdf file (for printing and browsing purpose) based on some info traced from database.
The file also concludes several pages content which is fixed. It seems that crystal report does not deal well with multiple pages files. I also did research online, someone said use iTextSharp.
The question is the format of the file can be complicated. iTextSharp is kinda of not efficient on this issue.
Anyone has an idea?
PDF Sharp is an excellent library for this. They also have Migra-Doc which allows you to write documents to pdf, xps and rtf. The API is robust and based on GDI, pages shouldn't be a problem, you can even draw tables and stuff.
Quick Samples are here but download the project source, they have a hoard of good samples.
Please take a look at Windward Reports (I'm the CTO at Windward). With Windward you design in Word, Excel, or PowerPoint so anything, no matter how complex, that you can layout in Office, we can then render with data in PDF.

Categories

Resources