Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I am trying to convert a docx into a pdf file from an ASP.NET MVC application. I have been using Microsoft interop saveas command til now but it sometimes (not always) fails with the error "command failed". I have seen that it is already deprecated and not supported by Microsoft anymore and Microsoft says it is not recommended to use it anymore from an ASP.NET application so I am trying to get alternatives.
I have seen there is a good one, that is, aspose.words but it is not free. I am interested in a free one. So nowadays is there any free alternative out there that is compatible with Microsoft docx documents and capable to convert into pdf without problems?
I am interested in a free one
There isn't one. Office/Word's .docx file format is incredibly long and complicated (see below), so writing a program that can fully parse a Word document alone is a mammoth undertaking, alone the as-important task of generating a visual-formatting model representation, and then convert that visual model to a PDF file by generating PostScript/PDF commands from it.
This is what OOXML specification looks like when it's printed out:
(Source: https://fussnotes.typepad.com/plexnex/2007/05/ooxml_more_than_1.html )
Then consider all the features and edge-cases present in the Word formatting model: tables, headings, drop-caps, captions, (don't forget embedded and external content using OLE!), floating textboxes, WordArt, and so on.
Non-visual processing of the XML representation of a Word document is actually trivial and can be done with any XML library - though you should use an OOXML-schema-aware library so you process the Word document correctly (so you don't end-up inserting a paragraph into a header, or a caption that fills the page).
Everything else is the difficult (and expensive) part of the problem. This is why, even today, almost 40 years after Word was first released and 15 years after the OOXML format specification was released, third-party software like OpenOffice (nee StarOffice) and Apple iWork still cannot fully and correctly import or render Word documents.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I have a task to convert the word document to PDF file. I need some process which could be done for free.Could any help me the process of converting the Word to PDF file for free with out Microsoft Interop ?
The sad truth is that you proberbly can't.
If it is a simple document with little styling and simple tables, or even less, then yes, you can proberbly find a free solution.
The paid solutions don't really work well either unless it's a somewhat simple document.
I was involved in a project where I made a document-generating system that had to prepare around 24.000 word documents in .docx and .pdf every day, and believe me we tried everything.
The free solution that almost kinda worked when the document did not contain any advances plots or tables was a java solution, docx4j.
We tried using Apose, Gembox and a bunch of others, but none of them could transform the advanced documents to a proper pdf without messing up the formatting.
Try converting something like this: example without using word. It won't work. Or at least it wouldn't approximately a year ago.
We ended up with setting up a dedicated document-server that hosts a very much abused Microsoft Word process that does nothing all day except generate and convert documents.
I would be very happy to discover the presence a decent free (or paid) alternativ. But my experience is that as soon as your document gets very complicated (see the example) no one knows .docx like Microsoft. And it sucks that they can't/won't just make a proper .dll you can include in your project for conversion, but that is the way it is.
If you have only small doc and docx files, you could use the free version of: GemBox.Document
If you want to convert your documents with all styles and so on, i think you have to buy a component. I've spent a lot of time searching for an open source solution, but could not find anything. GemBox.Document has a really good price/performance ratio.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I would like the best way and method to convert from PDF to HTML, Note the PDFs contain different layouts, smart arts, images. Can you please suggest? I would actually prefer an API which I can use in a C# program and thus programmatically convert a number of files. I would prefer converting the images and embed them as base64 itself
Some time ago (2013), I developed a PDF to epub (a variation of HTML) converter.
I also wanted to develop in C# and looked what was available, but the best libraries are in C/C++. You probably know that PDF is a very tricky format, and even the best converters fail on some documents, so you really have to stick with the best options.
From C#, you can easily call C or C++ functions, so using a library in those languages is not be much of a problem.
Poppler http://poppler.freedesktop.org/ is the PDF library that I chose: It is based on Xpdf PDF viewer. It is reliable but you will have to postprocess the HTML code anyway. This package contains command line utilities including pdftohtml, a PDF to HTML converter. Sources files are also available.
Another very good option is PDFLib: http://www.pdflib.com/ It is a commercial product.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm developing a tool in ASP.Net MVC 3 Razor. There is a page where the candidate uploads the Curriculum Vitae (rtf, pdf, doc, docx formats supported).
I've done that part. But now the challenging part for me is that, I need to have another page to view the CV uploaded by the Candidate. So, this is kinda document viewer shown in a browser to see what has been uploaded. This should work even in the absence of Acrobat / MS word installations.
Please can someone throw some light on this?
There are couple of commercial and non-commercial solutions for viewing documents on web. Also there are mostly two different types: Flash-based or HTML-based. Also some viewers are cable of viewing remote files and some don't which you need to upload documents to their servers to be able to use their viewers!
If you need quick and free ways I really recommend following options:
Google Docs document viewer
http://crocodoc.com/ (Also Commercial)
https://viewer.zoho.com/home.do (RIP! Not available anymore!)
If you need to secure documents and limit access to authorized users only, then I really recommend going to commercial solutions which will give you more sophisticated APIs to implement it. I hope this helps :-)
You may give Doconut a try. More details at http://www.doconut.com
It is able to view all popular document formats and works for MVC also.
PS: I am the author of the tool
At work, we use Aspose.Words to convert different document types to XPS and render them in the browser using the Silverlight Document Toolkit. Aspose has components for other types of files as well such as PDF and Excel documents. Both products are commercial and especially the Aspose components are not cheap. The combination has worked great for us so far.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'm now working on a .net/c# project requires to generate a contract pdf file (for printing and browsing purpose) based on some info traced from database.
The file also concludes several pages content which is fixed. It seems that crystal report does not deal well with multiple pages files. I also did research online, someone said use iTextSharp.
The question is the format of the file can be complicated. iTextSharp is kinda of not efficient on this issue.
Anyone has an idea?
PDF Sharp is an excellent library for this. They also have Migra-Doc which allows you to write documents to pdf, xps and rtf. The API is robust and based on GDI, pages shouldn't be a problem, you can even draw tables and stuff.
Quick Samples are here but download the project source, they have a hoard of good samples.
Please take a look at Windward Reports (I'm the CTO at Windward). With Windward you design in Word, Excel, or PowerPoint so anything, no matter how complex, that you can layout in Office, we can then render with data in PDF.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm using GemBox for my project now. Currently my module requires only excel which is why I've tried and really like using this application. Talk to my PM about the possibility of purchasing it but they wanted something similar that might work for word documents as well as just excel so that others might use it as it is more common for word documents to be generated.
These product have both Excel and Word component.
SoftArtisans OfficeWriter
http://www.softartisans.com/
aspose cells and word
http://www.aspose.com/
office.Net
http://www.independentsoft.de/office/index.html
GemBox also has both Excel and Word component:
GemBox.Spreadsheet for Excel in .NET
GemBox.Document for Word in .NET
I have worked both with GemBox and Aspose (in particular using Aspose with Word and PDF) and personally i prefer this second one.
In addition Aspose let you manipulate a lot of file format (pdf, word, excell, power point, one note, eml, msg.. and so on), depending on which license you required.
You can try Docentric Toolkit. It uses Word template documents and fills them with data from any source which can be read by .NET. Templates can even be prepared/modified by end users, is they are permitted to do so.