Convert Word to PDF in c# with free libs - c#

I know that exist some post about this issue but never cover my simple problem. I need to print my DOCX or XLXS to PDF. First of all DOCX.
I don't want to use word Interop because this required word on server machine.
I try to use PDFCreator but I can't pass my docx to PDFCreator... or I'm not able to do this.
UPDATE
Using free libs and without word interop

I used NuGet to install something called freeSpire.Doc which claims to convert the first 500 paragraphs for free. Seems to working so far.

File -> Export -> Create PDF/XPS-document

Related

Is it at all possible to convert a document to PDF or edit a PDF in C# using only free software?

I had this stupid idea of creating a template as a .docx or .rtf or .pdf and then replacing the text in that document to generate reports. This seemed like a better way of doing it than using paid reporting software.
Well, I believe I've tried just about everything now and I'm amazed at how impossible it is to do anything with pdfs.
Try 1
HTML -> PDF
A lot harder to design the template. It doesn't look the same when you print it. Never got it working outside of a command line example (not sure how well, say, iTextSharp-LGPL would even work or if it could handle base64 strings as I'm not sure how else you are going to tell it about images). In any case, doing it this way makes it too hard to design the template.
Try 2
OpenXml -> PDF
I stupidly assumed that because Word could save as PDF that OpenXml could to. I was wrong. It cannot save as a PDF.
Try 3
OpenOffice/LibreOffice (docX -> PDF)
It can't read OpenXml which is a problem because I was editing the template as OpenXml and then saving that result (as a .docx) but it can't read that saved document.
Try 4
iTextSharp LGPL
This one just doesn't work, lol. And apparently even though when you google "convert rtf to pdf" the ONLY thing that comes up is iText and its derivatives it doesn't convert rtf documents to pdf documents. I verified this myself (it only saves the text not the formatting) and later found this post to convince me I wasn't doing something wrong.
Try 5
PDF -> PDF
Since converting ANYTHING to a PDF seems to be impossible maybe I can save the template as a PDF and just do a text replace on that. Nope, lol, that is apparently a very difficult thing to do.
Try 6
Pandoc (.odt/.docx -> pdf), (.rtf -> .pdf not supported)
pandoc mockup2.odt -s -o mockup2.pdf
link to the files in the picture. *note, it messes up in the same way if you try converting .odt/.docx to .tex.
What do I do here? Buy software so that I can save a file as PDF? Is that the only option?
I have a solution. I'm not saying it's the best solution. LibreOffice (or possibly OpenOffice if you are so inclined) accepts command line arguments that will do the switch.
soffice.exe --headless --convert-to pdf mockup.odt
*note - this is after I added libreoffice to my path (C:\Program Files\LibreOffice\program). idk why it's called soffice.exe instead of libreoffice.exe.
where i found the answer
relevant documentation
I might have a working solution for you, if you are stuck with the docx-file for the template.
I found one free solution for docx to pdf conversions, without using microsoft.interop, etc.: See first answer in this stack overflow post
It uses two tools: The open xml power tools and DinkToPdf (Which is essentially a wkhtmltopdf wrapper). The html to pdf part works just fine, but the docx to html part looks like a catastrophe at first. You can fix this with custom css (There are some resources online).
Powertools-.NetStandard
DinkToPdf-GitHub
There are more possibilities for proprietary software, like Asposes.Words and Syncfusion file-formats. Most of the proprietary solutions are pretty expensive...
If you are just working on a Windows Environment, where MS-Office is installed, you can use Microsoft.Interop. It is by far the easiest solution (In this post, Interop is mentioned several times Stackoverflow Word to PDF
If you found another (better) working solution, please let me know. I still have not decided if I will use a proprietary or a free solution. :-)

Is it possible to generate .docx files without having MS Word installed?

I want to use "OLE automation" (or whatever it's called now) to generate a Word document.
I assume that it's possible to perform the following programmatically:
Set page size (height, width, margin vals)
Set font type/name, style, and size
Add page numbering
Add pages
Insert page breaks
What I'm not sure of is if I need to have MS Word on my system to do this (to have the necessary DLLs, perhaps)? I use Open Office (I like it, and it's free), but I reckon controlling the creation of docs programmatically is probably easier/better documented for MS Word than it is for Open Office and/or Libre Office - that's why I'm strongly considering making this "rendezvous with Redmond."
This question is tangentially related to this one
If Google Docs is a possibility here, I'd be willing to have a "meeting with Mountain View" but I know nothing about that file format or whether it can be "automated" etc.
I need to end up with something that I can either convert to a PDF file or a DOCX file. Open Office can open DOCX and convert files to PDF, but I don't know about Google Docs.
I've found https://docx.codeplex.com/ to be very useful in dynamically building docx documents.
Yes,
it is possible. Check this link: http://www.microsoft.com/en-us/download/details.aspx?id=30425
this is a library for open xml documents (*.docx, *.xlsx and powerpoint files)
yes you can Use Openxml , also with openXml you can create Excel Pdf and ...
Check This out
You can use this library to generate document by template:
https://github.com/StasClick/DocumentGenerator
'DocumentGenerator' can generate one leaflet, multiple leaflets in one document or registers.

Need to find DOCX to PDF Conversion C# API

From within C#, I want to be able to take a DOCX file and convert it to PDF.
How can I do this?
The catch is that I would like to do other types too, e.g. images, doc files, etc.
I also ideally would like there to be no office installed on the computer where this software will be running.
Perhaps the answer is to some software that 'prints to pdf'
My software is dealing with arrays of data representing the file, so it would ideally be some kind of API that handles byte arrays.
There aren't a ton of good C# libraries for this one. It's hard to do without COM.
Here's one option:
http://www.aspose.com/categories/.net-components/aspose.pdf-for-.net/default.aspx
If you want something free (but requires Microsoft Word to be installed), you could try using Word itself via .NET code:
http://www.codeproject.com/KB/cs/CreatePDFsForFree.aspx
It isn't the solution for everything but it can be useful at times.
DOCX is Office 2007 format. If you don't mind using the built-in functionality of Office 2007, you might want to check this link out:
http://msdn.microsoft.com/en-us/library/bb412305.aspx
Office automation + Save As Pdf Add-in ?

Replacing contents inside docx and pdf file using asp.net c#

In my application I am using some templates in docx and pdf format. I am storing this docs to DB as Bytes.
Befor showing/sending this docs back to user or application I need to replace some contents inside the doc. eg:if the doc contain ##username## I need to replace this with the exact username of the customer. I am not getting a proper solution for this. Any good ideas?
For the docx file, your best bet is to use OpenXML, and instead of having special text like ##username##, replace it with a content control that you can fill in.
Since you specified docx, you can use OpenXML, which is great, it's an API. If it has to work with older doc files, then you'll have to automate Word (which should be avoided if at all possible).
For the PDF, your best bet is to create a PDF form, and fill it in a runtime (using a tool like itextsharp).
HTH,
Brian
For DOC / DOCX:
You should use the MSWord object model through MSWord assembly reference (will work only on machines which have msoffice installed.. or else you can use something like ASPOSE word libraries which wont need msoffice installation on server). You can programmatically trigger the Find-Replace context of word through the library's API.
For PDF: You will need a third party library for editing pdf files.. 3rd party libraries like ABCpdf are available.. (not sure whether Adobe itself has something for this)
The same mechanism like for word library.. but I am not sure whether you will be able to trigger the Find-replace context here or do something else... I have not used a pdf generation library.

Best way to extract data from Microsoft Word

The release notes of a software have some important data that I would like to extract in every release. Is there a way to extract certain information from Microsoft Word?
The application that I am thinking of would be written in C#, but I am okay if it is any other solution.
All MS Office products (Word, Office, etc.) are totally scriptable, both internally (using VBA) and externally (via OLE Automation, also known as ActiveX; in fact, VBA uses the interface exposed through OLE).
My suggestion would be to look for a library in your language that supports this. Here is a link to a Perl module, Win32::OLE, that does: as you can see, it's quite easy to use and very powerful. The interface should be similar for other languages.
I went through this a few years back. You can:
Use Word to convert the file into some other format, ASCII, RTF, XML etc.
Use some third-party app to convert to another format, such as ASCII.
Access the Word API through OLE and extract the information directly.
I couldn't find any generic libraries to read Word files, and back then all of the applications that read Word files only worked for a subset. Word changed often enough that they had trouble keeping up.
There were some documents that listed the specifics of the older Word file formats, the underlying file structure is outrageously complicated. Without a lot of resources it would be hard to keep code in sync with the file format.
Initially, I used Perl to drive Word and create new documents, but the solution was too fragile. Later I switch the whole application to work with PDFs instead, and gave up on Word.
Paul.
Probably not the most elegant solution but this seems to be the lightest method: Use a Cscript.
Just tried it on a sample word doc(2003) and it works perfectly.
More information: http://www.gregthatcher.com/Papers/VBScript/WordExtractScript.aspx
I did a lot of excel programming with the VSTO (Visual Studio Tools for Office) tools, I think you will be able to use the VSTO API to read a word doc. You should be able to use C#
You could write an IFilter to extract text from word files. No need to have Word installed.
You can work from within Word (VBA, VSTO) or outside it.
From outside it, automation is one approach.
Another is to avoid using Word entirely. If the docs are .docx, you can use anything which can manipulate an Open XML file. Microsoft has its Open XML SDK, and in the Java world you can use docx4j or POI.

Categories

Resources