Can I save a DOCX file as HTML using the DOCX Library? - c#

I am using the DOCX library to manipulate *.docx files.
I would like to save a *docx file as an html file, but this code:
using (DocX sourceDoc = DocX.Create(sourceFilename))
{
sourceDoc.SaveAs(sourceHTMLFileName);
}
...does not work (sourceHTMLFileName is "Bla.html").
Is it possible? If so, how?

The author of DocX has stated in a blog post that his library does not support this feature yet. I got the link from the codeplex page for the library.)
Quote from the link:
I would love to add this functionality to DocX, however there is a problem.
[...]
The only easy way to do this conversion, is to use Microsoft’s Office interop libraries
[...]
Is there no way to do conversions without having Word.exe installed on my machine. I didn’t say that, I said there is no easy way. This looks very promising, now if I could only find the time.
He suggests a workaround using Interop but that might not be possible depending on your environment.
Using SaveAs with a file that ends in .html simply saves a .docx file with the wrong extension; there is no conversion done.

Related

Is it at all possible to convert a document to PDF or edit a PDF in C# using only free software?

I had this stupid idea of creating a template as a .docx or .rtf or .pdf and then replacing the text in that document to generate reports. This seemed like a better way of doing it than using paid reporting software.
Well, I believe I've tried just about everything now and I'm amazed at how impossible it is to do anything with pdfs.
Try 1
HTML -> PDF
A lot harder to design the template. It doesn't look the same when you print it. Never got it working outside of a command line example (not sure how well, say, iTextSharp-LGPL would even work or if it could handle base64 strings as I'm not sure how else you are going to tell it about images). In any case, doing it this way makes it too hard to design the template.
Try 2
OpenXml -> PDF
I stupidly assumed that because Word could save as PDF that OpenXml could to. I was wrong. It cannot save as a PDF.
Try 3
OpenOffice/LibreOffice (docX -> PDF)
It can't read OpenXml which is a problem because I was editing the template as OpenXml and then saving that result (as a .docx) but it can't read that saved document.
Try 4
iTextSharp LGPL
This one just doesn't work, lol. And apparently even though when you google "convert rtf to pdf" the ONLY thing that comes up is iText and its derivatives it doesn't convert rtf documents to pdf documents. I verified this myself (it only saves the text not the formatting) and later found this post to convince me I wasn't doing something wrong.
Try 5
PDF -> PDF
Since converting ANYTHING to a PDF seems to be impossible maybe I can save the template as a PDF and just do a text replace on that. Nope, lol, that is apparently a very difficult thing to do.
Try 6
Pandoc (.odt/.docx -> pdf), (.rtf -> .pdf not supported)
pandoc mockup2.odt -s -o mockup2.pdf
link to the files in the picture. *note, it messes up in the same way if you try converting .odt/.docx to .tex.
What do I do here? Buy software so that I can save a file as PDF? Is that the only option?
I have a solution. I'm not saying it's the best solution. LibreOffice (or possibly OpenOffice if you are so inclined) accepts command line arguments that will do the switch.
soffice.exe --headless --convert-to pdf mockup.odt
*note - this is after I added libreoffice to my path (C:\Program Files\LibreOffice\program). idk why it's called soffice.exe instead of libreoffice.exe.
where i found the answer
relevant documentation
I might have a working solution for you, if you are stuck with the docx-file for the template.
I found one free solution for docx to pdf conversions, without using microsoft.interop, etc.: See first answer in this stack overflow post
It uses two tools: The open xml power tools and DinkToPdf (Which is essentially a wkhtmltopdf wrapper). The html to pdf part works just fine, but the docx to html part looks like a catastrophe at first. You can fix this with custom css (There are some resources online).
Powertools-.NetStandard
DinkToPdf-GitHub
There are more possibilities for proprietary software, like Asposes.Words and Syncfusion file-formats. Most of the proprietary solutions are pretty expensive...
If you are just working on a Windows Environment, where MS-Office is installed, you can use Microsoft.Interop. It is by far the easiest solution (In this post, Interop is mentioned several times Stackoverflow Word to PDF
If you found another (better) working solution, please let me know. I still have not decided if I will use a proprietary or a free solution. :-)

Convert Word to PDF in c# with free libs

I know that exist some post about this issue but never cover my simple problem. I need to print my DOCX or XLXS to PDF. First of all DOCX.
I don't want to use word Interop because this required word on server machine.
I try to use PDFCreator but I can't pass my docx to PDFCreator... or I'm not able to do this.
UPDATE
Using free libs and without word interop
I used NuGet to install something called freeSpire.Doc which claims to convert the first 500 paragraphs for free. Seems to working so far.
File -> Export -> Create PDF/XPS-document

Is it possible to generate .docx files without having MS Word installed?

I want to use "OLE automation" (or whatever it's called now) to generate a Word document.
I assume that it's possible to perform the following programmatically:
Set page size (height, width, margin vals)
Set font type/name, style, and size
Add page numbering
Add pages
Insert page breaks
What I'm not sure of is if I need to have MS Word on my system to do this (to have the necessary DLLs, perhaps)? I use Open Office (I like it, and it's free), but I reckon controlling the creation of docs programmatically is probably easier/better documented for MS Word than it is for Open Office and/or Libre Office - that's why I'm strongly considering making this "rendezvous with Redmond."
This question is tangentially related to this one
If Google Docs is a possibility here, I'd be willing to have a "meeting with Mountain View" but I know nothing about that file format or whether it can be "automated" etc.
I need to end up with something that I can either convert to a PDF file or a DOCX file. Open Office can open DOCX and convert files to PDF, but I don't know about Google Docs.
I've found https://docx.codeplex.com/ to be very useful in dynamically building docx documents.
Yes,
it is possible. Check this link: http://www.microsoft.com/en-us/download/details.aspx?id=30425
this is a library for open xml documents (*.docx, *.xlsx and powerpoint files)
yes you can Use Openxml , also with openXml you can create Excel Pdf and ...
Check This out
You can use this library to generate document by template:
https://github.com/StasClick/DocumentGenerator
'DocumentGenerator' can generate one leaflet, multiple leaflets in one document or registers.

Manipulating Word documents on server without Office installed (ASP.NET)

I'm working on a code to make a MS Word to HTML system. After googleing for about half a minute, I was able to find the code which does exactly what I need. Now.. It works offline on the ASP.NET development server, but it won't work when I upload the files on my server.
I read a couple of posts, and the problem seems to be becouse the server does not have MS Office installed on it. Now, i'm not sure if it has, i'm still avaiting an email from the good people # hosting (but i assume it's not installed), but my question is...
Is there ANY way to make it work without th MS Office installed?
I'm using Microsoft.Office.Interop.Word ver. 12. / ASP 3.5 / C# and the error I'm getting is
Could not load file or assembly
'Microsoft.Office.Interop.Word,
Version=12.0.0.0, Culture=neutral,
PublicKeyToken=71e9bce111e9429c' or
one of its dependencies.
Thank you for your time!
The Interop library is not a "working" library in itself, it is only a wrapper around winword.exe for .NET programs, so using this library does not make any sense if you don't install or use Microsoft Word.
Instead you will need to find a library that allows for manipulating Word Documents. If you can constrain the documents to be in the new format (docx), then it will be quite an easy task, e.g. using the OOXML SDK (as proposed by Stilgar, too). But there are libraries for the old format, too.
Update: I have to admit, although I was convinced I searched and found some libraries for the old doc format before, I do not manage to find those anymore, probably because the result lists is "spoiled" by the many offers for docx. To be clear:
If you can afford to stick to docx (2007 or later) format, you should do that. Office Open XML is a (more or less) open standard based on ZIP and XML, and many tools already exist and will be developed in the future. The old format is much less supported nowadays.
If you have to go for the old format, too, then Aspose (as proposed by Uwe) is the only library I found.
I think the OOXML SDK may contain something but it will only work with docx and not with the old doc.
As for the old formats I am also interested in a cheap and easy way to support them without the need to use the Automation APIs
You should explain better what is the result you want to achieve
NO WAY, MS Office interop needs MS Word do be installed on the server
Depending on you needs, you should find the best 3rd party library (I suggest OpenXml.WordprocessingDocument) but code must be rewritten.
you can use Code7248.word_reader.dll
below is the sample code on how to use Code7248.word_reader.dll
add reference to this DLL in your project and copy below code.
using System;
using System.Collections.Generic;
using System.Text;
//add extra namespaces
using Code7248.word_reader;
namespace testWordRead
{
class Program
{
private void readFileContent(string path)
{
TextExtractor extractor = new TextExtractor(path);
string text = extractor.ExtractText();
Console.WriteLine(text);
}
static void Main(string[] args)
{
Program cs = new Program();
string path = "D:\Test\testdoc1.docx";
cs.readFileContent(path);
Console.ReadLine();
}
}
}

Replacing contents inside docx and pdf file using asp.net c#

In my application I am using some templates in docx and pdf format. I am storing this docs to DB as Bytes.
Befor showing/sending this docs back to user or application I need to replace some contents inside the doc. eg:if the doc contain ##username## I need to replace this with the exact username of the customer. I am not getting a proper solution for this. Any good ideas?
For the docx file, your best bet is to use OpenXML, and instead of having special text like ##username##, replace it with a content control that you can fill in.
Since you specified docx, you can use OpenXML, which is great, it's an API. If it has to work with older doc files, then you'll have to automate Word (which should be avoided if at all possible).
For the PDF, your best bet is to create a PDF form, and fill it in a runtime (using a tool like itextsharp).
HTH,
Brian
For DOC / DOCX:
You should use the MSWord object model through MSWord assembly reference (will work only on machines which have msoffice installed.. or else you can use something like ASPOSE word libraries which wont need msoffice installation on server). You can programmatically trigger the Find-Replace context of word through the library's API.
For PDF: You will need a third party library for editing pdf files.. 3rd party libraries like ABCpdf are available.. (not sure whether Adobe itself has something for this)
The same mechanism like for word library.. but I am not sure whether you will be able to trigger the Find-replace context here or do something else... I have not used a pdf generation library.

Categories

Resources