I am using CKeditor to get some data from user and i d like to save whatever user types on this very html editor to Word Document without losing the look and feel.
I have done lot of searching but wansnt able to find any resources actually.
Some of which are :
How can a Word document be created in C#?
Problem writing HTML content to Word document in ASP.NET
so on.
Not sure why there is no clear direction on this.
Any pointers? libraries you can recommend?
You can use Open XML sdk to create word documents programatically without installing word on the server. And you can convert simple html to word document using Html to Openxml. Alternatively you can insert the html file into word document as an alt chunk without losing the formatting (unless you using external css). An example code for it here and this one.
Related
I'm working on a C# project and I need to open a word doc and do a search/replace on it and save the result for later editing within Word itself.
This is to be a stand alone application and not a Word plugin.
Is there any simple code to get me started?
I've searched and not found anything helpful.
EDIT:
Looks like the nuget package DocX will do what I need.
http://docx.codeplex.com/
http://nuget.org/packages/DocX
If you save the doc as a .xml initially from within word you could open it as plain markup (as opposed to a binary) and do a (very rough) search and replace of the raw doc, you'll have to make sure you didn't mangle any tags containing the target words, but it would work.
You'll preserve all formatting and will be able to open/redistribute it as normal in word, the .xml is basically just an uncompressed .docx .
Edit: Giving this is a possible easy solution, not necessarily saying it's the best idea.
with Open XML you can open and manipulate a word document.
There're a data list with hundreds of data items (suppose each item is a customer), and a predefined word document as template, the requirement is - for each data item, fill corresponding data into template fields, and generate a readonly PDF file as result.
Prefered platform is ASP.NET with C#.
I found two solutions:
Change the word document into a PDF form, and use iTextSharp to fill the form fields. But create the PDF form with correct format (font, layout, etc.) is a difficult work, and it needs particular tool and new skill when system user wants to add new template (unless the PDF form is always created by developer).
Add text placeholder in the word file, and the program can read word file, replace text, and convert into PDF. But I'm not sure which components should be used.
I'd like to get some advices on this problem. tks.
Update 20130416:
After some searching & experiments, my conclusion is below:
Client solution: use Microsoft.Office.Interop.Word (Office2007+plugin or Office2012) to read data, convert to pdf, etc. But this method running on server side may be unsafe.
Server solution:
Make PDF form, and use iTextSharp to fill the form fields. The disadvantage has been mentioned above.
Make HTML template, and replace field placeholders, and use iTextSharp+XMLWorker to convert HTML to PDF. The difficulty is create the HTML template manually and optimize the PDF effect.
MS SharePoint Office Automation Service is a server solution based on MS Office, perhaps this method will be easier, but it needs license and SharePoint server cost.
Finally, I chose the HTML template solution for this request. QED.
Another option would be to use Tx Text Control for ASP.NET. They have a
mailmerge feature that allows you to fill data into a word template.
The merged document can easily be saved as a pdf.
For the second option you can use iTextsharp or Aspose which supports the placeholder replacement and generation PDF, it supports creating files based on templates of MSWord and Openoffice which could be usefull for user who do not want to buy MSWord only to create a template.
Another option, you can use nustache templates, fill them with list data and then use xmlworker from ItextSharp to render to pdf.
Is there any way to convert the contents (along with the formatting and embedded images) of a rich contentControl in word 2010 to HTML.
The contentControl.Range.WordOpenXml property returns a XML in the openXml format. However this XML is not a full fledged openXml document mark-up and just a part of it. This denies the possibility of using any HTMLConverter libraries for OpenXML to HTML to be used.
I gave it a try with Power Tools for Open XML by EricWhiteDpe but the markup cannot be even loaded in a WordprocessingDocument object. (Error : Corrupt data)
What else could be my options?
I'm not sure if this will work, but the idea I have is - try creating a new document and add the contents of the contentControl.Range.WordOpenXml to the new document, then save it. You could then take this new document's Open Office XML and covert it to HTML using the tools you mentioned.
I have my application written using C#.net and open XML SDK(2.0).I have spitted the word file paragraph wise and section wise using open XML tags.But i could not find anything about split a word file page wise...
Pleas Guide me get out of this issue.
However, after Word (or Word Automation Services) has rendered the document, you can find the w:lastRenderedPageBreak element, which, combined with hard page breaks, can let you know where page breaks are. There are no guarantees about this - you could potentially go into an Open XML document and alter content using the Open XML SDK, and then the w:lastRenderedPageBreak elements would not be in the correct place.
In my application I am using some templates in docx and pdf format. I am storing this docs to DB as Bytes.
Befor showing/sending this docs back to user or application I need to replace some contents inside the doc. eg:if the doc contain ##username## I need to replace this with the exact username of the customer. I am not getting a proper solution for this. Any good ideas?
For the docx file, your best bet is to use OpenXML, and instead of having special text like ##username##, replace it with a content control that you can fill in.
Since you specified docx, you can use OpenXML, which is great, it's an API. If it has to work with older doc files, then you'll have to automate Word (which should be avoided if at all possible).
For the PDF, your best bet is to create a PDF form, and fill it in a runtime (using a tool like itextsharp).
HTH,
Brian
For DOC / DOCX:
You should use the MSWord object model through MSWord assembly reference (will work only on machines which have msoffice installed.. or else you can use something like ASPOSE word libraries which wont need msoffice installation on server). You can programmatically trigger the Find-Replace context of word through the library's API.
For PDF: You will need a third party library for editing pdf files.. 3rd party libraries like ABCpdf are available.. (not sure whether Adobe itself has something for this)
The same mechanism like for word library.. but I am not sure whether you will be able to trigger the Find-replace context here or do something else... I have not used a pdf generation library.