Manipulating and printing docx documents in C# - c#

I have developed an application in C#. I want to enable sers define their own print template for printing certificates. Currently it accepts html and replaces keywords with values. However users are not familiar with html design, converters do not give well designed result, and hand writing html code for getting the same design is time consuming.
I want to enable my application to open the docx file, replace keywords with values and print.
Any idea will be helpful.

Please check out Automated Search and Replace in Word 2007 documents with C#
Also you can look at Microsoft Word Templates?

The MSDN site has info about the Word Object Model.

Related

Creating word documents within a c# application?

I've been searching on ways to create letters using a C# Windows Form application in Visual Studio based on information from a local SQL server. I've seen some other topics but each answer seems to be really different.
My knowledge is pretty basic about this and wouldn't mind a step in the right direction. Is it actually possible? is there a better solution rather than creating word documents?
I only really need to be able to create a word document which has some text and and tables is this possible?
If you are creating a document of same kind with different data then use a Word template (.dotx) and use content controls or bookmarks in the document.
Advantage: Saves your time in manipulating with formatting and alignment in the document
Then use Open XML to just replace your content controls or bookmarks with the values.
Advantage: You dont need Word Interop assemblies to be deployed. Faster and recommended by Microsoft.
Depending on your needs, you could either simply copy a pre-existing file, or if you need to modify the document from code, you can use Microsoft Office Automation. (see C# office 2010 automation)

Convert MS Word Content to HTML

I need to make an upload tool where in the Word document will be converted to HTML format for saving to database. Any idea?
I've written one (see the Doc to HTML Converter).
To implement it, I downloaded the PIAs for Word, which let me open a document using Word, and control the format in which Word then re-saves the document.
Alternatively (instead of doing it yourself) there are tools like mine (and others, more famous) which you can use (some of which don't even use Word).
I know this is an old post, but I just wrote an app that converts a Word-doc to a usable web-page. The app provides some of the requirements in the OP.
The app is WordWebNav (WWN). It's free and open-source.
WWN provides a Word VBA program that converts Word-docs to Word-HTML.
WWN also provides a Python program that converts the Word-HTML to a usable web-page:
It adds missing features to the Word-HTML, e.g., a navigation pane.
And, WWN fixes some common bugs in Word's HTML, e.g., mis-formatted lists, and overly-wide paragraphs.
The Python program uses a CLI, and it can be called externally.
If this is a client application and you have access to Word, why not automate Word? Word can save in HTML (although you will probably have to clean the HTML up a bit). However, I will warn you that this is not very portable; whoever is going to use application will need to have the same version of Word you developed it with.

What is the best way to populate a Word 2007 template in C#?

I have a need to populate a Word 2007 document from code, including repeating table sections - currently I use an XML transform on the document.xml portion of the docx, but this is extremely time consuming to setup (each time you edit the template document, you have to recreate the transform.xsl file, which can take up to a day to do for complex documents).
Is there any better way, preferably one that doesn't require you to run Word 2007 during the process?
Regards
Richard
I tried myself to write some code for that purpose, but gave up. Now I use a 3rd party product: Aspose Words and am quite happy with that component.
It doesn't need Microsoft Word on the machine.
"Aspose.Words enables .NET and Java applications to read, modify and write Word® documents without utilizing Microsoft Word®."
"Aspose.Words supports a wide array of features including document creation, content and formatting manipulation, powerful mail merge abilities, comprehensive support of DOC, OOXML, RTF, WordprocessingML, HTML, OpenDocument and PDF formats. Aspose.Words is truly the most affordable, fastest and feature rich Word component on the market."
DISCLAIMER: I am not affiliated with that company.
Since a DOCX file is simply a ZIP file containing a folder structure with images and XML files, you should be able to manipulate those XML files using our favorite XML manipulation API. The specification of the format is known as WordprocessingML, part of the Office Open XML standard.
I thought I'd mention it in case the 3rd party tool suggested by splattne is not an option.
Have you considered using the Open XML SDK from Microsoft? The only dependency is on .NET 3.5.
Documentation: http://msdn.microsoft.com/en-us/library/bb448854%28office.14%29.aspx
Download: http://www.microsoft.com/downloads/details.aspx?familyid=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en
Use invoke docx lib. it supports table data (http://invoke.co.nz/products/help/docx_tables.aspx). More info at http://invoke.co.nz/products/docx.aspx
Have you considered using VB? You could create a separate assembly to populate your document.
I know you are looking for a C# solution, but the XML literal support is one area where XML literal support could help you populate the document. Create a document in Word to server as a template, unzip the docx, paste the relevant XML section you want to change into you VB code, and add code to fill in the parts you wish to change. It's difficult to say from your description if this would meet your requirements but I would suggest looking into it.

Using Word 2007 as CMS page editor

I have been searching for several hours but i couldn't find anything about this... Basically I would like to create a template or plug-in for word 2007 that would allow someone to create new pages for a CMS. What I have in mind is something similar to blog post template. I know how to create a basic template but I can't find a way to publish the created document using a publish button inside the Word.
thnx in advance
I understand what you are trying to achieve, but Word is the wrong starting point. I would start with a much more basic text editor.
Word is horrible, horrible, horrible. Your site will define clear styles, yet Word will output nasty HTML that won't match your website's CSS definitions.
Your best bet therefore is to have a means to drop the Word file into the site, and have code programmatically analyse it and transform it into site-valid HTML. In Java you could use Apache POI, but that's very raw still. Might be a lot easier in a Microsoft centric world.
Far better, in my opinion, is to force people to learn Markdown, or BBCode, or HTML, or to use a Styled HTML Editor in your CMS - cut and paste plain text in, then style with the CMS defined styles.
As you are using Word 2007 you can export the document as XML and then use XSLT to generate the HTML.
If your CMS has an API or import facility you could convert the output from Word to suit that interface.
You can write a Word macro to add a Publish button/menu option to Word that will generate the correct output.
It's not a bad idea since it's all about the end user. If Word produces bad HTML, you should just make it semantic correct before posting it to the CMS.
I've never done this but I'm sure that it's possible to with .NET via the "Word 2007 Addin"-template (assuming Office 2007).
Good luck!
You can do what you want if you use SharePoint 2007 as your CMS. You can set up a blog on SharePoint 2007 and post to the blog from Word. If you use Office 2007 on the client end then you will get some nice buttons like "post to my blog" etc.
If you can't use SharePoint or are talking about an existing CMS, you have a lot of hurdles to jump through. This is a major undertaking and not something you can get a simple answer out of Stack Overflow.
Have you considered using one of the freely available Javascript WYSIWYG Editors such as TinyMCE http://tinymce.moxiecode.com/? When configured with all the options, it has an impressive amount of functionality and the interface is very similar to Word. I realize this doesn't directly answer your question, but as others have pointed out starting from Word is going to be difficult.
I've been on a team that wrote a Word addin for a custom CMS system. It was written in VB6 and was able to take a Word document and turn basic formatting information - lists, bold, italic and even tables into HTML, which was uploaded to the server. It didn't create new pages or manage the site in the addin though.
I would definitely avoid choosing Word as the editor for your CMS from my experience. The biggest issue is each time you want to update the addin you have to redistribute it to the company or companies using it. You can do this is as an IE active-x control but it's far easier just to handicap the user to a limited set of styling options via a Javascript editor.
Word does have a powerful API for manipulating your content with, however we needed to disable so many options in Word to avoid unwanted fonts and so on, it resembled Wordpad more than Word in the end.
If it's a greenfield project and you have the time, I would infact recommend using Silverlight 4.0 over a Javascript editor. Version 4.0 has a richtextbox control built in, plus there is also the excellent Vectorlight one.
May be it helps you, umbraco CMS allow editing with Microsoft Word.
For some reason this is a feature that Excel enjoys but not Word.
Excel can can automatically publish an HTML file version of your document when you save it.
Unfortunately Word seems to only be able to achieve this functionality when using Sharepoint, which is a shame because it can be quite useful.
What you can do, short of creating your own add-in is to add a bit of code to your template to create a HTML copy of your document whenever the user saves it.
First, make sure your template is macro-enabled (saved as .dotm file).
Second, while editing the template in Word, open the VBA code editor (ALT-F11)
In the project list double-click on your document to open its code-behind file.
Add the following bit of code to it, modifying the ActiveDocument.SaveAs path to something more appropriate to you, like a shared network folder where your CMS exposed by your CMS.
Sub FileSave()
' First Save the main document
ActiveDocument.Save
' Now we create a new document based on the current one
Selection.WholeStory
Selection.Copy
Documents.Add
Selection.PasteAndFormat wdPasteDefault
' Save it as HTML and close it
ActiveDocument.SaveAs "c:\temp\mydoc.html", fileformat:=wdFormatHTML
ActiveDocument.Close
End Sub
This will copy the original file into a blank new one that will be saved to HTML and closed before returning to the original file.
You can check some of the options to the Documents.Add if you want to use a different template than the normal one.
Security
because this template contains macros, you will have to install it with the other templates where Word expect them.
If you don't, then you'll get a security warning.
To avoid getting it, you can add the path where your templates are located to the list of Trusted Locations under Word's Options > Trust Center > Trust Center Settings > Trusted Locations.

Parsing Office Documents

I`d like to be able to read the content of office documents (for a custom crawler).
The office version that need to be readable are from 2000 to 2007. I mainly want to be crawling words, excel and powerpoint documents.
I don`t want to retrieve the formatting, only the text in it.
The crawler is based on lucene.NET if that can be of some help and is in c#.
I already used iTextSharp for parsing PDF
If you're already using Lucene.NET you might just want to take advantage of the various IFilters already available for doing this. Take a look at the open source SeekAFile project. It will show you how to use an IFilter to open and extract this information from any filetype where an IFilter is available. There are IFilters for Word, Excel, Powerpoint, PDf, and most of the other common document types.
There is an excelent open source project POI, only drawback - it is written for Java.
The .net port is somehow very beta.
Here is a good list of various tools for converting Word documents to plaintext, which you can then do whatever with.
Here's a nice little post on c-charpcorner by Krishnan LN that gives basic code to grab the text from a Word document using the Word Primary Interop assemblies.
Basically, you get the "WholeStory" property out of the Word document, paste it to the clipboard, then pull it from the clipboard while converting it to text format. The clipboard step is presumably done to strip out formatting.
For PowerPoint, you do a similar thing, but you need to loop through the slides, then for each slide loop through the shapes, and grab the "TextFrame.TextRange.Text" property in each shape.
For Excel, since Excel can be an OleDb data source, it's easiest to use ADO.NET. Here's a good post by Laurent Bugnion that walks through this technique.
You might also consider checking out DtSearch (www.DtSearch.com). Although it is primarily a searching tool, it does a great job of extracting text from a large number of file types and is considerably cheaper than other options like the Oracle/Stellent OutsideIn technology or the equivalent from Autonomy.
I've been using DtSearch for years and find it indispensible for this type of task.

Categories

Resources