Embed custom data in Word 2003, 2007, 2010 documents - c#

I'm working on a Word plugin that has to be compatible with all Word versions beginning with 2003. One of the requirements needed for a feature is the ability to embed custom data of any length in a document. The Word.Document.CustomDocumentProperties works for all the required versions but unfortunately limits the size of the data. While researching, I also discovered that Word.Document.CustomXML might be the solution I need. However, I was not able to find out if this property works properly for Word 2003. Additionally, performing CRUD operations must be possible on the custom data but from what I see the CustomXML part only provides methods to add and retrieve the data. How should I tackle this requirement?

Concerning CustomXML, please remember the patent issue Microsoft encountered last year, which lead to a removal of the feature in Office 2007 (at least in the US). See this blog for details. I remember that the removal was limited to the US edition, and that for Office 2010 MS uses another approach. But that means that you depend from the patch state of your customer's Office installation.
You may look into Word document variables instead of document properties. They are supported ever since Word 95 and can be seen as an INI file attached to the document. I've made good experiences with a XML structure saved just in one document variable (the XML content you can also encrypt, if necessary).
However, you can just read or write document variables (or delete them). There is no way to assure CRUD operations through the Word object model in itself; it's your application which is responsible.
Please keep in mind too, that (if your documents go out and come back thr' e-mail attachments), that some of your customers may use some meta-data removal tool; I've seen the use of it in the legal market a lot. Those tools can remove document variables and Office document properties. With Office documents (Word, Excel), it is not possible to have a 100% security that your data will remain in the document.

Related

Creating word documents within a c# application?

I've been searching on ways to create letters using a C# Windows Form application in Visual Studio based on information from a local SQL server. I've seen some other topics but each answer seems to be really different.
My knowledge is pretty basic about this and wouldn't mind a step in the right direction. Is it actually possible? is there a better solution rather than creating word documents?
I only really need to be able to create a word document which has some text and and tables is this possible?
If you are creating a document of same kind with different data then use a Word template (.dotx) and use content controls or bookmarks in the document.
Advantage: Saves your time in manipulating with formatting and alignment in the document
Then use Open XML to just replace your content controls or bookmarks with the values.
Advantage: You dont need Word Interop assemblies to be deployed. Faster and recommended by Microsoft.
Depending on your needs, you could either simply copy a pre-existing file, or if you need to modify the document from code, you can use Microsoft Office Automation. (see C# office 2010 automation)

Give users the ability to edit Word 97-2010 documents within a .NET application

I was wondering if anyone has any idea of any product/method to give my end users the ability to edit Word documents within our C#/.NET application, avoiding the use of Automation and separate instances of Word opening outside of the application. This is a possibility [backup plan!] - but one that I'd rather not have to implement (due to the amount of work involved and having users exit our application).
I know that I could possibly use the WebBrowser control - but from what I've been able to find -- support for this is sketchy at best, and things such as toolbars are not present, and it does not appear to work with Word 2010 anyway.
I've been evaluating a few products that claim to do this but many are lacking in features or produce compatibility errors within documents rendering them useless when opened in Word.
We are using Word 2003 and Word 2010. Our documents start out as .DOCX files through our custom merge/templating processes.
Any suggestions for products or other ideas would be great.
Edit:
We're creating documents without issue using OpenXML. Fun stuff, works really well. However, at the end of the day I would prefer to have users editing the created documents as well as legacy documents (created as .DOC files) within our .NET application directly. Unforunately, with Microsoft removing the ability to embed via ActiveX/OLE, etc. there isn't a way to do this. What I am looking for is a 3rd party product to achieve this, which should be virtually 100% compatible with both the .DOC and .DOCX formats.
For those asking why ? Security, ease of use, etc. We are storing documents in a database. Once I start dropping files on the filesystem and working with Automation support/macros, ... there's a lot of things that would have to be done to get the files back into the database / update, etc. This is made especially difficult since Word doesn't expose the raw bytes[] of a document and files must be saved as temporary files somewhere on the fs. Just a lot of headaches.
So, the "easiest" solution - embed Word [seems not possible] or use a 3rd party product that supports editing .DOC/.DOCX files.
An example is DevExpress XtraRichEdit control - unfortunately, while it supports a lot of nice Word-like/compatible features it only works with .DOCX files.. and isn't 100% feature complete, compared to Word.
The file structure of a word document is huge, it could take hundreds of man hours to program even limited .doc/docx support. What exactly is the reason for using your program to edit a word file over word itself?
I am not exactly sure how Word 2003 has .docx support though, my understanding is there was only a word viewer release when Office 2007 was released, it of course has been years since thats been a problem.
If you are going to actually do this only add support for .doc files since there is more information out there, you can allow word itself to handle the converstion to a .docx file if you want.
You are not going to find a third party product that does this. The amount of effort required to build an app that 100% supports the Word formats is beyond consideration. Not just every feature, but every bug as well would have to be duplicated. Considering the potential legal pitfalls of doing such, no one in their right mind would bother trying. The legal aspects, incidentally, is one of the primary reasons for the new formats.
Which means you have to go external. There are two really good options here.
One would be to hook into Office Live to give them the ability to edit Microsoft Documents online.
Another possibility is to just leverage Sharepoint in your application. It has built in methods for document workflow and integrates nicely with Office.
A third possibility would be to write your own word add-in which would take care of saving / loading the documents from your system. I'd go with the first two above before going this route.
This used to be supported through a feature called OLE Embedding. Support for it has been disappearing from Microsoft software and tools over the past 10 years. Notably .NET has no support for it whatsoever. Office was one of the last hold-outs with 2007 already getting pretty cranky about it. But this indeed looks to be completely gonzo in the 2010 edition. All download links to the DSOFramer control, a generic ActiveX embedding control were removed around the time that 2010 went into beta.
There's no future here, look at VSTO for the road ahead.
Word Automation Services and Office Web Apps (requires SP 2010).
Certainly not 100% coverage of Word features, but have you tried ASPOSE.Words.NET Total or TXTextControl.NET?

Generating a Word Document with C#

Given a list of mailing addresses, I need to open an existing Word document, which is formatted for printing labels, and then insert each address into a different cell of the table. The current solution opens the Word application and moves the cursor to insert the text. However, after reading about the security issues and problems associated with opening the newer versions of Word from a web application, I have decided that I need to use another method.
I have looked into using Office Open XML, but I have not found any good resources that provide concrete information on exactly how to use it. Also, someone suggested that I use SQL reporting services, but searching for information on how to use them, lead me nowhere.
Which method do you think is the most appropriate for my problem?
Code samples and links to good tutorials would be extremely helpful.
Thanks for all the answers, but I really did not want to pay for a plugin and using Word automation was out of the question. So I kept searching and eventually, through some trial and error, found some answers.
After throughly searching through Microsoft's site, I found some newer articles on the Office Open XML SDK. I downloaded the new tools and just started going through each them.
I then found the Document Reflector, which creates a class to generate XML code based off an existing Word Document (.docx). Using my Label Template Document and the code this tool generated, I went through and added a loop that appends table cells for each address. It actually proved to be fairly simple and way faster than using Word automation.
So, if you're still using Word automation check out the Office Open XML tools. Their surprisingly extensive for a free download from Microsoft.
Office Open XML SDK 2.0 Download
I use the Words plugin from Aspose.com to do mail merges (programming guide).
You can take a look show 137 and 138 on dnrTV (www.dnrtv.com). In these video's Beth Massi shows how to do some editing and mail merging with OpenXML. She does this by using the Open XML SDK and xml literals in VB. It requires no third party components. Also it doesn't require MS Office to be installed on the machine.
This video inspired me as a C# developed (and no VB experience) to do some XML manipulation in a separate dll in VB. I call into this dll from my C# application.
It is worth a try.
We have the product Aspose that tvanfosson has mentioned. The edition that we purchased works with SQL Reporting Services so it can be used with the scheduler for creating output. It is really a great product and we used in a system that needed to support Korean characters in the final document. It works great and was under $1K with support. Not bad.
The advantage of using a product like this is that you can continue to manage your data and the skill set required to produce the documents is at a level where a variety of developers can support its use.
Vanstee,
If you really want to do this in code, check out this post I just found on Google
http://kellychronicles.spaces.live.com/blog/cns!A0D71E1614E8DBF8!1364.entry
If you are using reporting services cant you just move the information in the word doc into a database table and read it from there, taking word out of the equation?

Sharepoint as template application

We have a site that needs to (as part of our process) generate a document (e.g. Word docx document) that is derived from data within our application merged with a template document. This document can be edited at run time by the user after it is generated. We know we are looking at a CMS like system (since the users will need to be able to edit/create new templates) but I was wondering if Sharepoint could be of better use (since we don;t need a lot of the overhead of a traditional CMS system). Can any sharepoint experts weigh in and give me some pointers of where to look.
I am working on a similar problem -- generating word documents from information stored in a SharePoint site. The real magic here relies on using Content Controls in office 2007 - as the new version of the Office suite is based on Office Open XML, generating documents from data is almost trivial.
Enabling the documents to be editable after creation is a simple configuration change that can be made either programmatically or in the document template itself. In fact I think the true value of the platform wills shine when you can see how easily you can your organizations business processes to SharePoint.
Getting content from deep within a segment of your company, through the various approval steps to the public facing internet site is breathtakingly simple, once everything is configured properly.
Here are some good blog posts on generating OOXML docs on the server
ECMA Office Open XML - Options for
Generating Word Documents (on the
server;))
Generating Word Documents on the
Server (Part 2) - Dynamically Adding
Content Controls / Structured
Document Tags (SDT) using
System.IO.Packaging
Note that clients DO NOT need to run Office 2007 to open these documents, you can either have a conversion process, or you can install the free compatibility packs for Office XP, 2000 & 2003
As far as SharePoint as CMS, I think its a pretty compelling proposition. There is definitely some configuration and implementation challenges, but I think that will be the case with any enterprise CMS package. One important consideration is the amount of traffic your CMS'd site will see. I don't think SharePoint is ready to scale up to google-esque traffic, but its certainly going to be good enough for a typical corporate internet presence.
Here is a list of some public sites that are running MOSS
Once you get past the hurdle of initial configuration, it becomes very easy to enable CMS tasks across the organization as it works so well on both sides of the Firewall.
I think its a great product, and amazed by its flexibility and extensibility.
jt
Infopath is the forms technology that Sharepoint people are going to recommend to you. Most Infopath forms features require a full MOSS installation and the associated licensing fees. I think you can acheive what you want here with just WSS however.
If DOCX files as the form template is a hard and fast requirement of your app you can;
Automate Word (physically isolate this), this is not officially supported by MS
Explore using OpenXML (assuming all clients are using Word 2007).
If you ultimately need to generate PDFs or Word 2003 DOC files as output you are stuck automating Word otherwise #2 is the most server friendly solution.
Either way I think you can use a SP document library to hold the DOCX files and your users can share and edit the templates with versioning this way. You can programatically access these files in your application specific code and perform the data merging 'out of band'.
Here is the component that generates document based on the custom template. The documents are generated from the sharepoint list ... so the data is pulled from the list item into the document on the fly:
http://store.sharemuch.com/products/generate-word-documents-from-sharepoint-list
Hope that helps,
Yaroslav Pentsarskyy
Blog: www.sharemuch.com
Sharepoint is not going to relieve you of the weight and baggage of a traditional CMS system. If anything it will increase your baggage compared with a lot of CMS tools out there.
So in short - dont look at Sharepoint - look elsewhere!

How can a Word document be created in C#? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a project where I would like to generate a report export in MS Word format. The report will include images/graphs, tables, and text. What is the best way to do this? Third party tools? What are your experiences?
The answer is going to depend slightly upon if the application is running on a server or if it is running on the client machine. If you are running on a server then you are going to want to use one of the XML based office generation formats as there are know issues when using Office Automation on a server.
However, if you are working on the client machine then you have a choice of either using Office Automation or using the Office Open XML format (see links below), which is supported by Microsoft Office 2000 and up either natively or through service packs. One draw back to this though is that you might not be able to embed some kinds of graphs or images that you wish to show.
The best way to go about things will all depend sightly upon how much time you have to invest in development. If you go the route of Office Automation there are quite a few good tutorials out there that can be found via Google and is fairly simple to learn. However, the Open Office XML format is fairly new so you might find the learning curve to be a bit higher.
Office Open XML Iinformation
Office Open XML - http://en.wikipedia.org/wiki/Office_Open_XML
OpenXML Developer - http://openxmldeveloper.org/default.aspx
Introducing the Office (2007) Open XML File Formats - http://msdn.microsoft.com/en-us/library/aa338205.aspx
DocX free library for creating DocX documents, actively developed and very easy and intuitive to use. Since CodePlex is dying, project has moved to github.
I have spent the last week or so getting up to speed on Office Open XML. We have a database application that stores survey data that we want to report in Microsoft Word. You can actually create Word 2007 (docx) files from scratch in C#. The Open XML SDK version 2 includes a cool application called the Document Reflector that will actually provide the C# code to fully recreate a Word document. You can use parts or all of the code, and substitute the bits you want to change on the fly. The help file included with the SDK has some good code samples as well.
There is no need for the Office Interop or any other Office software on the server - the new formats are 100% XML.
Have you considered using .RTF as an alternative?
It supports embedding images and tables as well as text, opens by default using Microsoft Word and whilst it's featureset is more limited (count out any advanced formatting) for something that looks and feels and opens like a Word document it's not far off.
Your end users probably won't notice.
I have found Aspose Words to be the best as not everybody can open Office Open XML/*.docx format files and the Word interop and Word automation can be buggy. Aspose Words supports most document file types from Word 97 upwards.
It is a pay-for component but has great support. The other alternative as already suggested is RTF.
To generate Word documents with Office Automation within .NET, specifically in C# or VB.NET:
Add the Microsoft.Office.Interop.Word assembly reference to your project. The path is \Visual Studio Tools for Office\PIA\Office11\Microsoft.Office.Interop.Word.dll.
Follow the Microsoft code example
you can find here: http://support.microsoft.com/kb/316384/en-us.
Schmidty, if you want to generate Word documents on a web server you will need a licence for each client (not just the web server). See this section in the first link Rob posted:
"Besides the technical problems, you must also consider licensing issues. Current licensing guidelines prevent Office applications from being used on a server to service client requests, unless those clients themselves have licensed copies of Office. Using server-side Automation to provide Office functionality to unlicensed workstations is not covered by the End User License Agreement (EULA)."
If you meet the licensing requirements, I think you will need to use COM Interop - to be specific, the Office XP Primary Interop Assemblies.
Check out VSTO (Visual Studio Tools for Office). It is fairly simple to create a Word template, inject an xml data island into it, then send it to the client. When the user opens the doc in Word, Word reads the xml and transforms it into WordML and renders it. You will want to look at the ServerDocument class of the VSTO library. No extra licensing is required from my experience.
I have had good success using the Syncfusion Backoffice DocIO which supports doc and docx formats.
In prior releases it did not support everything in word, but accoriding to your list we tested it with tables and text as a mail merge approach and it worked fine.
Not sure about the import of images though. On their blurb page http://www.syncfusion.com/products/DocIO/Backoffice/features/default.aspx it says
Blockquote
Essential DocIO has support for inserting both Scalar and Vector images into the document, in almost all formats. Bitmap, gif, png and tiff are some of the common image types supported.
So its worth considering.
As others have mentioned you can build up a RTF document, there are some good RTF libraries around for .net like http://www.codeproject.com/KB/string/nrtftree.aspx
I faced this problem and created a small library for this. It was used in several projects and then I decided to publish it. It is free and very very simple but I'm sure it will help with you with the task. Invoke the Office Open XML Library, http://invoke.co.nz/products/docx.aspx.
I've written a blog post series on Open XML WordprocessingML document generation. My approach is that you create a template document that contains content controls, and in each content control you write an XPath expression that defines how to retrieve the content from an XML document that contains the data that drives the document generation process. The code is free, and is licensed under the the Microsoft Reciprocal License (Ms-RL). In that same blog post series, I also explore an approach where you write C# code in content controls. The document generation process then processes the template document and generates a C# program that generates the desired documents. One advantage of this approach is that you can use any data source as the source of data for the document generation process. That code is also licenced under the Microsoft Reciprocal License.
I currently do this exact thing.
If the document isn't very big, doesn't contain images and such, then I store it as an RTF with #MergeFields# in it and simply replace them with content, sending the result down to the user as an RTF.
For larger documents, including images and dynamically inserted images, I save the initial Word document as a Single Webpage *.mht file containing the #MergeFields# again. I then do the same as above. Using this, I can easily render a DataTable with some basic Html table tags and replace one of the #MergeFields# with a whole table.
Images can be stored on your server and the url embedded into the document too.
Interestingly, the new Office 2007 file formats are actually zip files - if you rename the extension to .zip you can open them up and see their contents. This means you should be able to switch content such as images in and out using a simple C# zip library.
#Dale Ragan: That will work for the Office 2003 XML format, but that's not portable (as, say, .doc or .docx files would be).
To read/write those, you'll need to use the Word Object Library ActiveX control:
http://www.codeproject.com/KB/aspnet/wordapplication.aspx
#Danny Smurf: Actually this article describes what will become the Office Open XML format which Rob answered with. I will pay more attention to the links I post for now on to make sure there not obsolete. I actually did a search on WordML, which is what it was called at the time.
I believe that the Office Open XML format is the best way to go.
LibreOffice also supports headless interaction via API. Unfortunately there's currently not much information about this feature yet.. :(
You could also use Word document generator. It can be used for client-side or server-side deployment. From the project description:
WordDocumentGenerator is an utility to generate Word documents from
templates using Visual Studio 2010 and Open XML 2.0 SDK.
WordDocumentGenerator helps generate Word documents both
non-refresh-able as well as refresh-able based on predefined templates
using minimum code changes. Content controls are used as placeholders
for document generation. It supports Word 2007 and Word 2010.
Grab it: http://worddocgenerator.codeplex.com/
Download SDK: http://www.microsoft.com/en-us/download/details.aspx?id=5124
Another alternative is Windward Docgen (disclaimer - I'm the founder). With Windward you design the template in Word, including images, tables, graphs, gauges, and anything else you want. You can set tags where data from an XML or SQL datasource is inserted (including functionality like forEach loops, import, etc). And then generate the report to DOCX, PDF, HTML, etc.

Categories

Resources