How to read metadata information from docx documents? - c#

what I need to achieve is to have a word document template(docx), which will contain Title, Author name, Date, etc.
This template then will be used by users to complete it. I need to create a c# program, that will take in the docx file and read all the information of interest(title, name, date, ..).
So my questions are:
How do I put the metadata into the template saying: this is Title, this is Date, this is Name, etc? (not programatically)
How do I programmatically read that information?

One way to approach this would be to use Content Controls. In Office, you can create your template, and then for each of your respective inputs of interest you can place one of these controls. They're under the Developer tab in Office.
After inserting your controls you'll need for each of them to have a unique name. Office will let them all have the same name, but you'll need to uniquely identify all of them in your template document.
You now need to get the data that's input in to these controls. Again, there's likely to be some better solutions but Eric White has all kinds of great OpenXML stuff, and so here's one of his: Iterating over Content Controls
I think there's problems with finding content controls nested within a table. So, if you do that, then I think you have to specifically loop over the elements of the table to find content controls within.
Also, you're probably going to want to save a .docx from your .doct file, which I don't think there's any built-in "one-liner" method in OpenXML; however, you can create a new Word document, and then write the file stream of the template in to the newly created docx file. Again, of course, there may be better solutions out there.
Have you been here? There's lots of good stuff:
Introduction to OpenXML
Additionally, Eric has been releasing more and more videos on the OpenXML YouTube channel

1) how do I put the metadata into the template saying: this is Title,
this is Date, this is Name, etc? (not programatically)
You could do that on Info tab in MS Word 2010 as shown below:
2) how do I programmatically read that information?
Once you created your document (or template) you could always look inside it with Open XML SDK 2.0 Productivity Tool (wich is installed with OpenXML SDK) to see where (what classes to use) to get/set some information from/to document.
Also I think this post might help you to solve your task:
Add and update custom document properties in a docx
UPDATE:
Hi Dave,
Please have a look at this MSDN Article - Retrieving Application Properties from Word 2010 Documents by Using the Open XML SDK 2.0
Hope this is exactly what you are looking for.

All OpenXML documents have built in core Metadata that will do what you need through System.IO.Packaging. Once you open the word file using the open xml sdk in c#, you can get to these values via the PackageProperties class. There are 11 Properties you can use.
You "encourage" your user to enter the metadata using Word's Document Information Panel (DIP).
You can force this on by default when they open your template, by a setting in the Developer Toolbar for the template. See the following article on how to set this in your template.
I wrote a quick Windows Form app that displays this information using open xml sdk call to the PackageProperties of the Word file that is displayed above.
Here is the full solution with the sample word file included.
Hope this helps.

Related

Is it possible to generate .docx files without having MS Word installed?

I want to use "OLE automation" (or whatever it's called now) to generate a Word document.
I assume that it's possible to perform the following programmatically:
Set page size (height, width, margin vals)
Set font type/name, style, and size
Add page numbering
Add pages
Insert page breaks
What I'm not sure of is if I need to have MS Word on my system to do this (to have the necessary DLLs, perhaps)? I use Open Office (I like it, and it's free), but I reckon controlling the creation of docs programmatically is probably easier/better documented for MS Word than it is for Open Office and/or Libre Office - that's why I'm strongly considering making this "rendezvous with Redmond."
This question is tangentially related to this one
If Google Docs is a possibility here, I'd be willing to have a "meeting with Mountain View" but I know nothing about that file format or whether it can be "automated" etc.
I need to end up with something that I can either convert to a PDF file or a DOCX file. Open Office can open DOCX and convert files to PDF, but I don't know about Google Docs.
I've found https://docx.codeplex.com/ to be very useful in dynamically building docx documents.
Yes,
it is possible. Check this link: http://www.microsoft.com/en-us/download/details.aspx?id=30425
this is a library for open xml documents (*.docx, *.xlsx and powerpoint files)
yes you can Use Openxml , also with openXml you can create Excel Pdf and ...
Check This out
You can use this library to generate document by template:
https://github.com/StasClick/DocumentGenerator
'DocumentGenerator' can generate one leaflet, multiple leaflets in one document or registers.

Navigate By Columns using Aspose.words in C#

Am evaluating the Aspose.words for one of my client, almost all of the feature i have migrated from MS Word library to Aspose.word library. Just one more to go, but am struggling to find the solution for the below:
We have Template document which is in .docx format. Template has a Two column page layout. at run time system would copy paste the content from other document to this Template document. still this steps works fine.
When i open the template page it looks good with 2 column layout.
But we have some logic that should read the last line of First Column & checks whether the text is in specific format, if it is then moves one line down which would automaticaly moves to the next column.
This logic is easily acheivable in Word but i couldn't find any refference in Aspose.words to implement this.
Also i tried to find different option by convering the document to Xml. & found that there is one node called . but this node is visble only when i save the document as xml From Microsoft word. Not occurs if i save the document as xml from Aspose.words.
Please advice me to solve this issue.
Thanks in advance
Gunasekara S
we just have finished integrating a feature into Aspose.Words to open up access to the rendering engine so that each element of the rendered document can be read as it appears as pages, columns, lines, spans etc. This functionality is exactly what you need and will be available in the next version of Aspose.Words which is expected to release in about a week's time. Soon I will be sharing the code snippet to accomplish your requirement.
My name is Nayyer and I am developer evangelist at Aspose.

Creating word documents within a c# application?

I've been searching on ways to create letters using a C# Windows Form application in Visual Studio based on information from a local SQL server. I've seen some other topics but each answer seems to be really different.
My knowledge is pretty basic about this and wouldn't mind a step in the right direction. Is it actually possible? is there a better solution rather than creating word documents?
I only really need to be able to create a word document which has some text and and tables is this possible?
If you are creating a document of same kind with different data then use a Word template (.dotx) and use content controls or bookmarks in the document.
Advantage: Saves your time in manipulating with formatting and alignment in the document
Then use Open XML to just replace your content controls or bookmarks with the values.
Advantage: You dont need Word Interop assemblies to be deployed. Faster and recommended by Microsoft.
Depending on your needs, you could either simply copy a pre-existing file, or if you need to modify the document from code, you can use Microsoft Office Automation. (see C# office 2010 automation)

Databinding in existing XPS document

I have an existing XPS file that I would like to use as a template and possibly bind data to it. I have tried several methods, but cannot seem to get it to work.
Does anyone have any experience altering an existing XPS file to add data at runtime and then print or save?
Any help is appreciated.
XPS documents conform to the Open XML standard. There is an SDK for working with these docs. Here is a How-to article by Beth Massi: "Accessing Open XML Document Parts with the Open XML SDK".
Since you are working with the internal doc structure you might also check out 'Open XML Package Editor" which lets you explore the doc with Visual Studio. Here is another How-to by Beth Massi: "Handy Visual Studio Add-In to View Office 2007 Files".
+tom
it's a bit of a challenge to do this with XPS, but it is possible.
You can do this with our NiXPS SDK.
I've posted an example on my blog a while ago:
XPS variable data example
Regards,
Nick
Bindings are evaluated during the process of writing to an XPS document. So you can't set up a {Binding} in a FixedDocument, Write that FD to an XpsDocument, and expect to get that original FD back again when you next open that saved doc.
Also, the standard XpsWriter does convert everything into Glyphs on canvases, so you can't, say, a textbox in the original and expect to be able to find it after its been saved to a document.
I've never used the NiXPS libraries, so if Nick says it can be done you might want to check it out.
One last possibility--You can create placeholders in a form that you will be able to find later. They'd have to be text (something like [[{{FORMFIELDHERELOL}}]]) with some kind of delimiter scheme to differentiate the text from everything else. You could then go spelunking in the XML looking for text that fits the delimeter pattern and switch out those glyphs for your binding text. Of course, the issue with THAT is that if you aren't putting X chars in place of X chars you might find you have to do some repositioning. As its all glyphs on canvas this might be slightly harder than, say, threading a needle with a shoelace.

Using Word 2007 as CMS page editor

I have been searching for several hours but i couldn't find anything about this... Basically I would like to create a template or plug-in for word 2007 that would allow someone to create new pages for a CMS. What I have in mind is something similar to blog post template. I know how to create a basic template but I can't find a way to publish the created document using a publish button inside the Word.
thnx in advance
I understand what you are trying to achieve, but Word is the wrong starting point. I would start with a much more basic text editor.
Word is horrible, horrible, horrible. Your site will define clear styles, yet Word will output nasty HTML that won't match your website's CSS definitions.
Your best bet therefore is to have a means to drop the Word file into the site, and have code programmatically analyse it and transform it into site-valid HTML. In Java you could use Apache POI, but that's very raw still. Might be a lot easier in a Microsoft centric world.
Far better, in my opinion, is to force people to learn Markdown, or BBCode, or HTML, or to use a Styled HTML Editor in your CMS - cut and paste plain text in, then style with the CMS defined styles.
As you are using Word 2007 you can export the document as XML and then use XSLT to generate the HTML.
If your CMS has an API or import facility you could convert the output from Word to suit that interface.
You can write a Word macro to add a Publish button/menu option to Word that will generate the correct output.
It's not a bad idea since it's all about the end user. If Word produces bad HTML, you should just make it semantic correct before posting it to the CMS.
I've never done this but I'm sure that it's possible to with .NET via the "Word 2007 Addin"-template (assuming Office 2007).
Good luck!
You can do what you want if you use SharePoint 2007 as your CMS. You can set up a blog on SharePoint 2007 and post to the blog from Word. If you use Office 2007 on the client end then you will get some nice buttons like "post to my blog" etc.
If you can't use SharePoint or are talking about an existing CMS, you have a lot of hurdles to jump through. This is a major undertaking and not something you can get a simple answer out of Stack Overflow.
Have you considered using one of the freely available Javascript WYSIWYG Editors such as TinyMCE http://tinymce.moxiecode.com/? When configured with all the options, it has an impressive amount of functionality and the interface is very similar to Word. I realize this doesn't directly answer your question, but as others have pointed out starting from Word is going to be difficult.
I've been on a team that wrote a Word addin for a custom CMS system. It was written in VB6 and was able to take a Word document and turn basic formatting information - lists, bold, italic and even tables into HTML, which was uploaded to the server. It didn't create new pages or manage the site in the addin though.
I would definitely avoid choosing Word as the editor for your CMS from my experience. The biggest issue is each time you want to update the addin you have to redistribute it to the company or companies using it. You can do this is as an IE active-x control but it's far easier just to handicap the user to a limited set of styling options via a Javascript editor.
Word does have a powerful API for manipulating your content with, however we needed to disable so many options in Word to avoid unwanted fonts and so on, it resembled Wordpad more than Word in the end.
If it's a greenfield project and you have the time, I would infact recommend using Silverlight 4.0 over a Javascript editor. Version 4.0 has a richtextbox control built in, plus there is also the excellent Vectorlight one.
May be it helps you, umbraco CMS allow editing with Microsoft Word.
For some reason this is a feature that Excel enjoys but not Word.
Excel can can automatically publish an HTML file version of your document when you save it.
Unfortunately Word seems to only be able to achieve this functionality when using Sharepoint, which is a shame because it can be quite useful.
What you can do, short of creating your own add-in is to add a bit of code to your template to create a HTML copy of your document whenever the user saves it.
First, make sure your template is macro-enabled (saved as .dotm file).
Second, while editing the template in Word, open the VBA code editor (ALT-F11)
In the project list double-click on your document to open its code-behind file.
Add the following bit of code to it, modifying the ActiveDocument.SaveAs path to something more appropriate to you, like a shared network folder where your CMS exposed by your CMS.
Sub FileSave()
' First Save the main document
ActiveDocument.Save
' Now we create a new document based on the current one
Selection.WholeStory
Selection.Copy
Documents.Add
Selection.PasteAndFormat wdPasteDefault
' Save it as HTML and close it
ActiveDocument.SaveAs "c:\temp\mydoc.html", fileformat:=wdFormatHTML
ActiveDocument.Close
End Sub
This will copy the original file into a blank new one that will be saved to HTML and closed before returning to the original file.
You can check some of the options to the Documents.Add if you want to use a different template than the normal one.
Security
because this template contains macros, you will have to install it with the other templates where Word expect them.
If you don't, then you'll get a security warning.
To avoid getting it, you can add the path where your templates are located to the list of Trusted Locations under Word's Options > Trust Center > Trust Center Settings > Trusted Locations.

Categories

Resources