Generating a .docx from a .dotx using merge (SimpleField) fields - c#

So, first off here's my code to open the dotx and create a new docx copy (of which the copy is then modified). Cut for brevity, but essentially takes 3 params a data table (to make it usable by legacy systems), the UNC path as a string to a template and a UNC path as a string to the output document:
using (WordprocessingDocument docGenerated = WordprocessingDocument.Open(outputPath, true))
{
docGenerated.ChangeDocumentType(WordprocessingDocumentType.Document);
foreach (SimpleField field in docGenerated.MainDocumentPart.Document.Descendants<SimpleField>())
{
string mergeFieldName = GetFieldName(field).Trim();
DataRow[] dr = dtSchema.Select("FieldName = '" + mergeFieldName + "'");
if (dr.Length > 0)
{
string runProperties = string.Empty;
foreach (RunProperties property in field.Descendants<RunProperties>())
{
runProperties = property.OuterXml;
break;
}
Run run = new Run();
run.Append(new RunProperties(runProperties));
run.Append(new Text(dr[0]["FieldDataValue"].ToString()));
field.Parent.ReplaceChild<SimpleField>(run, field);
}
}
docGenerated.MainDocumentPart.Document.Save();
}
What I did initially was take a .dot template and re-save it as a .dotx and crossed my fingers, didn't work. So instead I tried deleting all merge fields in the .dotx and adding them again. This worked - but it would only find one merge field (as a SimpleField), specifically the last one added before saving the .dotx. Looking further at the template using the open XML productivity tool I can see that all other merge fields are of type w:instrText which is why they're being ignored.
I'm literally just starting out with OpenXML as we're looking to replace our current office automation with it so I know very little at this point. Could someone please instruct me a bit further or point me to a good resource? I've Google'd around a bit but I can't find my specific problem. I am trying to put off reading through the whole SDK documentation (I know, I know!) as I need to get a solution put together quickly so am focusing on a single task which is to take our existing .dot templates, convert them to .dotx and just replace merge fields with data to derive a .docx.
Thanks in advance!

Working with OpenXml - you don't strictly need to use .dotx for your templates, instead you can make your templates just using DocX straight away. A good resource of learning OpenXML is obviously http://openxmldeveloper.org/ and you can find a good pdf read there.
Also worth looking at third party API docx.codeplex.com which I am using now for developing server side doc automation solution for my company. See the example http://cathalscorner.blogspot.co.uk/2009/08/docx-v1007-released.html which is similar to your scenario.. merging fields with data..
Hope this helps..

Here is the link to "C# OpenXML Mail Merge Complete Example" http://www.jarredcapellman.com/2012/10/22/c-openxml-mail-merge-complete-example/
I used it to update my web application code that used to work with mergefields in .DOC (.DOT) files (and required MS Word to be installed and the corresponding DCOM to be configured on the web server). Now the solution requires just OpenXML SDK 2.0 to be installed on the web server and the .DOC (.DOT) templates to be saved as .DOCX (.DOTX).

If you want to use content controls you can also try WordDocumentGenerator. WordDocumentGenerator is an utility to generate Word documents from templates using Visual Studio 2010 and Open XML 2.0 SDK # http://worddocgenerator.codeplex.com

Related

Change Document template path in multiple Word Templates

I have about 500 word documents that I need to change the server name in the Document template path. I am not an expert with VBA but I have tried several solutions that have not worked for me. Is there a way to do this (perhaps with C#, with a foreach loop on the directory?) that I can do a very simple find and replace on this field ?
i.e.
\\ASDCFS\NtierFiles\...
becomes
\\NewServer\NtierFiles\...
You can't write to the field in the dialog box directly. In the object model the equivalent is Document.AttachedTemplate and yes, you can work with that. Using in the object model (whether using VBA or C#) you'd loop the documents in a folder, open each in Word, assign the correct path, save and close.
More efficient and less prone to "hiccups" if the original template path is already invalid, would be to edit the documents' Word Open XML directly, without using the Word application. The Open XML SDK would be a good tool for this. It provides the AttachedTemplate class (https://msdn.microsoft.com/en-us/library/documentformat.openxml.wordprocessing.attachedtemplate(v=office.14).aspx).
You can use WTC to correct the template path in a bulk of documents. You find the source code and binary on Github: https://github.com/NeosIT/wtc

Word 2010 doesn't support XML Schemes... any alternatives? (asp.net)

I'm looking for new alternatives in creating/changing doc/docx files from word template (asp.net). In the past I was creating documents using XML schemes, but since i4i case Word 2010 don't support them. Microsoft suggests custom controls, but I have large documents with quite a few dynamically generated tables and I don't think that these controls are a good solution.
Does anyone have any alternatives similar to XML Schemes?
Check out the Open XML SDK 2.0 from Microsoft.
Then you can use the Simple OOXML project to get you started.
Does not work for legacy office documents (i.e. doc/xls) only the new ones like docx and xslx.
hint: a docx/xslx is really a zip file full of xml documents. just change the extension to zip.

Merge documents

I'm trying to merge two docx-documents into one docx-document using OpenXML SDK 2.0. The documents should be merged without loosing their styling and custom headers and footers. I hope I can achieve this using AltChunk and a section break. But I can't get it working.
Is it possible what I'm trying to do? Can someone give me a hint how to achieve this?
The above answer is NOT correct at all! This is EXACTLY what AltChunk has been designed to do, and it works great!
NOTE: that the documents will not be merged into one document UNTIL Word opens the file for the first time (obviously the file has to be saved or the file on disk won't be updated.)
See this blog for more information on how to do it properly:
https://blogs.msdn.com/b/ericwhite/archive/2008/10/27/how-to-use-altchunk-for-document-assembly.aspx?Redirected=true
p.s. As for examining Open XML using the productivity tool, my opinion is to just install the official Visual Studio Open XML add-on and open the Office Documents from Visual Studio to examine them, it's super convenient! :-)
Using the 'Open XML Productivity Tool' I analyzed the structure of a docx-document, and concluded that merging documents with their style, headers, footers, ... is not possible out of the box using Altchunk. You can download the tool seperatly from the open xml sdk.
What I'm doing now, and what is working, is copying everything manually into to document, making sure that all style-references, header-references, footer-references, ... are preserved. This means that I give them a new unique id before I copy them into the document and changing all references from the old id to the new id. There is a lot of code to do this, but the tool mentioned above really helped.
Adding a section break is also quite difficult. You should know that the SectionProperties-tag describes all the properties of the section and that there can be one SectionProperties-tag under the Body-tag, describing the properties of the last section. So adding a new sectionbreak, means copying the last SectionProperties-tag to the last paragraph of the section and adding a new SectionProperties-tag under the Body-tag. I also got al lot of information from the productivity tool.

What is the best way to populate a Word 2007 template in C#?

I have a need to populate a Word 2007 document from code, including repeating table sections - currently I use an XML transform on the document.xml portion of the docx, but this is extremely time consuming to setup (each time you edit the template document, you have to recreate the transform.xsl file, which can take up to a day to do for complex documents).
Is there any better way, preferably one that doesn't require you to run Word 2007 during the process?
Regards
Richard
I tried myself to write some code for that purpose, but gave up. Now I use a 3rd party product: Aspose Words and am quite happy with that component.
It doesn't need Microsoft Word on the machine.
"Aspose.Words enables .NET and Java applications to read, modify and write Word® documents without utilizing Microsoft Word®."
"Aspose.Words supports a wide array of features including document creation, content and formatting manipulation, powerful mail merge abilities, comprehensive support of DOC, OOXML, RTF, WordprocessingML, HTML, OpenDocument and PDF formats. Aspose.Words is truly the most affordable, fastest and feature rich Word component on the market."
DISCLAIMER: I am not affiliated with that company.
Since a DOCX file is simply a ZIP file containing a folder structure with images and XML files, you should be able to manipulate those XML files using our favorite XML manipulation API. The specification of the format is known as WordprocessingML, part of the Office Open XML standard.
I thought I'd mention it in case the 3rd party tool suggested by splattne is not an option.
Have you considered using the Open XML SDK from Microsoft? The only dependency is on .NET 3.5.
Documentation: http://msdn.microsoft.com/en-us/library/bb448854%28office.14%29.aspx
Download: http://www.microsoft.com/downloads/details.aspx?familyid=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en
Use invoke docx lib. it supports table data (http://invoke.co.nz/products/help/docx_tables.aspx). More info at http://invoke.co.nz/products/docx.aspx
Have you considered using VB? You could create a separate assembly to populate your document.
I know you are looking for a C# solution, but the XML literal support is one area where XML literal support could help you populate the document. Create a document in Word to server as a template, unzip the docx, paste the relevant XML section you want to change into you VB code, and add code to fill in the parts you wish to change. It's difficult to say from your description if this would meet your requirements but I would suggest looking into it.

Programatically Break Apart a PDF created by a scanner into separate PDF documents

I have PDF documents from a scanner. This PDF contain forms filled out and signed by staff for a days work. I want to place a bar code or standard area for OCR text on every form type so the batch scan can be programatically broken apart into separate PDF document based on form type.
I would like to do this in Microsoft .net 2.0
I can purchase the require Adobe or other namespaces/dll need to accomplish the task if there are no open source namespaces/dll's available.
Not a free or open source option, but you might also look at ABCPdf by webSuperGoo as another alternative to Adobe.
You can research the iTextSharp library, which can split pdf files.
But it isn't very good for reading the actual pdfs. So I have no idea how it would know where to split them.
There are companies that already do this for you.
You can research the kwiktag company.
iTextSharp will help you split, reassemble, and apply barcodes to pdf's in .NET languages. I dont think it can OCR a document, but I havent looked (I used Abby fine Reader engine).
From the title of your question I'm assuming that you just need to break apart PDF files and that they are already OCR'd. There are a few open source .NET PDF libraries out there. I have successfully used PDFSharp in a project of my own.
Here is a quick snippet that shows how to cull out each page from a PDF document using PDFSharp:
string filePath = #"c:\file.pdf";
using (PdfDocument ipdf = PdfReader.Open(filePath, PdfDocumentOpenMode.ReadOnly))
{
int i = 1;
foreach (PdfPage page in ipdf.Pages)
{
using (PdfDocument opdf = new PdfDocument())
{
opdf.Version = ipdf.Version;
opdf.AddPage(page);
opdf.Save("page " + i++ + ".pdf");
}
}
}
Assuming also that you need to access the text in the document for grouping you can use the PdfPage.Contents property.
You can use several, try these free tools:
PDF Toolkit
Multivalent
check out the Tesseract .NET wrapper (v 2.04.0) around the c++ ocr engine by the same name developed by hp in the late 90's, it won awards for its ingenuity

Categories

Resources