Headers lose styles after merging documents with Xceed Docx - c#

Let me explain my scenario.
I am making use of Xceed Docx library to merge and manipulate word documents.
I have multiple templates that needs to be merged to form one customer facing document.
All of them having individual document headers, tables and images.
As per business requirements, we need to make use of content controls as there will be manual intervention.
PROBLEM:
All goes well and the merge works as expected, but it seems to drop the styling of the headers in merged document. But this only occurs when I include CONTENT CONTROLS (rich text content control)!
For example: Header 1, Header 2 becomes normal text....
Has anyone experienced anything similar with this library?
Is there something I am doing wrong or missing?

I did try and contact the developers of DocX, with no avail.
I tried merging the files with OpenXml using AltChunk.
This did work but not to the extend that I required.
Let me explain.
AltChunk inserts the entire file (doc2.docx) into the base file(doc1.docx)
and then only add reference of doc2 inside doc1's XML file.
Hope that makes sense.
MS Word can open this file, but when I want to make changes using DocX it is unable to load the file.
I ended up using Docx for all the document manipulation and OpenXmlPowerTools to merge the documents.
OpenXmlPowerTools seems to resolve the above mentioned issue as its does seem to do a complete image, chart and text merge.
I hope this helps someone in the near future ;-P

Related

Merging word documents and preserve their formatting, header and footer

I have a trouble on merging multiple word documents into single. I had a scenario where I am generating word documents from html with header and footer. I have around 10-15 such documents. I am generating these word document individually and are working fine.
Now, I have a requirement to generate html of all 10 pages and combine them to single word report. These should preserve the individuals report's formatting, header and footer.
I have tried this in two ways but didn't get success:
Combined html of all pages into one html page and then finally saved file as word file.
Created word report for all 10 html files individually and merged them using Microsoft.Office.Interop.
I was able to merge the document but was not able to keep the header, footer and formatting of the individual document.
I have searched about section-break too but not sure how to use this.
Please see if anyone can guide me toward the possible solution or anything else that can help me.
Thanks in advance.
.
You could try merging with DocumentBuilder
If that doesn't give you enough control, see whether docx4j.NET (commercial edition) might help, with its demo merge webapp. Docx4j's MergeDocx provides fine grained control over header/footer behaviour.

Navigate By Columns using Aspose.words in C#

Am evaluating the Aspose.words for one of my client, almost all of the feature i have migrated from MS Word library to Aspose.word library. Just one more to go, but am struggling to find the solution for the below:
We have Template document which is in .docx format. Template has a Two column page layout. at run time system would copy paste the content from other document to this Template document. still this steps works fine.
When i open the template page it looks good with 2 column layout.
But we have some logic that should read the last line of First Column & checks whether the text is in specific format, if it is then moves one line down which would automaticaly moves to the next column.
This logic is easily acheivable in Word but i couldn't find any refference in Aspose.words to implement this.
Also i tried to find different option by convering the document to Xml. & found that there is one node called . but this node is visble only when i save the document as xml From Microsoft word. Not occurs if i save the document as xml from Aspose.words.
Please advice me to solve this issue.
Thanks in advance
Gunasekara S
we just have finished integrating a feature into Aspose.Words to open up access to the rendering engine so that each element of the rendered document can be read as it appears as pages, columns, lines, spans etc. This functionality is exactly what you need and will be available in the next version of Aspose.Words which is expected to release in about a week's time. Soon I will be sharing the code snippet to accomplish your requirement.
My name is Nayyer and I am developer evangelist at Aspose.

How to read metadata information from docx documents?

what I need to achieve is to have a word document template(docx), which will contain Title, Author name, Date, etc.
This template then will be used by users to complete it. I need to create a c# program, that will take in the docx file and read all the information of interest(title, name, date, ..).
So my questions are:
How do I put the metadata into the template saying: this is Title, this is Date, this is Name, etc? (not programatically)
How do I programmatically read that information?
One way to approach this would be to use Content Controls. In Office, you can create your template, and then for each of your respective inputs of interest you can place one of these controls. They're under the Developer tab in Office.
After inserting your controls you'll need for each of them to have a unique name. Office will let them all have the same name, but you'll need to uniquely identify all of them in your template document.
You now need to get the data that's input in to these controls. Again, there's likely to be some better solutions but Eric White has all kinds of great OpenXML stuff, and so here's one of his: Iterating over Content Controls
I think there's problems with finding content controls nested within a table. So, if you do that, then I think you have to specifically loop over the elements of the table to find content controls within.
Also, you're probably going to want to save a .docx from your .doct file, which I don't think there's any built-in "one-liner" method in OpenXML; however, you can create a new Word document, and then write the file stream of the template in to the newly created docx file. Again, of course, there may be better solutions out there.
Have you been here? There's lots of good stuff:
Introduction to OpenXML
Additionally, Eric has been releasing more and more videos on the OpenXML YouTube channel
1) how do I put the metadata into the template saying: this is Title,
this is Date, this is Name, etc? (not programatically)
You could do that on Info tab in MS Word 2010 as shown below:
2) how do I programmatically read that information?
Once you created your document (or template) you could always look inside it with Open XML SDK 2.0 Productivity Tool (wich is installed with OpenXML SDK) to see where (what classes to use) to get/set some information from/to document.
Also I think this post might help you to solve your task:
Add and update custom document properties in a docx
UPDATE:
Hi Dave,
Please have a look at this MSDN Article - Retrieving Application Properties from Word 2010 Documents by Using the Open XML SDK 2.0
Hope this is exactly what you are looking for.
All OpenXML documents have built in core Metadata that will do what you need through System.IO.Packaging. Once you open the word file using the open xml sdk in c#, you can get to these values via the PackageProperties class. There are 11 Properties you can use.
You "encourage" your user to enter the metadata using Word's Document Information Panel (DIP).
You can force this on by default when they open your template, by a setting in the Developer Toolbar for the template. See the following article on how to set this in your template.
I wrote a quick Windows Form app that displays this information using open xml sdk call to the PackageProperties of the Word file that is displayed above.
Here is the full solution with the sample word file included.
Hope this helps.

How to split a document in OpenXML SDK

I need to split a document in OpenXml sdk 2.0. The document has sections that each have a footer with a text element (name of the section). Is there a straightforward way to copy from one OpenXml document to another?
DocumentBuilder is the tool you are looking for. See for example, http://blogs.msdn.com/b/ericwhite/archive/2010/01/08/how-to-control-sections-when-using-openxml-powertools-documentbuilder.aspx
This would require a lot of work on your part to copy and merge stylesheets among other things. I'd recommend using altChunk to do the merging for you as it will take care of all the hard stuff for you. Here are two links to help explain it more: How to Use altChunk for Document Assembly and How to: Assemble Multiple Word Processing Documents in One
I have done similar to what you describe using just the OpenXmlSDK. Though I have to say it wasn't much fun, and I was left wanting a solution I didn't have to carve myself. In my case I had to keep footers/headers etc. with section content and split the document into several other documents.
At the time I couldn't locate any samples on identifying which section an element belonged to, and had to write a utility myself. (The way word splits sections is by injecting a section break after the content, and the SDK didn't seem to provide any helpers.) I then had to locate the header definition by using the headerReference and grab that content too, before creating a new document and injecting the header, footer, and section content.
I wish you the best of luck!

Merge documents

I'm trying to merge two docx-documents into one docx-document using OpenXML SDK 2.0. The documents should be merged without loosing their styling and custom headers and footers. I hope I can achieve this using AltChunk and a section break. But I can't get it working.
Is it possible what I'm trying to do? Can someone give me a hint how to achieve this?
The above answer is NOT correct at all! This is EXACTLY what AltChunk has been designed to do, and it works great!
NOTE: that the documents will not be merged into one document UNTIL Word opens the file for the first time (obviously the file has to be saved or the file on disk won't be updated.)
See this blog for more information on how to do it properly:
https://blogs.msdn.com/b/ericwhite/archive/2008/10/27/how-to-use-altchunk-for-document-assembly.aspx?Redirected=true
p.s. As for examining Open XML using the productivity tool, my opinion is to just install the official Visual Studio Open XML add-on and open the Office Documents from Visual Studio to examine them, it's super convenient! :-)
Using the 'Open XML Productivity Tool' I analyzed the structure of a docx-document, and concluded that merging documents with their style, headers, footers, ... is not possible out of the box using Altchunk. You can download the tool seperatly from the open xml sdk.
What I'm doing now, and what is working, is copying everything manually into to document, making sure that all style-references, header-references, footer-references, ... are preserved. This means that I give them a new unique id before I copy them into the document and changing all references from the old id to the new id. There is a lot of code to do this, but the tool mentioned above really helped.
Adding a section break is also quite difficult. You should know that the SectionProperties-tag describes all the properties of the section and that there can be one SectionProperties-tag under the Body-tag, describing the properties of the last section. So adding a new sectionbreak, means copying the last SectionProperties-tag to the last paragraph of the section and adding a new SectionProperties-tag under the Body-tag. I also got al lot of information from the productivity tool.

Categories

Resources