Accessing CustomXMLPart in document included using INCLUDETEXT field - c#

I have a docx Word document that contains Content Controls bound to data in a CustomXMLPart.
This document (or bookmarks therein) is then included in another Word document by using INCLUDETEXT.
When the first document is included into the second is there any way of getting the CustomXMLPart from the original document (I already have a VSTO Word Addin running in Word looking at the document)?
What I want to do is merge it with the CustomXMLParts already present in the second document so that the Content Controls are still bound to the data in the XMLPart.
Alternatively, is there another way to do this without using the INCLUDETEXT field?

I decided this probably wasn't possible using VSTO and IncludeText fields and investigated using altChunks as an alternative.
I was already doing some processing on the file using the Open XML SDK 2 before opening it so could so the extra work required to merge the document together there.
Although using the altChunk method embeds the whole second document in the first, including its own CustomXmlParts, the CustomXmlParts are discarded by Word when the document is opened and the second merged with the first.
I ended up with code similar to the following. It replaces defined Content Controls with altChunk data and merges specific CustomXmlParts together.
private static void CreateAltChunksInWordDocument(WordprocessingDocument doc, string externalDocumentPath)
{
foreach (var control in doc.ContentControls().ToList()) //Have to do .ToList() on this as when we update the Doc in the loop it stops enumerating otherwise
{
SdtProperties props = control.Elements<SdtProperties>().FirstOrDefault();
if (props == null)
continue;
SdtAlias alias = props.Elements<SdtAlias>().FirstOrDefault();
if (alias == null || !alias.Val.HasValue || alias.Val.Value != "External Template")
continue;
using (WordprocessingDocument externaldoc = WordprocessingDocument.Open(externalDocumentPath, false))
{
//Replace the Content Control with an AltChunk section, and stream in the external file
string altChunkId = "AltChunkId" + Guid.NewGuid().ToString().Replace("{", "").Replace("}", "").Replace("-", "");
AlternativeFormatImportPart chunk = doc.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
chunk.FeedData(File.OpenRead(externalDocumentPath));
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
OpenXmlElement parent = control.Parent;
parent.InsertAfter(altChunk, control);
control.Remove();
XDocument xDocMain;
CustomXmlPart partMain = MyCommon.GetMyXmlPart(doc.MainDocumentPart, out xDocMain);
XDocument xDocExternal;
CustomXmlPart partExternal = MyCommon.GetMyXmlPart(externaldoc.MainDocumentPart, out xDocExternal);
if (xDocMain != null && partMain != null && xDocExternal != null && partExternal != null)
{
MyCommon.MergeXmlPartFields(xDocMain, xDocExternal);
//Save the updated part
using (Stream outputStream = partMain.GetStream())
{
using (StreamWriter ts = new StreamWriter(outputStream))
{
ts.Write(xDocMain.ToString());
}
}
}
}
}
}

Related

OpenXML Word Document - Unreadable content found after altchunk

I build a small application using .NET 6 that is replacing values inside a word document and saving a copy.
Some keys are replaced with other files content using an AltChunk.
Using file A, in which I merge AltChunk1, the output is working fine.
Using file B with same AltChunk1, the output produce the error "found unreadable content" when opening with Word.
Using file B and a different AltChunk file (even the same after I trimmed it) can, in some cases, work.
I don't have any clue what the issue might be.
I tried comparing files using OpenXML productivity tool however:
File A and File B have a lot of differences, it is really hard to find anything that would explain this behavior
They are identical in the place the AltChunk is put.
Tried comparing the not working result with what word is creating with a repair but word is not keeping the AltChunk, it completely merges content of AltChunk with File B making any comparison almost impossible with my non-working result.
Here is the code Is use:
First method is creating the AltChunk from file, then calls methods used to replace "keys" with the wanted value (including case where the key is split accross various runs)
internal static void MergeOutSideDocument(string key, string filePath, IEnumerable<string> outsideDocs)
{
if (string.IsNullOrEmpty(key)) throw new ArgumentException("Cannot replace empty key.");
if (!File.Exists(filePath) || outsideDocs.Any(path => !File.Exists(path))) throw new FileNotFoundException();
using WordprocessingDocument doc = WordprocessingDocument.Open(filePath, true);
List<OpenXmlElement> altChunks = new();
foreach (var outsideDoc in outsideDocs)
{
var existingIds = doc.MainDocumentPart.Document.Body.Descendants<AltChunk>();
string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString();
MainDocumentPart mainPart = doc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open(outsideDoc, FileMode.Open))
chunk.FeedData(fileStream);
altChunks.Add(new AltChunk()
{
Id = altChunkId
});
inMemoryAltChunkIds.Add(altChunkId);
}
var body = doc.MainDocumentPart.Document.Body;
SetElementForKey(key, altChunks,
body.Descendants<Paragraph>().First(par => par.Contains(key)),
body);
}
private static void SetElementForKey(string key, List<OpenXmlElement> replacements, OpenXmlElement el, Body body)
{
List<Run> previousRuns = new();
if (el?.InnerText.Contains(key) != true) return;
for (int i = 0; i <= el.Descendants<Run>().Count(); i++)
{
var innerText = string.Join("", previousRuns.Select(r => r.InnerText));
if (innerText.Contains(key))
{
var usedRuns = GetRequiredRunsForText(previousRuns, key);
var firstRun = usedRuns.First();
MergeRunsWithKey(key, usedRuns, firstRun);
var usedRun = usedRuns.First();
var firstPart = usedRun.InnerText.IndexOf(key) != -1 ? usedRun.InnerText[..usedRun.InnerText.IndexOf(key)] : "";
ReplaceText(key, "", usedRun);
foreach (var replacement in replacements)
el.Parent.InsertAfter(replacement, el);
if (string.IsNullOrEmpty(usedRun.InnerText)) usedRun.Remove();
if (string.IsNullOrEmpty(el.InnerText)) el.Remove();
break;
}
else
{
previousRuns.Add(el.Descendants<Run>().ElementAt(i));
}
}
}
private static void MergeRunsWithKey(string key, List<Run> usedRuns, Run firstRun)
{
while (!usedRuns.First().InnerText.Contains(key))
{
AddText(usedRuns.Skip(1).First().InnerText, firstRun);
usedRuns.Skip(1).First().Remove();
usedRuns.RemoveAt(1);
}
}
private static void AddText(string newText, Run run)
{
Text text = run.Elements<Text>().LastOrDefault();
if (text == null)
{
run.Append(new Text());
text = run.Elements<Text>().Last();
}
text.Text += newText;
if (text.Text.StartsWith(" ") || text.Text.EndsWith(" "))
text.Space = SpaceProcessingModeValues.Preserve;
}
What can I do to understand where the problem lies?
I tried replacing some values I don't understand in File B with the ones from File A (header and footer have rectangles with different gfxdata values, the "recovered" from word was setting the same values as File A).
I tried a different way of generating the AltChunkIds and storing a global list for the file.
I tried comparing various parts of the documents (File A and B or Fil B's result and its recovered version). There are differences, but too many and none seem to be relevant.

Modifying Hyperlink with openxml

I'm trying to modify a hyperlink inside a word document. The hyperlink is originally pointing to a bookmark in a external document. What I want to do is change it to point to an internal bookmark that as the same Anchor.
Here's the code I use... It seem to work when I what the variables but when I look at the saved document it is exactly like the original.
What is the reason my chances don't persist?
// read file specified in stream
MemoryStream stream = new MemoryStream(File.ReadAllBytes("C:\\TEMPO\\smartbook\\text1.docx"));
WordprocessingDocument doc = WordprocessingDocument.Open(stream, true);
MainDocumentPart mainPart = doc.MainDocumentPart;
// The first hyperlink -- it happens to be the one I want to modify
Hyperlink hLink = mainPart.Document.Body.Descendants<Hyperlink>().FirstOrDefault();
if (hLink != null)
{
// get hyperlink's relation Id (where path stores)
string relationId = hLink.Id;
if (relationId != string.Empty)
{
// get current relation
HyperlinkRelationship hr = mainPart.HyperlinkRelationships.Where(a => a.Id == relationId).FirstOrDefault();
if (hr != null)
{
// remove current relation
mainPart.DeleteReferenceRelationship(hr);
// add new relation with relation
// mainPart.AddHyperlinkRelationship(new Uri("C:\\TEMPO\\smartbook\\test.docx"), false, relationId);
}
}
// change hyperlink attributes
hLink.DocLocation = "#";
hLink.Id = "";
hLink.Anchor = "TEST";
}
// save stream to a new file
File.WriteAllBytes("C:\\TEMPO\\smartbook\\test.docx", stream.ToArray());
doc.Close();
You have not yet saved your OpenXmlPackage when you write the stream ...
// types that implement IDisposable are better wrapped in a using statement
using(var stream = new MemoryStream(File.ReadAllBytes(#"C:\TEMPO\smartbook\text1.docx")))
{
using(var doc = WordprocessingDocument.Open(stream, true))
{
// do all your changes
// call doc.Close because that SAVES your changes to the stream
doc.Close();
}
// save stream to a new file
File.WriteAllBytes(#"C:\TEMPO\smartbook\test.docx", stream.ToArray());
}
The Close method states explicitly:
Saves and closes the OpenXml package plus all underlying part streams.
You could also set the AutoSave property to true in which case the OpenXMLPackage will be saved when Dispose is called. The using statement I used above would guarantee that that will happen.

Merged document using AltChunks has innertext empty

I am trying to merge multiple documents into a single one and then open the result document and process it further.
The "ChunkId" is a property that is increased every time this method is called in order to get a unique id. I followed the example from this site.
This is the code used to merge multiple documents (using altchunks):
`
private void MergeDocument(string mergePath, bool appendPageBreak)
{
if (!File.Exists(mergePath))
{
Log.Warn(string.Format("Document: \"{0}\" was not found.", mergePath));
return;
}
ChunkId++;
var altChunkId = "AltChunkId" + ChunkId;
var mainDocPart = DestinationDocument.MainDocumentPart;
if (mainDocPart == null)
{
DestinationDocument.AddMainDocumentPart();
mainDocPart = DestinationDocument.MainDocumentPart;
if (mainDocPart.Document == null)
mainDocPart.Document = new Document { Body = new Body() };
}
try
{
var chunk = mainDocPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
if (chunk != null)
using (var ms = new FileStream(mergePath, FileMode.Open))
{
chunk.FeedData(ms);
}
else
{
Log.Error(string.Format("Merge - Failed to create chunk document based on \"{0}\".", mergePath));
return; // failed to create chunk document, return from merge method
}
}
catch (Exception e)
{
Log.Error(string.Format("Merge - Failed to insert chunk document based on \"{0}\".", mergePath));
return; // failed to create chunk document, return from merge method
}
var altChunk = new AltChunk { Id = altChunkId };
//append the page break
if (appendPageBreak)
try
{
AppendPageBreak(mainDocPart);
Log.Info(string.Format("Successfully appended page break."));
}
catch (Exception ex)
{
Log.Error(string.Format("Eror appending page break. Message: \"{0}\".", ex.Message));
return; // return if page break insertion failed
}
// insert the document
var last = mainDocPart.Document
.Body
.Elements()
.LastOrDefault(e => e is Paragraph || e is AltChunk);
try
{
if (last == null)
mainDocPart.Document.Body.InsertAt(altChunk, 0);
else
last.InsertAfterSelf(altChunk);
Log.Info(string.Format("Successfully inserted new doc \"{0}\" into destination.", mergePath));
}
catch (Exception ex)
{
Log.Error(string.Format("Error merging document \"{0}\". Message: \"{1}\".", mergePath, ex.Message));
return; // return if the merge was not successfull
}
try
{
mainDocPart.Document.Save();
}
catch (Exception ex)
{
Log.Error(string.Format("Error saving document \"{0}\". Message: \"{1}\".", mergePath, ex.Message));
}
}`
If I open the merged document with Word I can see its content (tables, text, paragraphs..), but if I open if from code again it says that inner text is "" (empty string). I need that inner text to reflect what the document contains because I have to replace some placeholders like "##name##" with another text and I can't if the inner text is empty.
This is the innerxml of the merged document,
This is how I open the merged document:
DestinationDocument = WordprocessingDocument.Open(Path.GetFullPath(destinationPath), true);
How can I read the inner text of the document? Or how can I merge these documents into a single one so that this problem would not occur anymore?
When documents merged with AltChunks it is like embedded attachments to the original word document. The client (MS Word) handles the rendering of the altchunk sections. Hence the resulting document won't have the openxml markup of the merged documents.
If you want to use the resulting document for further programmatic post-processing use Openxml Power Tools. pelase refer to my answer here
Openxml powertools - https://github.com/OfficeDev/Open-Xml-PowerTools
The problem is that the documents are not really merged (per se), the altChunk element only defines a place where the alternative content should be placed in the document and it has a reference to that alternative content.
When you open this document in MS Word then it will actually merge all those alternative contents automatically for you. So when you resave that document with MS Word you'll no longer have altChunk elements.
Nevertheless what you can do is actually manipulate with those altChunk DOCX files (the child DOCX documents) just like you do with the main DOCX file (the parent document).
For example:
string destinationPath = "Sample.docx";
string search = "##name##";
string replace ="John Doe";
using (var parent = WordprocessingDocument.Open(Path.GetFullPath(destinationPath), true))
{
foreach (var altChunk in parent.MainDocumentPart.GetPartsOfType<AlternativeFormatImportPart>())
{
if (Path.GetExtension(altChunk.Uri.OriginalString) != ".docx")
continue;
using (var child = WordprocessingDocument.Open(altChunk.GetStream(), true))
{
var foundText = child.MainDocumentPart.Document.Body
.Descendants<Text>()
.Where(t => t.Text.Contains(search))
.FirstOrDefault();
if (foundText != null)
{
foundText.Text = foundText.Text.Replace(search, replace);
break;
}
}
}
}
Alternatively you'll need to use some approach to merge those documents for real. One solution is mentioned by Flowerking, another one that you could try is with GemBox.Document library. It will merge those alternative contents for you on loading (just as MS Word does when opening).
For example:
string destinationPath = "Sample.docx";
string search = "##name##";
string replace = "John Doe";
DocumentModel document = DocumentModel.Load(destinationPath);
ContentRange foundText = document.Content.Find(search).FirstOrDefault();
if (foundText != null)
foundText.LoadText(replace);
document.Save(destinationPath);

OpenXml-SDK: How to apply FontFamily/Size to AltChunk of Type [TextPlain]

Can anybody show me how to apply Fontfamily/size to an AltChunk of Type
AlternativeFormatImportPartType.TextPlain
This is my Code, but I can´t figure out how to do this at all (even Google doesn´t help)
MainDocumentPart main = doc.MainDocumentPart;
string altChunkId = "AltChunkId" + Guid.NewGuid().ToString().Replace("-", "");
var chunk = main.AddAlternativeFormatImportPart
(AlternativeFormatImportPartType.TextPlain, altChunkId);
using (var mStream = new MemoryStream())
{
using (var writer = new StreamWriter(mStream))
{
writer.Write(value);
writer.Flush();
mStream.Position = 0;
chunk.FeedData(mStream);
}
}
var altChunk = new AltChunk();
altChunk.Id = altChunkId;
OpenXmlElement afterThat = null;
foreach (var para in main.Document.Body.Descendants<Paragraph>())
{
if (para.InnerText.Equals("Notizen:"))
{
afterThat = para;
}
}
main.Document.Body.InsertAfter(altChunk, afterThat);
if I do it this way I get "Courier New" with a Size of "10,5"
UPDATE
This is the working Solution I came up with:
Convert Plaintext to RTF, change the Fontfamily/size and apply it to the WordProcessingDocument!
public static string PlainToRtf(string value)
{
using (var rtf = new System.Windows.Forms.RichTextBox())
{
rtf.Text = value;
rtf.SelectAll();
rtf.SelectionFont = new System.Drawing.Font("Calibri", 10);
return rtf.Rtf;
}
}
var chunk = main.AddAlternativeFormatImportPart
(AlternativeFormatImportPartType.Rtf, altChunkId);
using (var mStream = new MemoryStream())
{
using (var writer = new StreamWriter(mStream))
{
var rtf = PlainToRtf(value);
writer.Write(rtf);
writer.Flush();
mStream.Position = 0;
chunk.FeedData(mStream);
}
}
//proceed with creating AltChunk and inserting it to the Document...
How to apply FontFamily/Size to AltChunk of Type [TextPlain]
I am afraid this is NOT possible, in any case, not with OpenXml SDK.
Why?
altChunk (Anchor for Imported External Content) object is further designed for importing content in the document. They are 'temporary' objects: it is a just a reference to an external content, that is incorporated "as is" in the document, and then, when the document will be opened and saved with Word, Word converts this external content in valid OpenXml content.
So you can't, for a newly created document, loop into the paragraphs in order to retrieve it and apply a style.
If you import rtf content for example, the style must be applied to rtf before importing it.
In case of plain text TextPlain (= Text file .txt), there is no style conversion (there is no style attached to the text file, you can change the font in NotePad, it will apply to all documents, this is an Application Level property).
And I can confirm that Word creates by default a style with "Courier New 10,5" to display the content of the file. I just tested.
What can I do?
Apply style after the document has been open/saved with Word. Note you will have to retreive the paragrap(s), or you could try to retrieve the style created in the document and change the font here. This link could help to achieve this:
How to: Apply a style to a paragraph in a word processing document (Open XML SDK).
Or maybe it exists(?) a registry key something Like this that you can change to change Word's default behavior on your computer. And even if it is, it doesn't solve the problem for newly created document which is opened the first time on the client.
Note from the OP:
I think a possible Solution to the Problem could be, converting the PlainText to RTF apply StyleInformation and then append it to WordProcessingDocument as AltChunk.
I totally agreed. Just note when he says apply StyleInformation, it means at rtf level.

OpenXml Sdk - Copy Sections of docx into another docx

I am trying the following code. It takes a fileName (docx file with many sections) and I try to iterate through each section getting the section name. The problem is that I end up with unreadable docx files. It does not error, but I think I am doing something wrong with getting the elements in the section.
public void Split(string fileName) {
using (WordprocessingDocument myDoc =
WordprocessingDocument.Open(fileName, true)) {
string curCliCode = "";
MainDocumentPart mdp = myDoc.MainDocumentPart;
foreach (var element in mdp.Document.Body.ChildElements) {
if (element.Descendants().OfType<SectionProperties>().Count() == 1) {
//get the name of the section from the footer
var footer = (FooterPart) mdp.GetPartById(
element.Descendants().OfType<SectionProperties>().First().OfType
<FooterReference>().First().
Id.Value);
foreach (Paragraph p in footer.Footer.ChildElements.OfType<Paragraph>()) {
if (p.InnerText != "") {
curCliCode = p.InnerText;
}
}
if (curCliCode != "") {
var forFile = new List<OpenXmlElement>();
var els = element.ElementsBefore();
if (els != null) {
foreach (var e in els) {
if (e != null) {
forFile.Add(e);
}
}
for (int i = 0; i < els.Count(); i++) {
els.ElementAt(i).Remove();
}
}
Create(curCliCode, forFile);
}
}
}
}
}
private void Create(string cliCode,IEnumerable<OpenXmlElement> docParts) {
var parts = from e in docParts select e.Clone();
const string template = #"\Test\toSplit\blank.docx";
string destination = string.Format(#"\Test\{0}.docx", cliCode);
File.Copy(template, destination,true);
/* Create the package and main document part */
using (WordprocessingDocument myDoc =
WordprocessingDocument.Open(destination, true)) {
MainDocumentPart mainPart = myDoc.MainDocumentPart;
/* Create the contents */
foreach(var part in parts) {
mainPart.Document.Body.Append((OpenXmlElement)part);
}
/* Save the results and close */
mainPart.Document.Save();
myDoc.Close();
}
}
Does anyone know what the problem could be (or how to properly copy a section from one document to another)?
I've done some work in this area, and what I have found invaluable is diffing a known good file with a prospective file; the error is usually fairly obvious.
What I would do is take a file that you know works, and copy all of the sections into the template. Theoretically, the two files should be identical. Run a diff on them the document.xml inside the docx file, and you'll see the difference.
BTW, I'm assuming that you know that the docx is actually a zip; change the extension to "zip", and you'll be able to get at the actual xml files which compose the format.
As far as diff tools, I use Beyond Compare from Scooter Software.
An approach along the lines of what you are doing will work only for simple documents (ie those not containing images, hyperlinks, comments etc). To handle these more complex documents, take a look at http://blogs.msdn.com/b/ericwhite/archive/2009/02/05/move-insert-delete-paragraphs-in-word-processing-documents-using-the-open-xml-sdk.aspx and the resulting DocumentBuilder API (part of the PowerTools for Open XML project on CodePlex).
In order to split a docx into sections using DocumentBuilder, you'll still need to first find the index of the paragraphs containing sectPr elements.

Categories

Resources