Write word form fields with DocumentFormat.OpenXML SDK - c#

I am writing an application which should use DocumentFormat.OpenXML SDK for writing data to form fields in a word template. But I cannot find a property in the document-object of the SDK where the form fields are stored.
I tried this code:
using (WordprocessingDocument document = WordprocessingDocument.Open("Path/To/document.dotx", true))
{
document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
MainDocumentPart mainPart = document.MainDocumentPart;
var fields = mainPart.Document.Body.Descendants<FormFieldData>();
foreach (var field in fields)
{
if (field.GetType() == typeof(FormFieldData))
{
if (field.LocalName == "Name")
{
Console.WriteLine("Hi!");
}
}
}
}
But fields is always null.

You can do that by replacing this line:
if (field.LocalName == "Name")
with this one:
if (((FormFieldName)field.FirstChild).Val.InnerText.Equals("Name"))
Besides, you can use the following code to put a text inside the form field element using the function SetFormFieldValue provided in the another SO answer:
if (((FormFieldName)field.FirstChild).Val.InnerText.Equals("Name"))
{
TextInput text = field.Descendants<TextInput>().First();
SetFormFieldValue(text, "Put some text inside the field");
}
See Write data into TextInput elements in docx documents with OpenXML 2.5 for the implementation of SetFormFieldValue

Is it possible that your document is using Custom Properties to fill the field of the form? Try to have a look at this MSDN page that explains how to read and manipulate custom properties.

Related

Replace text in docx file with content of another docx file

I'm trying to use OpenXml to replace a text "Veteran" in file A.docx with content in B.docx . If B.docx contains text or paragraph , it works fine and I get modified A.docx file.
However, if B.docx contains a table, then the code doesn't work.
static void Main(string[] args)
{
SearchAndReplace(#"C:\A.docx", #"C:\B.docx");
}
public static void SearchAndReplace(string docTo, string docFrom)
{
List<WordprocessingDocument> docList = new List<WordprocessingDocument>();
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(docTo, true))
using (WordprocessingDocument wordDoc1 = WordprocessingDocument.Open(docFrom, true))
{
var parts = wordDoc1.MainDocumentPart.Document.Descendants().FirstOrDefault();
docList.Add(wordDoc);
docList.Add(wordDoc1);
if (parts != null)
{
foreach (var node in parts.ChildElements)
{
if (node is Table)
{
ParseTable(docList, (Table)node, textBuilder);
}
}
}
}
}
public static void ParseText(List<WordprocessingDocument> wpd, Paragraph node, StringBuilder textBuilder)
{
Body body = wpd[0].MainDocumentPart.Document.Body;
Body body1 = wpd[1].MainDocumentPart.Document.Body;
string content = body1.InnerXml;
var paras = body.Elements<Paragraph>();
foreach (var para in paras)
{
foreach (var run in para.Elements<Run>())
{
foreach (var text in run.Elements<Text>())
{
if (text.Text.Contains("Veteran"))
{
run.InnerXml.Replace(run.InnerXml, content);
break;
}
}
}
}
}
public static void ParseTable(List<WordprocessingDocument> wpd, Table node, StringBuilder textBuilder)
{
foreach (var row in node.Descendants<TableRow>())
{
textBuilder.Append("| ");
foreach (var cell in row.Descendants<TableCell>())
{
foreach (var para in cell.Descendants<Paragraph>())
{
ParseText(wpd, para, textBuilder);
}
textBuilder.Append(" | ");
}
textBuilder.AppendLine("");
}
}
}
}
How to make this work ? Is there a better way to replace content with another docx file?
Not having enough detail for a specific answer, here's how you solve such problems in general:
Ensure you understand the Open XML specification and valid Open XML markup on an appropriate level of detail.
If you don't understand what w:document, w:body, w:p, w:r, w:t, w:tbl, etc. are and how they relate to each other, you have no chance.
You must look at actual Open XML markup, e.g., using the Open XML Productivity Tool or the Open XML Package Editor for Modern Visual Studios to get to an appropriate level of understanding and develop Open XML-based solutions.
Understand that most Open XML-related code transforms some source markup into some target markup. Therefore, you must:
understand the source and target markup first and then
define the transformation required to create the target from the source.
Depending on what you need to do, the Open XML Productivity Tool can help create the transforming code. If you have a source and target document, you can use the Productivity Tool to compare those documents. This shows the difference in the markup, so you see what markup is created, deleted, or changed. It even shows you the Open XML SDK-based code required to effect the change.
In my own use cases, I typically prefer to write recursive, pure functional transformations. While you need to wrap your head around the concept, this is an extremely powerful approach.
In your case, you should:
take a few representative, manually-created samples of source (A.docx with "Vetaran" still to be replaced) and target (A.docx with "Veteran" replaced as desired) documents;
look at the Open XML markup of the source and target documents; and
write code that creates the target markup.
Once you have created code that at least tries to create valid target Open XML markup, you could come back with further questions in case you identify further issues.

How to access OpenXML content by page number?

Using OpenXML, can I read the document content by page number?
wordDocument.MainDocumentPart.Document.Body gives content of full document.
public void OpenWordprocessingDocumentReadonly()
{
string filepath = #"C:\...\test.docx";
// Open a WordprocessingDocument based on a filepath.
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Open(filepath, false))
{
// Assign a reference to the existing document body.
Body body = wordDocument.MainDocumentPart.Document.Body;
int pageCount = 0;
if (wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text != null)
{
pageCount = Convert.ToInt32(wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text);
}
for (int i = 1; i <= pageCount; i++)
{
//Read the content by page number
}
}
}
MSDN Reference
Update 1:
it looks like page breaks are set as below
<w:p w:rsidR="003328B0" w:rsidRDefault="003328B0">
<w:r>
<w:br w:type="page" />
</w:r>
</w:p>
So now I need to split the XML with above check and take InnerTex for each, that will give me page vise text.
Now question becomes how can I split the XML with above check?
Update 2:
Page breaks are set only when you have page breaks, but if text is floating from one page to other pages, then there is no page break XML element is set, so it revert back to same challenge how o identify the page separations.
You cannot reference OOXML content via page numbering at the OOXML data level alone.
Hard page breaks are not the problem; hard page breaks can be counted.
Soft page breaks are the problem. These are calculated according to
line break and pagination algorithms which are implementation
dependent; it is not intrinsic to the OOXML data. There is nothing
to count.
What about w:lastRenderedPageBreak, which is a record of the position of a soft page break at the time the document was last rendered? No, w:lastRenderedPageBreak does not help in general either because:
By definition, w:lastRenderedPageBreak position is stale when content has
been changed since last opened by a program that paginates its
content.
In MS Word's implementation, w:lastRenderedPageBreak is known to be unreliable in various circumstances including
when table spans two pages
when next page starts with an empty paragraph
for
multi-column layouts with text boxes starting a new column
for
large images or long sequences of blank lines
If you're willing to accept a dependence on Word Automation, with all of its inherent licensing and server operation limitations, then you have a chance of determining page boundaries, page numberings, page counts, etc.
Otherwise, the only real answer is to move beyond page-based referencing frameworks that are dependent upon proprietary, implementation-specific pagination algorithms.
This is how I ended up doing it.
public void OpenWordprocessingDocumentReadonly()
{
string filepath = #"C:\...\test.docx";
// Open a WordprocessingDocument based on a filepath.
Dictionary<int, string> pageviseContent = new Dictionary<int, string>();
int pageCount = 0;
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Open(filepath, false))
{
// Assign a reference to the existing document body.
Body body = wordDocument.MainDocumentPart.Document.Body;
if (wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text != null)
{
pageCount = Convert.ToInt32(wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text);
}
int i = 1;
StringBuilder pageContentBuilder = new StringBuilder();
foreach (var element in body.ChildElements)
{
if (element.InnerXml.IndexOf("<w:br w:type=\"page\" />", StringComparison.OrdinalIgnoreCase) < 0)
{
pageContentBuilder.Append(element.InnerText);
}
else
{
pageviseContent.Add(i, pageContentBuilder.ToString());
i++;
pageContentBuilder = new StringBuilder();
}
if (body.LastChild == element && pageContentBuilder.Length > 0)
{
pageviseContent.Add(i, pageContentBuilder.ToString());
}
}
}
}
Downside: This wont work in all scenarios. This will work only when you have a page break, but if you have text extended from page 1 to page 2, there is no identifier to know you are in page two.
Unfortunately, As Why only some page numbers stored in XML of docx file? answers, docx dose not contains reliable page number service. Xml files carry no page number, until microsoft Word open it and render dynamically. Even you read openxml documents like https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.pagenumber?view=openxml-2.8.1 .
You can unzip some docx files, and search "page" or "pg". Then you will know it. I do this on different kinds of docx files in my situation. All tell me the same truth. Glad if this helps.
List<Paragraph> Allparagraphs = wp.MainDocumentPart.Document.Body.OfType<Paragraph>().ToList();
List<Paragraph> PageParagraphs = Allparagraphs.Where (x=>x.Descendants<LastRenderedPageBreak>().Count() ==1) .Select(x => x).Distinct().ToList();
Rename docx to zip.
Open docProps\app.xml file. :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
<Template>Normal</Template>
<TotalTime>0</TotalTime>
<Pages>1</Pages>
<Words>141</Words>
<Characters>809</Characters>
<Application>Microsoft Office Word</Application>
<DocSecurity>0</DocSecurity>
<Lines>6</Lines>
<Paragraphs>1</Paragraphs>
<ScaleCrop>false</ScaleCrop>
<HeadingPairs>
<vt:vector size="2" baseType="variant">
<vt:variant>
<vt:lpstr>Название</vt:lpstr>
</vt:variant>
<vt:variant>
<vt:i4>1</vt:i4>
</vt:variant>
</vt:vector>
</HeadingPairs>
<TitlesOfParts>
<vt:vector size="1" baseType="lpstr">
<vt:lpstr/>
</vt:vector>
</TitlesOfParts>
<Company/>
<LinksUpToDate>false</LinksUpToDate>
<CharactersWithSpaces>949</CharactersWithSpaces>
<SharedDoc>false</SharedDoc>
<HyperlinksChanged>false</HyperlinksChanged>
<AppVersion>14.0000</AppVersion>
</Properties>
OpenXML lib reads wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text from <Pages>1</Pages> property . This properies are created only by winword application. if word document changed wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text is not actual. if word document created programmatically the wordDocument.ExtendedFilePropertiesPart is offten null.

Read bookmarks in outlook MSG file with C#

My goal is to somehow be able to read bookmarks in an outlook .msg file, then replace them with a different text. I want to do this with C#.
I know how to access the body and change the text, but was wondering if there was a way to access directly the list of all the bookmarks and its location so that i can easily replace them, instead going through the whole body text, splitting it up, etc etc...
edit: this is how a bookmark window looks like from this window one can assign bookmarks, but it should be possible to obtain this list via c#.
Any relevant info is appreciated.
Thanks in advance.
Since Outlook most often uses Word as it's body editor - you need to add a project reference to Microsoft.Office.Interop.Word.dll and then access to the Outlook Inspector's WordEditor during the Inspector.Activate event. Once you have access to the Word.Document - it's trivial to load up the Bookmarks and access/modify their values.
Outlook.Inspector inspector = Globals.ThisAddIn.Application.ActiveInspector();
((Outlook.InspectorEvents_10_Event)inspector).Activate += () =>
{ // validation to ensure we are using Word Editor
if (inspector.EditorType == Outlook.OlEditorType.olEditorWord && inspector.IsWordMail())
{
Word.Document wordDoc = inspector.WordEditor as Word.Document;
if (wordDoc != null)
{
var bookmarks = wordDoc.Bookmarks;
foreach (Word.Bookmark item in bookmarks)
{
string name = item.Name; // bookmark name
Word.Range bookmarkRange = item.Range; // bookmark range
string bookmarkText = bookmarkRange.Text; // bookmark text
item.Select(); // triggers bookmark selection
}
}
}
};

Accessing CustomXMLPart in document included using INCLUDETEXT field

I have a docx Word document that contains Content Controls bound to data in a CustomXMLPart.
This document (or bookmarks therein) is then included in another Word document by using INCLUDETEXT.
When the first document is included into the second is there any way of getting the CustomXMLPart from the original document (I already have a VSTO Word Addin running in Word looking at the document)?
What I want to do is merge it with the CustomXMLParts already present in the second document so that the Content Controls are still bound to the data in the XMLPart.
Alternatively, is there another way to do this without using the INCLUDETEXT field?
I decided this probably wasn't possible using VSTO and IncludeText fields and investigated using altChunks as an alternative.
I was already doing some processing on the file using the Open XML SDK 2 before opening it so could so the extra work required to merge the document together there.
Although using the altChunk method embeds the whole second document in the first, including its own CustomXmlParts, the CustomXmlParts are discarded by Word when the document is opened and the second merged with the first.
I ended up with code similar to the following. It replaces defined Content Controls with altChunk data and merges specific CustomXmlParts together.
private static void CreateAltChunksInWordDocument(WordprocessingDocument doc, string externalDocumentPath)
{
foreach (var control in doc.ContentControls().ToList()) //Have to do .ToList() on this as when we update the Doc in the loop it stops enumerating otherwise
{
SdtProperties props = control.Elements<SdtProperties>().FirstOrDefault();
if (props == null)
continue;
SdtAlias alias = props.Elements<SdtAlias>().FirstOrDefault();
if (alias == null || !alias.Val.HasValue || alias.Val.Value != "External Template")
continue;
using (WordprocessingDocument externaldoc = WordprocessingDocument.Open(externalDocumentPath, false))
{
//Replace the Content Control with an AltChunk section, and stream in the external file
string altChunkId = "AltChunkId" + Guid.NewGuid().ToString().Replace("{", "").Replace("}", "").Replace("-", "");
AlternativeFormatImportPart chunk = doc.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
chunk.FeedData(File.OpenRead(externalDocumentPath));
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
OpenXmlElement parent = control.Parent;
parent.InsertAfter(altChunk, control);
control.Remove();
XDocument xDocMain;
CustomXmlPart partMain = MyCommon.GetMyXmlPart(doc.MainDocumentPart, out xDocMain);
XDocument xDocExternal;
CustomXmlPart partExternal = MyCommon.GetMyXmlPart(externaldoc.MainDocumentPart, out xDocExternal);
if (xDocMain != null && partMain != null && xDocExternal != null && partExternal != null)
{
MyCommon.MergeXmlPartFields(xDocMain, xDocExternal);
//Save the updated part
using (Stream outputStream = partMain.GetStream())
{
using (StreamWriter ts = new StreamWriter(outputStream))
{
ts.Write(xDocMain.ToString());
}
}
}
}
}
}

Programatically filling content controls in Word document (OpenXML) in .NET

I have a really simple word document with Content Controls (all text).
I want to loop through the controls, filling them with values from a dictionary. Should be super simple, but something is wrong:
var myValues = new Dictionary<string, string>(); //And fill it
using (var wordDoc = WordprocessingDocument.Open(outputFile, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
foreach(SdtElement sdt in mainPart.Document.Descendants<SdtElement>())
{
SdtAlias alias = sdt.Descendants<SdtAlias>().FirstOrDefault();
if (alias != null)
{
string sdtTitle = alias.Val.Value;
sdt.??? = myValues[sdtTitle];
}
}
mainPart.Document.Save();
}
How do I write my value into the document?
Do I need a CustomXmlPart?
If you are going to do something like that, you'll need to write suitable content into the Sdt's SdtContent: a paragraph or a run or a tc etc depending on the sdt's parent element.
The alternative is to put the contents of your dictionary into a CustomXml part, and set up databindings on each content control which refer to the relevant dictionary element. Word will then resolve the bindings when the docx is first opened (which is not much good to you if you expect your users to open it with something else).
You can use this code.
foreach (SdtElement sdt in mainPart.Document.Descendants<SdtElement>())
{
SdtAlias alias = sdt.Descendants<SdtAlias>().FirstOrDefault();
if (alias != null)
{
string sdtTitle = alias.Val.Value;
Text t = sdt.Descendants<Text>().First();
t.Text = "test";
}
}

Categories

Resources