Read bookmarks in outlook MSG file with C#

Read bookmarks in outlook MSG file with C# - c#

My goal is to somehow be able to read bookmarks in an outlook .msg file, then replace them with a different text. I want to do this with C#.
I know how to access the body and change the text, but was wondering if there was a way to access directly the list of all the bookmarks and its location so that i can easily replace them, instead going through the whole body text, splitting it up, etc etc...
edit: this is how a bookmark window looks like from this window one can assign bookmarks, but it should be possible to obtain this list via c#.
Any relevant info is appreciated.
Thanks in advance.

Since Outlook most often uses Word as it's body editor - you need to add a project reference to Microsoft.Office.Interop.Word.dll and then access to the Outlook Inspector's WordEditor during the Inspector.Activate event. Once you have access to the Word.Document - it's trivial to load up the Bookmarks and access/modify their values.
Outlook.Inspector inspector = Globals.ThisAddIn.Application.ActiveInspector();
((Outlook.InspectorEvents_10_Event)inspector).Activate += () =>
{ // validation to ensure we are using Word Editor
if (inspector.EditorType == Outlook.OlEditorType.olEditorWord && inspector.IsWordMail())
{
Word.Document wordDoc = inspector.WordEditor as Word.Document;
if (wordDoc != null)
{
var bookmarks = wordDoc.Bookmarks;
foreach (Word.Bookmark item in bookmarks)
{
string name = item.Name; // bookmark name
Word.Range bookmarkRange = item.Range; // bookmark range
string bookmarkText = bookmarkRange.Text; // bookmark text
item.Select(); // triggers bookmark selection
}
}
}
};

Related

C# word document

I want to read a word document without opening it. Then replace some text in it and then save it with another name in the same format using C#. But the document is not being saved with the changes but int the same way as the original document. Any help will be appreciated. Thanks in advance.
string path = Server.MapPath("~/CustomerDocument/SampleNDA5.docx");
WordprocessingDocument wordprocessingDocument = WordprocessingDocument.Open(path, false);
{
var body = wordprocessingDocument.MainDocumentPart.Document.Body;
foreach (var text in body.Descendants<Text>())
{
if (text.Text.Contains("<var_Date>"))
{
text.Text = text.Text.Replace("<var_Date>", DateTime.Now.ToString("MMM dd,yyyy"));
}
}
wordprocessingDocument.SaveAs(Server.MapPath("~/CustomerDocument/SampleNDA10.docx"));
wordprocessingDocument.Close();
}

I think this should help. It explains how to put edits back into a collection.
https://www.codeproject.com/Articles/87616/List-T-ForEach-or-Foreach-It-Doesn-t-Matter-Or-Doe

So if your word template is the same each time you essentially
Copy The Template
Work On The Template
Save In Desired Format
Delete Template Copy
Each of the sections that you are replacing within your word document you have to insert a bookmark for that location (easiest way to input text in an area).
I always create a function to accomplish this, and I end up passing in the path - as well as all of the text to replace my in-document bookmarks. The function call can get long sometimes, but it works for me.
Application app = new Application();
Document doc = app.Documents.Open("sDocumentCopyPath.docx");
if (doc.Bookmarks.Exists("bookmark_1"))
{
object oBookMark = "bookmark_1";
doc.Bookmarks.get_Item(ref oBookMark).Range.Text =
"My Text To Replace bookmark_1";
}
if (doc.Bookmarks.Exists("bookmark_2"))
{
object oBookMark = "bookmark_2";
doc.Bookmarks.get_Item(ref oBookMark).Range.Text =
"My Text To Replace bookmark_2";
}
doc.ExportAsFixedFormat("myNewPdf.pdf", WdExportFormat.wdExportFormatPDF);
((_Document)doc).Close();
((_Application)app).Quit();
This code should get you up and running unless you want to pass in all the values into a function.
If you need some more explanation I can help as well :) my example saves it as a .pdf, but you can do any format you prefer.

How to access OpenXML content by page number?

Using OpenXML, can I read the document content by page number?
wordDocument.MainDocumentPart.Document.Body gives content of full document.
public void OpenWordprocessingDocumentReadonly()
{
string filepath = #"C:\...\test.docx";
// Open a WordprocessingDocument based on a filepath.
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Open(filepath, false))
{
// Assign a reference to the existing document body.
Body body = wordDocument.MainDocumentPart.Document.Body;
int pageCount = 0;
if (wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text != null)
{
pageCount = Convert.ToInt32(wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text);
}
for (int i = 1; i <= pageCount; i++)
{
//Read the content by page number
}
}
}
MSDN Reference
Update 1:
it looks like page breaks are set as below
<w:p w:rsidR="003328B0" w:rsidRDefault="003328B0">
<w:r>
<w:br w:type="page" />
</w:r>
</w:p>
So now I need to split the XML with above check and take InnerTex for each, that will give me page vise text.
Now question becomes how can I split the XML with above check?
Update 2:
Page breaks are set only when you have page breaks, but if text is floating from one page to other pages, then there is no page break XML element is set, so it revert back to same challenge how o identify the page separations.

You cannot reference OOXML content via page numbering at the OOXML data level alone.
Hard page breaks are not the problem; hard page breaks can be counted.
Soft page breaks are the problem. These are calculated according to
line break and pagination algorithms which are implementation
dependent; it is not intrinsic to the OOXML data. There is nothing
to count.
What about w:lastRenderedPageBreak, which is a record of the position of a soft page break at the time the document was last rendered? No, w:lastRenderedPageBreak does not help in general either because:
By definition, w:lastRenderedPageBreak position is stale when content has
been changed since last opened by a program that paginates its
content.
In MS Word's implementation, w:lastRenderedPageBreak is known to be unreliable in various circumstances including
when table spans two pages
when next page starts with an empty paragraph
for
multi-column layouts with text boxes starting a new column
for
large images or long sequences of blank lines
If you're willing to accept a dependence on Word Automation, with all of its inherent licensing and server operation limitations, then you have a chance of determining page boundaries, page numberings, page counts, etc.
Otherwise, the only real answer is to move beyond page-based referencing frameworks that are dependent upon proprietary, implementation-specific pagination algorithms.

This is how I ended up doing it.
public void OpenWordprocessingDocumentReadonly()
{
string filepath = #"C:\...\test.docx";
// Open a WordprocessingDocument based on a filepath.
Dictionary<int, string> pageviseContent = new Dictionary<int, string>();
int pageCount = 0;
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Open(filepath, false))
{
// Assign a reference to the existing document body.
Body body = wordDocument.MainDocumentPart.Document.Body;
if (wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text != null)
{
pageCount = Convert.ToInt32(wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text);
}
int i = 1;
StringBuilder pageContentBuilder = new StringBuilder();
foreach (var element in body.ChildElements)
{
if (element.InnerXml.IndexOf("<w:br w:type=\"page\" />", StringComparison.OrdinalIgnoreCase) < 0)
{
pageContentBuilder.Append(element.InnerText);
}
else
{
pageviseContent.Add(i, pageContentBuilder.ToString());
i++;
pageContentBuilder = new StringBuilder();
}
if (body.LastChild == element && pageContentBuilder.Length > 0)
{
pageviseContent.Add(i, pageContentBuilder.ToString());
}
}
}
}
Downside: This wont work in all scenarios. This will work only when you have a page break, but if you have text extended from page 1 to page 2, there is no identifier to know you are in page two.

Unfortunately, As Why only some page numbers stored in XML of docx file? answers, docx dose not contains reliable page number service. Xml files carry no page number, until microsoft Word open it and render dynamically. Even you read openxml documents like https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.pagenumber?view=openxml-2.8.1 .
You can unzip some docx files, and search "page" or "pg". Then you will know it. I do this on different kinds of docx files in my situation. All tell me the same truth. Glad if this helps.

List<Paragraph> Allparagraphs = wp.MainDocumentPart.Document.Body.OfType<Paragraph>().ToList();
List<Paragraph> PageParagraphs = Allparagraphs.Where (x=>x.Descendants<LastRenderedPageBreak>().Count() ==1) .Select(x => x).Distinct().ToList();

Rename docx to zip.
Open docProps\app.xml file. :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
<Template>Normal</Template>
<TotalTime>0</TotalTime>
<Pages>1</Pages>
<Words>141</Words>
<Characters>809</Characters>
<Application>Microsoft Office Word</Application>
<DocSecurity>0</DocSecurity>
<Lines>6</Lines>
<Paragraphs>1</Paragraphs>
<ScaleCrop>false</ScaleCrop>
<HeadingPairs>
<vt:vector size="2" baseType="variant">
<vt:variant>
<vt:lpstr>Название</vt:lpstr>
</vt:variant>
<vt:variant>
<vt:i4>1</vt:i4>
</vt:variant>
</vt:vector>
</HeadingPairs>
<TitlesOfParts>
<vt:vector size="1" baseType="lpstr">
<vt:lpstr/>
</vt:vector>
</TitlesOfParts>
<Company/>
<LinksUpToDate>false</LinksUpToDate>
<CharactersWithSpaces>949</CharactersWithSpaces>
<SharedDoc>false</SharedDoc>
<HyperlinksChanged>false</HyperlinksChanged>
<AppVersion>14.0000</AppVersion>
</Properties>
OpenXML lib reads wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text from <Pages>1</Pages> property . This properies are created only by winword application. if word document changed wordDocument.ExtendedFilePropertiesPart.Properties.Pages.Text is not actual. if word document created programmatically the wordDocument.ExtendedFilePropertiesPart is offten null.

Write word form fields with DocumentFormat.OpenXML SDK

I am writing an application which should use DocumentFormat.OpenXML SDK for writing data to form fields in a word template. But I cannot find a property in the document-object of the SDK where the form fields are stored.
I tried this code:
using (WordprocessingDocument document = WordprocessingDocument.Open("Path/To/document.dotx", true))
{
document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
MainDocumentPart mainPart = document.MainDocumentPart;
var fields = mainPart.Document.Body.Descendants<FormFieldData>();
foreach (var field in fields)
{
if (field.GetType() == typeof(FormFieldData))
{
if (field.LocalName == "Name")
{
Console.WriteLine("Hi!");
}
}
}
}
But fields is always null.

You can do that by replacing this line:
if (field.LocalName == "Name")
with this one:
if (((FormFieldName)field.FirstChild).Val.InnerText.Equals("Name"))
Besides, you can use the following code to put a text inside the form field element using the function SetFormFieldValue provided in the another SO answer:
if (((FormFieldName)field.FirstChild).Val.InnerText.Equals("Name"))
{
TextInput text = field.Descendants<TextInput>().First();
SetFormFieldValue(text, "Put some text inside the field");
}
See Write data into TextInput elements in docx documents with OpenXML 2.5 for the implementation of SetFormFieldValue

Is it possible that your document is using Custom Properties to fill the field of the form? Try to have a look at this MSDN page that explains how to read and manipulate custom properties.

Word Interop Join documents in memory

I have a List that I would like to join in a single Word.Document. Below is all that I have so far.
Any ideas?
public static Word.Document JoinDocuments(List<Word.Document> DocstoJoin)
{
Word.Document JoinedDoc = new Word.Document();
foreach (Word.Document doc in DocstoJoin)
{
foreach (Word.Section sec in doc.Sections)
{
**????**
}
}
return JoinedDoc;
}

The Selection class provides the following methods that can be used to get the job done:
Copy - copies the specified selection to the Clipboard.
Paste - inserts the contents of the Clipboard at the specified selection.
Also you may consider using the Open XML SDK if you deal with open XML documents only.

Programatically filling content controls in Word document (OpenXML) in .NET

I have a really simple word document with Content Controls (all text).
I want to loop through the controls, filling them with values from a dictionary. Should be super simple, but something is wrong:
var myValues = new Dictionary<string, string>(); //And fill it
using (var wordDoc = WordprocessingDocument.Open(outputFile, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
foreach(SdtElement sdt in mainPart.Document.Descendants<SdtElement>())
{
SdtAlias alias = sdt.Descendants<SdtAlias>().FirstOrDefault();
if (alias != null)
{
string sdtTitle = alias.Val.Value;
sdt.??? = myValues[sdtTitle];
}
}
mainPart.Document.Save();
}
How do I write my value into the document?
Do I need a CustomXmlPart?

If you are going to do something like that, you'll need to write suitable content into the Sdt's SdtContent: a paragraph or a run or a tc etc depending on the sdt's parent element.
The alternative is to put the contents of your dictionary into a CustomXml part, and set up databindings on each content control which refer to the relevant dictionary element. Word will then resolve the bindings when the docx is first opened (which is not much good to you if you expect your users to open it with something else).

You can use this code.
foreach (SdtElement sdt in mainPart.Document.Descendants<SdtElement>())
{
SdtAlias alias = sdt.Descendants<SdtAlias>().FirstOrDefault();
if (alias != null)
{
string sdtTitle = alias.Val.Value;
Text t = sdt.Descendants<Text>().First();
t.Text = "test";
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Read bookmarks in outlook MSG file with C# - c#

Related

C# word document

How to access OpenXML content by page number?

Write word form fields with DocumentFormat.OpenXML SDK

Word Interop Join documents in memory

Programatically filling content controls in Word document (OpenXML) in .NET

Categories

Resources