I have worked a bit with OpenXML SDK, and made a POC of replacing images in a header in a word document. However, when I try to call DeletePart or DeleteParts with the images I want to remove, it doesn't go as expected.
When I open the word doc afterwards, where there before was an image, there now is a frame with the text "This image cannot currently be displayed" and a red cross.
From a bit of googling it appears as if the references have not been completely removed, but I can't find any help on how to do that..
Below is an example of how I delete images. I only add some of them to the list, because I need to remove all but the ones with a specific uri..
//...
foreach(HeaderPart headerPart in document.MainDocumentPart.HeaderParts) {
List<ImagePart> list = new List<ImagePart>();
List<ImagePart> imgParts = new List<ImagePart> (headerPart.ImageParts);
foreach(ImagePart headerImagePart in imgParts) {
string newUri = headerImagePart.Uri.ToString();
if(newUri != uri) {
list.Add(headerImagePart);
}
}
headerPart.DeleteParts(list);
}
//...
Images are made up of 2 parts in OpenXml; you have the actual image itself and you also have details of the Picture container that the image is displayed within in the document.
This makes sense if you think about an image being displayed more than once in the same document; details of the image can be stored once and the position(s) of the image can be stored as many times as needed.
The following code will find any Drawing objects that contain the ImagePart objects that you wish to delete. This is done by matching the Embed property of the Blip against the Id of the ImagePart.
using (WordprocessingDocument document = WordprocessingDocument.Open(filename, true))
{
foreach (HeaderPart headerPart in document.MainDocumentPart.HeaderParts)
{
List<ImagePart> list = new List<ImagePart>();
List<ImagePart> imgParts = new List<ImagePart>(headerPart.ImageParts);
List<Drawing> drwdDeleteParts = new List<Drawing>();
List<Drawing> drwParts = new List<Drawing>(headerPart.RootElement.Descendants<Drawing>());
foreach (ImagePart headerImagePart in imgParts)
{
string newUri = headerImagePart.Uri.ToString();
if (newUri != uri)
{
list.Add(headerImagePart);
//you also need to find the Drawings the image was related to
IEnumerable<Drawing> drawings = drwParts.Where(d => d.Descendants<Pic.Picture>().Any(p => p.BlipFill.Blip.Embed == headerPart.GetIdOfPart(headerImagePart)));
foreach (var drawing in drawings)
{
if (drawing != null && !drwdDeleteParts.Contains(drawing))
drwdDeleteParts.Add(drawing);
}
}
}
foreach (var d in drwdDeleteParts)
{
d.Remove();
}
headerPart.DeleteParts(list);
}
}
As you pointed out in the comments, you'll need to add a using statement:
Pic = DocumentFormat.OpenXml.Drawing.Pictures;
Related
How can I get all unloaded images in the word document which is generated by Aspose.Word. Images like this. Unloaded Image
I have solved my problem. I am posting my answer here, maybe it's helpful to someone other.
var doc = new Document();
var shapeCollection = doc.GetChildNodes(Word.NodeType.Shape, true);
// getting all images from Word document.
foreach (Shape shape in shapeCollection)
{
if (shape.ShapeType == ShapeType.Image)
{
//Unloaded image alwasy have 924 imagebytes
if (shape.ImageData.ImageBytes.Length == 924)
{
shape.Width = 72.0;
shape.Height = 72.0;
}
}
}
I'm really having trouble in editing bookmarks in a Word template using Document.Format.OpenXML and then saving it to a new PDF file.
I cannot use Microsoft.Word.Interop as it gives a COM error on the server.
My code is this:
public static void CreateWordDoc(string templatePath, string destinationPath, Dictionary<string, dynamic> dictionary)
{
byte[] byteArray = File.ReadAllBytes(templatePath);
using (MemoryStream stream = new MemoryStream())
{
stream.Write(byteArray, 0, (int)byteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(stream, true))
{
var bookmarks = (from bm in wordDoc.MainDocumentPart.Document.Body.Descendants<BookmarkStart>()
select bm).ToList();
foreach (BookmarkStart mark in bookmarks)
{
if (mark.Name != "Table" && mark.Name != "_GoBack")
{
UpdateBookmark(dictionary, mark);//Not doing anything
}
else if (mark.Name != "Table")
{
// CreateTable(dictionary, wordDoc, mark);
}
}
File.WriteAllBytes("D:\\RohitDocs\\newfile_rohitsingh.docx", stream.ToArray());
wordDoc.Close();
}
// Save the file with the new name
}
}
private static void UpdateBookmark(Dictionary<string, dynamic> dictionary, BookmarkStart mark)
{
string name = mark.Name;
string value = dictionary[name];
Run run = new Run(new Text(value));
RunProperties props = new RunProperties();
props.AppendChild(new FontSize() { Val = "20" });
run.RunProperties = props;
var paragraph = new DocumentFormat.OpenXml.Wordprocessing.Paragraph(run);
mark.Parent.InsertAfterSelf(paragraph);
paragraph.PreviousSibling().Remove();
mark.Remove();
}
I was trying to replace bookmarks with my text but the UpdateBookmark method doesn't work. I'm writing stream and saving it because I thought if bookmarks are replaced then I can save it to another file.
I think you want to make sure that when you reference mark.Parent that you are getting the correct instance that you are expecting.
Once you get a reference to the correct Paragraph element where your content should go, use the following code to add/swap the run.
// assuming you have a reference to a paragraph called "p"
p.AppendChild<Run>(new Run(new Text(content)) { RunProperties = props });
// and here is some code to remove a run
p.RemoveChild<Run>(run);
To answers the second part of your question, when I did a similar project a few years ago we used iTextSharp to create PDFs from Docx. It worked very well and the API was easy to grok. We even added password encryption and embedded watermarks to the PDFs.
I'm developing a web application with asp.net and I have a file called Template.docx that works like a template to generate other reports. Inside this Template.docx I have some MergeFields (Title, CustomerName, Content, Footer, etc) to replace for some dynamic content in C#.
I would like to know, how can I put a content in a mergefield in docx ?
I don't know if MergeFields is the right way to do this or if there is another way. If you can suggest me, I appreciate!
PS: I have openxml referenced in my web application.
Edits:
private MemoryStream LoadFileIntoStream(string fileName)
{
MemoryStream memoryStream = new MemoryStream();
using (FileStream fileStream = File.OpenRead(fileName))
{
memoryStream.SetLength(fileStream.Length);
fileStream.Read(memoryStream.GetBuffer(), 0, (int) fileStream.Length);
memoryStream.Flush();
fileStream.Close();
}
return memoryStream;
}
public MemoryStream GenerateWord()
{
string templateDoc = "C:\\temp\\template.docx";
string reportFileName = "C:\\temp\\result.docx";
var reportStream = LoadFileIntoStream(templateDoc);
// Copy a new file name from template file
//File.Copy(templateDoc, reportFileName, true);
// Open the new Package
Package pkg = Package.Open(reportStream, FileMode.Open, FileAccess.ReadWrite);
// Specify the URI of the part to be read
Uri uri = new Uri("/word/document.xml", UriKind.Relative);
PackagePart part = pkg.GetPart(uri);
XmlDocument xmlMainXMLDoc = new XmlDocument();
xmlMainXMLDoc.Load(part.GetStream(FileMode.Open, FileAccess.Read));
// replace some keys inside xml (it will come from database, it's just a test)
xmlMainXMLDoc.InnerXml = xmlMainXMLDoc.InnerXml.Replace("field_customer", "My Customer Name");
xmlMainXMLDoc.InnerXml = xmlMainXMLDoc.InnerXml.Replace("field_title", "Report of Documents");
xmlMainXMLDoc.InnerXml = xmlMainXMLDoc.InnerXml.Replace("field_content", "Content of Document");
// Open the stream to write document
StreamWriter partWrt = new StreamWriter(part.GetStream(FileMode.Open, FileAccess.Write));
//doc.Save(partWrt);
xmlMainXMLDoc.Save(partWrt);
partWrt.Flush();
partWrt.Close();
reportStream.Flush();
pkg.Close();
return reportStream;
}
PS: When I convert MemoryStream to a file, I got a corrupted file. Thanks!
I know this is an old post, but I could not get the accepted answer to work for me. The project linked would not even compile (which someone has already commented in that link). Also, it seems to use other Nuget packages like WPFToolkit.
So I'm adding my answer here in case someone finds it useful. This only uses the OpenXML SDK 2.5 and also the WindowsBase v4. This works on MS Word 2010 and later.
string sourceFile = #"C:\Template.docx";
string targetFile = #"C:\Result.docx";
File.Copy(sourceFile, targetFile, true);
using (WordprocessingDocument document = WordprocessingDocument.Open(targetFile, true))
{
// If your sourceFile is a different type (e.g., .DOTX), you will need to change the target type like so:
document.ChangeDocumentType(WordprocessingDocumentType.Document);
// Get the MainPart of the document
MainDocumentPart mainPart = document.MainDocumentPart;
var mergeFields = mainPart.RootElement.Descendants<FieldCode>();
var mergeFieldName = "SenderFullName";
var replacementText = "John Smith";
ReplaceMergeFieldWithText(mergeFields, mergeFieldName, replacementText);
// Save the document
mainPart.Document.Save();
}
private void ReplaceMergeFieldWithText(IEnumerable<FieldCode> fields, string mergeFieldName, string replacementText)
{
var field = fields
.Where(f => f.InnerText.Contains(mergeFieldName))
.FirstOrDefault();
if (field != null)
{
// Get the Run that contains our FieldCode
// Then get the parent container of this Run
Run rFldCode = (Run)field.Parent;
// Get the three (3) other Runs that make up our merge field
Run rBegin = rFldCode.PreviousSibling<Run>();
Run rSep = rFldCode.NextSibling<Run>();
Run rText = rSep.NextSibling<Run>();
Run rEnd = rText.NextSibling<Run>();
// Get the Run that holds the Text element for our merge field
// Get the Text element and replace the text content
Text t = rText.GetFirstChild<Text>();
t.Text = replacementText;
// Remove all the four (4) Runs for our merge field
rFldCode.Remove();
rBegin.Remove();
rSep.Remove();
rEnd.Remove();
}
}
What the code above does is basically this:
Identify the 4 Runs that make up the merge field named "SenderFullName".
Identify the Run that contains the Text element for our merge field.
Remove the 4 Runs.
Update the text property of the Text element for our merge field.
UPDATE
For anyone interested, here is a simple static class I used to help me with replacing merge fields.
Frank Fajardo's answer was 99% of the way there for me, but it is important to note that MERGEFIELDS can be SimpleFields or FieldCodes.
In the case of SimpleFields, the text runs displayed to the user in the document are children of the SimpleField.
In the case of FieldCodes, the text runs shown to the user are between the runs containing FieldChars with the Separate and the End FieldCharValues. Occasionally, several text containing runs exist between the Separate and End Elements.
The code below deals with these problems. Further details of how to get all the MERGEFIELDS from the document, including the header and footer is available in a GitHub repository at https://github.com/mcshaz/SimPlanner/blob/master/SP.DTOs/Utilities/OpenXmlExtensions.cs
private static Run CreateSimpleTextRun(string text)
{
Run returnVar = new Run();
RunProperties runProp = new RunProperties();
runProp.Append(new NoProof());
returnVar.Append(runProp);
returnVar.Append(new Text() { Text = text });
return returnVar;
}
private static void InsertMergeFieldText(OpenXmlElement field, string replacementText)
{
var sf = field as SimpleField;
if (sf != null)
{
var textChildren = sf.Descendants<Text>();
textChildren.First().Text = replacementText;
foreach (var others in textChildren.Skip(1))
{
others.Remove();
}
}
else
{
var runs = GetAssociatedRuns((FieldCode)field);
var rEnd = runs[runs.Count - 1];
foreach (var r in runs
.SkipWhile(r => !r.ContainsCharType(FieldCharValues.Separate))
.Skip(1)
.TakeWhile(r=>r!= rEnd))
{
r.Remove();
}
rEnd.InsertBeforeSelf(CreateSimpleTextRun(replacementText));
}
}
private static IList<Run> GetAssociatedRuns(FieldCode fieldCode)
{
Run rFieldCode = (Run)fieldCode.Parent;
Run rBegin = rFieldCode.PreviousSibling<Run>();
Run rCurrent = rFieldCode.NextSibling<Run>();
var runs = new List<Run>(new[] { rBegin, rCurrent });
while (!rCurrent.ContainsCharType(FieldCharValues.End))
{
rCurrent = rCurrent.NextSibling<Run>();
runs.Add(rCurrent);
};
return runs;
}
private static bool ContainsCharType(this Run run, FieldCharValues fieldCharType)
{
var fc = run.GetFirstChild<FieldChar>();
return fc == null
? false
: fc.FieldCharType.Value == fieldCharType;
}
You could try http://www.codeproject.com/KB/office/Fill_Mergefields.aspx which uses the Open XML SDK to do this.
I have a really simple word document with Content Controls (all text).
I want to loop through the controls, filling them with values from a dictionary. Should be super simple, but something is wrong:
var myValues = new Dictionary<string, string>(); //And fill it
using (var wordDoc = WordprocessingDocument.Open(outputFile, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
foreach(SdtElement sdt in mainPart.Document.Descendants<SdtElement>())
{
SdtAlias alias = sdt.Descendants<SdtAlias>().FirstOrDefault();
if (alias != null)
{
string sdtTitle = alias.Val.Value;
sdt.??? = myValues[sdtTitle];
}
}
mainPart.Document.Save();
}
How do I write my value into the document?
Do I need a CustomXmlPart?
If you are going to do something like that, you'll need to write suitable content into the Sdt's SdtContent: a paragraph or a run or a tc etc depending on the sdt's parent element.
The alternative is to put the contents of your dictionary into a CustomXml part, and set up databindings on each content control which refer to the relevant dictionary element. Word will then resolve the bindings when the docx is first opened (which is not much good to you if you expect your users to open it with something else).
You can use this code.
foreach (SdtElement sdt in mainPart.Document.Descendants<SdtElement>())
{
SdtAlias alias = sdt.Descendants<SdtAlias>().FirstOrDefault();
if (alias != null)
{
string sdtTitle = alias.Val.Value;
Text t = sdt.Descendants<Text>().First();
t.Text = "test";
}
}
I am trying the following code. It takes a fileName (docx file with many sections) and I try to iterate through each section getting the section name. The problem is that I end up with unreadable docx files. It does not error, but I think I am doing something wrong with getting the elements in the section.
public void Split(string fileName) {
using (WordprocessingDocument myDoc =
WordprocessingDocument.Open(fileName, true)) {
string curCliCode = "";
MainDocumentPart mdp = myDoc.MainDocumentPart;
foreach (var element in mdp.Document.Body.ChildElements) {
if (element.Descendants().OfType<SectionProperties>().Count() == 1) {
//get the name of the section from the footer
var footer = (FooterPart) mdp.GetPartById(
element.Descendants().OfType<SectionProperties>().First().OfType
<FooterReference>().First().
Id.Value);
foreach (Paragraph p in footer.Footer.ChildElements.OfType<Paragraph>()) {
if (p.InnerText != "") {
curCliCode = p.InnerText;
}
}
if (curCliCode != "") {
var forFile = new List<OpenXmlElement>();
var els = element.ElementsBefore();
if (els != null) {
foreach (var e in els) {
if (e != null) {
forFile.Add(e);
}
}
for (int i = 0; i < els.Count(); i++) {
els.ElementAt(i).Remove();
}
}
Create(curCliCode, forFile);
}
}
}
}
}
private void Create(string cliCode,IEnumerable<OpenXmlElement> docParts) {
var parts = from e in docParts select e.Clone();
const string template = #"\Test\toSplit\blank.docx";
string destination = string.Format(#"\Test\{0}.docx", cliCode);
File.Copy(template, destination,true);
/* Create the package and main document part */
using (WordprocessingDocument myDoc =
WordprocessingDocument.Open(destination, true)) {
MainDocumentPart mainPart = myDoc.MainDocumentPart;
/* Create the contents */
foreach(var part in parts) {
mainPart.Document.Body.Append((OpenXmlElement)part);
}
/* Save the results and close */
mainPart.Document.Save();
myDoc.Close();
}
}
Does anyone know what the problem could be (or how to properly copy a section from one document to another)?
I've done some work in this area, and what I have found invaluable is diffing a known good file with a prospective file; the error is usually fairly obvious.
What I would do is take a file that you know works, and copy all of the sections into the template. Theoretically, the two files should be identical. Run a diff on them the document.xml inside the docx file, and you'll see the difference.
BTW, I'm assuming that you know that the docx is actually a zip; change the extension to "zip", and you'll be able to get at the actual xml files which compose the format.
As far as diff tools, I use Beyond Compare from Scooter Software.
An approach along the lines of what you are doing will work only for simple documents (ie those not containing images, hyperlinks, comments etc). To handle these more complex documents, take a look at http://blogs.msdn.com/b/ericwhite/archive/2009/02/05/move-insert-delete-paragraphs-in-word-processing-documents-using-the-open-xml-sdk.aspx and the resulting DocumentBuilder API (part of the PowerTools for Open XML project on CodePlex).
In order to split a docx into sections using DocumentBuilder, you'll still need to first find the index of the paragraphs containing sectPr elements.