OpenXml image relationship doesn't exist

OpenXml image relationship doesn't exist - c#

I'm an intern at a large company who got tasked with working on a previous intern's project that apparently worked at some point, but now it's broken. What the program does, is it takes a bunch of text and images out of a document and inserts them into a template document. The problem is, about half of the images aren't forming relationships and I'm getting the red X "Image cannot be displayed" empty box. I've been doing some digging with the productivity tool, and I found out that there are a couple duplicate IDs, as well as quite a few non-existent relationships, although looking at his code I'm not sure what might be causing that. Here are his 2 methods for copying images:
internal static void CopyImages(OpenXmlElement oldTable, OpenXmlElement newTable,
WordprocessingDocument testData, WordprocessingDocument testReport)
{
List<Blip> sourceBlips = DocumentHelper.GetAllBlips(oldTable);
List<Blip> targetBlips = DocumentHelper.GetAllBlips(newTable);
foreach (Blip sourceBlip in sourceBlips)
{
foreach (Blip targetBlip in targetBlips)
{
if (targetBlip.Embed.Value == sourceBlip.Embed.Value)
{
if (testData.MainDocumentPart.GetPartById(sourceBlip.Embed.Value) is ImagePart imagePart)
{
ImagePart newImagePart = testReport.MainDocumentPart.AddPart(imagePart);
targetBlip.Embed.Value = testReport.MainDocumentPart.GetIdOfPart(newImagePart);
break;
}
}
}
}
}
internal static void CopyEmbeddedVisioImages(OpenXmlElement oldTable, OpenXmlElement newTable,
WordprocessingDocument testData, WordprocessingDocument testReport)
{
List<EmbeddedObject> sourceObjects = oldTable.Descendants<EmbeddedObject>().ToList();
List<EmbeddedObject> targetObjects = newTable.Descendants<EmbeddedObject>().ToList();
foreach (EmbeddedObject targetobj in targetObjects)
{
foreach (EmbeddedObject sourceObj in sourceObjects)
{
if (testData.MainDocumentPart.GetPartById(sourceObj.Descendants<ImageData>()
.FirstOrDefault().RelationshipId) is ImagePart oldImagePart)
{
ImagePart newImagePart = testReport.MainDocumentPart.AddPart(oldImagePart);
targetobj.Descendants<ImageData>().FirstOrDefault().RelationshipId =
testReport.MainDocumentPart.GetIdOfPart(newImagePart);
}
if (testData.MainDocumentPart.GetPartById(sourceObj.Descendants<OleObject>()
.FirstOrDefault().Id) is OpenXmlPart openXmlPart)
{
EmbeddedObjectPart newEmbeddedObj = (EmbeddedObjectPart)testReport.MainDocumentPart.AddPart(openXmlPart);
targetobj.Descendants<OleObject>().FirstOrDefault().Id =
testReport.MainDocumentPart.GetIdOfPart(newEmbeddedObj);
}
}
}
}
I've tried calling Save() and Close() on the documents. I even tried calling Dispose(). using(WordprocessingDocument foo = WordprocessingDocument.Open(bar, false){} doesn't seem to help either. I'm not too worried about the duplicate IDs for now, but I have no idea why only some of the relationships are being formed while others are not. This is a massive project so navigating through some of it can be pretty tricky.
Edit: It's probably also worth mentioning that the images stop forming relationships at a certain point. It isn't random. About 2/3 of the way down none of the images work.
Here's the updated set of methods
internal static void CopyImages(OpenXmlElement oldTable, OpenXmlElement newTable,
WordprocessingDocument testData, WordprocessingDocument testReport)
{
List<Blip> sourceBlips = DocumentHelper.GetAllBlips(oldTable);
List<Blip> targetBlips = DocumentHelper.GetAllBlips(newTable);
foreach (Blip sourceBlip in sourceBlips)
{
foreach (Blip targetBlip in targetBlips)
{
if (targetBlip.Embed.Value == sourceBlip.Embed.Value)
{
if (testData.MainDocumentPart.GetPartById(sourceBlip.Embed.Value) is ImagePart imagePart)
{
//ImagePart newImagePart = testReport.MainDocumentPart.AddPart(imagePart);
ImagePart newImagePart = testReport.MainDocumentPart.AddImagePart(imagePart.ContentType);
newImagePart.FeedData(imagePart.GetStream(FileMode.Open, FileAccess.Read));
targetBlip.Embed.Value = testReport.MainDocumentPart.GetIdOfPart(newImagePart);
break;
}
}
}
}
}
internal static void CopyEmbeddedVisioImages(OpenXmlElement oldTable, OpenXmlElement newTable,
WordprocessingDocument testData, WordprocessingDocument testReport)
{
List<EmbeddedObject> sourceObjects = oldTable.Descendants<EmbeddedObject>().ToList();
List<EmbeddedObject> targetObjects = newTable.Descendants<EmbeddedObject>().ToList();
foreach (EmbeddedObject targetobj in targetObjects)
{
foreach (EmbeddedObject sourceObj in sourceObjects)
{
if (testData.MainDocumentPart.GetPartById(sourceObj.Descendants<ImageData>()
.FirstOrDefault().RelationshipId) is ImagePart oldImagePart)
{
//ImagePart newImagePart = testReport.MainDocumentPart.AddPart(oldImagePart);
ImagePart newImagePart = testReport.MainDocumentPart.AddImagePart(oldImagePart.ContentType);
newImagePart.FeedData(oldImagePart.GetStream(FileMode.Open, FileAccess.Read));
targetobj.Descendants<ImageData>().FirstOrDefault().RelationshipId =
testReport.MainDocumentPart.GetIdOfPart(newImagePart);
}
if (testData.MainDocumentPart.GetPartById(sourceObj.Descendants<OleObject>()
.FirstOrDefault().Id) is OpenXmlPart openXmlPart)
{
EmbeddedObjectPart newEmbeddedObj = (EmbeddedObjectPart)testReport.MainDocumentPart.AddPart(openXmlPart);
targetobj.Descendants<OleObject>().FirstOrDefault().Id =
testReport.MainDocumentPart.GetIdOfPart(newEmbeddedObj);
}
}
}
}
Here's an update on my findings.
There are 25 total blips in the entire document.
targetBlip.Embed.Value != sourceBlip.Embed.Value in most cases or maybe it's something else?
Elements containing pictures are cloned from source doc and then saved into target doc.
All elements are being read. Tables containing pictures with broken relationships exist and are filled with other content, so it's not like it's missing those elements.
The duplicate IDs are due to the target document containing a couple images to begin with, so when I copy over the other images, some of those IDs are duplicated. This isn't my concern for now.

Images from a source document can't be added as-is into a target document;
an image has a unique id/number within its parent document and this one might conflict with the target document if one already exists with that same id.
Replace the following line
ImagePart newImagePart = testReport.MainDocumentPart.AddPart(imagePart);
with the one below. Here a whole new image file gets embedded and gets a new id assigned.
ImagePart newImagePart = testReport.MainDocumentPart.AddImagePart(oldImagePart.ContentType);
newImagePart.FeedData(oldImagePart.GetStream(FileMode.Open, FileAccess.Read));
It's important that the ids in the target document are unique.
I share some (old(er)) code fragments about how I handled to merge images from one document into another. (This is a fragment of a more complete/complex implementation where duplicate images are being detected and prevented from being inserted more than once.)
It starts by iterating over all Drawings in the source document and building a list of these together with their original id as in this source document. Then all images get inserted into the target document; while doing so the new id as in the target document gets mapped to each item.
Each drawing in the source document gets updated with the id as in the target document; the list contains both orginal source and new target ids. (This sounds bizarre, but for me at that moment only this gave the expected result.)
Only after the image merge has completed, the content (paragraphs and tables) get merged into the target document, which consists of adding clones of these items.
public class DocumentMerger
{
private readonly WordprocessingDocument _targetDocument;
public DocumentMerger(WordprocessingDocument targetDocument)
{
this._targetDocument = targetDocument;
}
public void Merge(WordprocessingDocument sourceDocument)
{
ImagesMerger imagesMerger = new ImagesMerger(this._targetDocument);
this._imagesMerger.Merge(sourceDocument);
// Merge the content; paragraphs and tables.
this._targetDocumentPart.Document.Save();
}
}
public class ImageInfo
{
private String _id;
private ImagePart _image;
private readonly String _originalId;
private ImageInfo(ImagePart image, String id)
{
this._id = id;
this._image = image;
this._originalId = id;
}
public String Id
{
get { return this._id; }
}
public ImagePart Image
{
get { return this._image; }
}
public String OriginalId
{
get { return this._originalId; }
}
public static ImageInfo Create(MainDocumentPart documentPart, ImagePart image)
{
String id = documentPart.GetIdOfPart(image);
ImageInfo r = new ImageInfo(image, id);
return r;
}
public void Reparent(MainDocumentPart documentPart)
{
ImagePart newImage = documentPart.AddImagePart(this._image.ContentType);
newImage.FeedData(this._image.GetStream(FileMode.Open, FileAccess.Read));
String newId = documentPart.GetIdOfPart(newImage);
this._id = newId;
this._image = newImage;
}
}
public class ImagesMerger
{
private readonly IList<ImageInfo> _imageInfosOfTheTargetDocument = new List<ImageInfo>();
private readonly MainDocumentPart _targetDocumentPart;
public ImagesMerger(WordprocessingDocument targetDocument)
{
this._targetDocumentPart = targetDocument.MainDocumentPart;
}
public void Merge(WordprocessingDocument sourceDocument)
{
MainDocumentPart sourceDocumentPart = sourceDocument.MainDocumentPart;
IList<ImageInfo> imageInfosOfTheSourceDocument = this.getImageInfos(sourceDocumentPart);
if (0 == imageInfosOfTheSourceDocument.Count) { return; }
this.addTheImagesToTheTargetDocument(imageInfosOfTheSourceDocument);
this.rereferenceTheImagesToTheirCorrespondingImageParts(sourceDocumentPart, imageInfosOfTheSourceDocument);
}
private void addTheImagesToTheTargetDocument(IList<ImageInfo> imageInfosOfTheSourceDocument)
{
for (Int32 i = 0, j = imageInfosOfTheSourceDocument.Count; i < j; i++)
{
imageInfoOfTheSourceDocument.Reparent(this._targetDocumentPart);
this._imageInfosOfTheTargetDocument.Add(imageInfoOfTheSourceDocument);
}
}
private IList<ImageInfo> getImageInfos(MainDocumentPart documentPart)
{
List<ImageInfo> r = new List<ImageInfo>();
foreach (ImagePart image in documentPart.ImageParts)
{
ImageInfo imageInfo = ImageInfo.Create(documentPart, image);
r.Add(imageInfo);
}
return r;
}
private void rereferenceTheImagesToTheirCorrespondingImageParts(MainDocumentPart sourceDocumentPart, IList<ImageInfo> imageInfosOfTheSourceDocument)
{
IEnumerable<Drawing> images = sourceDocumentPart.Document.Body.Descendants<Drawing>();
foreach (Drawing image in images)
{
Blip blip = image.Inline.Graphic.GraphicData.Descendants<Blip>().FirstOrDefault();
String originalId = blip.Embed.Value;
ImageInfo imageInfo = imageInfosOfTheSourceDocument.FirstOrDefault(o => o.OriginalId._Equals(originalId));
blip.Embed.Value = imageInfo.Id;
}
}
}

Related

C# OpenXML SDK - Inserting a new slide from slide masters

I am attempting to implement the solutions given here and/or here.
I have a .pptx file that contains zero slides initially. One of the layouts is named "One content". For now, I just want to produce a new PPTX file with a single slide based on this layout. Should be trivial, no? No, apparently not.
In file OpenXmlUtils.cs I have the following method which I use to create a new PPTX from the "template" file:
public static void CopyTemplate(string template, string target)
{
string targetPath = Path.GetFullPath(target);
string targetFolder = Path.GetDirectoryName(targetPath);
if (!System.IO.Directory.Exists(targetFolder))
{
System.IO.Directory.CreateDirectory(targetFolder);
}
System.IO.File.Copy(template, targetPath, true);
}
My PPTWriter.cs broken down to MCVE:
public PPTOpenXMLWriter(string templatePath, string presSaveAsPath)
{
if (File.Exists(presSaveAsPath)) { File.Delete(presSaveAsPath); }
OpenXmlUtils.CopyTemplate(templatePath, presSaveAsPath);
_createPresentation(presSaveAsPath);
}
private void _createPresentation(string presSaveAsPath)
{
using (PresentationDocument presentationDocument = PresentationDocument.Open(presSaveAsPath, true))
{
string layoutName = "One content";
_insertNewSlide(presentationDocument.PresentationPart, layoutName);
presentationDocument.Save();
}
}
private void _insertNewSlide(PresentationPart presentationPart, string layoutName)
{
Slide slide = new Slide(new CommonSlideData(new ShapeTree()));
SlidePart slidePart = presentationPart.AddNewPart<SlidePart>();
slide.Save(slidePart);
SlideMasterPart slideMasterPart = presentationPart.SlideMasterParts.FirstOrDefault();
SlideLayoutPart slideLayoutPart = slideMasterPart.SlideLayoutParts.SingleOrDefault
(sl => sl.SlideLayout.CommonSlideData.Name.Value.Equals(layoutName, StringComparison.OrdinalIgnoreCase));
slidePart.AddPart<SlideLayoutPart>(slideLayoutPart);
slidePart.Slide.CommonSlideData = (CommonSlideData)slideLayoutPart.SlideLayout.CommonSlideData.Clone();
SlideIdList slideIdList = null;
if ( presentationPart.Presentation.SlideIdList is null)
{
presentationPart.Presentation.SlideIdList = new SlideIdList();
}
slideIdList = presentationPart.Presentation.SlideIdList;
// find the highest id
uint maxSlideId = 0;
if (slideIdList.ChildElements.Count() > 0)
maxSlideId = slideIdList.ChildElements
.Cast<SlideId>()
.Max(x => x.Id.Value);
// Insert the new slide into the slide list after the previous slide.
SlideId newSlideId = new SlideId();
slideIdList.Append(newSlideId);
newSlideId.Id = maxSlideId;
newSlideId.RelationshipId = presentationPart.GetIdOfPart(slidePart);
// Save the modified presentation.
presentationPart.Presentation.Save();
}
The resulting file is corrupt and needs to be "repaired" by PowerPoint, after which repair process the slide layout is not the layout that was specified. In fact it's a completely different layout with a radically different XML structure and all I can gather is that it's somehow defaulting back to the ordinally first layout in the master ("Title"), because it doesn't know how to handle whatever it's actually been given via OpenXML.
This seems like it ought to be a fairly common use-case, and perhaps my expectations are wrong, but it seems like given an already existing slide layout, you ought to be able to (relatively easily) create a new slide based on that layout which will contain all of the same placeholder shapes, etc.

Got it. The following is working for my test scenarios (thanks to your code for help):
presentationPart.InsertNewSlide("CV Full page");
presentationPart.InsertNewSlide("CV Half page");
presentationPart.InsertNewSlide("Credential full page");
presentationPart.InsertNewSlide("CV or Credential 5 to a page", 3);
public static void InsertNewSlide(this PresentationPart presentationPart, string layoutName, int? position = null)
{
Slide slide = new Slide();
SlidePart slidePart = presentationPart.AddNewPart<SlidePart>();
slide.Save(slidePart);
SlideMasterPart slideMasterPart = presentationPart.SlideMasterParts.FirstOrDefault();
SlideLayoutPart slideLayoutPart = slideMasterPart.GetSlideLayoutPartByLayoutName(layoutName);
slidePart.AddPart(slideLayoutPart, slideMasterPart.GetIdOfPart(slideLayoutPart));
slidePart.Slide.CommonSlideData = (CommonSlideData)slideLayoutPart.SlideLayout.CommonSlideData.Clone();
string id = slideMasterPart.GetIdOfPart(slideLayoutPart);
slidePart.CloneSlideLayout(slideLayoutPart, id);
slideMasterPart.AddPart(slideLayoutPart, id);
presentationPart.SetSlideID(slidePart, position);
}
public static void SetSlideID(this PresentationPart presentationPart, SlidePart slidePart, int? position = null)
{
SlideIdList slideIdList = presentationPart.Presentation.SlideIdList;
if (slideIdList == null)
{
slideIdList = new SlideIdList();
presentationPart.Presentation.SlideIdList = slideIdList;
}
if (position != null && position > slideIdList.Count())
throw new InvalidOperationException($"Unable to set slide to position '{position}'. There are only '{slideIdList.Count()}' slides.");
uint newId = slideIdList.ChildElements.Count() == 0 ? 256 : slideIdList.GetMaxSlideId() + 1;
if (position == null)
{
var newSlideId = slideIdList.AppendChild(new SlideId());
newSlideId.Id = newId;
newSlideId.RelationshipId = presentationPart.GetIdOfPart(slidePart);
}
else
{
SlideId nextSlideId = (SlideId)slideIdList.ChildElements[position.Value - 1];
var newSlideId = slideIdList.InsertBefore(new SlideId(), nextSlideId);
newSlideId.Id = newId;
newSlideId.RelationshipId = presentationPart.GetIdOfPart(slidePart);
}
}
public static uint GetMaxSlideId(this SlideIdList slideIdList)
{
uint maxSlideId = 0;
if (slideIdList.ChildElements.Count() > 0)
maxSlideId = slideIdList.ChildElements
.Cast<SlideId>()
.Max(x => x.Id.Value);
return maxSlideId;
}
public static SlideLayoutPart GetSlideLayoutPartByLayoutName(this SlideMasterPart slideMasterPart, string layoutName)
{
return slideMasterPart.SlideLayoutParts.SingleOrDefault
(sl => sl.SlideLayout.CommonSlideData.Name.Value.Equals(layoutName, StringComparison.OrdinalIgnoreCase));
}
public static void CloneSlideLayout(this SlidePart newSlidePart, SlideLayoutPart slPart, string id)
{
/* ensure we added the rel ID to this part */
newSlidePart.AddPart(slPart, id);
using (Stream stream = slPart.GetStream()) { newSlidePart.SlideLayoutPart.FeedData(stream); }
newSlidePart.Slide.CommonSlideData = (CommonSlideData)slPart.SlideLayout.CommonSlideData.Clone();
foreach (ImagePart iPart in slPart.ImageParts)
newSlidePart.AddPart(iPart, slPart.GetIdOfPart(iPart));
}

I noticed some discrepancies in the slide's .rels, from the correct, manually produced slide:
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Target="../slideLayouts/slideLayout8.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideLayout" Id="rId1"/>
</Relationships>
And the incorrect one looked like:
<?xml version="1.0" encoding="UTF-8"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="R522c7c9989a04964" Target="/ppt/slideLayouts/slideLayout8.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideLayout"/>
<Relationship Id="rId5" Target="/ppt/media/image2.bin" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"/>
</Relationships>
Two discrepancies, which I believe are as follows:
The image2.bin I believe I traced this back to a 1x1 pixel autoshape "object" that was present on several of the slide masters. I manually removed that from each slide master where it existed, and resaved my template pptx file.
The slide is missing the rel ID back to the slide layout, seems easy enough. I added some extension methods to the OpenXmlUtils class, and modified the _insertNewSlide method as follows:
private void _insertNewSlide(PresentationPart presentationPart, string layoutName)
{
Slide slide = new Slide();
SlidePart slidePart = presentationPart.AddNewPart<SlidePart>();
slide.Save(slidePart);
SlideMasterPart slideMasterPart = presentationPart.SlideMasterParts.FirstOrDefault();
SlideLayoutPart slideLayoutPart = slideMasterPart.GetSlideLayoutPartByLayoutName(layoutName); // extension method
/* ensure we added the rel ID to this part */
slidePart.AddPart<SlideLayoutPart>(slideLayoutPart, slideMasterPart.GetIdOfPart(slideLayoutPart));
slidePart.Slide.CommonSlideData = (CommonSlideData)slideLayoutPart.SlideLayout.CommonSlideData.Clone();
slidePart.CloneSlideLayout(slideLayoutPart); // extension method
presentationPart.AppendSlide(slidePart); // extension method
}
I've added the following extension methods in OpenXmlUtils.cs:
public static void CloneSlideLayout(this SlidePart newSlidePart, SlideLayoutPart slPart, string id)
{
// creates a Slide from a SlideLayout
/* ensure we added the rel ID to this part */
newSlidePart.AddPart(slPart, id);
using (Stream stream = slPart.GetStream()) { newSlidePart.SlideLayoutPart.FeedData(stream); }
newSlidePart.Slide.CommonSlideData = (CommonSlideData)slPart.SlideLayout.CommonSlideData.Clone();
foreach (ImagePart iPart in slPart.ImageParts)
{
newSlidePart.AddPart<ImagePart>(iPart, slPart.GetIdOfPart(iPart));
}
}
public static uint GetNextSlideId(this SlideIdList slideIdList)
{
uint nextId;
uint maxId = GetMaxSlideId(slideIdList);
if (maxId == 0)
{
// Slide Id must be >= 256
nextId = 256;
}
else
{
nextId = maxId++;
}
return nextId;
}
public static uint GetMaxSlideId(this SlideIdList slideIdList)
{
// find the highest id
uint maxSlideId = 0;
if (slideIdList.ChildElements.Count() > 0)
maxSlideId = slideIdList.ChildElements
.Cast<SlideId>()
.Max(x => x.Id.Value);
return maxSlideId;
}
public static SlideLayoutPart GetSlideLayoutPartByLayoutName(this SlideMasterPart slideMasterPart, string layoutName)
{
return slideMasterPart.SlideLayoutParts.SingleOrDefault
(sl => sl.SlideLayout.CommonSlideData.Name.Value.Equals(layoutName, StringComparison.OrdinalIgnoreCase));
}
public static void AppendSlide(this PresentationPart presentationPart, SlidePart newSlidePart)
{
SlideMasterPart slideMasterPart = presentationPart.SlideMasterParts.FirstOrDefault();
SlideLayoutPart slideLayoutPart = slideMasterPart.GetSlideLayoutPartByLayoutName(layoutName);
Slide slide = new Slide( );
SlidePart slidePart = presentationPart.AddNewPart<SlidePart>();
slide.Save(slidePart);
string id = slideMasterPart.GetIdOfPart(slideLayoutPart);
slidePart.CloneSlideLayout(slideLayoutPart, id);
presentationPart.AppendSlide(slidePart);
}
Having implemented these changes, I can successfully produce the "One content" slide from the master, and it looks like most of the other layouts are output correctly as well, but if I try to create an instance of each slide layout, there is still a "repair" issue which I'll need to isolate.
Update:

Read Line Having Track Revisions in Word

Is there any way I can find Line Having Track Changes [Inserted or Deleted] using Open XML SDK. I have tried with below code I am able to detect whether document body having Track Changes or Not and It Works correctly Now What I want is to find which Text line of body contains track changes
public static System.Type[] trackedRevisionsElements = new System.Type[] {
typeof(CellDeletion),
typeof(CellInsertion),
typeof(CellMerge),
typeof(CustomXmlDelRangeEnd),
typeof(CustomXmlDelRangeStart),
typeof(CustomXmlInsRangeEnd),
typeof(CustomXmlInsRangeStart),
typeof(Deleted),
typeof(DeletedFieldCode),
typeof(DeletedMathControl),
typeof(DeletedRun),
typeof(DeletedText),
typeof(Inserted),
typeof(InsertedMathControl),
typeof(InsertedMathControl),
typeof(InsertedRun),
typeof(MoveFrom),
typeof(MoveFromRangeEnd),
typeof(MoveFromRangeStart),
typeof(MoveTo),
typeof(MoveToRangeEnd),
typeof(MoveToRangeStart),
typeof(MoveToRun),
typeof(NumberingChange),
typeof(ParagraphMarkRunPropertiesChange),
typeof(ParagraphPropertiesChange),
typeof(RunPropertiesChange),
typeof(SectionPropertiesChange),
typeof(TableCellPropertiesChange),
typeof(TableGridChange),
typeof(TablePropertiesChange),
typeof(TablePropertyExceptionsChange),
typeof(TableRowPropertiesChange),
};
public static bool PartHasTrackedRevisions(OpenXmlPart part)
{
List<OpenXmlElement> insertions =
part.RootElement.Descendants<Inserted>()
.Cast<OpenXmlElement>().ToList();
//Body bdy = wordDoc.MainDocumentPart.Document.Body;
if (part.RootElement.Descendants()
.Any(e => trackedRevisionsElements.Contains(e.GetType())))
{
var initialTextDescendants = part.RootElement.Descendants<Text>();
string dummy = string.Empty;
foreach (Text t in initialTextDescendants)
{
MessageBox.Show(t.Text);
}
}
return part.RootElement.Descendants()
.Any(e => trackedRevisionsElements.Contains(e.GetType()));
}
public static bool HasTrackedRevisions(WordprocessingDocument doc)
{
if (PartHasTrackedRevisions(doc.MainDocumentPart))
return true;
foreach (var part in doc.MainDocumentPart.HeaderParts)
if (PartHasTrackedRevisions(part))
return true;
foreach (var part in doc.MainDocumentPart.FooterParts)
if (PartHasTrackedRevisions(part))
return true;
if (doc.MainDocumentPart.EndnotesPart != null)
if (PartHasTrackedRevisions(doc.MainDocumentPart.EndnotesPart))
return true;
if (doc.MainDocumentPart.FootnotesPart != null)
if (PartHasTrackedRevisions(doc.MainDocumentPart.FootnotesPart))
return true;
return false;
}
private void button2_Click(object sender, EventArgs e)
{
foreach (var documentName in Directory.GetFiles(".", "*.docx"))
{
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(documentName, false))
{
if (HasTrackedRevisions(wordDoc)) {
//Body bdy = wordDoc.MainDocumentPart.Document.Body;
//var initialTextDescendants = bdy.Descendants<Text>();
//string dummy = string.Empty;
//foreach (Text t in initialTextDescendants)
//{
// richTextBox1.Text = richTextBox1.Text + t.Text;
//}
Console.WriteLine("{0} contains tracked revisions", documentName);
}
else
Console.WriteLine("{0} does not contain tracked revisions", documentName);
}
}
}

What exactly do you mean with "Text line of body"? The line of text as it would appear on a laid out document (which is not easy) or the Open XML elements that were changed?
If this is about the line of text on a laid out document, as produced by Microsoft Word, this is hard, because you would require a layout algorithm to understand where the lines with those tracked changes would be rendered.
If this is about the OpenXmlElements, e.g., Text or Paragraph, you already have part of your solution as this is about querying the XML mark-up.

OpenXml Excel: throw error in any word after mail address

I read Excel files using OpenXml. all work fine but if the spreadsheet contains one cell that has an address mail and after it a space and another word, such as:
abc#abc.com abc
It throws an exception immediately at the opening of the spreadsheet:
var _doc = SpreadsheetDocument.Open(_filePath, false);
exception:
DocumentFormat.OpenXml.Packaging.OpenXmlPackageException
Additional information:
Invalid Hyperlink: Malformed URI is embedded as a
hyperlink in the document.

There is an open issue on the OpenXml forum related to this problem: Malformed Hyperlink causes exception
In the post they talk about encountering this issue with a malformed "mailto:" hyperlink within a Word document.
They propose a work-around here: Workaround for malformed hyperlink exception
The workaround is essentially a small console application which locates the invalid URL and replaces it with a hard-coded value; here is the code snippet from their sample that does the replacement; you could augment this code to attempt to correct the passed brokenUri:
private static Uri FixUri(string brokenUri)
{
return new Uri("http://broken-link/");
}
The problem I had was actually with an Excel document (like you) and it had to do with a malformed http URL; I was pleasantly surprised to find that their code worked just fine with my Excel file.
Here is the entire work-around source code, just in case one of these links goes away in the future:
void Main(string[] args)
{
var fileName = #"C:\temp\corrupt.xlsx";
var newFileName = #"c:\temp\Fixed.xlsx";
var newFileInfo = new FileInfo(newFileName);
if (newFileInfo.Exists)
newFileInfo.Delete();
File.Copy(fileName, newFileName);
WordprocessingDocument wDoc;
try
{
using (wDoc = WordprocessingDocument.Open(newFileName, true))
{
ProcessDocument(wDoc);
}
}
catch (OpenXmlPackageException e)
{
e.Dump();
if (e.ToString().Contains("The specified package is not valid."))
{
using (FileStream fs = new FileStream(newFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
UriFixer.FixInvalidUri(fs, brokenUri => FixUri(brokenUri));
}
}
}
}
private static Uri FixUri(string brokenUri)
{
brokenUri.Dump();
return new Uri("http://broken-link/");
}
private static void ProcessDocument(WordprocessingDocument wDoc)
{
var elementCount = wDoc.MainDocumentPart.Document.Descendants().Count();
Console.WriteLine(elementCount);
}
}
public static class UriFixer
{
public static void FixInvalidUri(Stream fs, Func<string, Uri> invalidUriHandler)
{
XNamespace relNs = "http://schemas.openxmlformats.org/package/2006/relationships";
using (ZipArchive za = new ZipArchive(fs, ZipArchiveMode.Update))
{
foreach (var entry in za.Entries.ToList())
{
if (!entry.Name.EndsWith(".rels"))
continue;
bool replaceEntry = false;
XDocument entryXDoc = null;
using (var entryStream = entry.Open())
{
try
{
entryXDoc = XDocument.Load(entryStream);
if (entryXDoc.Root != null && entryXDoc.Root.Name.Namespace == relNs)
{
var urisToCheck = entryXDoc
.Descendants(relNs + "Relationship")
.Where(r => r.Attribute("TargetMode") != null && (string)r.Attribute("TargetMode") == "External");
foreach (var rel in urisToCheck)
{
var target = (string)rel.Attribute("Target");
if (target != null)
{
try
{
Uri uri = new Uri(target);
}
catch (UriFormatException)
{
Uri newUri = invalidUriHandler(target);
rel.Attribute("Target").Value = newUri.ToString();
replaceEntry = true;
}
}
}
}
}
catch (XmlException)
{
continue;
}
}
if (replaceEntry)
{
var fullName = entry.FullName;
entry.Delete();
var newEntry = za.CreateEntry(fullName);
using (StreamWriter writer = new StreamWriter(newEntry.Open()))
using (XmlWriter xmlWriter = XmlWriter.Create(writer))
{
entryXDoc.WriteTo(xmlWriter);
}
}
}
}
}

The fix by #RMD works great. I've been using it for years. But there is a new fix.
You can see the fix here in the changelog for issue #793
Upgrade OpenXML to 2.12.0.
Right click solution and select Manage NuGet Packages.
Implement the fix
It is helpful to have a unit test. Create an excel file with a bad email address like test#gmail,com. (Note the comma instead of the dot).
Make sure the stream you open and the call to SpreadsheetDocument.Open allows Read AND Write.
You need to implement a RelationshipErrorHandlerFactory and use it in the options when you open. Here is the code I used:
public class UriRelationshipErrorHandler : RelationshipErrorHandler
{
public override string Rewrite(Uri partUri, string id, string uri)
{
return "https://broken-link";
}
}
Then you need to use it when you open the document like this:
var openSettings = new OpenSettings
{
RelationshipErrorHandlerFactory = package =>
{
return new UriRelationshipErrorHandler();
}
};
using var document = SpreadsheetDocument.Open(stream, true, openSettings);
One of the nice things about this solution is that it does not require you to create a temporary "fixed" version of your file and it is far less code.

Unfortunately solution where you have to open file as zip and replace broken hyperlink would not help me.
I just was wondering how it is posible that it works fine when your target framework is 4.0 even if your only installed .Net Framework has version 4.7.2.
I have found out that there is private static field inside System.UriParser that selects version of URI's RFC specification. So it is possible to set it to V2 as it is set for .net 4.0 and lower versions of .Net Framework. Only problem that it is private static readonly.
Maybe someone will want to set it globally for whole application. But I wrote UriQuirksVersionPatcher that will update this version and restore it back in Dispose method. It is obviously not thread-safe but it is acceptable for my purpose.
using System;
using System.Diagnostics;
using System.Reflection;
namespace BarCap.RiskServices.RateSubmissions.Utility
{
#if (NET20 || NET35 || NET40)
public class UriQuirksVersionPatcher : IDisposable
{
public void Dispose()
{
}
}
#else
public class UriQuirksVersionPatcher : IDisposable
{
private const string _quirksVersionFieldName = "s_QuirksVersion"; //See Source\ndp\fx\src\net\System\_UriSyntax.cs in NexFX sources
private const string _uriQuirksVersionEnumName = "UriQuirksVersion";
/// <code>
/// private enum UriQuirksVersion
/// {
/// V1 = 1, // RFC 1738 - Not supported
/// V2 = 2, // RFC 2396
/// V3 = 3, // RFC 3986, 3987
/// }
/// </code>
private const string _oldQuirksVersion = "V2";
private static readonly Lazy<FieldInfo> _targetFieldInfo;
private static readonly Lazy<int?> _patchValue;
private readonly int _oldValue;
private readonly bool _isEnabled;
static UriQuirksVersionPatcher()
{
var targetType = typeof(UriParser);
_targetFieldInfo = new Lazy<FieldInfo>(() => targetType.GetField(_quirksVersionFieldName, BindingFlags.Static | BindingFlags.NonPublic));
_patchValue = new Lazy<int?>(() => GetUriQuirksVersion(targetType));
}
public UriQuirksVersionPatcher()
{
int? patchValue = _patchValue.Value;
_isEnabled = patchValue.HasValue;
if (!_isEnabled) //Disabled if it failed to get enum value
{
return;
}
int originalValue = QuirksVersion;
_isEnabled = originalValue != patchValue;
if (!_isEnabled) //Disabled if value is proper
{
return;
}
_oldValue = originalValue;
QuirksVersion = patchValue.Value;
}
private int QuirksVersion
{
get
{
return (int)_targetFieldInfo.Value.GetValue(null);
}
set
{
_targetFieldInfo.Value.SetValue(null, value);
}
}
private static int? GetUriQuirksVersion(Type targetType)
{
int? result = null;
try
{
result = (int)targetType.GetNestedType(_uriQuirksVersionEnumName, BindingFlags.Static | BindingFlags.NonPublic)
.GetField(_oldQuirksVersion, BindingFlags.Static | BindingFlags.Public)
.GetValue(null);
}
catch
{
#if DEBUG
Debug.WriteLine("ERROR: Failed to find UriQuirksVersion.V2 enum member.");
throw;
#endif
}
return result;
}
public void Dispose()
{
if (_isEnabled)
{
QuirksVersion = _oldValue;
}
}
}
#endif
}
Usage:
using(new UriQuirksVersionPatcher())
{
using(var document = SpreadsheetDocument.Open(fullPath, false))
{
//.....
}
}
P.S. Later I found that someone already implemented this pathcher: https://github.com/google/google-api-dotnet-client/blob/master/Src/Support/Google.Apis.Core/Util/UriPatcher.cs

I haven't use OpenXml but if there's no specific reason for using it then I highly recommend LinqToExcel from LinqToExcel. Example of code is here:
var sheet = new ExcelQueryFactory("filePath");
var allRows = from r in sheet.Worksheet() select r;
foreach (var r in allRows) {
var cella = r["Header"].ToString();
}

iterate through and infoPath form

I have a infopath form that may contain multiple attachments: By using a group of repeating elements, the user can click an “click add item” option he will be able to upload more attachments.
In Sharepoint I am using a workflow to extract the attachments and put them in a separate list. So far I manage only to extract the first one and the workflow finishes successfully.
can I put a loop or something to iterate trough the form?
I attach the code below:
public sealed partial class FileCopyFeature : SharePointSequentialWorkflowActivity
{
public FileCopyFeature()
{
InitializeComponent();
}
public Guid workflowId = default(System.Guid);
public Microsoft.SharePoint.Workflow.SPWorkflowActivationProperties workflowProperties = new Microsoft.SharePoint.Workflow.SPWorkflowActivationProperties();
private void CopyFile(object sender, EventArgs e)
{
// Retrieve the file associated with the item
// on which the workflow has been instantiated
SPFile file = workflowProperties.Item.File;
if (file == null)
return;
// Get the binary data of the file
byte[] xmlFormData = null;
xmlFormData = file.OpenBinary();
// Load the data into an XPathDocument object
XPathDocument ipForm = null;
if (xmlFormData != null)
{
using (MemoryStream ms = new MemoryStream(xmlFormData))
{
ipForm = new XPathDocument(ms);
ms.Close();
}
}
if (ipForm == null)
return;
// Create an XPathNavigator object to navigate the XML
XPathNavigator ipFormNav = ipForm.CreateNavigator();
ipFormNav.MoveToFollowing(XPathNodeType.Element);
XmlNamespaceManager nsManager =
new XmlNamespaceManager(new NameTable());
foreach (KeyValuePair<string, string> ns
in ipFormNav.GetNamespacesInScope(XmlNamespaceScope.All))
{
if (ns.Key == String.Empty)
{
nsManager.AddNamespace("def", ns.Value);
}
else
{
nsManager.AddNamespace(ns.Key, ns.Value);
}
}
do
{
XPathNavigator nodeNav = ipFormNav.SelectSingleNode("//my:field2", nsManager);
// Retrieve the value of the attachment in the InfoPath form
//XPathNavigator nodeNav = ipFormNav.SelectSingleNode(
//"//my:field2", nsManager);
string ipFieldValue = string.Empty;
if (nodeNav != null)
{
ipFieldValue = nodeNav.Value;
// Decode the InfoPath file attachment
InfoPathAttachmentDecoder dec =
new InfoPathAttachmentDecoder(ipFieldValue);
string fileName = dec.Filename;
byte[] data = dec.DecodedAttachment;
// Add the file to a document library
using (SPWeb web = workflowProperties.Web)
{
SPFolder docLib = web.Folders["Doc"];
docLib.Files.Add(fileName, data);
docLib.Update();
// workflowProperties.Item.CopyTo(data + "/Doc/" + fileName);
}
}
}
while (ipFormNav.MoveToNext());
}
}
/// <summary>
/// Decodes a file attachment and saves it to a specified path.
/// </summary>
public class InfoPathAttachmentDecoder
{
private const int SP1Header_Size = 20;
private const int FIXED_HEADER = 16;
private int fileSize;
private int attachmentNameLength;
private string attachmentName;
private byte[] decodedAttachment;
/// <summary>
/// Accepts the Base64 encoded string
/// that is the attachment.
/// </summary>
public InfoPathAttachmentDecoder(string theBase64EncodedString)
{
byte[] theData = Convert.FromBase64String(theBase64EncodedString);
using (MemoryStream ms = new MemoryStream(theData))
{
BinaryReader theReader = new BinaryReader(ms);
DecodeAttachment(theReader);
}
}
private void DecodeAttachment(BinaryReader theReader)
{
//Position the reader to get the file size.
byte[] headerData = new byte[FIXED_HEADER];
headerData = theReader.ReadBytes(headerData.Length);
fileSize = (int)theReader.ReadUInt32();
attachmentNameLength = (int)theReader.ReadUInt32() * 2;
byte[] fileNameBytes = theReader.ReadBytes(attachmentNameLength);
//InfoPath uses UTF8 encoding.
Encoding enc = Encoding.Unicode;
attachmentName = enc.GetString(fileNameBytes, 0, attachmentNameLength - 2);
decodedAttachment = theReader.ReadBytes(fileSize);
}
public void SaveAttachment(string saveLocation)
{
string fullFileName = saveLocation;
if (!fullFileName.EndsWith(Path.DirectorySeparatorChar.ToString()))
{
fullFileName += Path.DirectorySeparatorChar;
}
fullFileName += attachmentName;
if (File.Exists(fullFileName))
File.Delete(fullFileName);
FileStream fs = new FileStream(fullFileName, FileMode.CreateNew);
BinaryWriter bw = new BinaryWriter(fs);
bw.Write(decodedAttachment);
bw.Close();
fs.Close();
}
public string Filename
{
get { return attachmentName; }
}
public byte[] DecodedAttachment
{
get { return decodedAttachment; }
}

It appears that your issue has to do with your use of MoveToNext. Per the documentation, this function moves to the next sibling, and does not navigate to children elements. Your code appears to go to the first element it finds (presumably my:myFields), looks for the first child named my:field2 (it only pulls the first one since you are using SelectSingleNode, and then goes to the next sibling of my:myFields (not the next sibling of my:field2). One way to fix this might be to replace your current do-while loop with a call to SelectNodes like the following and then iterate over nodeList.
XmlNodeList nodelist = ipFormNav.SelectNodes("//my:field2", nsManager);

Open-XML saving word document produces corrupted file

Although I'm new to the Open-XML world, I've already encountered some troubles/problems using it. The most of them were easily solved but I can't get around this one:
public class ReportDocument : IDisposable
{
private MemoryStream stream;
private WordprocessingDocument document;
private MainDocumentPart mainPart;
public byte[] DocumentData
{
get
{
this.document.ChangeDocumentType(WordprocessingDocumentType.MacroEnabledDocument);
byte[] documentData = this.stream.ToArray();
return documentData;
}
}
public ReportDocument()
{
byte[] template = DocumentTemplates.SingleReportTemplate;
this.stream = new MemoryStream();
stream.Write(template, 0, template.Length);
this.document = WordprocessingDocument.Open(stream, true);
this.mainPart = document.MainDocumentPart;
}
public void SetReport(Report report)
{
Body body = mainPart.Document.Body;
var placeholder = body.Descendants<SdtBlock>();
this.SetPlaceholderTextValue(placeholder, "Company", WebApplication.Service.Properties.Settings.Default.CompanyName);
this.SetPlaceholderTextValue(placeholder, "Title", String.Format("Status Report for {0} to {1}", report.StartDate.ToShortDateString(),
report.ReportingInterval.EndDate.ToShortDateString()));
//this.SetPlaceholderTextValue(placeholder, "Subtitle", String.Format("for {0}", report.ReportingInterval.Project.Name));
this.SetPlaceholderTextValue(placeholder, "Author", report.TeamMember.User.Username);
this.SetPlaceholderTextValue(placeholder, "Date", String.Format("for {0}", DateTime.Today.ToShortDateString()));
}
private void SetPlaceholderTextValue(IEnumerable<SdtBlock> sdts, string alias, string value)
{
SdtContentBlock contentBlock = this.GetContentBlock(sdts, alias);
Text text = contentBlock.Descendants<Text>().First();
text.Text = value;
}
private SdtContentBlock GetContentBlock(IEnumerable<SdtBlock> sdts, string alias)
{
return sdts.First(sdt => sdt.Descendants<SdtAlias>().First().Val.Value == alias).SdtContentBlock;
}
public void Dispose()
{
this.document.Close();
}
}
So i create a new document, based on a template it gains through the memory stream and want to write it back to a memory stream when the changes are made.
The big problem is, when i save the resulting byte array the data docx file is corrupted:
The document.xml in .\word is called document2.xml
The document.xml.rels in .\word_rels is called document2.xml.rels and it contains
I hope some of you can provide good solutions for it.
MFG SakeSushiBig

Change your DocumentData property to this and I think it should work. The important thing is to close the document before you read the memorystream.
public byte[] DocumentData
{
get
{
this.document.ChangeDocumentType(WordprocessingDocumentType.MacroEnabledDocument);
this.document.MainDocumentPart.Document.Save();
this.document.Close();
byte[] documentData = this.stream.ToArray();
return documentData;
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

OpenXml image relationship doesn't exist - c#

Related

C# OpenXML SDK - Inserting a new slide from slide masters

Read Line Having Track Revisions in Word

OpenXml Excel: throw error in any word after mail address

iterate through and infoPath form

Open-XML saving word document produces corrupted file

Categories

Resources