I want to read out the BuiltInDocumentProperties/CustomDocumentProperties of an Word document. The following Source always return null :-(
using Microsoft.Office.Core;
using Word = Microsoft.Office.Interop.Word;
.....
private void toolStripMenuItemTmp_Click(object sender, EventArgs e)
{
Word.Application word = new Word.Application();
Word.Document document = word.Documents.Open(#"C:\Users\fillibuster\Desktop\docproperty.docx");
DocumentProperties properties = (DocumentProperties)document.CustomDocumentProperties;
if (properties != null)
{
foreach (Microsoft.Office.Core.DocumentProperty item in properties)
{
MessageBox.Show(item.Name.ToString() + item.Value.ToString());
}
}
else
{
MessageBox.Show("null");
}
}
What's wrong with the source? CustomDocumentProperties and BuiltInDocumentProperties are available and filled in the document!
I had the same issue with .docx document. One way to get through was to forget about type casting and instead keep dynamic and object as types and then the code worked. I suspect that the COM property of a .docx file is not the type described in the MSDN...
So this code captures the raw document's properties and set them in a Dictionary.
try
{
BuiltInDocumentProperties = new Dictionary<string, string>();
var builtinProps = Doc.BuiltInDocumentProperties; // don't strong cast this or you will get null
SetBuiltInProperty(builtinProps, "Title");
SetBuiltInProperty(builtinProps, "Keywords");
}
catch (Exception e)
{
// Ignorer l'erreur
Log.Warn("Erreur inattendue à la lecture des propriétés internes du document", e);
}
IDictionary<string, string> BuiltInDocumentProperties { get; set; }
internal void SetBuiltInProperty(dynamic builtInProps, string property)
{
if (builtInProps != null)
{
try
{
var prop = builtInProps[property];
if (prop != null)
{
string str = prop.Value.ToString();
BuiltInDocumentProperties[property] = str;
}
}
catch (RuntimeBinderException)
{
// Property is missing
}
catch (COMException)
{
}
}
}
Related
Here is the code for my work.
public void InsertValue(WordprocessingDocument doc, string bookMark, string txt)
{
try
{
RemoveBookMarkContent(doc, bookMark);
var bmStart = FindBookMarkStart(doc, bookMark);
if (bmStart == null)
return;
var run = new Run();
run.Append(GetRunProperties());
run.Append(new Text(txt));
bmStart.Parent.InsertAfter(run, bmStart);
}
catch (Exception c)
{
//not Exception
}
}
private void RemoveBookMarkContent(WordprocessingDocument doc, string bmName)
{
BookmarkStart bmStart = FindBookMarkStart(doc, bmName);
if (bmStart == null)
return;
BookmarkEnd bmEnd = FindBookMarkEnd(doc, bmStart.Id);
while (true)
{
var run = bmStart.NextSibling();
if (run == null)
{
break;
}
if (run is BookmarkEnd && (BookmarkEnd)run == bmEnd)
{
break;
}
run.Remove();
}
}
There are still several auxiliary classes not written.Work process, first find the bookmark location, delete the content of the bookmark location, and then add it.I've also tried to add one Paragraph to the bookmark location.But that doesn't work.
Document to insert in bookmark eg:露点:U=0.15℃(k=2);相对湿度:U=1.0%RH(k=2).Both u and K must be italics.Any help will be appreciated.Thanks.
I tried a new component.[Spire.Office.][1]
At the beginning, I didn't think of a solution, but I used the global search and replacement to determine whether the search location has bookmarks, which perfectly solved the problem.
Here is the code for my work.
var selection = document.FindAllString("U", false, true);
foreach (var sec in selection)
{
var t = sec.GetAsOneRange();
if (sec.GetAsOneRange()?.Owner?.LastChild?.DocumentObjectType == DocumentObjectType.BookmarkEnd)
{
sec.GetAsOneRange().CharacterFormat.Italic = true;
}
}
I didn't try to do this with openxml, but I think the principle should be consistent.
[1]: https://www.e-iceblue.cn/Buy/Spire-PDF-NET.html
Is there any way I can find Line Having Track Changes [Inserted or Deleted] using Open XML SDK. I have tried with below code I am able to detect whether document body having Track Changes or Not and It Works correctly Now What I want is to find which Text line of body contains track changes
public static System.Type[] trackedRevisionsElements = new System.Type[] {
typeof(CellDeletion),
typeof(CellInsertion),
typeof(CellMerge),
typeof(CustomXmlDelRangeEnd),
typeof(CustomXmlDelRangeStart),
typeof(CustomXmlInsRangeEnd),
typeof(CustomXmlInsRangeStart),
typeof(Deleted),
typeof(DeletedFieldCode),
typeof(DeletedMathControl),
typeof(DeletedRun),
typeof(DeletedText),
typeof(Inserted),
typeof(InsertedMathControl),
typeof(InsertedMathControl),
typeof(InsertedRun),
typeof(MoveFrom),
typeof(MoveFromRangeEnd),
typeof(MoveFromRangeStart),
typeof(MoveTo),
typeof(MoveToRangeEnd),
typeof(MoveToRangeStart),
typeof(MoveToRun),
typeof(NumberingChange),
typeof(ParagraphMarkRunPropertiesChange),
typeof(ParagraphPropertiesChange),
typeof(RunPropertiesChange),
typeof(SectionPropertiesChange),
typeof(TableCellPropertiesChange),
typeof(TableGridChange),
typeof(TablePropertiesChange),
typeof(TablePropertyExceptionsChange),
typeof(TableRowPropertiesChange),
};
public static bool PartHasTrackedRevisions(OpenXmlPart part)
{
List<OpenXmlElement> insertions =
part.RootElement.Descendants<Inserted>()
.Cast<OpenXmlElement>().ToList();
//Body bdy = wordDoc.MainDocumentPart.Document.Body;
if (part.RootElement.Descendants()
.Any(e => trackedRevisionsElements.Contains(e.GetType())))
{
var initialTextDescendants = part.RootElement.Descendants<Text>();
string dummy = string.Empty;
foreach (Text t in initialTextDescendants)
{
MessageBox.Show(t.Text);
}
}
return part.RootElement.Descendants()
.Any(e => trackedRevisionsElements.Contains(e.GetType()));
}
public static bool HasTrackedRevisions(WordprocessingDocument doc)
{
if (PartHasTrackedRevisions(doc.MainDocumentPart))
return true;
foreach (var part in doc.MainDocumentPart.HeaderParts)
if (PartHasTrackedRevisions(part))
return true;
foreach (var part in doc.MainDocumentPart.FooterParts)
if (PartHasTrackedRevisions(part))
return true;
if (doc.MainDocumentPart.EndnotesPart != null)
if (PartHasTrackedRevisions(doc.MainDocumentPart.EndnotesPart))
return true;
if (doc.MainDocumentPart.FootnotesPart != null)
if (PartHasTrackedRevisions(doc.MainDocumentPart.FootnotesPart))
return true;
return false;
}
private void button2_Click(object sender, EventArgs e)
{
foreach (var documentName in Directory.GetFiles(".", "*.docx"))
{
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(documentName, false))
{
if (HasTrackedRevisions(wordDoc)) {
//Body bdy = wordDoc.MainDocumentPart.Document.Body;
//var initialTextDescendants = bdy.Descendants<Text>();
//string dummy = string.Empty;
//foreach (Text t in initialTextDescendants)
//{
// richTextBox1.Text = richTextBox1.Text + t.Text;
//}
Console.WriteLine("{0} contains tracked revisions", documentName);
}
else
Console.WriteLine("{0} does not contain tracked revisions", documentName);
}
}
}
What exactly do you mean with "Text line of body"? The line of text as it would appear on a laid out document (which is not easy) or the Open XML elements that were changed?
If this is about the line of text on a laid out document, as produced by Microsoft Word, this is hard, because you would require a layout algorithm to understand where the lines with those tracked changes would be rendered.
If this is about the OpenXmlElements, e.g., Text or Paragraph, you already have part of your solution as this is about querying the XML mark-up.
I read Excel files using OpenXml. all work fine but if the spreadsheet contains one cell that has an address mail and after it a space and another word, such as:
abc#abc.com abc
It throws an exception immediately at the opening of the spreadsheet:
var _doc = SpreadsheetDocument.Open(_filePath, false);
exception:
DocumentFormat.OpenXml.Packaging.OpenXmlPackageException
Additional information:
Invalid Hyperlink: Malformed URI is embedded as a
hyperlink in the document.
There is an open issue on the OpenXml forum related to this problem: Malformed Hyperlink causes exception
In the post they talk about encountering this issue with a malformed "mailto:" hyperlink within a Word document.
They propose a work-around here: Workaround for malformed hyperlink exception
The workaround is essentially a small console application which locates the invalid URL and replaces it with a hard-coded value; here is the code snippet from their sample that does the replacement; you could augment this code to attempt to correct the passed brokenUri:
private static Uri FixUri(string brokenUri)
{
return new Uri("http://broken-link/");
}
The problem I had was actually with an Excel document (like you) and it had to do with a malformed http URL; I was pleasantly surprised to find that their code worked just fine with my Excel file.
Here is the entire work-around source code, just in case one of these links goes away in the future:
void Main(string[] args)
{
var fileName = #"C:\temp\corrupt.xlsx";
var newFileName = #"c:\temp\Fixed.xlsx";
var newFileInfo = new FileInfo(newFileName);
if (newFileInfo.Exists)
newFileInfo.Delete();
File.Copy(fileName, newFileName);
WordprocessingDocument wDoc;
try
{
using (wDoc = WordprocessingDocument.Open(newFileName, true))
{
ProcessDocument(wDoc);
}
}
catch (OpenXmlPackageException e)
{
e.Dump();
if (e.ToString().Contains("The specified package is not valid."))
{
using (FileStream fs = new FileStream(newFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
UriFixer.FixInvalidUri(fs, brokenUri => FixUri(brokenUri));
}
}
}
}
private static Uri FixUri(string brokenUri)
{
brokenUri.Dump();
return new Uri("http://broken-link/");
}
private static void ProcessDocument(WordprocessingDocument wDoc)
{
var elementCount = wDoc.MainDocumentPart.Document.Descendants().Count();
Console.WriteLine(elementCount);
}
}
public static class UriFixer
{
public static void FixInvalidUri(Stream fs, Func<string, Uri> invalidUriHandler)
{
XNamespace relNs = "http://schemas.openxmlformats.org/package/2006/relationships";
using (ZipArchive za = new ZipArchive(fs, ZipArchiveMode.Update))
{
foreach (var entry in za.Entries.ToList())
{
if (!entry.Name.EndsWith(".rels"))
continue;
bool replaceEntry = false;
XDocument entryXDoc = null;
using (var entryStream = entry.Open())
{
try
{
entryXDoc = XDocument.Load(entryStream);
if (entryXDoc.Root != null && entryXDoc.Root.Name.Namespace == relNs)
{
var urisToCheck = entryXDoc
.Descendants(relNs + "Relationship")
.Where(r => r.Attribute("TargetMode") != null && (string)r.Attribute("TargetMode") == "External");
foreach (var rel in urisToCheck)
{
var target = (string)rel.Attribute("Target");
if (target != null)
{
try
{
Uri uri = new Uri(target);
}
catch (UriFormatException)
{
Uri newUri = invalidUriHandler(target);
rel.Attribute("Target").Value = newUri.ToString();
replaceEntry = true;
}
}
}
}
}
catch (XmlException)
{
continue;
}
}
if (replaceEntry)
{
var fullName = entry.FullName;
entry.Delete();
var newEntry = za.CreateEntry(fullName);
using (StreamWriter writer = new StreamWriter(newEntry.Open()))
using (XmlWriter xmlWriter = XmlWriter.Create(writer))
{
entryXDoc.WriteTo(xmlWriter);
}
}
}
}
}
The fix by #RMD works great. I've been using it for years. But there is a new fix.
You can see the fix here in the changelog for issue #793
Upgrade OpenXML to 2.12.0.
Right click solution and select Manage NuGet Packages.
Implement the fix
It is helpful to have a unit test. Create an excel file with a bad email address like test#gmail,com. (Note the comma instead of the dot).
Make sure the stream you open and the call to SpreadsheetDocument.Open allows Read AND Write.
You need to implement a RelationshipErrorHandlerFactory and use it in the options when you open. Here is the code I used:
public class UriRelationshipErrorHandler : RelationshipErrorHandler
{
public override string Rewrite(Uri partUri, string id, string uri)
{
return "https://broken-link";
}
}
Then you need to use it when you open the document like this:
var openSettings = new OpenSettings
{
RelationshipErrorHandlerFactory = package =>
{
return new UriRelationshipErrorHandler();
}
};
using var document = SpreadsheetDocument.Open(stream, true, openSettings);
One of the nice things about this solution is that it does not require you to create a temporary "fixed" version of your file and it is far less code.
Unfortunately solution where you have to open file as zip and replace broken hyperlink would not help me.
I just was wondering how it is posible that it works fine when your target framework is 4.0 even if your only installed .Net Framework has version 4.7.2.
I have found out that there is private static field inside System.UriParser that selects version of URI's RFC specification. So it is possible to set it to V2 as it is set for .net 4.0 and lower versions of .Net Framework. Only problem that it is private static readonly.
Maybe someone will want to set it globally for whole application. But I wrote UriQuirksVersionPatcher that will update this version and restore it back in Dispose method. It is obviously not thread-safe but it is acceptable for my purpose.
using System;
using System.Diagnostics;
using System.Reflection;
namespace BarCap.RiskServices.RateSubmissions.Utility
{
#if (NET20 || NET35 || NET40)
public class UriQuirksVersionPatcher : IDisposable
{
public void Dispose()
{
}
}
#else
public class UriQuirksVersionPatcher : IDisposable
{
private const string _quirksVersionFieldName = "s_QuirksVersion"; //See Source\ndp\fx\src\net\System\_UriSyntax.cs in NexFX sources
private const string _uriQuirksVersionEnumName = "UriQuirksVersion";
/// <code>
/// private enum UriQuirksVersion
/// {
/// V1 = 1, // RFC 1738 - Not supported
/// V2 = 2, // RFC 2396
/// V3 = 3, // RFC 3986, 3987
/// }
/// </code>
private const string _oldQuirksVersion = "V2";
private static readonly Lazy<FieldInfo> _targetFieldInfo;
private static readonly Lazy<int?> _patchValue;
private readonly int _oldValue;
private readonly bool _isEnabled;
static UriQuirksVersionPatcher()
{
var targetType = typeof(UriParser);
_targetFieldInfo = new Lazy<FieldInfo>(() => targetType.GetField(_quirksVersionFieldName, BindingFlags.Static | BindingFlags.NonPublic));
_patchValue = new Lazy<int?>(() => GetUriQuirksVersion(targetType));
}
public UriQuirksVersionPatcher()
{
int? patchValue = _patchValue.Value;
_isEnabled = patchValue.HasValue;
if (!_isEnabled) //Disabled if it failed to get enum value
{
return;
}
int originalValue = QuirksVersion;
_isEnabled = originalValue != patchValue;
if (!_isEnabled) //Disabled if value is proper
{
return;
}
_oldValue = originalValue;
QuirksVersion = patchValue.Value;
}
private int QuirksVersion
{
get
{
return (int)_targetFieldInfo.Value.GetValue(null);
}
set
{
_targetFieldInfo.Value.SetValue(null, value);
}
}
private static int? GetUriQuirksVersion(Type targetType)
{
int? result = null;
try
{
result = (int)targetType.GetNestedType(_uriQuirksVersionEnumName, BindingFlags.Static | BindingFlags.NonPublic)
.GetField(_oldQuirksVersion, BindingFlags.Static | BindingFlags.Public)
.GetValue(null);
}
catch
{
#if DEBUG
Debug.WriteLine("ERROR: Failed to find UriQuirksVersion.V2 enum member.");
throw;
#endif
}
return result;
}
public void Dispose()
{
if (_isEnabled)
{
QuirksVersion = _oldValue;
}
}
}
#endif
}
Usage:
using(new UriQuirksVersionPatcher())
{
using(var document = SpreadsheetDocument.Open(fullPath, false))
{
//.....
}
}
P.S. Later I found that someone already implemented this pathcher: https://github.com/google/google-api-dotnet-client/blob/master/Src/Support/Google.Apis.Core/Util/UriPatcher.cs
I haven't use OpenXml but if there's no specific reason for using it then I highly recommend LinqToExcel from LinqToExcel. Example of code is here:
var sheet = new ExcelQueryFactory("filePath");
var allRows = from r in sheet.Worksheet() select r;
foreach (var r in allRows) {
var cella = r["Header"].ToString();
}
I'm having trouble figuring out how to determine if a number is duplicated.
Right now, the process is when the user clicks on a button to browse for an xml file, the xml file gets deserialized and stored into db and the data gets shown on a DataGrid on the view.
So, I added a confirmation dialog so when the user clicks on browse, the code checks to see if the lot_number being deserialized is a duplicate or not from inside a column from a table in database. I only want the user to be able to add lot numbers to db that are not duplicates.
Here's my code so far:
public void DeSerializationStream(string filePath)
{
XmlRootAttribute xRoot = new XmlRootAttribute();
xRoot.ElementName = "lot_information";
xRoot.IsNullable = false;
// Create an instance of lotinformation class.
var lot = new LotInformation();
// Create an instance of stream writer.
TextReader txtReader = new StreamReader(filePath);
// Create and instance of XmlSerializer class.
XmlSerializer xmlSerializer = new XmlSerializer(typeof(LotInformation), xRoot);
// DeSerialize from the StreamReader
lot = (LotInformation)xmlSerializer.Deserialize(txtReader);
// Close the stream reader
txtReader.Close();
}
public void ReadLot(LotInformation lot)
{
try
{
using (var db = new DDataContext())
{
var lotNumDb = db.LotInformation.FirstOrDefault(r => r.lot_number.Equals(r.lot_number));
if (lotNumDb != null || lotNumDb.lot_number.ToString().Equals(lot.lot_number))
{
confirmationWindow.Message = LanguageResources.Resource.Sample_Exists_Already;
dialogService.ShowDialog(LanguageResources.Resource.Error, confirmationWindow);
}
else {
Console.WriteLine("lot does not exist. yay");
}
DateTime ExpirationDate = lot.exp_date;
if (ExpirationDate != null)
{
if (DateTime.Compare(ExpirationDate, DateTime.Now) > 0)
{
try
{
LotInformation lotInfo = db.LotInformation.FirstOrDefault(r => r.lot_number.Equals(lotNumber));
}
catch (InvalidOperationException e)
{
//TODO: Add a Dialog Here
}
}
else
{
Console.WriteLine(ExpirationDate);
errorWindow.Message = LanguageResources.Resource.Lot_Expired;
dialogService.ShowDialog(LanguageResources.Resource.Error, errorWindow);
}
}
else
{
errorWindow.Message = LanguageResources.Resource.Lot_Not_In_Database;
dialogService.ShowDialog(LanguageResources.Resource.Error, errorWindow);
}
}
}
catch
{
errorWindow.Message = LanguageResources.Resource.Database_Error;
dialogService.ShowDialog(LanguageResources.Resource.Error, errorWindow);
logger.writeErrLog(LanguageResources.Resource.Database_Error);
}
}
I think I'm just having problems with when to grab the lot_number in this process.
This part below gives me problems. It keeps showing the Sample Exists already message for unique lot numbers that I'm uploading and I'm not sure why. I think it's a problem with my LINQ query but I'm not sure how to fix it. Any ideas?
var lotNumDb = db.LotInformation.FirstOrDefault(r => r.lot_number.Equals(r.lot_number));
if (lotNumDb != null || lotNumDb.lot_number.ToString().Equals(lot.lot_number))
{
confirmationWindow.Message = LanguageResources.Resource.Sample_Exists_Already;
dialogService.ShowDialog(LanguageResources.Resource.Error, confirmationWindow);
}
else {
Console.WriteLine("lot does not exist. yay");
}
you can't use like this:
db.LotInformation.FirstOrDefault(r => r.lot_number.Equals(r.lot_number))
may be :
db.LotInformation.FirstOrDefault(r => r.lot_number.Equals(lot.lot_number))
or
db.LotInformation.FirstOrDefault(r => r.lot_number.Equals(a string))
I am not sure if WebBrowser.DocumentText contains only top document source or frames document text also included. Could not find that from MSDN page.
No it does not. I have tried next:
DocumentText:
File.WriteAllText(#"C:\doc.txt", webBrowser1.DocumentText, Encoding.UTF8);
GetElementsByTagName("HTML")
HtmlElement elem;
if (webBrowser1.Document != null)
{
HtmlElementCollection elems = webBrowser1.Document.GetElementsByTagName("HTML");
if (elems.Count == 1)
{
elem = elems[0];
string pageSource = elem.OuterHtml;
File.WriteAllText(#"C:\doc.txt", pageSource, Encoding.UTF8);
}
}
IOleCommandTarget
public void ShowSource()
{
IOleCommandTarget cmdt = null;
object o = null;
object oIE = null;
try {
cmdt = (IOleCommandTarget)this.Document.DomDocument;
cmdt.Exec(cmdGUID, oCommands.ViewSource, 1, o, o);
} catch (Exception ex) {
throw new Exception(ex.Message.ToString(), ex.InnerException);
} finally {
cmdt = null;
}
}
The only way is to go through all frame documents.
Updated If iframe has different url you will get UnauthorizedAccessException when trying to retrieve iframe document