DocumentItem.SaveAs results in corrupted file

DocumentItem.SaveAs results in corrupted file - c#

Within outlook I have various DocumentItems in folders such as the inbox and I am trying to save these to the file system within a drag/drop event.
Here is the code:
for (var i = 1; i <= _application.ActiveExplorer().Selection.Count; i++)
{
var temp = _application.ActiveExplorer().Selection[i];
var documentItem = (temp as DocumentItem);
if (documentItem == null)
continue;
var tempFileName = Path.GetTempPath() + documentItem.Subject;
documentItem.SaveAs(tempFileName);
}
They seem to save successfully and have file sizes:
But when I try to open any of them they all say they cannot be opened so they are corrupted somehow, does anyone have any ideas?

You are calling SaveAs without specifying the format, and Outlook Object Model defaults it to olMsg. You end up with an MSG file with a JPG extension.
What you need to do is loop though all attachments in the DocumentItem.Attachments collection and call Attachment.SaveAsFile. You might also want to use the Attachmeent.FileName property.
Just a general comment - multiple dot notation is evil, especially in a loo:
Selection selection = _application.ActiveExplorer().Selection;
for (var i = 1; i <= selection.Count; i++)
{
var temp = selection[i];
var documentItem = (temp as DocumentItem);
...

Related

How can I debug my add-in when WordEditor's Content property is crashing Outlook?

I have folder full of *.msg files saved from Outlook and I'm trying to convert them to Word.
There is a loop that loads each *.msg as MailItem and saves them.
public static ConversionResult ConvertEmailsToWord(this Outlook.Application app, string source, string target)
{
var word = new Word.Application();
var emailCounter = 0;
var otherCounter = 0;
var directoryTree = new PhysicalDirectoryTree();
foreach (var node in directoryTree.Walk(source))
{
foreach (var fileName in node.FileNames)
{
var currentFile = Path.Combine(node.DirectoryName, fileName);
var branch = Regex.Replace(node.DirectoryName, $"^{Regex.Escape(source)}", string.Empty).Trim('\\');
Debug.Print($"Processing file: {currentFile}");
// This is an email. Convert it to Word.
if (Regex.IsMatch(fileName, #"\.msg$"))
{
if (app.Session.OpenSharedItem(currentFile) is MailItem item)
{
if (item.SaveAs(word, Path.Combine(target, branch), fileName))
{
emailCounter++;
}
item.Close(SaveMode: OlInspectorClose.olDiscard);
}
}
// This is some other file. Copy it as is.
else
{
Directory.CreateDirectory(Path.Combine(target, branch));
File.Copy(currentFile, Path.Combine(target, branch, fileName), true);
otherCounter++;
}
}
}
word.Quit(SaveChanges: false);
return new ConversionResult
{
EmailCount = emailCounter,
OtherCount = otherCounter
};
}
The save method looks likes this:
public static bool SaveAs(this MailItem mail, Word.Application word, string path, string name)
{
Directory.CreateDirectory(path);
name = Path.Combine(path, $"{Path.GetFileNameWithoutExtension(name)}.docx");
if (File.Exists(name))
{
return false;
}
var copy = mail.GetInspector.WordEditor as Word.Document;
copy.Content.Copy();
var doc = word.Documents.Add();
doc.Content.Paste();
doc.SaveAs2(FileName: name);
doc.Close();
return true;
}
It works for most *.msg files but there are some that crash Outlook when I call copy.Content on a Word.Document.
I know you cannot tell me what is wrong with it (or maybe you do?) so I'd like to findit out by myself but the problem is that I am not able to catch the exception. Since a simple try\catch didn't work I tried it with AppDomain.CurrentDomain.UnhandledException this this didn't catch it either.
Are there any other ways to debug it?
The mail that doesn't let me get its content inside a loop doesn't cause any troubles when I open it in a new Outlook window and save it with the same method.

It makes sense to add some delays between Word calls. IO operations takes some time to finish. Also there is no need to create another document in Word for copying the content:
var copy = mail.GetInspector.WordEditor as Word.Document;
copy.Content.Copy();
var doc = word.Documents.Add();
doc.Content.Paste();
doc.SaveAs2(FileName: name);
doc.Close();
Instead, do the required modifications on the original document instance and then save it to the disk. The original mail item will remain unchanged until you call the Save method from the Outlook object model. You may call the Close method passing the olDiscard which discards any changes to the document.
Also consider using the Open XML SDK if you deal with open XML documents only, see Welcome to the Open XML SDK 2.5 for Office for more information.

Do you actually need to use Inspector.WordEditor? You can save the message in a format supported by Word (such as MHTML) using OOM alone by calling MailItem.Save(..., olMHTML) and open the file in Word programmatically to save it in the DOCX format.

Replacing Invalid XML characters from an excel file and writing it back to disk causes file is corrupted error on opening in MS Excel

A little background on problem:
We have an ASP.NET MVC5 Application where we use FlexMonster to show the data in grid. The data source is a stored procedure that brings all the data into the UI grid, and once user clicks on export button, it exports the report to Excel. However, in some cases export to excel is failing.
Some of the data has some invalid characters, and it is not possible/feasible to fix the source as suggested here
My approach so far:
EPPlus library fails on initializing the workbook as the input excel file contains some invalid XML characters. I could find that the file is dumped with some invalid character in it. I looked into the possible approaches .
Firstly, I identified the problematic character in the excel file. I first tried to replace the invalid character with blank space manually using Notepad++ and the EPPlus could successfully read the file.
Now using the approaches given in other SO thread here and here, I replaced all possible occurrences of invalid chars. I am using at the moment
XmlConvert.IsXmlChar
method to find out the problematic XML character and replacing with blank space.
I created a sample program where I am trying to work on the problematic excel sheet.
//in main method
String readFile = File.ReadAllText(filePath);
string content = RemoveInvalidXmlChars(readFile);
File.WriteAllText(filePath, content);
//removal of invalid characters
static string RemoveInvalidXmlChars(string inputText)
{
StringBuilder withoutInvalidXmlCharsBuilder = new StringBuilder();
int firstOccurenceOfRealData = inputText.IndexOf("<t>");
int lastOccurenceOfRealData = inputText.LastIndexOf("</t>");
if (firstOccurenceOfRealData < 0 ||
lastOccurenceOfRealData < 0 ||
firstOccurenceOfRealData > lastOccurenceOfRealData)
return inputText;
withoutInvalidXmlCharsBuilder.Append(inputText.Substring(0, firstOccurenceOfRealData));
int remaining = lastOccurenceOfRealData - firstOccurenceOfRealData;
string textToCheckFor = inputText.Substring(firstOccurenceOfRealData, remaining);
foreach (char c in textToCheckFor)
{
withoutInvalidXmlCharsBuilder.Append((XmlConvert.IsXmlChar(c)) ? c : ' ');
}
withoutInvalidXmlCharsBuilder.Append(inputText.Substring(lastOccurenceOfRealData));
return withoutInvalidXmlCharsBuilder.ToString();
}
If I replaces the problematic character manually using notepad++, then the file opens fine in MSExcel. The above mentioned code successfully replaces the same invalid character and writes the content back to the file. However, when I try to open the excel file using MS Excel, it throws an error saying that file may have been corrupted and no content is displayed (snapshots below). Moreover, Following code
var excelPackage = new ExcelPackage(new FileInfo(filePath));
on the file that I updated via Notepad++, throws following exception
"CRC error: the file being extracted appears to be corrupted. Expected 0x7478AABE, Actual 0xE9191E00"}
My Questions:
Is my approach to modify content this way correct?
If yes, How can I write updated string to an Excel file?
If my approach is wrong then, How can I proceed to get rid of invalid XML chars?
Errors shown on opening file (without invalid XML char):
First Pop up
When I click on yes
Thanks in advance !

It does sounds like a binary (presumable XLSX) file based on your last comment. To confirm, open the file created by the FlexMonster with 7zip. If it opens properly and you see a bunch of XML files in folders, its a XLSX.
In that case, a search/replace on a binary file sounds like a very bad idea. It might work on the XML parts but might also replace legit chars in other parts. I think the better approach would be to do as #PanagiotisKanavos suggests and use ZipArchive. But you have to do rebuild it in the right order otherwise Excel complains. Similar to how it was done here https://stackoverflow.com/a/33312038/1324284, you could do something like this:
public static void ReplaceXmlString(this ZipArchive xlsxZip, FileInfo outFile, string oldString, string newstring)
{
using (var outStream = outFile.Open(FileMode.Create, FileAccess.ReadWrite))
using (var copiedzip = new ZipArchive(outStream, ZipArchiveMode.Update))
{
//Go though each file in the zip one by one and copy over to the new file - entries need to be in order
foreach (var entry in xlsxZip.Entries)
{
var newentry = copiedzip.CreateEntry(entry.FullName);
var newstream = newentry.Open();
var orgstream = entry.Open();
//Copy non-xml files over
if (!entry.Name.EndsWith(".xml"))
{
orgstream.CopyTo(newstream);
}
else
{
//Load the xml document to manipulate
var xdoc = new XmlDocument();
xdoc.Load(orgstream);
var xml = xdoc.OuterXml.Replace(oldString, newstring);
xdoc = new XmlDocument();
xdoc.LoadXml(xml);
xdoc.Save(newstream);
}
orgstream.Close();
newstream.Flush();
newstream.Close();
}
}
}
When it is used like this:
[TestMethod]
public void ReplaceXmlTest()
{
var datatable = new DataTable("tblData");
datatable.Columns.AddRange(new[]
{
new DataColumn("Col1", typeof (int)),
new DataColumn("Col2", typeof (int)),
new DataColumn("Col3", typeof (string))
});
for (var i = 0; i < 10; i++)
{
var row = datatable.NewRow();
row[0] = i;
row[1] = i * 10;
row[2] = i % 2 == 0 ? "ABCD" : "AXCD";
datatable.Rows.Add(row);
}
using (var pck = new ExcelPackage())
{
var workbook = pck.Workbook;
var worksheet = workbook.Worksheets.Add("source");
worksheet.Cells.LoadFromDataTable(datatable, true);
worksheet.Tables.Add(worksheet.Cells["A1:C11"], "Table1");
//Now similulate the copy/open of the excel file into a zip archive
using (var orginalzip = new ZipArchive(new MemoryStream(pck.GetAsByteArray()), ZipArchiveMode.Read))
{
var fi = new FileInfo(#"c:\temp\ReplaceXmlTest.xlsx");
if (fi.Exists)
fi.Delete();
orginalzip.ReplaceXmlString(fi, "AXCD", "REPLACED!!");
}
}
}
Gives this:
Just keep in mind that this is completely brute force. Anything you can do to make the file filter smarter rather then simply doing ALL xml files would be a very good thing. Maybe limit it to the SharedString.xml file if that is where the problem lies or in the xml files in the worksheet folders. Hard to say without knowing more about the data.

C# splitting large excel file to smaller files

I want to split one large Excel file to few smaller and accessible files.
I already tried to use this code but the files are not accessible:
using (System.IO.StreamReader sr = new System.IO.StreamReader("path"))
{
int fileNumber = 0;
while (!sr.EndOfStream)
{
int count = 0;
using (System.IO.StreamWriter sw = new System.IO.StreamWriter("other path" + ++fileNumber + ".xlsx"))
{
sw.AutoFlush = true;
while (!sr.EndOfStream && ++count < 20000)
{
sw.WriteLine(sr.ReadLine());
}
}
}
}
Any ideas?
thanks.

Files, other than text files, don't work this way. You can't simply cut at a certain point and obtain a working copy.
As for Excel files, you may look into the following tutorial, which illustrates how to automate Excel from C#:
https://support.microsoft.com/en-us/help/302084/how-to-automate-microsoft-excel-from-microsoft-visual-c--net
Basically, what you want to do is open your large Excel file, decide where you want to split it (every n rows, every n sheets and so on), read each portion and write into a newly created xlsx.

Outputing to .txt file

I am a little stuck. I have to use the getItems method and output it to PrintItems.txt, but I am not sure how to appoach this problem.
This is my GetItems method:
public string getItems()
{
string strout = "stock items " + "\n";
foreach (Stock s in stock)
{
strout = strout + s.ToString() + "\n";
}
return strout;
}
This is my PrintItems method:
string filename = "printitems.txt";
int count = 0;
using (StreamWriter sw = new StreamWriter(filename))
{
while (count < 5 && count < stock.Count)
{
Stock t = stock[count];
sw.WriteLine(t);
count++;
}
}
It doesn't work because it doesn't write to a file at all.

You code generally should work.
But since you haven't specified full path to the text file - it will be created in the same folder where your executable file is.
If you running it from Visual Studio - it should be in your_project\bin\Debug or your_project\bin\Release folder.

You could use:
File.WriteAllText(#"C:\dummy.txt", strout);
(As long as you don't expect the string to be massive - i.e. 10MB)
'File' uses System.IO

If you made no copy & paste error then there are 4 possibilities for your pheonomenon:
string filename = "printitems.txt";
int count = 0;
using (StreamWriter sw = new StreamWriter(filename))
{
while (count < 5 && count < stock.Count)
{
Stock t = stock[count];
sw.WriteLine(t.ToString());
count++;
}
}
You had no { after the streamwriter using line was something I saw first. The second thing is that Stock t normally can't be written directly but instead you need to do the same thing as when printing it out to the console.
Third: your code does not say anything about if stock is filled or not.
Fourth: The file: You should specify a directory (not only the filename) as else it can be that it is tried to create the file in a location where you have no permissions to create a file in (normally if you put no additional path info in the same path as where the application runs in is used [if you start from the visual studio itself then the appropraite bin path] as location where the file would be created).
Additionally: You should put a try catch block around the whole thing as there can be unexpected phenomenons which result in an exception.

getting URLs from PDF using PDFNet

We're using the PDFNet library to extract the contents of a PDF file. One of the things we need to do is extract the URLs in the PDF. Unfortunately, as you scan through the elements in the file, you get the URL in pieces, and it's not always clear which piece goes with which.
What is the best way to get complete URLs from PDFNet?

Links are stored on the pages as annotations. You can do something like the following code to get the URI from the annotation. The try/catch block is there because if any of the values are missing, they still return an Obj object, but you cannot call any method on it without it throwing.
Also, be aware that not everything that looks like a link is the same. We created two PDFs from the same Word file. The first we created with print to PDF. The second we created from within Acrobat.
The links in both files work fine with Acrobat Reader, but only the second file has annotations that PDFNet can see.
Page page = doc.GetPage(1);
for (int i = 1; j < page.GetNumAnnots(); j++) {
Annot annot = page.GetAnnot(i);
if (!annot.IsValid())
continue;
var sdf = annot.GetSDFObj();
string uri = ParseURI(sdf);
Console.WriteLine(uri);
}
private string ParseURI(pdftron.SDF.Obj obj) {
try {
if (obj.IsDict()) {
var aDictionary = obj.Find("A").Value();
var uri = aDictionary.Find("URI").Value();
return uri.GetAsPDFText();
}
} catch (Exception ) {
return null;
}
return null;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

DocumentItem.SaveAs results in corrupted file - c#

Related

How can I debug my add-in when WordEditor's Content property is crashing Outlook?

Replacing Invalid XML characters from an excel file and writing it back to disk causes file is corrupted error on opening in MS Excel

C# splitting large excel file to smaller files

Outputing to .txt file

getting URLs from PDF using PDFNet

Categories

Resources