SharpCompress unpacking locks files - c#

Im trying to make a process where I unpack a set of rar-files using SharpCompress and then delete them.
I first have a foreach loop which loops over the different set of files to be unpacked like this:
foreach (var package in extractionPackages.Where(package => Extractor.ExtractToFolder(package, destinationFolder)))
{
FileHelper.DeleteExtractionFiles(package);
}
The extraction process is taken straight of the SharpCompress Tests code and goes like this:
public static bool ExtractToFolder(ExtractionPackage extractionPackage, string extractionPath)
{
var fullExtractionPath = Path.Combine(extractionPath, extractionPackage.FolderName);
try
{
using (var reader = RarReader.Open(extractionPackage.ExtractionFiles.Select(p => File.OpenRead(p.FullPath))))
{
while (reader.MoveToNextEntry())
{
reader.WriteEntryToDirectory(fullExtractionPath, ExtractOptions.ExtractFullPath | ExtractOptions.Overwrite);
}
}
return true;
}
catch (Exception)
{
return false;
}
}
As you can see in the first code block I then call to delete the files but there I recieve an error due to that the files are locked by another process:
The process cannot access the file 'file.rar' because it is being used by another process.
If I try to put the deletion after the foreach-loop I am able to delete all but the last set of files if there are more than one. If it is just one the same issue occurs since the last set of files doesn't seems to be "unlocked".
How can I structure the code so that the files is unlocked?

By using an example from Nunrar and modifying it a bit I finally seems to have solved the issue:
public static bool ExtractToFolder(ExtractionPackage extractionPackage, string extractionPath)
{
var fullExtractionPath = Path.Combine(extractionPath, extractionPackage.FolderName);
try
{
using (var reader = RarReader.Open(GetStreams(extractionPackage)))//extractionPackage.ExtractionFiles.Select(p => File.OpenRead(p.FullPath)), Options.None))
{
while (reader.MoveToNextEntry())
{
reader.WriteEntryToDirectory(fullExtractionPath, ExtractOptions.ExtractFullPath | ExtractOptions.Overwrite);
}
}
return true;
}
catch (Exception)
{
return false;
}
}
private static IEnumerable<Stream> GetStreams(ExtractionPackage package)
{
foreach (var item in package.ExtractionFiles)
{
using (Stream input = File.OpenRead(item.FullPath))
{
yield return input;
}
}
}

By default (and probably a flaw) is that I leave the Streams open as I did not create them myself. However, the .NET framework usually breaks this rule with things like StreamReader etc. so it's expected that they'd be closed.
I'm thinking I'm going to change this behavior. In the meantime, you ought to be able to fix this yourself by calling this:
RarReader.Open(extractionPackage.ExtractionFiles.Select(p => File.OpenRead(p.FullPath)), Options.None)

I had a similar trouble. I used SharpCompress for adding files in archive and I need to delete files after archiving. I had tried wrap streams with using() {} - but this solution didn't worked in my case. When I added to my code Application.DoEvents() - it solved the problem.

Related

Handle network fail while copying files in C#

I am writing a C# app that copy files over a network, the problem is that the size of the files and folders to copy is more than 1 TB.
My method is as follows
public static void SubmitDocsToRepository(string p_FilePaths)
{
IEnumerable<(string,string)> directoryLevels = GetAllFolders(p_FilePaths);
IEnumerable<(string,string)> filesLevels = GetAllFiles(p_FilePaths);
foreach (var tuple in directoryLevels)
Folder copy logic
foreach (var tuple in filesLevels)
File copy logic
}
Which would work fine, but assuming something would happen to the network or remote server or the electric power gets lost for whatever reason what should I add to this code to allow me to continue where I left off, especially how could I retrace my steps to where I was.
It could be something like this:
public static void SubmitDocsToRepository(string p_FilePaths)
{
IEnumerable<(string, string)> directoryLevels = GetAllFolders(p_FilePaths);
IEnumerable<(string, string)> filesLevels = GetAllFiles(p_FilePaths);
foreach (var tuple in directoryLevels)
while (!CopyDirectory(tuple)) ;
foreach (var tuple in filesLevels)
while (!CopyFile(tuple)) ;
}
static bool CopyDirectory((string, string)tuple)
{
try
{
// Copy logic
}
catch
{
// Some logging here
return false;
}
return true;
}
static bool CopyFile((string, string) tuple)
{
try
{
// Copy logic
}
catch
{
// Some logging here
return false;
}
return true;
}

Fetching Custom Document Properties in Excel using C# is slow

I am fetching custom properties from woorkbook using this method:
var wb = Globals.ThisAddIn.Application.Workbooks.Open(file.FullName);
wb.Windows[1].Visible = false;
if (wb != null)
{
foreach (var prop in wb.CustomDocumentProperties)
{
try
{
Console.WriteLine(prop.Name.ToString());
Console.WriteLine(prop.Value.ToString());
Properties.Add(new CustomDocumentProperty { Key = prop.Name.ToString(), Value = prop.Value.ToString() });
}
catch (Exception)
{
Console.WriteLine();
}
}
wb.Close(false);
}
This method works, but the problem is that it is really slow in fetching the properties before executing the loop. So is there any way to speed this up? I have tried to look at other posts on this site, but I havent seen anyone mention this issue. Please let me know if I need to post anymore code. (Properties is a list of a custom class)

gsapi_init_with_args is made: -100

I'm trying to build a PostScript to PDF Converter using Ghostscript.Net.
The Args that GetArgs return, are the ones I usually use to call gswin32c.exe and they work fine.
But every time i call Process, i get an error Saying "An error occured when call to 'gsapi_init_with_args' is made: -100". Googling that error didn't bring anything up so I thought I might ask here.
Are there differnet arguments to consider when calling the .dll directly with Ghostscript.net? Or did I made a mistake somewhere else?
Here's my class:
public class PdfConverter
{
#region Private Fields
private List<GhostscriptVersionInfo> _Versions = GhostscriptVersionInfo.GetInstalledVersions(GhostscriptLicense.GPL | GhostscriptLicense.AFPL | GhostscriptLicense.Artifex);
#endregion
#region Private Properties
private GhostscriptVersionInfo Version { get; set; }
#endregion
#region Construction
public PdfConverter()
{
Version = GhostscriptVersionInfo.GetLastInstalledVersion();
}
#endregion
#region Public Members
public bool ConvertToPdf(DirectoryInfo dir)
{
var d = dir;
if(!d.Exists)
return false;
var postScriptFiles = d.GetFiles("*.ps");
var pdfFiles = postScriptFiles.Select(psf => new FileInfo(Path.ChangeExtension(psf.FullName, ".pdf")));
foreach(var file in postScriptFiles) {
//ThreadPool.QueueUserWorkItem(new WaitCallback((o) => {
Process(file, new FileInfo(Path.ChangeExtension(file.FullName, ".pdf")));
//}));
}
pdfFiles.ForEach(pdf => pdf?.Refresh());
return pdfFiles.All(pdf => pdf.Exists);
}
#endregion
#region Private Helpers
private void Process(FileInfo inputFile, FileInfo outputFile)
{
Console.WriteLine($"Converting {inputFile} to {outputFile}");
var proc = new GhostscriptProcessor(Version, true);
proc.Process(GetArgs(inputFile, outputFile).ToArray(), new ConsoleStdIO(true, true, true));
}
private IEnumerable<string> GetArgs(FileInfo inputFile, FileInfo outputFile)
{
return new [] {
$"-q ",
$"-sDEVICE=pdfwrite",
$"-dSAFER",
$"-dNOPAUSE",
$"-dBATCH",
$"-sPAPERSIZE=a4",
$"-dEmbedAllFonts=true",
$"-dAutoRotatePages=/None",
$"-sOutputFile=\"{outputFile.FullName}\"",
$"-dCompatibilityLevel#1.4",
$"-c .setpdfwrite",
$"-f \"{inputFile.FullName}\""
};
}
#endregion
}
Edit:
I forgot to mention: To implement it i had to make my own GhostsdcriptStdIO class. I admit that I'm not entirely sure if I did this right. Although it does get instanciated without exceptions, and override StdOut(...) get's called, and the output is written to the console as expected. override void StdError(...) get's called as well. And also written to the console as expeted.
The Output of the error btw is:
"**** Could not open the file "c:\temp\test.pdf""
"**** Unable to open the initial device, quitting."
Here's my ConsoleStdIO class:
public class ConsoleStdIO : Ghostscript.NET.GhostscriptStdIO
{
#region Construction
public ConsoleStdIO(bool handleStdIn, bool handleStdOut, bool handleStdError) : base(handleStdIn, handleStdOut, handleStdError) { }
#endregion
#region Overrides
public override void StdError(string error)
{
var foo = Encoding.Default.GetBytes(error);
var lenght = foo.Length;
using (var err = Console.OpenStandardError()) {
if(err.CanWrite)
err.Write(foo, 0, lenght);
}
}
public override void StdIn(out string input, int count)
{
byte[] bytes = new byte[0];
using(var stdInput = Console.OpenStandardInput()) {
stdInput.Read(bytes, 0, count);
}
input = Encoding.Default.GetString(bytes);
}
public override void StdOut(string output)
{
var foo = Encoding.Default.GetBytes(output);
var lenght = foo.Length;
using (var err = Console.OpenStandardError()) {
if(err.CanWrite)
err.Write(foo, 0, lenght);
}
}
#endregion
}
Again: doing the same operation with the exact same files and arguments using gswin32c.exe works fine.
Happy Hacking
Error -100 is gs_error_Fatal, which means 'something catastrophic went wrong'. Its an indication that the program failed to start up properly and we can't tell why. The back channel may contain more information.
And indeed, the back channel tells you what's wrong:
**** Could not open the file "c:\temp\test.pdf
**** Unable to open the initial device, quitting.
Ghostscript is unable to open the output file, which means it can't open the pdfwrite device (because that requires an output file) so it aborts the operation.
There could be a number of reasons why Ghostscript can't open the output file. The first thing I'd do is trim down the number of arguments;
You don't want -q (quiet) when you are trying to debug a problem, you want all the information you can get.
I'd remove -dSAFER at least to start with, because that prevents Ghostscript accessing directories outside the current working directory and certain 'special' ones. It may well prevent you accessing the temp directory.
You don't need to set EmbedAllFonts when its the same value as the default.
You could drop the CompatibilityLevel (and note that you've used a # there instead of an =) switch, and the AutoRotatePages while getting this to work.
The "-c .setpdfwrite -f" string has been pointless for years but people still keep using it. All that does these days is slow down the start of processing, ditch it.
Finally you can try changing the backslash ('\') characters to forward slash ('/') in case your string handling is messing that up, or use double backslashes (I'd use the forward slash myself).
You should also check that c:\test\temp.pdf doesn't exist, or if it does exist is not read-only or already open in a different application.
So I solved the problem...
After taking KenS' advice I could run the application without Ghostscript (not Ghostscript.NET) giving me any errors. But it did not produce an actual PDF File.
So KenS's answer did not quite solve the problem, but since 'less is more' and since he took the time to talk to me on IRC to verify that my args in itself were correct, I'll give the answer points nonetheless.
What actually solved my was the following:
Here my original GetArgs(...)
private IEnumerable<string> GetArgs(FileInfo inputFile, FileInfo outputFile)
{
return new [] {
$"-sDEVICE=pdfwrite",
$"-dNOPAUSE",
$"-dBATCH",
$"-sPAPERSIZE=a4",
#"-sFONTPATH=" + System.Environment.GetFolderPath(System.Environment.SpecialFolder.Fonts),
$"-sOutputFile={outputFile.FullName}",
$"{inputFile.FullName}",
};
}
Someone in #csharp pointed out to me, that in c, the first argument is always the name of the command. So he suggested to just put "gs" as the first argument (as a dummy) and try... And that's what actually solved my problem.
So this is how the working GetArgs(...) looks:
private IEnumerable<string> GetArgs(FileInfo inputFile, FileInfo outputFile)
{
return new [] {
$"gs",
$"-sDEVICE=pdfwrite",
$"-dNOPAUSE",
$"-dBATCH",
$"-sPAPERSIZE=a4",
#"-sFONTPATH=" + System.Environment.GetFolderPath(System.Environment.SpecialFolder.Fonts),
$"-sOutputFile={outputFile.FullName}",
$"{inputFile.FullName}",
};
}

Move outlook item to archive

I have a C# program (actually it's just a C# library that is being used by NUnit) that I wish to modify slightly which is based off of this article: How to Programmatically move items in outlook. I'm currently faced with a folder that has bout 3500 messages all around 350kb and is taking FOREVER to move to my archive so I can send and receive emails again (since my inbox is currently at 1.5Gb out of 500Mb... lol) but for the life of me I can't figure out how to get my archive folder. I'm a little bit multitasking since I'm at work so I can edit as I go. So if you have any code readily available that finds the archive folder that would be great. Thank you
EDIT
ok to show that I do have some work in progress (based on negative feedback) here is the code I have in place right now (since yes I know I have a give me teh codez)
here is my NUnit test case that looks at a folder and gives me specific information
[Test]
public void CheckMessages()
{
List<EmailMessage> messages = new List<EmailMessage>();
using (var target = new EmailMessageProvider())
{
messages.AddRange(target.GetEmailMessages("UnexpectedErrors\\NotFindImage"));
}
Dictionary<int, string> asdf = new Dictionary<int, string>();
foreach (var item in messages)
{
var line = item.Body.Split(new string[] { Environment.NewLine }, StringSplitOptions.None)[2];
var revisionId = int.Parse(Regex.Match(line, #"\-*\d+").Value);
var path = line.Substring(line.IndexOf("\\\\"));
if (asdf.ContainsKey(revisionId))
{
Assert.That(path, Is.EqualTo(asdf[revisionId]));
}
else
{
asdf.Add(revisionId, path);
}
}
foreach (var item in asdf.OrderBy(x => x.Key))
{
Console.WriteLine($"{item.Key} {item.Value}");
}
}
I use the same class to find messages (in another test) and move it to that subfolder which that test is using.
here is the code I have that does the moving
public void MoveSurveyPrintComponentsNotFound()
{
var destination = _baseFolder.Folders["UnexpectedErrors"].Folders["NotFindImage"];
foreach (var mailItem in _baseFolder.Folders["UnexpectedErrors"].Items.OfType<MailItem>())
{
mailItem.UseMailItem(x =>
{
if (x.Body.Contains("Foobar.Common.Exceptions.ImageNotFound"))
x.Move(destination);
});
}
}
EDIT 2
looks like I may have just about got it. I found that in the MAPI Namspace one of the subfolders is Archives. I'm going to try to change a few of the variables and see if it moves. Problem is just checking one folder takes over 31 seconds. oh well. better than never.
I figured it out. It wasn't as hard as I had thought either so I'll share what i have just incase someone else has this problem. In my program I did 2 things. One was to set _basefolder as my default email address's Folder. Second was to to set _mapi to the Outlook.GetNamespace("MAPI"). Those two thinsg I already had in my constructor.
private readonly OutlookApplication _outlook;
private readonly NameSpace _mapi;
private MAPIFolder _baseFolder;
public EmailMessageProvider()
{
_outlook = new OutlookApplication();
_mapi = _outlook.GetNamespace("MAPI");
_baseFolder = _mapi.Folders["robert#defaultEmail.com"];
}
Archives works just like any other MAPIFolder so it's just a matter of getting said folder. For me it was in _mapi.Folders["Archive"]. I would imagine this is fairly standard so if you copy and paste it should work just fine.
So now to list out all of my emails I want to go through and move tham appropriatly.
public void MoveSpecificEmailsToArchives()
{
var destination = _mapi.Folders["Archives"];
foreach (var mailItem in _baseFolder.Folders["Unexpected Error"].Items.OfType<MailItem>())
{
mailItem.UseMailItem(x =>
{
if (x.Body.Contains("offensiveProgram.exe ERROR "))
x.Move(destination);
});
}
Release(destination);
}
fyi the UseMailItem is an extension method. Looks like this
public static void UseMailItem(this MailItem item, Action<MailItem> mailItemAction)
{
mailItemAction(item);
Marshal.ReleaseComObject(item);
}

Adding AsParallel() call cause my code to break on writing a file

I'm building a console application that have to process a bunch of document.
To stay simple, the process is :
for each year between X and Y, query the DB to get a list of document reference to process
for each of this reference, process a local file
The process method is, I think, independent and should be parallelized as soon as input args are different :
private static bool ProcessDocument(
DocumentsDataset.DocumentsRow d,
string langCode
)
{
try
{
var htmFileName = d.UniqueDocRef.Trim() + langCode + ".htm";
var htmFullPath = Path.Combine("x:\path", htmFileName;
missingHtmlFile = !File.Exists(htmFullPath);
if (!missingHtmlFile)
{
var html = File.ReadAllText(htmFullPath);
// ProcessHtml is quite long : it use a regex search for a list of reference
// which are other documents, then sends the result to a custom WS
ProcessHtml(ref html);
File.WriteAllText(htmFullPath, html);
}
return true;
}
catch (Exception exc)
{
Trace.TraceError("{0,8}Fail processing {1} : {2}","[FATAL]", d.UniqueDocRef, exc.ToString());
return false;
}
}
In order to enumerate my document, I have this method :
private static IEnumerable<DocumentsDataset.DocumentsRow> EnumerateDocuments()
{
return Enumerable.Range(1990, 2020 - 1990).AsParallel().SelectMany(year => {
return Document.FindAll((short)year).Documents;
});
}
Document is a business class that wrap the retrieval of documents. The output of this method is a typed dataset (I'm returning the Documents table). The method is waiting for a year and I'm sure a document can't be returned by more than one year (year is part of the key actually).
Note the use of AsParallel() here, but I never got issue with this one.
Now, my main method is :
var documents = EnumerateDocuments();
var result = documents.Select(d => {
bool success = true;
foreach (var langCode in new string[] { "-e","-f" })
{
success &= ProcessDocument(d, langCode);
}
return new {
d.UniqueDocRef,
success
};
});
using (var sw = File.CreateText("summary.csv"))
{
sw.WriteLine("Level;UniqueDocRef");
foreach (var item in result)
{
string level;
if (!item.success) level = "[ERROR]";
else level = "[OK]";
sw.WriteLine(
"{0};{1}",
level,
item.UniqueDocRef
);
//sw.WriteLine(item);
}
}
This method works as expected under this form. However, if I replace
var documents = EnumerateDocuments();
by
var documents = EnumerateDocuments().AsParrallel();
It stops to work, and I don't understand why.
The error appears exactly here (in my process method):
File.WriteAllText(htmFullPath, html);
It tells me that the file is already opened by another program.
I don't understand what can cause my program not to works as expected. As my documents variable is an IEnumerable returning unique values, why my process method is breaking ?
thx for advises
[Edit] Code for retrieving document :
/// <summary>
/// Get all documents in data store
/// </summary>
public static DocumentsDS FindAll(short? year)
{
Database db = DatabaseFactory.CreateDatabase(connStringName); // MS Entlib
DbCommand cm = db.GetStoredProcCommand("Document_Select");
if (year.HasValue) db.AddInParameter(cm, "Year", DbType.Int16, year.Value);
string[] tableNames = { "Documents", "Years" };
DocumentsDS ds = new DocumentsDS();
db.LoadDataSet(cm, ds, tableNames);
return ds;
}
[Edit2] Possible source of my issue, thanks to mquander. If I wrote :
var test = EnumerateDocuments().AsParallel().Select(d => d.UniqueDocRef);
var testGr = test.GroupBy(d => d).Select(d => new { d.Key, Count = d.Count() }).Where(c=>c.Count>1);
var testLst = testGr.ToList();
Console.WriteLine(testLst.Where(x => x.Count == 1).Count());
Console.WriteLine(testLst.Where(x => x.Count > 1).Count());
I get this result :
0
1758
Removing the AsParallel returns the same output.
Conclusion : my EnumerateDocuments have something wrong and returns twice each documents.
Have to dive here I think
This is probably my source enumeration in cause
I suggest you to have each task put the file data into a global queue and have a parallel thread take writing requests from the queue and do the actual writing.
Anyway, the performance of writing in parallel on a single disk is much worse than writing sequentially, because the disk needs to spin to seek the next writing location, so you are just bouncing the disk around between seeks. It's better to do the writes sequentially.
Is Document.FindAll((short)year).Documents threadsafe? Because the difference between the first and the second version is that in the second (broken) version, this call is running multiple times concurrently. That could plausibly be the cause of the issue.
Sounds like you're trying to write to the same file. Only one thread/program can write to a file at a given time, so you can't use Parallel.
If you're reading from the same file, then you need to open the file with only read permissions as not to put a write lock on it.
The simplest way to fix the issue is to place a lock around your File.WriteAllText, assuming the writing is fast and it's worth parallelizing the rest of the code.

Categories

Resources