race condition while working with file system - c#

I'm using a System.IO.FileSystemWatcher to get notified on file renaming inside a directory. This files are log files, created by a different process.
The event handler looks like this:
private async void FileRenamedHandler(object sender, RenamedEventArgs e)
{
//when file is renamed
//try to upload it to a storage
//if upload is succesful delete it from disk
}
all looks good until now but i need to add a second method that iterates through the directory when this application starts in order to upload existing log files to storage
so
public async Task UploadAllFilesInDirectory()
{
foreach (var file in Directory.GetFiles(_directoryPath))
{
await TryUploadLogAsync(file);
}
}
Problem is i get into race conditions like for example:
file has just been renamed and FileRenamedHandler is triggered but the same fill would be also parsed by UploadAllFilesInDirectory method. In this moment i may upload the same file twice or i would get an exception when trying to delete it from disk because it has been already deleted.
I can see more race condition cases with this code.
Any idea how i can solve this?
Thanks

You can use a ConcurrentDictionary to keep track of the items currently being processed, and let it worry about the thread safety.
Create the dictionary in which the key is the file path (or some other identifying object) and the value is...whatever. We're treating this as a set, not a dictionary, but there is no ConcurrentSet, so this will have to do.
Then for each file you have to process call TryAdd. If it returns true you added the object, and you can process the file. If it returns false then the file was there, and it's being processed elsewhere.
You can then remove the object when you're done processing it:
//store this somewhere
var dic = new ConcurrentDictionary<string, string>();
//to process each file
if (dic.TryAdd(path, path))
{
//process the file at "path"
dic.TryRemove(path, out path);
}

I would suggest to build a queue and store the files to be uploaded as some sort of job into the queue. If you process the items in the queue you can check the existence of every file before trying to upload it.

Related

Delete .xlsx or .pdf after closing file

I'm trying to delete .xlsx or .pdf files after using them. When files are created I display them, but then users want automatic file deletion after closing them.
I've tried couple of things, but none of them seem to work properly. Issue:
When opened multiple files (.xlsx or .pdf) I can't terminate a single process, like just a single file. Instead what happens is that file get's deleted only when I close all same processes (Excel or PDF files). As I investigated this happens because Excel or PDF works as one instance only. However code works as expected when I have only one file opened...
This is what I have so far:
var process= Process.Start(file_path); //file_path is global variable
Set_event(process);
private void Set_event(Process process)
{
process.EnableRaisingEvents = true;
process.Exited += new EventHandler(Delete_File);
}
public void Delete_File(object sender, EventArgs e)
{
//Delete file on close
File.Delete(file_path);
}
I've also tried with DeleteOnClose method of FileOptions, but unfortunally that doesn't display file to user and doesn't quite delete file immediately after using them, only after my win app is closed. That isn't my desired output, but at least files are deleted, so If I could fix that I would be partially satisfied too. Here is my line for that:
var open_file = new FileStream(file_path,FileMode.Open, FileAccess.ReadWrite,FileShare.ReadWrite, 512, FileOptions.DeleteOnClose);
With all that said, are there any other options I missed ? Thanks for help in advance.
I've tried almost everything I could find (different variations of Exited_Event for Process, monitoring with FileSystemWatcher, creating files with DeleteOnClose - even API), but none of them worked as expected.
Everything ends or fails with issue I described in first place - some apps, like Microsoft Excel or Adobe Acrobat uses one instance to open a file (.pdf or .xls/.xlsx), so you can't just reference a single file as object while you have opened more files. That means you either end up with an error when trying to assign Exited_event to single file, or no error but file gets deleted only when you close all files with same type...
BUT fortunate enough I figured out one thing: WHEN you have opened more than one file in question (.pdf or .xlsx) something happens in background of OS: If you loop through processes of same type at that time, you'll get a list of particular instance that is in use.
In other words, while you have 2 Excel files opened, loop through processes is showing you only a file which is currently active for "EXCEL" process.
So, that leaded me to a completely new approach that might solve this issue. In order to have a complete solution for this you have to:
1. Create a method to check whether file is no longer in use.
2. Set a Timer with a delay of 2 seconds, to make sure process really ends. Maybe this should be incremented for different purposes...
3. Set a Timer_tick event, where you loop processes to see whether particular file is listed as active, and If user has already closed this file. As described by other users this method isn't quite accurate, but with setting delay for Timer I think there shouldn't be any problems anymore.
Here is a complete code for this (for .pdf and .xlsx - that is what I needed):
//as global variable
System.Windows.Forms.Timer delete_file = new System.Windows.Forms.Timer();
Process.Start(file_path); //file_path is global variable
delete_file.Tick += new EventHandler(timer_Tick);
delete_file.Interval = (2000);
delete_file.Enabled = true;
delete_file.Start();
private void timer_Tick(object sender, EventArgs e)
{
Boolean file_is_opened = false;
// Loop processes and list active files in use
foreach (var process in Process.GetProcesses())
{
if (process.MainWindowTitle.Contains(Path.GetFileName(file_path)))
{
file_is_opened = true;
}
}
//If our file is not listed under active processes we check
//whether user has already closed file - If so, we finally delete It
if (file_is_opened==false)
{
if (!File_In_Use(new FileInfo(file_path)))
{
File.Delete(file_path);
delete_file.Enabled = false;
delete_file.Stop();
return;
}
}
}
private bool File_In_Use(FileInfo file)
{
//Method to check whether file is in use
FileStream stream = null;
try
{
//If file doesn't exist
if (!file.Exists)
{
return false;
}
stream = file.Open(FileMode.Open, FileAccess.ReadWrite, FileShare.None);
}
catch (IOException)
{
//File is unavailable:
//because someone writes to It, or It's being processed
return true;
}
finally
{
if (stream!=null)
{
stream.Close();
}
}
//File not locked
return false;
}
This is how I did It. It might not be a perfect solution, but that works for me on Win 10 with no errors so far.
If someone has a suggestion to fix upper code, please let me know. Otherwise I hope this will help someone in future as I noticed there were already some questions about this in past, with no proper answer.

detecting that a file is currently being written to

(I know It's a common problem but I couldn't find an exact answer)
I need to write a windows service that monitors a directory, and upon the arrival of a file, opens it, parses the text, does something with it and moves it to another directory afterwards. I used IsFileLocked method mentioned in this post to find out if a file is still been written. My problem is that I don't know how much it takes for another party to complete writing into the file. I could wait a few seconds before opening the file but this is not a perfect solution since I don't know in which rate is the file written to and a few seconds may not suffice.
here's my code:
while (true)
{
var d = new DirectoryInfo(path);
var files = d.GetFiles("*.txt").OrderBy(f => f);
foreach (var file in files)
{
if (!IsFileLocked(file))
{
//process file
}
else
{
//???
}
}
}
I think you might use a FileSystemWatcher (more info about it here: http://msdn.microsoft.com/it-it/library/system.io.filesystemwatcher(v=vs.110).aspx ).
Specificially you could hook to the OnChanged event and after it raises you can check IsFileLocked to verify if it's still being written or not.
This strategy should avoid you to actively wait through polling.

Is it possible, through FileSystemWatcher or something else, to be able to retrieve only the new entries of a text/log file that is being watched?

I have a folder where I am watching though FileSystemWatcher the contents of any log file, i.e. watching *.log within a particular directory. I'm now realizing that FileSystemWatcher is rather limited and will only be able to kick off events based off changes, however it doesn't return nor know of what changes were made. Is there a library that can control this somewhere or something else I should be using?
When I detect the changes to this DNS Log, I want to get the new entries that were added to the end of the file. I can't even get line numbers from StreamReader which I thought might of helped. What are my options?
You have several options.Read whole file on each change and each time compare new lines with previous ones.This would be overkill if the file is too large. If your file is changed by appending text to the end only, then you can Read the file for first time, then you can keep the line count, and in every change you can Skip previous lines and get new ones.And update the counter.For example:
private static int counter;
private static string[] currentLines;
private static void Main(string[] args)
{
FileSystemWatcher watcher = new FileSystemWatcher("myfile.txt");
watcher.Changed += fileChanged;
currentLines = File.ReadLines("myFile.txt").ToArray();
counter = currentLines.Length;
Console.ReadLine();
}
private static void fileChanged(object sender, FileSystemEventArgs e)
{
var temp = File.ReadLines("myFile.txt").Skip(counter).ToArray();
if (temp.Any())
{
currentLines = temp;
counter += temp.Length;
}
}
Right solution is to employ the filesystem filter driver and intercept file write operations thus getting the data being written right after (or even before) they reach the file.
You can write a filesystem driver yourself (this is quite tricky) or use our CallbackFilter library which includes a ready-to-use driver.
Also if the file is not opened when you detect the change or is opened in the mode that allows reading, you can read the known data from it as Selman22 described in his answer. Note, that it's a bad idea to read text lines - instead you need to read the written data as binary (and read only the data after the position that you have remembered during previous read) and split it to lines in your code.

Upload file still in use *sometimes* when trying to move

After an image is uploaded to my server, my code moves it into a specific folder given by the user details. Sometimes I think it tries to move the file too fast or the upload file is still in use so 9/10 the function won't perform the move.
Is there a way to add a 'wait' or a way to check if a file is in use and possibly perform a while loop until the file is allowed to be moved?
Current move function in my controller:
while (!File.Exists(uploadedPath))
{
}
File.Move(uploadedPath, savePath);
PS. I intend to add in a counter to ensure the while loop doesn't get stuck and has a timeout.
If you have control over the code receiving the file, I would update it to notify the moving code when the file is received completely. Alternatively I would move the file from there or even save the file where it should be eventually.
Otherwise, it will be a hack. You need
Try to move the file,
Catch the exception if it doesn't move
Use Thread.Sleep for a few sec
Go To 1
Something along the lines:
bool success = false;
for (var count = 0; !success && count < 10; ++count)
{
try
{
File.Move(uploadedPath, savePath);
success = true;
}
catch (IOException)
{
Thread.Wait(1000);
}
}
You also need to handle the situation when it cannot move the file at all. So it is a hack and should not be done in general if there are other ways to notify the moving code.
Also note:
From File.Move msdn:
If you try to move a file across disk volumes and that file is in use,
the file is copied to the destination, but it is not deleted from the
source.
which means that your file will remain in the received files directory after moving.
Are UploadFile and MoveFile 2 different components that are independent of each other. If so I don't think it's a good architecture. I would recommend a way so as to have the UploadFile pass the control to MoveFile once it's part is done. This way you can avoid multiple processes trying to access the same file.

How to Lock a file and avoid readings while it's writing

My web application returns a file from the filesystem. These files are dynamic, so I have no way to know the names o how many of them will there be. When this file doesn't exist, the application creates it from the database. I want to avoid that two different threads recreate the same file at the same time, or that a thread try to return the file while other thread is creating it.
Also, I don't want to get a lock over a element that is common for all the files. Therefore I should lock the file just when I'm creating it.
So I want to lock a file till its recreation is complete, if other thread try to access it ... it will have to wait the file be unlocked.
I've been reading about FileStream.Lock, but I have to know the file length and it won't prevent that other thread try to read the file, so it doesn't work for my particular case.
I've been reading also about FileShare.None, but it will throw an exception (which exception type?) if other thread/process try to access the file... so I should develop a "try again while is faulting" because I'd like to avoid the exception generation ... and I don't like too much that approach, although maybe there is not a better way.
The approach with FileShare.None would be this more or less:
static void Main(string[] args)
{
new Thread(new ThreadStart(WriteFile)).Start();
Thread.Sleep(1000);
new Thread(new ThreadStart(ReadFile)).Start();
Console.ReadKey(true);
}
static void WriteFile()
{
using (FileStream fs = new FileStream("lala.txt", FileMode.Create, FileAccess.Write, FileShare.None))
using (StreamWriter sw = new StreamWriter(fs))
{
Thread.Sleep(3000);
sw.WriteLine("trolololoooooooooo lolololo");
}
}
static void ReadFile()
{
Boolean readed = false;
Int32 maxTries = 5;
while (!readed && maxTries > 0)
{
try
{
Console.WriteLine("Reading...");
using (FileStream fs = new FileStream("lala.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
using (StreamReader sr = new StreamReader(fs))
{
while (!sr.EndOfStream)
Console.WriteLine(sr.ReadToEnd());
}
readed = true;
Console.WriteLine("Readed");
}
catch (IOException)
{
Console.WriteLine("Fail: " + maxTries.ToString());
maxTries--;
Thread.Sleep(1000);
}
}
}
But I don't like the fact that I have to catch exceptions, try several times and wait an inaccurate amount of time :|
You can handle this by using the FileMode.CreateNew argument to the stream constructor. One of the threads is going to lose and find out that the file was already created a microsecond earlier by another thread. And will get an IOException.
It will then need to spin, waiting for the file to be fully created. Which you enforce with FileShare.None. Catching exceptions here doesn't matter, it is spinning anyway. There's no other workaround for it anyway unless you P/Invoke.
i think that a right aproach would be the following:
create a set of string were u will save the current file name
so one thread would process the file at time, something like this
//somewhere on your code or put on a singleton
static System.Collections.Generic.HashSet<String> filesAlreadyProcessed= new System.Collections.Generic.HashSet<String>();
//thread main method code
bool filealreadyprocessed = false
lock(filesAlreadyProcessed){
if(set.Contains(filename)){
filealreadyprocessed= true;
}
else{
set.Add(filename)
}
}
if(!filealreadyprocessed){
//ProcessFile
}
Do you have a way to identify what files are being created?
Say every one of those files corresponds to a unique ID in your database. You create a centralised location (Singleton?), where these IDs can be associated with something lockable (Dictionary). A thread that needs to read/write to one of those files does the following:
//Request access
ReaderWriterLockSlim fileLock = null;
bool needCreate = false;
lock(Coordination.Instance)
{
if(Coordination.Instance.ContainsKey(theId))
{
fileLock = Coordination.Instance[theId];
}
else if(!fileExists(theId)) //check if the file exists at this moment
{
Coordination.Instance[theId] = fileLock = new ReaderWriterLockSlim();
fileLock.EnterWriteLock(); //give no other thread the chance to get into write mode
needCreate = true;
}
else
{
//The file exists, and whoever created it, is done with writing. No need to synchronize in this case.
}
}
if(needCreate)
{
createFile(theId); //Writes the file from the database
lock(Coordination.Instance)
Coordination.Instance.Remove[theId];
fileLock.ExitWriteLock();
fileLock = null;
}
if(fileLock != null)
fileLock.EnterReadLock();
//read your data from the file
if(fileLock != null)
fileLock.ExitReadLock();
Of course, threads that don't follow this exact locking protocol will have access to the file.
Now, locking over a Singleton object is certainly not ideal, but if your application needs global synchronization then this is a way to achieve it.
Your question really got me thinking.
Instead of having every thread responsible for file access and having them block, what if you used a queue of files that need to be persisted and have a single background worker thread dequeue and persist?
While the background worker is cranking away, you can have the web application threads return the db values until the file does actually exist.
I've posted a very simple example of this on GitHub.
Feel free to give it a shot and let me know what you think.
FYI, if you don't have git, you can use svn to pull it http://svn.github.com/statianzo/MultiThreadFileAccessWebApp
The question is old and there is already a marked answer. Nevertheless I would like to post a simpler alternative.
I think we can directly use the lock statement on the filename, as follows:
lock(string.Intern("FileLock:absoluteFilePath.txt"))
{
// your code here
}
Generally, locking a string is a bad idea because of String Interning. But in this particular case it should ensure that no one else is able to access that lock. Just use the same lock string before attempting to read. Here interning works for us and not against.
PS: The text 'FileLock' is just some arbitrary text to ensure that other string file paths are not affected.
Why aren't you just using the database - e.g. if you have a way to associate a filename with the data from the db it contains, just add some information to the db that specifies whether a file exists with that information currently and when it was created, how stale the information in the file is etc. When a thread needs some information, it checks the db to see if that file exists and if not, it writes out a row to the table saying it's creating the file. When it's done it updates that row with a boolean saying the file is ready to be used by others.
the nice thing about this approach - all your information is in 1 place - so you can do nice error recovery - e.g. if the thread creating the file dies badly for some reason, another thread can come along and decide to rewrite the file because the creation time is too old. You can also create simple batch cleanup processes and get accurate data on how frequently certain data is being used for a file, how often information is updated (by looking at the creation times etc). Also, you avoid having to do many many disk seeks across your filesystem as different threads look for different files all over the place - especially if you decide to have multiple front-end machines seeking across a common disk.
The tricky thing - you'll have to make sure your db supports row-level locking on the table that threads write to when they create files because otherwise the table itself may be locked which could make this unacceptably slow.

Categories

Resources