How to check if a file is in use? - c#

Is there any way to first test if a file is in use before attempting to open it for reading? For example, this block of code will throw an exception if the file is still being written to or is considered in use:
try
{
FileStream stream = new FileStream(fullPath, FileMode.Open, FileAccess.Read, FileShare.Read);
}
catch (IOException ex)
{
// ex.Message == "The process cannot access the file 'XYZ' because it is being used by another process."
}
I've looked all around and the best I can find is to perform some sort of polling with a try catch inside, and that feels so hacky. I would expect there to be something on System.IO.FileInfo but there isn't.
Any ideas on a better way?

"You can call the LockFile API function through the P/Invoke layer directly. You would use the handle returned by the SafeFileHandle property on the FileStream.
Calling the API directly will allow you to check the return value for an error condition as opposed to resorting to catching an exception."
"The try/catch block is the CORRECT solution (though you want to catch IOException, not all exceptions). There's no way you can properly synchronize, because testing the lock + acquiring the lock is not an atomic operation."
"Remember, the file system is volatile: just because your file is in one state for one operation doesn't mean it will be in the same state for the next operation. You have to be able to handle exceptions from the file system."
Using C# is it possible to test if a lock is held on a file
http://www.dotnet247.com/247reference/msgs/32/162678.aspx

Well a function that would try and do it would simply try catch in a loop. Just like with databases, the best way to find out IF you can do something is to try and do it. If it fails, deal with it. Unless your threading code is off, there is no reason that your program shouldn't be able to open a file unless the user has it open in another program.
Unless of course you're doing interesting things.

Related

How to use TransferUtility to upload multiple files

I am trying to make sense of the documentation for:
TransferUtility.UploadDirectory
The documentation does not describe the error condition of the upload. Typically I would guess something like System.Net.Http.HttpRequestException.
After reading multiple comments, it seems that S3 does not support TransactionScope. The only thing that seems to be supported is at file level:
Are writes to Amazon S3 atomic (all-or-nothing)?
and
https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html#ConsistencyModel
So my questions are:
Where can I find the error condition of UploadDirectory ?
Does it makes sense to use UploadDirectory since atomic operations are at file (object) level ?
My question is about uploading multiples files (ie. s3 'object'), not about doing a multi-parts upload of single file.
AFAIK, the best you can do here is wrap it in a try catch:
try
{
...
}
catch (AmazonS3Exception e)
{
// implement rollback operation
...
}
catch (Exception e)
{
// no possible rollback operation, abort program ?
...
}
You can keep track of progress using an UploadDirectoryProgressEvent. In the event of an error, if you want to clean up, you'd have to compare the progress, note the diffs, and take action as appropriate (e.g. by removing objects if you don't want to keep them in S3 and you want the entire operation to be atomic).
Pay special attention to the fact that:
var request = new TransferUtilityUploadDirectoryRequest
{
UploadFilesConcurrently = true,
};
Will have an impact on your rollback mechanism. Setting UploadFilesConcurrently to true imply that UploadDirectoryProgressArgs received in UploadDirectoryProgressEvent have a null value for CurrentFile:
https://github.com/aws/aws-sdk-net/issues/317
In which case you can only implement rollback in the case where you can delete the full remote directory.
Note also the documentation on multi-part uploads:
If a multipart upload is interrupted, TransferUtility will attempt to abort the multipart upload. Under certain circumstances (network outage, power failure, etc.), TransferUtility will not be able to abort the multipart upload. In this case, in order to stop getting charged for the storage of uploaded parts, you should manually invoke TransferUtility.AbortMultipartUploads() to abort the incomplete multipart uploads.
The documentation has examples of both tracking and aborting muultipart uploads.
As for your other question:
Does it makes sense to use UploadDirectory since atomic operations are at file (object) level ?
I'd say that depends. The code to upload an entire directory of files might be somewhat cleaner, but since you still have to possible track and clean up, you might as well process the files one by one.

Force a file to be closed

I am reading, then writing to a text file. I do this in multiple parts of my program. After I'm done writing, I always close it (I use streamreader/writer). There is usually about 3 seconds between the close and the next time it's opened.
However, the second time I need to write to the same file, I always get an access denied error because another process is using it. At no point is any other process ever using it, and restarting my program lets me read from it.
This is the open/write/close code:
System.IO.StreamWriter file = new System.IO.StreamWriter(saveFileLocation.Text);
file.WriteLine(account);
file.Close();
Assuming there is no multi-threading then the issue is with proper disposal. The correct way to dispose of a stream or in general types that implement IDisposable is to wrap them in a using statement. The using statement ensures proper disposal and uses a finally block to ensure that the stream is closed even in exceptional circumstances.
using(var file = new System.IO.StreamWriter(saveFileLocation.Text))
{
//do work...
file.WriteLine(account);
}//when file goes out of scope it will close
Do this for all your streams.
use using statement or try{ }finally{ file.Close(); }
Are you sure an exception isn't being thrown, preventing close from being called? Either way this is better code:
using (System.IO.StreamWriter file = new System.IO.StreamWriter(saveFileLocation.Text))
{
file.WriteLine(account);
}

do exceptions reduce performance?

My application traverses a directory tree and in each directory it tries to open a file with a particular name (using File.OpenRead()). If this call throws FileNotFoundException then it knows that the file does not exist. Would I rather have a File.Exists() call before that to check if file exists? Would this be more efficient?
Update
I ran these two methods in a loop and timed each:
void throwException()
{
try
{
throw new NotImplementedException();
}
catch
{
}
}
void fileOpen()
{
string filename = string.Format("does_not_exist_{0}.txt", random.Next());
try
{
File.Open(filename, FileMode.Open);
}
catch
{
}
}
void fileExists()
{
string filename = string.Format("does_not_exist_{0}.txt", random.Next());
File.Exists(filename);
}
Random random = new Random();
These are the results without the debugger attached and running a release build :
Method Iterations per second
throwException 10100
fileOpen 2200
fileExists 11300
The cost of a throwing an exception is a lot higher than I was expecting, and calling FileOpen on a file that doesn't exist seems much slower than checking the existence of a file that doesn't exist.
In the case where the file will often not be present it appears to be faster to check if the file exists. I would imagine that in the opposite case - when the file is usually present you will find it is faster to catch the exception. If performance is critical to your application I suggest that you benchmark both apporaches on realistic data.
As mentioned in other answers, remember that even in you check for existence of the file before opening it you should be careful of the race condition if someone deletes the file after your existence check but just before you open it. You still need to handle the exception.
No, don't. If you use File.Exists, you introduce concurrency problem. If you wrote this code:
if file exists then
open file
then if another program deleted your file between when you checked File.Exists and before you actually open the file, then the program will still throw exception.
Second, even if a file exists, that does not mean you can actually open the file, you might not have the permission to open the file, or the file might be a read-only filesystem so you can't open in write mode, etc.
File I/O is much, much more expensive than exception, there is no need to worry about the performance of exceptions.
EDIT:
Benchmarking Exception vs Exists in Python under Linux
import timeit
setup = 'import random, os'
s = '''
try:
open('does not exist_%s.txt' % random.randint(0, 10000)).read()
except Exception:
pass
'''
byException = timeit.Timer(stmt=s, setup=setup).timeit(1000000)
s = '''
fn = 'does not exists_%s.txt' % random.randint(0, 10000)
if os.path.exists(fn):
open(fn).read()
'''
byExists = timeit.Timer(stmt=s, setup=setup).timeit(1000000)
print 'byException: ', byException # byException: 23.2779269218
print 'byExists: ', byExists # byExists: 22.4937438965
Is this behavior truly exceptional? If it is expected, you should be testing with an if statement, and not using exceptions at all. Performance isn't the only issue with this solution and from the sound of what you are trying to do, performance should not be an issue. Therefore, style and a good approach should be the items of concern with this solution.
So, to summarize, since you expect some tests to fail, do use the File.Exists to check instead of catching exceptions after the fact. You should still catch other exceptions that can occur, of course.
It depends !
If there's a high chance for the file to be there (you know this for your scenario, but as an example something like desktop.ini) I would rather prefer to directly try to open it.
Anyway, in case of using File.Exist you need to put File.OpenRead in try/catch for concurrency reasons and avoiding any run-time exception but it would considerably boost your application performance if the chance for file to be there is low. Ostrich algorithm
Wouldn't it be most efficient to run a directory search, find it, and then try to open it?
Dim Files() as string = System.IO.Directory.GetFiles("C:\", "SpecificName.txt", IO.SearchOption.AllDirectories)
Then you would get an array of strings that you know exist.
Oh, and as an answer to the original question, I would say that yes, try/catch would introduce more processor cycles, I would also assume that IO peeks actually take longer than the overhead of the processor cycles.
Running the Exists first, then the open second, is 2 IO functions against 1 of just trying to open it. So really, I'd say the overall performance is going to be a judgment call on processor time vs. hard drive speed on the PC it will be running on. If you've got a slower processor, I'd go with the check, if you've got a fast processor, I might go with the try/catch on this one.
File.Exists is a good first line of defense. If the file doesn't exist, then you're guaranteed to get an exception if you try to open it. The existence check is cheaper than the cost of throwing and catching an exception. (Maybe not much cheaper, but a bit.)
There's another consideration, too: debugging. When you're running in the debugger, the cost of throwing and catching an exception is higher, because the IDE has hooks into the exception mechanism that increase your overhead. And if you've checked any of the "Break on thrown" checkboxes in Debug > Exceptions, then any avoidable exceptions become a huge pain point. For that reason alone, I would argue for preventing exceptions when possible.
However, you still need the try-catch, for the reasons pointed out by other answers here. The File.Exists call is merely an optimization; it doesn't save you from needing to catch exceptions due to timing, permissions, solar flares, etc.
I don't know about efficiency but I would prefer the File.Exists check. The problem is all the other things that could happen: bad file handle, etc. If your program logic knows that sometimes the file doesn't exist and you want to have a different behavior for existing vs. non-existing files, use File.Exists. If its lack of existence is the same as other file-related exceptions, just use exception handling.
Vexing Exceptions -- more about using exceptions well
Yes, you should use File.Exists. Exceptions should be used for exceptional situations not to control the normal flow of your program. In your case, a file not being there is not an exceptional occurrence. Therefore, you should not rely on exceptions.
UPDATE:
So everyone can try it for themselves, I'll post my test code. For non existing files, relying on File.Open to throw an exception for you is about 50 times worse than checking with File.Exists.
class Program
{
static void Main(string[] args)
{
TimeSpan ts1 = TimeIt(OpenExistingFileWithCheck);
TimeSpan ts2 = TimeIt(OpenExistingFileWithoutCheck);
TimeSpan ts3 = TimeIt(OpenNonExistingFileWithCheck);
TimeSpan ts4 = TimeIt(OpenNonExistingFileWithoutCheck);
}
private static TimeSpan TimeIt(Action action)
{
int loopSize = 10000;
DateTime startTime = DateTime.Now;
for (int i = 0; i < loopSize; i++)
{
action();
}
return DateTime.Now.Subtract(startTime);
}
private static void OpenExistingFileWithCheck()
{
string file = #"C:\temp\existingfile.txt";
if (File.Exists(file))
{
using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read))
{
}
}
}
private static void OpenExistingFileWithoutCheck()
{
string file = #"C:\temp\existingfile.txt";
using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read))
{
}
}
private static void OpenNonExistingFileWithCheck()
{
string file = #"C:\temp\nonexistantfile.txt";
if (File.Exists(file))
{
using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read))
{
}
}
}
private static void OpenNonExistingFileWithoutCheck()
{
try
{
string file = #"C:\temp\nonexistantfile.txt";
using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read))
{
}
}
catch (Exception ex)
{
}
}
}
On my computer:
ts1 = .75 seconds (same with or without debugger attached)
ts2 = .56 seconds (same with or without debugger attached)
ts3 = .14 seconds (same with or without debugger attached)
ts4 = 14.28 seconds (with debugger attached)
ts4 = 1.07 (without debugger attached)
UPDATE:
I added details on whether a dubgger was attached or not. I tested debug and release build but the only thing that made a difference was the one function that ended up throwing exceptions while the debugger was attached (which makes sense). Still though, checking with File.Exists is the best choice.
I would say that, generally speaking, exceptions "increase" the overall "performance" of your system!
In your sample, anyway, it is better to use File.Exists...
The problem with using File.Exists first is that it opens the file too. So you end up opening the file twice. I haven't measured it, but I guess this additional opening of the file is more expensive than the occasional exceptions.
If the File.Exists check improves the performance depends on the probability of the file existing. If it likely exists then don't use File.Exists, if it usually doesn't exist the the additional check will improve the performance.
The overhead of an exception is noticeable, but it's not significant compared to file operations.

How to Lock a file and avoid readings while it's writing

My web application returns a file from the filesystem. These files are dynamic, so I have no way to know the names o how many of them will there be. When this file doesn't exist, the application creates it from the database. I want to avoid that two different threads recreate the same file at the same time, or that a thread try to return the file while other thread is creating it.
Also, I don't want to get a lock over a element that is common for all the files. Therefore I should lock the file just when I'm creating it.
So I want to lock a file till its recreation is complete, if other thread try to access it ... it will have to wait the file be unlocked.
I've been reading about FileStream.Lock, but I have to know the file length and it won't prevent that other thread try to read the file, so it doesn't work for my particular case.
I've been reading also about FileShare.None, but it will throw an exception (which exception type?) if other thread/process try to access the file... so I should develop a "try again while is faulting" because I'd like to avoid the exception generation ... and I don't like too much that approach, although maybe there is not a better way.
The approach with FileShare.None would be this more or less:
static void Main(string[] args)
{
new Thread(new ThreadStart(WriteFile)).Start();
Thread.Sleep(1000);
new Thread(new ThreadStart(ReadFile)).Start();
Console.ReadKey(true);
}
static void WriteFile()
{
using (FileStream fs = new FileStream("lala.txt", FileMode.Create, FileAccess.Write, FileShare.None))
using (StreamWriter sw = new StreamWriter(fs))
{
Thread.Sleep(3000);
sw.WriteLine("trolololoooooooooo lolololo");
}
}
static void ReadFile()
{
Boolean readed = false;
Int32 maxTries = 5;
while (!readed && maxTries > 0)
{
try
{
Console.WriteLine("Reading...");
using (FileStream fs = new FileStream("lala.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
using (StreamReader sr = new StreamReader(fs))
{
while (!sr.EndOfStream)
Console.WriteLine(sr.ReadToEnd());
}
readed = true;
Console.WriteLine("Readed");
}
catch (IOException)
{
Console.WriteLine("Fail: " + maxTries.ToString());
maxTries--;
Thread.Sleep(1000);
}
}
}
But I don't like the fact that I have to catch exceptions, try several times and wait an inaccurate amount of time :|
You can handle this by using the FileMode.CreateNew argument to the stream constructor. One of the threads is going to lose and find out that the file was already created a microsecond earlier by another thread. And will get an IOException.
It will then need to spin, waiting for the file to be fully created. Which you enforce with FileShare.None. Catching exceptions here doesn't matter, it is spinning anyway. There's no other workaround for it anyway unless you P/Invoke.
i think that a right aproach would be the following:
create a set of string were u will save the current file name
so one thread would process the file at time, something like this
//somewhere on your code or put on a singleton
static System.Collections.Generic.HashSet<String> filesAlreadyProcessed= new System.Collections.Generic.HashSet<String>();
//thread main method code
bool filealreadyprocessed = false
lock(filesAlreadyProcessed){
if(set.Contains(filename)){
filealreadyprocessed= true;
}
else{
set.Add(filename)
}
}
if(!filealreadyprocessed){
//ProcessFile
}
Do you have a way to identify what files are being created?
Say every one of those files corresponds to a unique ID in your database. You create a centralised location (Singleton?), where these IDs can be associated with something lockable (Dictionary). A thread that needs to read/write to one of those files does the following:
//Request access
ReaderWriterLockSlim fileLock = null;
bool needCreate = false;
lock(Coordination.Instance)
{
if(Coordination.Instance.ContainsKey(theId))
{
fileLock = Coordination.Instance[theId];
}
else if(!fileExists(theId)) //check if the file exists at this moment
{
Coordination.Instance[theId] = fileLock = new ReaderWriterLockSlim();
fileLock.EnterWriteLock(); //give no other thread the chance to get into write mode
needCreate = true;
}
else
{
//The file exists, and whoever created it, is done with writing. No need to synchronize in this case.
}
}
if(needCreate)
{
createFile(theId); //Writes the file from the database
lock(Coordination.Instance)
Coordination.Instance.Remove[theId];
fileLock.ExitWriteLock();
fileLock = null;
}
if(fileLock != null)
fileLock.EnterReadLock();
//read your data from the file
if(fileLock != null)
fileLock.ExitReadLock();
Of course, threads that don't follow this exact locking protocol will have access to the file.
Now, locking over a Singleton object is certainly not ideal, but if your application needs global synchronization then this is a way to achieve it.
Your question really got me thinking.
Instead of having every thread responsible for file access and having them block, what if you used a queue of files that need to be persisted and have a single background worker thread dequeue and persist?
While the background worker is cranking away, you can have the web application threads return the db values until the file does actually exist.
I've posted a very simple example of this on GitHub.
Feel free to give it a shot and let me know what you think.
FYI, if you don't have git, you can use svn to pull it http://svn.github.com/statianzo/MultiThreadFileAccessWebApp
The question is old and there is already a marked answer. Nevertheless I would like to post a simpler alternative.
I think we can directly use the lock statement on the filename, as follows:
lock(string.Intern("FileLock:absoluteFilePath.txt"))
{
// your code here
}
Generally, locking a string is a bad idea because of String Interning. But in this particular case it should ensure that no one else is able to access that lock. Just use the same lock string before attempting to read. Here interning works for us and not against.
PS: The text 'FileLock' is just some arbitrary text to ensure that other string file paths are not affected.
Why aren't you just using the database - e.g. if you have a way to associate a filename with the data from the db it contains, just add some information to the db that specifies whether a file exists with that information currently and when it was created, how stale the information in the file is etc. When a thread needs some information, it checks the db to see if that file exists and if not, it writes out a row to the table saying it's creating the file. When it's done it updates that row with a boolean saying the file is ready to be used by others.
the nice thing about this approach - all your information is in 1 place - so you can do nice error recovery - e.g. if the thread creating the file dies badly for some reason, another thread can come along and decide to rewrite the file because the creation time is too old. You can also create simple batch cleanup processes and get accurate data on how frequently certain data is being used for a file, how often information is updated (by looking at the creation times etc). Also, you avoid having to do many many disk seeks across your filesystem as different threads look for different files all over the place - especially if you decide to have multiple front-end machines seeking across a common disk.
The tricky thing - you'll have to make sure your db supports row-level locking on the table that threads write to when they create files because otherwise the table itself may be locked which could make this unacceptably slow.

how to best wait for a filelock to release

I have an application where i sometimes need to read from file being written to and as a result being locked. As I have understood from other questions i should catch the IOException and retry until i can read.
But my question is how do i know for certain that the file is locked and that it is not another IOExcetpion that occurs.
When you open a file for reading in .NET it will at some point try to create a file handle using the CreateFile API function which sets the error code which can be used to see why it failed:
const int ERROR_SHARING_VIOLATION = 32;
try
{
using (var stream = new FileStream("test.dat", FileMode.Open, FileAccess.Read, FileShare.Read))
{
}
}
catch (IOException ex)
{
if (Marshal.GetLastWin32Error() == ERROR_SHARING_VIOLATION)
{
Console.WriteLine("The process cannot access the file because it is being used by another process.");
}
}
There's a useful discussion on google groups which you really should read. One of the options is close to darin's; however, to guarantee you get the right win32 error, you really should call the win32 OpenFile() API yourself (otherwise, you really don't know which error you are retrieving).
Another is to parse the error message: that will fail if your application is run on another language version.
A third option is to hack inside the exception class with reflection to fish out the actual HRESULT.
None of the alternatives are really that attractive: the IOException hierarchy would benefit from a few more subclasses IMHO.
To read data you can do:
using (FileStream fs = new
FileStream(fileName, FileMode.Open,
FileAccess.Read, FileShare.ReadWrite |
FileShare.Delete)) { .... }
and to save into file:
using (FileStream fs = new
FileStream(fileName, FileMode.Append,
FileAccess.Write, FileShare.Read |
FileShare.Delete)) { ... }
Flags at the end of constructors describes what other process can do with the file. It is fine, of course, if you control both write and read...
You may open it (as described by bezieur) then try to lock sections (or whole file) :
http://www.java2s.com/Code/VB/File-Directory/Lockandunlockafile.htm
Do you mean you are both reading and writing to the file? Or that an external application is writing to it.
If you are doing the reading and writing then I assume you're doing it on different threads in which case take a look at the ReaderWriteLock class which will do this the management for you, and allow you to provide timeouts.
http://msdn.microsoft.com/en-us/library/system.threading.readerwriterlock.aspx
Otherwise all you need to do is open the file in a read only mode. Then you shouldn't have any problems:
fileStream = new FileStream(fileName, FileMode.Open, FileAccess.Read));
You can compare against the type IOException to check and see if it is not something else
Such as
if (ex is FileNotFoundException)
You might want to look up the help on System.IO. A lot of the exceptions in that class inherit from IOException. Outside of checking to see if it is another type of exception you may have to look in the message of the description or you might look into making a Win32 API call into shell32.dll. There may be a function in there to check if a file is locked.
Also, if you absolutely need to wait you can use loop, but if you want to do other actions while waiting use an asynchronous thread.

Categories

Resources