Is "sequential" file I/O with System.IO.File helper methods safe? - c#

I just saw this question: Is it safe to use static methods on File class in C#?. To summarize OP has an IOException because file is in use in this ASP.NET code snippet:
var text= File.ReadAllText("path-to-file.txt");
// Do something with text
File.WriteAllText("path-to-file.txt");
My first thought has been it's a simple concurrent access issue because of multiple ASP.NET overlapping requests. Something I'd solve centralizing I/O into a synchronized thread-safe class (or dropping files in favor of something else). I read both answers and when I was about to downvote one of them then I saw who those users are and I thought what the h* and stopped.
I'll cite them both (then please refer to original answers for more context).
For this OP paragraph:
I am guessing that the file read operation sometimes is not closing the file before the write operation happens [...]
An answer says:
Correct. File systems do not support atomic updates well [...] Using FileStream does not help [...] File has no magic inside. It just uses FileStream wrapped for your convenience.
However I don't see any expectancy for an atomic operation (read + subsequent write) and parallel (because of partially overlapping multi-threaded requests) may cause concurrent accesses. Even an atomic I/O operation (read + write) will have exactly same issue. OK FileStream may be asynchronous but it's not how File.ReadAllText() and File.WriteAllText() use it.
The other answer made me much more perplex, it says:
Although according to the documentation the file handle is guaranteed to be closed by this method, even if exceptions are raised, the timing of the closing is not guaranteed to happen before the method returns: the closing could be done asynchronously.
What? MSDN says method will open, read and close file (also in case of exceptions). Is it ever possible that such method will close file asynchronously? Will OS defer CloseHandle()? In which cases? Why?
In short: is it just a misunderstanding or CloseHandle() is asynchronous? I'm missing something extremely important?

If you look at the CloseHandle documentation, it states that each method which opens a handle has a description of how it should be closed:
The documentation for the functions that create these objects
indicates that CloseHandle should be used when you are finished with
the object, and what happens to pending operations on the object after
the handle is closed. In general, CloseHandle invalidates the
specified object handle, decrements the object's handle count, and
performs object retention checks. After the last handle to an object
is closed, the object is removed from the system.
When you look at the CreateFile docs, this is what it says:
When an application is finished using the object handle returned by
CreateFile, use the CloseHandle function to close the handle. This not
only frees up system resources, but can have wider influence on things
like sharing the file or device and committing data to disk.
I would find it peculiar that CloseHandle would yield that the underlying handle is closed while asynchronously retaining the file for additional checks. This would weaken many guarantees the OS makes to the callers, and would be a source for many bugs.

The first two quotes in your question are not supposed to be related. When File.* is done, or when you close a FileStream, the file is unlocked immediately. There never is any kind of "lingering". If there was you could never safely access the same file again without rebooting.
May answer assumes that the code in the question is being run multiple times in parallel. If not, that code is clearly safe.
However I don't see any expectancy for an atomic operation ... Even an atomic I/O operation (read + write) will have exactly same issue.
That's true. I don't know why I made a statement about that in my answer (it's correct, though. Just not relevant).
the timing of the closing is not guaranteed to happen before the method returns: the closing could be done asynchronously.
I don't know why he said that because it's not correct under any circumstances that I can think of. Closing a handle has an immediate effect.
I think your understanding of the situation is completely accurate. Apparently, our answers were unclear and slightly misleading... Sorry about that.

Related

What to do when lost stream blocks a file?

I've had this problem several times and until now not found a satisfying solution for it (restarting the computer is quite annoying if it takes 15 min to do so...):
When programming something with files, you have to use filestreams. The problem with them is (at least in C#) that they need to release the file again before you can access it from another place. As this is of course a good idea most of the time, it happened to me a few times that I forgot to release the file in the process of programming and debugging or the program crashed before the stream could be closed.
Is there any way to find and kill those streams using windows features or something like that? Problems like that occured in C# as well as in C++ (or C, I am not sure anymore).
From the answers I read, that this should not occur when the stream is handled properly. But what if I was to dumb to handle it right and the stream is not closed properly (because of whatever reason)? Is there a way to fix this while the PC is running?
This should not be a problem.
When your application is running normally, you use a using block to ensure that unmanaged resources like these are properly released. This is necessary for any object that implements the IDisposable interface.
If that fails, the operating system will jump in and save your bacon. The OS automatically releases any file handles that a process had open when it terminates, so even if your application crashes, there is no issue.
Anything that implements the IDisposable interface (which includes Streams) should always be either enclosed in a using block or be a member of a class which in turn implements IDisposable itself (and the new Dispose() method will also call the member's Dispose() method). There ought to be a compiler warning for this, imo.
Do this, and your file locks will be released properly.

After Directory.Delete the Directory.Exists returning true sometimes?

I have very weird behavior. I have,
Directory.Delete(tempFolder, true);
if (Directory.Exists(tempFolder))
{
}
Sometimes Directory.Exists return true. Why? May be the explorer is open?
Directory.Delete calls the Windows API function RemoveDirectory. The observed behavior is documented:
The RemoveDirectory function marks a directory for deletion on close. Therefore, the directory is not removed until the last handle to the directory is closed.
The .NET documentation is unfortunately missing this information. Whether the static Directory.Delete method opens a handle to the directory is not documented. Likewise, if it does, it is not documented when the handle is closed.
Without any of this information, the best you can do is to poll for completion:
Directory.Delete(tempFolder, true);
while (Directory.Exists(tempFolder)) Thread.Sleep(0);
// At this point the directory has been removed from the filesystem
Even though polling should generally be avoided in preference of events, installing a filesystem watcher would be a bit over the top for this. Still, keep in mind, that this operation does not come for free, particularly when dealing with a network drive.
Update: With .NET's Reference Source available, the implementation of Directory.Delete can be inspected. The first action of this method is to iterate over all files and delete them. The iteration is implemented using FindFirstFile/FindNextFile. The returned handle is stored as a SafeFindHandle, a concrete subclass of SafeHandle. As the documentation points out, the native handle is freed through a concrete ReleaseHandle override. ReleaseHandle is called from a (postponed) critical finalizer. Since finalization is non-deterministic, this explains the open handle, responsible for the delayed directory delete.
This information, however, does not help in finding a better solution than the one described above (polling for completion).
Other answers to this question did not identify the core issue, and work by coincidence at best. BanksySan's answer adds unrelated code that introduces a delay to allow time for open handles to be closed. Byeni's answer is closer, yet still off: When he talks about the object referencing the directory he almost nails it. However, the object referencing the directory is called a handle, a native resource. Native resources are disposed of in finalizers, and GC.Collect() does not run finalizers. This, too, appears to work by buying extra time.
Use DirectoryInfo instead, and call Refresh() on that.
var dir = new DirectoryInfo(tempFolder);
dir.Delete();
dir.Refresh();
Because we are performing many operations on the directory, it is more performant to use DirectoryInfo rather that Directory. This probably explains why there is no Refresh() on the static class, it is meant for one off operations and so would never need to be refreshed.
If might be worth adding a Thread.Sleep(0) after the refresh to relinquish the thread and put it to the back of the pool. Haven't tested that though, it's just a musing.

Check if a file Exists async?

I wish there was a File.ExistsAsync()
I have:
bool exists = await Task.Run(() => File.Exists(fileName));
Using a thread for this feels like an antipattern.
Is there a cleaner way?
There is no cleaner way than your solution.
The problems of race conditions aside I believe your solution can be used in some situations.
e.g.
I have static file content in many different folders. (in my case cshtml views,script files, css files, for mvc)
These files (which do not change much, during application execution) are always checked for in every request to the webserver, due to my application architecture, there are alot more places that files are checked for than in the default mvc application. So much so that file.exists takes up quite a portion of time each request.
so race conditions will generally not happen. The only interesting question for me is performance
starting a task with Task.Factory.StartNew() takes 0.002 ms (source Why so much difference in performance between Thread and Task?)
calling file.exists takes "0.006255ms when the file exists and 0.010925ms when the file does not exist." [Richard Harrison]
so by simple math calling the async File.Exists takes 0.008 ms up to 0.012 ms
in the best case async File.Exists takes 1.2 times as long as File.Exists and in the worst case it takes 1.3 times as long. (in my case most paths that are searched do not exist) so most of the time a File.Exists is mostly close to 0.01 ms
so it is not that much overhead, and you can utilize multiple cores/ harddisk controllers etc. more efficiently. With these calculations you can see that asynchroniously checking for existence of 2 files you will already have a performance increase of 1.6 in the worst case (0.02/ 0.012 )
well i'm just asyning async File.Exists is worth it in specific situations.
caveats of my post:
i might have not calculated everything correctly
i rounded alot
i did not measure performance on a single pc
i took performance from other posts
i just added the time of File.Exists and Task.Factory.StartNew() (this may be wrong)
i disregard alot of sideffects of multithreading
Long time since this thread, but I found it today...
ExistsAsync should definitely be a thing. In fact, in UWP, you have to use Async methods to find out if a file exists, as it could take longer than 50ms (anything that 'could' take longer than 50ms should be async in UWP language).
However, this is not UWP. The reason I need it is to check for folder.exists which if on a network share, remote disk, or idle disk would block the UI. So I can put all the messages like "checking..." but the UI wouldn't update without aysnc (or ViewModel, or timers, etc.)
bool exists = await Task.Run(() => File.Exists(fileName)); works perfectly. In my code, I have both (Exists and ExistsAsync) so that I can run Exists() when running on a non UI thread and don't have to worry about the overhead.
There isn't a File.ExistsAsync probably for good reason; because it makes absolutely no sense to have one; File.Exists is not going to take very long; I tested it as 0.006255ms when the file exists and 0.010925ms when the file does not exist.
There are a few times when it is sensible to call File.Exists; however usually I think the correct solution would be to open the file (thus preventing deletion), catching any exceptions - as there is no guarantee that the file will continue to exist after the call to File.Exists.
When you want to create a new file and not overwrite the old one :
File.Open("fn", FileMode.CreateNew)
For most of the use cases I can think of File.Open() (whether for existing or create new) is going to be better because once the call succeeds you will have a handle to the file and be able to do something with it. Even when using the file's existence as a flag I think I'd still open and close it. The only time I've really used File.Exists is when checking to see if a local HTML file is there before calling the browser so I can show a nice error message when it isn't.
The is no guarantee that something else won't delete the file after File.Exists; so if you did open it after checking with File.Exists the open call could still fail.
In my tests using a FileExists on network drive takes longer than File.Open, File.Exists takes 1.5967ms whereas File.OpenRead takes 0.3927ms)
Maybe if you could expand upon why you're doing this we'd be better able to answer; until then I'd say that you shouldn't do this

Does FileStream.Dispose close the file immediately?

I have some code that writes a file by saving a MemoryStream to a FileStream using MemoryStream.WriteTo(). After the file is closed it is opened up again to read some metdata...
This works about 80 - 90% of the time. The other 20% I get an exception saying the file is "in use by another process".
Does FileStream.Dispose() not release resources synchronously? Is there something going on lower in Win32 land I'm not aware of? I'm not seeing anything obvious in the .Net documentation.
As "immediately" as possible. There can easily be some lag due to outstanding writes, delay in updating the directory info etc. It could also be anti-virus software checking your changed file.
This may be a rare case where a Thread.Sleep(1) is called for. But to be totally safe you will have to catch the (any) exception and try again a set number of times.

different thread accessing MemoryStream

There's a bit of code which writes data to a MemoryStream object directly into it's data buffer by calling GetBuffer(). It also uses and updates the Position and SetLength() properties appropriately.
This code works properly 99.9999% of the time. Literally. Only every so many 100,000's of iterations it will barf. The specific problem is that the Position property of MemoryStream suddenly returns zero instead of the appropriate value.
However, code was added that checks for the 0 and throws an exception which includes log of the MemoryStream properties like Position and Length in a separate method. Those return the correct value. Further addition of logging within the same method shows that when this rare condition occurs, the Position only has zero inside this particular method.
Okay. Obviously, this must be a threading issue. And most likely a compiler optimization issue.
However, the nature of this software is that it's organized by "tasks" with a scheduler and so any one of several actual O/S thread may run this code at any give time--but never more than one at a time.
So it's my guess that ordinarily it so happens that the same thread keeps getting used for this method and then on a rare occasion a different thread get used. (Just code the idea to test this theory by capturing and comparing the thread id.)
Then due to compiler optimizations, the different thread never gets the correct value. It gets a "stale" value.
Ordinarily in a situation like this I would apply a "volatile" keyword to the variable in question to see if that fixes it. But in this case the variables are inside the MemoryStream object.
Does anyone have any other idea? Or does this mean we have to implement our own MemoryStream object?
Sincerely,
Wayne
EDIT: Just ran a test which counts the total number of calls to this method and counts the number of times the ManagedThreadId is different than the last call. It's almost exactly 50% of the time that it switches threads--alternating between them. So my theory above is almost certainly wrong or the error would occur far more often.
EDIT: This bug occurs so rarely that it would take nearly a week to run without the bug before feeling any confidence it's really gone. Instead, it's better to run experiments to confirm precisely the nature of the problem.
EDIT: Locking currently is handled via lock() statements in each of 5 methods that use the MemoryStream.
(Really need exemplar code to confirm this.)
MemoryStream members are not documented as thread safe (e.g. Position) so you need to ensure you are only access this instance (or any reference to an object logically a part of the MemoryStream) from one thread at a time.
But MemoryStream is not documented as having thread affinity, so you can access an instance from a different thread—as long as such an access is not concurrent.
Threading is hard (axiomatic for this Q&A).
I would suggest you have some concurrent access going on, with two threads both accessing the same instance concurrently and this is, occasionally, corrupting some aspect of the instance state.
I would ensure I keep the locking as simple as possible (trying to be extra clever and limiting locking is often a cause of very hard to find bugs) and get things working. Testing on a multi-core system may also help. Only try and optimise the locking if profiling shows there is potential for significant net (application as a whole) gain.

Categories

Resources