Is using the FileStream class to write to a file and the .NET File.Copy method to copy the file at the same time thread safe? It seems like the operating system should safely handle concurrent access to the file, but I cannot find any documentation on this. I've written a simple application to test and am seeing weird results. The copy of the file is showing to be 2MB, but when I inspect the file content with notepad++ it's empty inside. The original file contains data.
using System;
using System.Threading.Tasks;
using System.Threading;
using System.IO;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
string filePath = Environment.CurrentDirectory + #"\test.txt";
using (FileStream fileStream = new FileStream(filePath, FileMode.Create, FileAccess.ReadWrite))
{
Task fileWriteTask = Task.Run(() =>
{
for (int i = 0; i < 10000000; i++)
{
fileStream.WriteByte((Byte)i);
}
});
Thread.Sleep(50);
File.Copy(filePath, filePath + ".copy", true);
fileWriteTask.Wait();
}
}
}
}
Thanks for the help!
It depends.
Depends what do you mean when you say "thread safe".
First of all, look at this constructor:
public FileStream(string path, FileMode mode, FileAccess access, FileShare share )
notice the last parameter, it states what do you allow other threads and processes to do with the file. the default that applies to constructors that don't have it is FileShare.Read, which means you allow others to view the file as read-only. this is of course unwise if you are writing to it.
That's what you basically did, you opened a file for writing, while allowing others to read it , and "read" includes copying.
Also please note, that without this: fileWriteTask.Wait(); at the end of your code, your entire function isn't thread safe, because the FileStream might be closed before you even start writing.
Windows does make file access thread safe, but in a pretty non trivial manner. for example if you would have opened the file with FileShare.None, it would have crashed File.Copy and to the best of my knowledge there isn't an elegant way to do this with .Net. The general approach Windows uses to synchronize file access is called optimistic concurrency, meaning to assume your action is possible, and fail if it isn't.
this question discusses waiting for file lock in .Net
Sharing files between process is a common issue and one of the ways to do this , mostly for Inter-Process Comunication is memory mapped files and this is the MSDN documentation
If you are brave and willing to play around with WinAPI and Overlapped IO, If I remember correctly LockFileEx allows nice file locking...
Also, once there was a magical thing called Transactional NTFS but it has moved on in to the realm of Microsoft Deprecated Technologies
It is thread-safe in a sense of "neither of C# object would be corrupted".
Result of operation will be more or less random (empty file, partial copy, access denied) and depends on sharing mode used to open file for each operation.
If carefully setup this can produce sensible results. I.e. flush file after each line and specify compatible share mode will allow to be reasonably sure that complete lines are copied.
The answer is no. You cannot in general operate on file system objects from different threads and achieve consistent or predictable results for the file contents.
Individual .NET Framework functions may or may not be thead-safe, but this is of little consequence. The timing and order in which data is read from, written to or copied between individual files on disk is essentially non-deterministic. By which I mean that if you do the same thing multiple times you will get different results, depending on factors outside your control such as machine load and disk layout.
The situation is made worse because the Windows API responsible for File.Copy is run on a system process and is only loosely synchronised with your program.
Bottom line is that if you want file level synchronisation you have no choice but to use file-level primitives to achieve it. That means things like open/close, flushing, locking. Finding combinations that work is non-trivial.
In general you are better off keeping all the operations on a file inside one thread, and synchronising access to that thread.
In answer to a comment, if you operate on a file by making it memory-mapped, the in-memory contents are not guaranteed to be consistent with the on-disk contents until the file is closed. The in-memory contents can be synchronised between processes or threads, but the on-disk contents cannot.
A named mutex locks as between processes, but does not guarantee anything as to consistency of file system objects.
File system locks are one of the ways I mentioned that could be used to ensure file system consistency, but in many situations there are still no guarantees. You are relying on the operating system to invalidate cached disk contents and flush to disk, and this is not guaranteed for all files at all times. For example, it may be necessary to use the FILE_FLAG_NO_BUFFERING, FILE_FLAG_OVERLAPPED and FILE_FLAG_WRITE_THROUGH flags, which may severely affect performance.
Anyone who thinks this is an easy problem with a simple one-size-fits-all solution has simply never tried to get it to work in practice.
Related
My team requires a bulletproof way to save a file (less than 100kb) on Windows 10 IOT.
The file cannot be corrupted but it's OK to loose the most recent version if save failed because of power off etc.
Since the File IO has changed significantly (no more File.Replace) we are not sure how to achieve it.
We can see that:
var file = await folder.CreateFileAsync(fileName, CreationCollisionOption.OpenIfExists);
await Windows.Storage.FileIO.WriteTextAsync(file, data);
is reliably unreliable (it repeatedly broke when stopping debugging, or reset the device.) and we are ending up with a corrupted file (full of zeroes) and and a .tmp file next to it. We can recover this .tmp file I'm not confident that we should base our solution on undocumented behaviour.
One way we want to try is:
var tmpfile = await folder.CreateFileAsync(fileName+".tmp",
CreationCollisionOption.ReplaceExisting);
await Windows.Storage.FileIO.WriteTextAsync(tmpfile, data);
var file = await folder.CreateFileAsync(fileName, CreationCollisionOption.OpenIfExists);
// can this end up with a corrupt or missing file?
await tmpfile.MoveAndReplaceAsync(file);
In summary, is there a safe way to save some text to a file that will never corrupt the file?
Not sure if there's a best practice for this, but if needed to come up with something myself:
I would do something like calculating a checksum and save that along with the file.
When saving the next time, don't overwrite it but save it next to the previous one (which should be "known good"), and delete the previous one only after verifying that the new save completed successfully (together with the checksum)
Also I would assume that a rename operation should not corrupt the file, but I haven't researched that
This article has a good explanation: Best practices for writing to files on the underlying processes involved with writing to files in UWP.
The following common issues are highlighted:
A file is partially written.
The app receives an exception when calling one of the methods.
The operations leave behind .TMP files with a file name similar to the target file name.
What is not easily deduced in discussion about the trade off with convenience-vs-control is that while create or edit operations are more prone to failure, because they do a lot of things, renaming operations are a lot more fault tolerant if they are not physically writing bits around the filesystem.
You suggestion of creating a temp file first, is on the right track and may serve you well, but using MoveAndReplaceAsync means that you are still susceptible to these known issues if the destination file already exists.
UWP will use a transactional pattern with the file system and may create various backup copies of the source and the destination files.
You can take control of the final element by deleting the original file before calling MoveAndReplaceAsync, or you could simply use RenameAsync if your temp file is in the same folder, these have less components which should reduce the area for failure.
#hansmbakker has an answer along these lines, how you identify that the file write was successful is up to you, but by isolating the heavy write operation and verifying it before overwriting your original is a good idea if you need it to be bulletproof.
About Failure
I have observed the .TMP files a lot, when using the Append variants of FileIO writing, the .TMP files have the content of the original file before Append, but the actual file does not always have all of the original client, sometimes its a mix of old and new content, and sometimes the
In my experience, UWP file writes are very reliable when your entire call structure to the write operation is asynchronous and correctly awaits the pipeline. AND you take steps to ensure that only one process is trying to access the same file at any point in time.
When you try to manipulate files from a synchronous context we can start to see the "unreliable" nature you have identified, this happens a lot in code that is being transitioned from the old synchronous operations to the newer Async variants of FileIO operations.
Make sure the code calling your write method is non-blocking and correctly awaits, this will allow you to catch any exceptions that might be raised
it is common for us traditionally synchronous minded developers to try to use a lock(){} pattern to ensure single access to the file, but you cannot easily await inside a lock and attempts to do so often become the source of UWP file write issues.
If your code has a locking mechanism to ensure singleton access to the file, have a read over these articles for a different approach, they're old but a good resource that covers the transition for a traditional synchronous C# developer into async and parallel development.
What’s New For Parallelism in .NET 4.5
Building Async Coordination Primitives, Part 6: AsyncLock
Building Async Coordination Primitives, Part 7: AsyncReaderWriterLock
Other times we encounter a synchronous constraint are when an Event or Timer or Dispose context are the trigger for writing to the file in the first place. There are different techniques to involve there, please post another question that covers that scenario specifically if you think it might be contributing to your issues. :)
Imagine I have DLL which has class which writes data to file. Imagine also that class uses locks to makes sure that no two threads write to that file at the same time.
Now imagine I have two different programs A and B which separately use this DLL.
And in both cases path to the file where data is written to is same. In this case this is not thread safe anymore because both programs are writing to same file, am I right? And the locks I mentioned in DLL help only when it is used from ONLY program A, and not from program B simultaneously, am I right?
To put this question in a different way: My point is basically if there is ONLY ONE single copy of DLL loaded regardless if many different programs are using this DLL, then I am on the safe side, because the locks which I had in DLL will work - and not let different threads from all these programs to write to file in an out of syn way. Am I right?
Well, the code runs thread-safe, since a thread of program A cannot change the state of a thread in program B. You'll not have problems of variables and objects being accessed simultaneously.
But you have a problem of accessing a global resource which cannot or should not be shared. There are ways of synchronizing programs as well, e.g. with an IPCChannel (inter process communication, internally using a named pipe), a named Mutex or an EventWaitHandle.
You can also use the file itself as a synchonization object, if only a single operation on the file is needed. See the File.Open() method, especially with the FileShare.None option.
Simply access the file. If it is possible, then this process has access to the file. If it's not possible, opening the file will result in an exception. Wait until the file lock is released, then try again.
At the moment you are locking an object. A .NET object is created in (virtual) memory. Virtual memory is not shared between processes, so each process will have a different object that it locks on. Therefore, synchronization between processes is not possible with the lock statement.
The code in your DLL will be executed in each process that loads it.
I'm assuming you mean C# lock, which is process-local, so nothing is stopping both processes writing to the file simultaneously.
Your best best is to let the file system handle it. Open the file in a mode that prevents other processes from writing to it:
File.Open("test.txt", FileMode.Open, FileAccess.Write, FileShare.Read);
Where FileShare.Read specifies that subsequent attempts to open the file will succeed only for read access.
I have a file that im going to fill up so I tought if its better to do it simultaneously.
Notes:
I get the file from multiple computers simultaneously.
I set the position every time befor calling StartWrite. -> Do I must lock it each time befor using it?
Is it good sulotion? Do you have a better one?
btw, what does Stream.Flush() ?
Thanks.
No, that would be conceptually wrong. Stream (I assume you mean a System.IO.Stream class) is an abstract class. When you instantiate an object you are using one of many child classes.
Assuming anything about child classes is wrong approach because:
a) Somebody might come after you to made modifications to your code and not see what actual child class implementation does.
b) Less likely, but the implementation can change. For example, what if someone installs your code on Mono framework.
If you are using FileStream class, consider creating two (or more) FileStream objects over the same underlying file with FileShare parameter set to Write. This way you specify that there might be simultaneous writing, but each stream has its own location pointer.
Update: Only now I saw your comment "each computer send me a part with start index, end index and byte[]". Actually, multiple FileStreams should work OK for this scenario.
void DataReceived(int start, byte[] data)
{
System.IO.FileStream f = new System.IO.FileStream("file.dat", System.IO.FileMode.Open, System.IO.FileAccess.Write, System.IO.FileShare.ReadWrite);
f.Seek(start, System.IO.SeekOrigin.Begin);
f.Write(data, start, data.Length);
f.Close();
}
This is unsafe by principle because even if your stream was thread-safe you would still have to non-atomically set the position and write.
The native Windows file APIs support this, .NET doesn't. Windows is perfectly capable of concurrent IO to the same file (how would SQL Server work if Windows didn't support this?).
I suggest you just use one writing FileStream per thread.
It's pointless to try to do several write operations to the same stream at the same time.
The underlying system can only write to one position in the file at a time, so even if the asynchronous write method would support multi threading, the writes would still be blocked.
Just do regular writes to the file, and use locking so that only one thread at a time writes to the file.
Is there a way to bypass or remove the file lock held by another thread without killing the thread?
I am using a third-party library in my app that is performing read-only operations on a file. I need a second thread read the file at the same time to extract out some extra data the third-party library is not exposing. Unfortunately, the third-party library opened the file using a Read/Write lock and hence I am getting the usual "The process cannot access the file ... because it is being used by another process" exception.
I would like to avoid pre-loading the entire file with my thread because the file is large and would cause unwanted delays in the loading of this file and excess memory usage. Copying the file is not practical due to the size of the files. During normal operation, two threads hitting the same file would not cause any significant IO contention/performance problems. I don't need perfect time-synchronization between the two threads, but they need to be reading the same data within a half second of eachother.
I cannot change the third-party library.
Are there any work-arounds to this problem?
If you start messing with the underlying file handle you may be able to unlock portions, the trouble is that the thread accessing the file is not designed to handle this kind of tampering and may end up crashing.
My strong recommendation would be to patch the third party library, anything you do can and probably will blow up in real world conditions.
In short, you cannot do anything about the locking of the file by a third-party. You can get away with Richard E's answer above that mentions the utility Unlocker.
Once the third-party opens a file and sets the lock on it, the underlying system will give that third-party a lock to ensure no other process can access it. There are two trains of thought on this.
Using DLL injection to patch up the code to explicitly set the lock or unset it. This can be dangerous as you would be messing with another process's stability, and possibly end up crashing the process and rendering grief. Think about it, the underlying system is keeping track of files opened by a process..DLL injection at the point and patch up the code - this requires technical knowledge to determine which process you want to inject into at run-time and alter the flags upon interception of the Win32 API call OpenFile(...).
Since this was tagged as .NET, why not disassemble the source of the third-party into .il files, and alter the flag for the lock to shared, rebuild the library by recompiling all .il files back together into a DLL. This of course, would require to root around the code where the opening of the file is taking place in some class somewhere.
Have a look at the podcast here. And have a look here that explains how to do the second option highlighted above, here.
Hope this helps,
Best regards,
Tom.
This doesn't address your situation directly, but a tool like Unlocker acheieves what you're trying to do, but via a Windows UI.
Any low level hack to do this may result in a thread crashing, file corruption or etc.
Hence I thought I'd mention the next best thing, just wait your turn and poll until the file is not locked: https://stackoverflow.com/a/11060322/495455
I dont think this 2nd advice will help but the closest thing (that I know of) would be DemandReadFileIO:
IntSecurity.DemandReadFileIO(filename);
internal static void DemandReadFileIO(string fileName)
{
string full = fileName;
full = UnsafeGetFullPath(fileName);
new FileIOPermission(FileIOPermissionAccess.Read, full).Demand();
}
I do think that this is a problem that can be solved with c++. It is annoying but at least it works (as discussed here: win32 C/C++ read data from a "locked" file)
The steps are:
Open the file before the third-library with fsopen and the _SH_DENYNO flag
Open the file with the third-library
Read the file within your code
You may be interested in these links as well:
Calling c++ from c# (Possible to call C++ code from C#?)
The inner link from this post with a sample (http://blogs.msdn.com/b/borisj/archive/2006/09/28/769708.aspx)
Have you tried making a dummy copy of the file before your third-party library gets a hold of it... then using the actual copy for your manipulations, logically this would only be considered if the file we are talking about is fairly small. but it is a kind of a cheat :) good luck
If the file is locked and isn't being used, then you have a problem with the way your file locking/unlocking mechanism works. You should only lock a file when you are modifying it, and should then immediately unlock it to avoid situations like this.
In a asp.net web application, I want to write to a file. This function will first get data from the database, and then write out the flat file.
What can I do to make sure only 1 write occurs, and once the write occurrs, the other threads that maybe want to write to the file don't since the write took place.
I want to have this write done ONLY if it hasn't been done in say 15 minutes.
I know there is a lock keyword, so should I wrap everything in a lock, then check if it has been updated in 15 minutes or more, or visa versa?
Update
Workflow:
Since this is a web application, the multiple instances will be people viewing a particular web page. I could use the build in cache system, but if asp.net recycles it will be expensive to rebuild the cache so I just want to write it out to a flat file. My other option would be just to create a windows service, but that is more work to manage that I want.
Synchronize your writing code to lock on a shared object so that only one thread gets inside the block. Others wait till the current one exits.
lock(this)
{
// perform the write.
}
Update: I assumed that you have a shared object. If these are different processes on the same machine, you'd need something like a Named Mutex. Looky here for an example
is it not better to lock an object variable rather than the whole instance?
File I/O operations that write to the file will automatically lock the file. Check if the file is locked (by trying the write) and if it is do not write. Before doing any writes, check the timestamp on the file and see if its more than 15 minutes.
afaik you can not write to a file without it being locked by Windows/whatever.
Now all that's left for you is to look up how to do the above using msdn (sorry, I can't be bothered to look it all up and I don't remember the C# classes very well). :)
I'm not convinced that .NET's locking is applicable to different processes. Moreover, lock(this) will only exclude other threads that are running the method on the same instance of "this" - so other threads even in the same process could run at once on different instances.
Assuming all your processes are running on the same machine, file locking should do it though.
If you're on different machines, your mileage may vary- win32 claims to have file locking which works over a network, but historically applications which rely on it (Think MSAccess) have problems with file corruption anyway.
// try enter will return false if another thread owns the lock
if (Monitor.TryEnter(lockObj))
{
try
{
// check last write time here, return if too soon; otherwise, write
}
finally
{
Monitor.Exit(lockobj);
}
}
Use file-system locks
Like others suggested, .NET locks will be of limited use in this situation.
Here is the code:
FileInfo fi = new FileInfo(path);
if (fi.Exists
&& (DateTime.UtcNow - fi.LastWriteTimeUtc < TimeSpan.FromMinutes(15)) {
// file is fresh
return;
}
FileStream fs;
try {
fs = new FileStream(
path, FileMode.Create, FileAccess.Write, FileShare.Read);
} catch (IOException) {
// file is locked
return;
}
using (fs) {
// write to file
}
This will work across threads and processes.
You guys should consider a Mutex. It can synchronize between multiple threads and processes.