My team requires a bulletproof way to save a file (less than 100kb) on Windows 10 IOT.
The file cannot be corrupted but it's OK to loose the most recent version if save failed because of power off etc.
Since the File IO has changed significantly (no more File.Replace) we are not sure how to achieve it.
We can see that:
var file = await folder.CreateFileAsync(fileName, CreationCollisionOption.OpenIfExists);
await Windows.Storage.FileIO.WriteTextAsync(file, data);
is reliably unreliable (it repeatedly broke when stopping debugging, or reset the device.) and we are ending up with a corrupted file (full of zeroes) and and a .tmp file next to it. We can recover this .tmp file I'm not confident that we should base our solution on undocumented behaviour.
One way we want to try is:
var tmpfile = await folder.CreateFileAsync(fileName+".tmp",
CreationCollisionOption.ReplaceExisting);
await Windows.Storage.FileIO.WriteTextAsync(tmpfile, data);
var file = await folder.CreateFileAsync(fileName, CreationCollisionOption.OpenIfExists);
// can this end up with a corrupt or missing file?
await tmpfile.MoveAndReplaceAsync(file);
In summary, is there a safe way to save some text to a file that will never corrupt the file?
Not sure if there's a best practice for this, but if needed to come up with something myself:
I would do something like calculating a checksum and save that along with the file.
When saving the next time, don't overwrite it but save it next to the previous one (which should be "known good"), and delete the previous one only after verifying that the new save completed successfully (together with the checksum)
Also I would assume that a rename operation should not corrupt the file, but I haven't researched that
This article has a good explanation: Best practices for writing to files on the underlying processes involved with writing to files in UWP.
The following common issues are highlighted:
A file is partially written.
The app receives an exception when calling one of the methods.
The operations leave behind .TMP files with a file name similar to the target file name.
What is not easily deduced in discussion about the trade off with convenience-vs-control is that while create or edit operations are more prone to failure, because they do a lot of things, renaming operations are a lot more fault tolerant if they are not physically writing bits around the filesystem.
You suggestion of creating a temp file first, is on the right track and may serve you well, but using MoveAndReplaceAsync means that you are still susceptible to these known issues if the destination file already exists.
UWP will use a transactional pattern with the file system and may create various backup copies of the source and the destination files.
You can take control of the final element by deleting the original file before calling MoveAndReplaceAsync, or you could simply use RenameAsync if your temp file is in the same folder, these have less components which should reduce the area for failure.
#hansmbakker has an answer along these lines, how you identify that the file write was successful is up to you, but by isolating the heavy write operation and verifying it before overwriting your original is a good idea if you need it to be bulletproof.
About Failure
I have observed the .TMP files a lot, when using the Append variants of FileIO writing, the .TMP files have the content of the original file before Append, but the actual file does not always have all of the original client, sometimes its a mix of old and new content, and sometimes the
In my experience, UWP file writes are very reliable when your entire call structure to the write operation is asynchronous and correctly awaits the pipeline. AND you take steps to ensure that only one process is trying to access the same file at any point in time.
When you try to manipulate files from a synchronous context we can start to see the "unreliable" nature you have identified, this happens a lot in code that is being transitioned from the old synchronous operations to the newer Async variants of FileIO operations.
Make sure the code calling your write method is non-blocking and correctly awaits, this will allow you to catch any exceptions that might be raised
it is common for us traditionally synchronous minded developers to try to use a lock(){} pattern to ensure single access to the file, but you cannot easily await inside a lock and attempts to do so often become the source of UWP file write issues.
If your code has a locking mechanism to ensure singleton access to the file, have a read over these articles for a different approach, they're old but a good resource that covers the transition for a traditional synchronous C# developer into async and parallel development.
What’s New For Parallelism in .NET 4.5
Building Async Coordination Primitives, Part 6: AsyncLock
Building Async Coordination Primitives, Part 7: AsyncReaderWriterLock
Other times we encounter a synchronous constraint are when an Event or Timer or Dispose context are the trigger for writing to the file in the first place. There are different techniques to involve there, please post another question that covers that scenario specifically if you think it might be contributing to your issues. :)
Related
Is using the FileStream class to write to a file and the .NET File.Copy method to copy the file at the same time thread safe? It seems like the operating system should safely handle concurrent access to the file, but I cannot find any documentation on this. I've written a simple application to test and am seeing weird results. The copy of the file is showing to be 2MB, but when I inspect the file content with notepad++ it's empty inside. The original file contains data.
using System;
using System.Threading.Tasks;
using System.Threading;
using System.IO;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
string filePath = Environment.CurrentDirectory + #"\test.txt";
using (FileStream fileStream = new FileStream(filePath, FileMode.Create, FileAccess.ReadWrite))
{
Task fileWriteTask = Task.Run(() =>
{
for (int i = 0; i < 10000000; i++)
{
fileStream.WriteByte((Byte)i);
}
});
Thread.Sleep(50);
File.Copy(filePath, filePath + ".copy", true);
fileWriteTask.Wait();
}
}
}
}
Thanks for the help!
It depends.
Depends what do you mean when you say "thread safe".
First of all, look at this constructor:
public FileStream(string path, FileMode mode, FileAccess access, FileShare share )
notice the last parameter, it states what do you allow other threads and processes to do with the file. the default that applies to constructors that don't have it is FileShare.Read, which means you allow others to view the file as read-only. this is of course unwise if you are writing to it.
That's what you basically did, you opened a file for writing, while allowing others to read it , and "read" includes copying.
Also please note, that without this: fileWriteTask.Wait(); at the end of your code, your entire function isn't thread safe, because the FileStream might be closed before you even start writing.
Windows does make file access thread safe, but in a pretty non trivial manner. for example if you would have opened the file with FileShare.None, it would have crashed File.Copy and to the best of my knowledge there isn't an elegant way to do this with .Net. The general approach Windows uses to synchronize file access is called optimistic concurrency, meaning to assume your action is possible, and fail if it isn't.
this question discusses waiting for file lock in .Net
Sharing files between process is a common issue and one of the ways to do this , mostly for Inter-Process Comunication is memory mapped files and this is the MSDN documentation
If you are brave and willing to play around with WinAPI and Overlapped IO, If I remember correctly LockFileEx allows nice file locking...
Also, once there was a magical thing called Transactional NTFS but it has moved on in to the realm of Microsoft Deprecated Technologies
It is thread-safe in a sense of "neither of C# object would be corrupted".
Result of operation will be more or less random (empty file, partial copy, access denied) and depends on sharing mode used to open file for each operation.
If carefully setup this can produce sensible results. I.e. flush file after each line and specify compatible share mode will allow to be reasonably sure that complete lines are copied.
The answer is no. You cannot in general operate on file system objects from different threads and achieve consistent or predictable results for the file contents.
Individual .NET Framework functions may or may not be thead-safe, but this is of little consequence. The timing and order in which data is read from, written to or copied between individual files on disk is essentially non-deterministic. By which I mean that if you do the same thing multiple times you will get different results, depending on factors outside your control such as machine load and disk layout.
The situation is made worse because the Windows API responsible for File.Copy is run on a system process and is only loosely synchronised with your program.
Bottom line is that if you want file level synchronisation you have no choice but to use file-level primitives to achieve it. That means things like open/close, flushing, locking. Finding combinations that work is non-trivial.
In general you are better off keeping all the operations on a file inside one thread, and synchronising access to that thread.
In answer to a comment, if you operate on a file by making it memory-mapped, the in-memory contents are not guaranteed to be consistent with the on-disk contents until the file is closed. The in-memory contents can be synchronised between processes or threads, but the on-disk contents cannot.
A named mutex locks as between processes, but does not guarantee anything as to consistency of file system objects.
File system locks are one of the ways I mentioned that could be used to ensure file system consistency, but in many situations there are still no guarantees. You are relying on the operating system to invalidate cached disk contents and flush to disk, and this is not guaranteed for all files at all times. For example, it may be necessary to use the FILE_FLAG_NO_BUFFERING, FILE_FLAG_OVERLAPPED and FILE_FLAG_WRITE_THROUGH flags, which may severely affect performance.
Anyone who thinks this is an easy problem with a simple one-size-fits-all solution has simply never tried to get it to work in practice.
I have read a number of closely related questions but not one that hits this exactly. If it is a duplicate, please send me a link.
I am using an angular version of the flowjs library for doing HTML5 file uploads (https://github.com/flowjs/ng-flow). This works very well and I am able to upload multiple files simultaneously in 1MB chunks. There is an ASP.Net Web API Files controller that accepts these and saves them on disk. Although I can make this work, I am not doing it efficiently and would like to know a better approach.
First, I used the MultipartFormDataStreamProvider in an async method that worked great as long as the file uploaded in a single chunk. Then I switched to just using the FileStream to write the file to disk. This also worked as long as the chunks arrived in order, but of course, I cannot rely on that.
Next, just to see it work, I wrote the chunks to individual file streams and combined them after the fact, hence the inefficiency. A 1GB file would generate a thousand chunks that needed to be read and rewritten after the upload was complete. I could hold all file chunks in memory and flush them after they are all uploaded, but I'm afraid the server would blow up.
It seems that there should be a nice asynchronous solution to this dilemma but I don't know what it is. One possibility might be to use async/await to combine previous chunks while writing the current chunk. Another might be to use Begin/EndInvoke to create a separate thread so that the file manipulation on disk was handled independent of the thread reading from the HttpContext but this would rely on the ThreadPool and I'm afraid that the created threads will be unduly terminated when my MVC controller returns. I could create a FileWatcher that ran completely independent of ASP.Net but that would be very kludgey.
So my questions are, 1) is there a simple solution already that I am missing? (seems like there should be) and 2) if not, what is the best approach to solving this inside the Web API framework?
Thanks, bob
I'm not familiar with that kind of chunked upload, but I believe this should work:
Use flowTotalSize to pre-allocate the file when the first chunk comes in.
Have one SemaphoreSlim per file to serialize the asynchronous writes for that file.
Each chunk will write to its own offset (flowChunkSize * (flowChunkNumber - 1)) within the file.
This doesn't handle situations where the uploads are unexpectedly terminated. That kind of solution usually involves allocating/writing a temporary file (with a special extension) and then moving/renaming that file once the last chunk arrives.
Don't forget to ensure that your file writing is actually asynchronous.
Using #Stephen Cleary's answer, and this thread: https://github.com/flowjs/ng-flow/issues/41 I was able to make an ASP.NET Web Api Implementation and uploaded it for those still wondering about this question such as #Herb Caudill
https://github.com/samhowes/NgFlowSample/tree/master.
The original answer is the real answer to this question, but I don't have enough reputation yet to comment. I did not use a SemaphoreSlim, but instead enabled file Write sharing. But did in fact pre-allocate and make sure that each chunk is getting written to the right location by calculating an offset.
I will be contributing this to the Flow samples at: https://github.com/flowjs/flow.js/tree/master/samples
This is what I have done. Uploaded the chunks and saved those chunks on the server an save the location of chunks in the database with their order (not the order they came in, the order of the chunk in the file they should be).
Then I introduced another endpoint to merge those chunks. Since this part can be a long process I used a messaging service to run the process in the background.
And after the service is done merging the file, sending a notification (or you can trigger an event).
Agree, it won't fix the problem of having to save all those chunks, but after the merging is done, we can just delete those from the disk. However there are some IIS configuration required though for the upload to work smoothly.
Here's my two cents to this old question. Now most of the application use azure or aws for storage. However, still sharing my thoughts in case it helps someone.
I am relatively new to C# so please bear with me.
I am writing a business application (in C#, .NET 4) that needs to be reliable. Data will be stored in files. Files will be modified (rewritten) regularly, thus I am afraid that something could go wrong (power loss, application gets killed, system freezes, ...) while saving data which would (I think) result in a corrupted file. I know that data which wasn't saved is lost, but I must not lose data which was already saved (because of corruption or ...).
My idea is to have 2 versions of every file and each time rewrite the oldest file. Then in case of unexpected end of my application at least one file should still be valid.
Is this a good approach? Is there anything else I could do? (Database is not an option)
Thank you for your time and answers.
Rather than "always write to the oldest" you can use the "safe file write" technique of:
(Assuming you want to end up saving data to foo.data, and a file with that name contains the previous valid version.)
Write new data to foo.data.new
Rename foo.data to foo.data.old
Rename foo.data.new to foo.data
Delete foo.data.old
At any one time you've always got at least one valid file, and you can tell which is the one to read just from the filename. This is assuming your file system treats rename and delete operations atomically, of course.
If foo.data and foo.data.new exist, load foo.data; foo.data.new may be broken (e.g. power off during write)
If foo.data.old and foo.data.new exist, both should be valid, but something died very shortly afterwards - you may want to load the foo.data.old version anyway
If foo.data and foo.data.old exist, then foo.data should be fine, but again something went wrong, or possibly the file couldn't be deleted.
Alternatively, simply always write to a new file, including some sort of monotonically increasing counter - that way you'll never lose any data due to bad writes. The best approach depends on what you're writing though.
You could also use File.Replace for this, which basically performs the last three steps for you. (Pass in null for the backup name if you don't want to keep a backup.)
A lot of programs uses this approach, but usually, they do more copies, to avoid also human error.
For example, Cadsoft Eagle (a program used to design circuits and printed circuit boards) do up to 9 backup copies of the same file, calling them file.b#1 ... file.b#9
Another thing you can do to enforce security is to hashing: append an hash like a CRC32 or MD5 at the end of the file.
When you open it you check the CRC or MD5, if they don't match the file is corrupted.
This will also enforce you from people that accidentally or by purpose try to modify your file with another program.
This will also give you a way to know if hard drive or usb disk got corrupted.
Of course, faster the save file operation is, the less risk of loosing data you have, but you cannot be sure that nothing will happen during or after writing.
Consider that both hard drives, usb drives and windows OS uses cache, and it means, also if you finish writing the data may be OS or disk itself still didn't physically wrote it to the disk.
Another thing you can do, save to a temporary file, if everything is ok you move the file in the real destination folder, this will reduce the risk of having half-files.
You can mix all these techniques together.
In principle there are two popular approaches to this:
Make your file format log-based, i.e. do not overwrite in the usual save case, just append changes or the latest versions at the end.
or
Write to a new file, rename the old file to a backup and rename the new file into its place.
The first leaves you with (way) more development effort, but also has the advantage of making saves go faster if you save small changes to large files (Word used to do this AFAIK).
We have some C# code that reads data from a text file using a StreamReader. On one computer we can read data from the text file even after it has been deleted or replaced with a different text file - the File.Exists call reports that the file exists even when it doesn't in Windows Explorer. However, on another computer this behaviour doesn't happen. Both computers are running Vista Business and .NET 2.0.50727 SP2.
We have tried restarting the machine without a resolution.
Does anyone have any understanding on how this could be possible and information about possible solutions?
Thanks,
Alan
From MSDN
The Exists method should not be used for path validation, this method merely checks if the file specified in path exists.
Be aware that another process can potentially do something with the file in between the time you call the Exists method and perform another operation on the file, such as Delete. A recommended programming practice is to wrap the Exists method, and the operations you take on the file, in a try...catch block as shown in the example. This helps to narrow the scope for potential conflicts. The Exists method can only help to ensure that the file will be available, it cannot guarantee it.
Could this be a folder virtualization issue?
Is the file being opened for reading before it's being deleted? If it is, it's not unexpected to still be able to read from the opened file even after the filesystem has otherwise let it go.
RE: File.Exists():
File.Exists is inherently prone to race-conditions. It should not be used as the exclusive manner to verify that a file does or doesn't exist before performing some operation. This mistake can frequently result in a security flaw within your software.
Rather, always handle the exceptions that can be thrown from your actual file operations that open, etc, and verify your input once it's open.
Is there a way to bypass or remove the file lock held by another thread without killing the thread?
I am using a third-party library in my app that is performing read-only operations on a file. I need a second thread read the file at the same time to extract out some extra data the third-party library is not exposing. Unfortunately, the third-party library opened the file using a Read/Write lock and hence I am getting the usual "The process cannot access the file ... because it is being used by another process" exception.
I would like to avoid pre-loading the entire file with my thread because the file is large and would cause unwanted delays in the loading of this file and excess memory usage. Copying the file is not practical due to the size of the files. During normal operation, two threads hitting the same file would not cause any significant IO contention/performance problems. I don't need perfect time-synchronization between the two threads, but they need to be reading the same data within a half second of eachother.
I cannot change the third-party library.
Are there any work-arounds to this problem?
If you start messing with the underlying file handle you may be able to unlock portions, the trouble is that the thread accessing the file is not designed to handle this kind of tampering and may end up crashing.
My strong recommendation would be to patch the third party library, anything you do can and probably will blow up in real world conditions.
In short, you cannot do anything about the locking of the file by a third-party. You can get away with Richard E's answer above that mentions the utility Unlocker.
Once the third-party opens a file and sets the lock on it, the underlying system will give that third-party a lock to ensure no other process can access it. There are two trains of thought on this.
Using DLL injection to patch up the code to explicitly set the lock or unset it. This can be dangerous as you would be messing with another process's stability, and possibly end up crashing the process and rendering grief. Think about it, the underlying system is keeping track of files opened by a process..DLL injection at the point and patch up the code - this requires technical knowledge to determine which process you want to inject into at run-time and alter the flags upon interception of the Win32 API call OpenFile(...).
Since this was tagged as .NET, why not disassemble the source of the third-party into .il files, and alter the flag for the lock to shared, rebuild the library by recompiling all .il files back together into a DLL. This of course, would require to root around the code where the opening of the file is taking place in some class somewhere.
Have a look at the podcast here. And have a look here that explains how to do the second option highlighted above, here.
Hope this helps,
Best regards,
Tom.
This doesn't address your situation directly, but a tool like Unlocker acheieves what you're trying to do, but via a Windows UI.
Any low level hack to do this may result in a thread crashing, file corruption or etc.
Hence I thought I'd mention the next best thing, just wait your turn and poll until the file is not locked: https://stackoverflow.com/a/11060322/495455
I dont think this 2nd advice will help but the closest thing (that I know of) would be DemandReadFileIO:
IntSecurity.DemandReadFileIO(filename);
internal static void DemandReadFileIO(string fileName)
{
string full = fileName;
full = UnsafeGetFullPath(fileName);
new FileIOPermission(FileIOPermissionAccess.Read, full).Demand();
}
I do think that this is a problem that can be solved with c++. It is annoying but at least it works (as discussed here: win32 C/C++ read data from a "locked" file)
The steps are:
Open the file before the third-library with fsopen and the _SH_DENYNO flag
Open the file with the third-library
Read the file within your code
You may be interested in these links as well:
Calling c++ from c# (Possible to call C++ code from C#?)
The inner link from this post with a sample (http://blogs.msdn.com/b/borisj/archive/2006/09/28/769708.aspx)
Have you tried making a dummy copy of the file before your third-party library gets a hold of it... then using the actual copy for your manipulations, logically this would only be considered if the file we are talking about is fairly small. but it is a kind of a cheat :) good luck
If the file is locked and isn't being used, then you have a problem with the way your file locking/unlocking mechanism works. You should only lock a file when you are modifying it, and should then immediately unlock it to avoid situations like this.