Atomicity of File.Move

Atomicity of File.Move - c#

I want to rename a file in a directory as an atomic transaction. The file will not be changing directories. The path is provided as a UNC Path to an NTFS file system, probably on either Server 03 or 08.
Is File.Move() atomic for these purposes? As in, it either completes successfully or fails such that the original file is still intact?
My gut says yes, but I wanted to make sure.

Yes, in NTFS. From here:
As an aside if you are running under NTFS then file operations are atomic at the file system level. A rename will occur in a single operation as far as any higher code is concerned. The problem you are seeing almost appears to be an issue where the FileInfo object is being shared across applications. It is a MarshalByRef object and therefore can be used in remoting environments. Don't know if this applies to you.

Related

Safely saving a file in Windows 10 IOT

My team requires a bulletproof way to save a file (less than 100kb) on Windows 10 IOT.
The file cannot be corrupted but it's OK to loose the most recent version if save failed because of power off etc.
Since the File IO has changed significantly (no more File.Replace) we are not sure how to achieve it.
We can see that:
var file = await folder.CreateFileAsync(fileName, CreationCollisionOption.OpenIfExists);
await Windows.Storage.FileIO.WriteTextAsync(file, data);
is reliably unreliable (it repeatedly broke when stopping debugging, or reset the device.) and we are ending up with a corrupted file (full of zeroes) and and a .tmp file next to it. We can recover this .tmp file I'm not confident that we should base our solution on undocumented behaviour.
One way we want to try is:
var tmpfile = await folder.CreateFileAsync(fileName+".tmp",
CreationCollisionOption.ReplaceExisting);
await Windows.Storage.FileIO.WriteTextAsync(tmpfile, data);
var file = await folder.CreateFileAsync(fileName, CreationCollisionOption.OpenIfExists);
// can this end up with a corrupt or missing file?
await tmpfile.MoveAndReplaceAsync(file);
In summary, is there a safe way to save some text to a file that will never corrupt the file?

Not sure if there's a best practice for this, but if needed to come up with something myself:
I would do something like calculating a checksum and save that along with the file.
When saving the next time, don't overwrite it but save it next to the previous one (which should be "known good"), and delete the previous one only after verifying that the new save completed successfully (together with the checksum)
Also I would assume that a rename operation should not corrupt the file, but I haven't researched that

This article has a good explanation: Best practices for writing to files on the underlying processes involved with writing to files in UWP.
The following common issues are highlighted:
A file is partially written.
The app receives an exception when calling one of the methods.
The operations leave behind .TMP files with a file name similar to the target file name.
What is not easily deduced in discussion about the trade off with convenience-vs-control is that while create or edit operations are more prone to failure, because they do a lot of things, renaming operations are a lot more fault tolerant if they are not physically writing bits around the filesystem.
You suggestion of creating a temp file first, is on the right track and may serve you well, but using MoveAndReplaceAsync means that you are still susceptible to these known issues if the destination file already exists.
UWP will use a transactional pattern with the file system and may create various backup copies of the source and the destination files.
You can take control of the final element by deleting the original file before calling MoveAndReplaceAsync, or you could simply use RenameAsync if your temp file is in the same folder, these have less components which should reduce the area for failure.
#hansmbakker has an answer along these lines, how you identify that the file write was successful is up to you, but by isolating the heavy write operation and verifying it before overwriting your original is a good idea if you need it to be bulletproof.
About Failure
I have observed the .TMP files a lot, when using the Append variants of FileIO writing, the .TMP files have the content of the original file before Append, but the actual file does not always have all of the original client, sometimes its a mix of old and new content, and sometimes the
In my experience, UWP file writes are very reliable when your entire call structure to the write operation is asynchronous and correctly awaits the pipeline. AND you take steps to ensure that only one process is trying to access the same file at any point in time.
When you try to manipulate files from a synchronous context we can start to see the "unreliable" nature you have identified, this happens a lot in code that is being transitioned from the old synchronous operations to the newer Async variants of FileIO operations.
Make sure the code calling your write method is non-blocking and correctly awaits, this will allow you to catch any exceptions that might be raised
it is common for us traditionally synchronous minded developers to try to use a lock(){} pattern to ensure single access to the file, but you cannot easily await inside a lock and attempts to do so often become the source of UWP file write issues.
If your code has a locking mechanism to ensure singleton access to the file, have a read over these articles for a different approach, they're old but a good resource that covers the transition for a traditional synchronous C# developer into async and parallel development.
What’s New For Parallelism in .NET 4.5
Building Async Coordination Primitives, Part 6: AsyncLock
Building Async Coordination Primitives, Part 7: AsyncReaderWriterLock
Other times we encounter a synchronous constraint are when an Event or Timer or Dispose context are the trigger for writing to the file in the first place. There are different techniques to involve there, please post another question that covers that scenario specifically if you think it might be contributing to your issues. :)

How WriteFile function interacts with another data on disk?

I using WriteFile function for writing sectors on disk. How WriteFile function interacts with another data on drive or disk? How I can write file without accidentally removing another file? And is it possible that the operating system can accidentally remove my file?

When you write directly to the disk you are bypassing the file system completely. Unless you re-implement the functionality required to read and respect the file system then you can expect to destroy the disk. You will likely not only write over other files, but it is likely that you will overwrite the meta data – that is the information that describes the structure and location of the directories and files, their attributes and so on.
If the disk already contains a functioning file system and you don't want to disturb that then there is practically no scenario that I can imagine where it makes sense to write to the disk directly. If you want to write files to the disk, do just that. I suspect that you have made a mistake in your reasoning somewhere that has led you to believe that you should be writing directly to the disk.

Do you really write sectors on disk and not to a file on disk? Some background information would have been great, because if you are really writing into the raw disk surface instead of writing to a file on the disk, through the operating system, using higher level functions such as fopen(), fwrite(), or even higher that that, then you should be doing it for a good reason. Might we inquire as to what that reason is?
If you are writing disk sectors with no regards as to what filesystem the disk has, then all bets are off. Supposing that the operating system allows that, there's nothing to protect you from overwriting critical disk data or from the OS overwriting your files.
I've done that to access numbered sectors on an embedded system whose only contact to the "outside world" (the PC) was a custom hacked USB mass storage interface. And even then I broke cold sweat every time I had to do it - If my code would have accidentally written to the hard disk of the PC, it would have probably been good-bye to the OS installation on the hard disk and all the files written to it.

Deleting Created Temp FIles

In C# or Jscript.NET is it necessary (or better to) delete files created in the folder returned by System.IO.Path.GetTempPath() or is it acceptable to wait for the system to delete them (if this ever happens)?

In terms of how your program will work, it makes no difference.
The issue is one of cluttering the filesystem with many files in a folder - the more files in a folder, the slower it will be to read it.
There is also a chance of running out of disk space if the files are not deleted.
In general, if your software creates temporary files, it should cleanup after itself (that is, delete the files once they are no longer in use).

In my opinion, you always clean up the mess you create. "Temp" files tend to hang around for a long time unless the user is savvy enough to use cleanup tools like CCleaner.
Note: This does add the complexity (and potential bugs) of having to remember all the temp files you've created in order to clean them up, but I think it's worth it. This answer by Mr. Gravell has an easy way to take care of that issue.

How to ensure that data doesn't get corrupted when saving to file?

I am relatively new to C# so please bear with me.
I am writing a business application (in C#, .NET 4) that needs to be reliable. Data will be stored in files. Files will be modified (rewritten) regularly, thus I am afraid that something could go wrong (power loss, application gets killed, system freezes, ...) while saving data which would (I think) result in a corrupted file. I know that data which wasn't saved is lost, but I must not lose data which was already saved (because of corruption or ...).
My idea is to have 2 versions of every file and each time rewrite the oldest file. Then in case of unexpected end of my application at least one file should still be valid.
Is this a good approach? Is there anything else I could do? (Database is not an option)
Thank you for your time and answers.

Rather than "always write to the oldest" you can use the "safe file write" technique of:
(Assuming you want to end up saving data to foo.data, and a file with that name contains the previous valid version.)
Write new data to foo.data.new
Rename foo.data to foo.data.old
Rename foo.data.new to foo.data
Delete foo.data.old
At any one time you've always got at least one valid file, and you can tell which is the one to read just from the filename. This is assuming your file system treats rename and delete operations atomically, of course.
If foo.data and foo.data.new exist, load foo.data; foo.data.new may be broken (e.g. power off during write)
If foo.data.old and foo.data.new exist, both should be valid, but something died very shortly afterwards - you may want to load the foo.data.old version anyway
If foo.data and foo.data.old exist, then foo.data should be fine, but again something went wrong, or possibly the file couldn't be deleted.
Alternatively, simply always write to a new file, including some sort of monotonically increasing counter - that way you'll never lose any data due to bad writes. The best approach depends on what you're writing though.
You could also use File.Replace for this, which basically performs the last three steps for you. (Pass in null for the backup name if you don't want to keep a backup.)

A lot of programs uses this approach, but usually, they do more copies, to avoid also human error.
For example, Cadsoft Eagle (a program used to design circuits and printed circuit boards) do up to 9 backup copies of the same file, calling them file.b#1 ... file.b#9
Another thing you can do to enforce security is to hashing: append an hash like a CRC32 or MD5 at the end of the file.
When you open it you check the CRC or MD5, if they don't match the file is corrupted.
This will also enforce you from people that accidentally or by purpose try to modify your file with another program.
This will also give you a way to know if hard drive or usb disk got corrupted.
Of course, faster the save file operation is, the less risk of loosing data you have, but you cannot be sure that nothing will happen during or after writing.
Consider that both hard drives, usb drives and windows OS uses cache, and it means, also if you finish writing the data may be OS or disk itself still didn't physically wrote it to the disk.
Another thing you can do, save to a temporary file, if everything is ok you move the file in the real destination folder, this will reduce the risk of having half-files.
You can mix all these techniques together.

In principle there are two popular approaches to this:
Make your file format log-based, i.e. do not overwrite in the usual save case, just append changes or the latest versions at the end.
or
Write to a new file, rename the old file to a backup and rename the new file into its place.
The first leaves you with (way) more development effort, but also has the advantage of making saves go faster if you save small changes to large files (Word used to do this AFAIK).

C# can read from a file that doesn't exist?

We have some C# code that reads data from a text file using a StreamReader. On one computer we can read data from the text file even after it has been deleted or replaced with a different text file - the File.Exists call reports that the file exists even when it doesn't in Windows Explorer. However, on another computer this behaviour doesn't happen. Both computers are running Vista Business and .NET 2.0.50727 SP2.
We have tried restarting the machine without a resolution.
Does anyone have any understanding on how this could be possible and information about possible solutions?
Thanks,
Alan

From MSDN
The Exists method should not be used for path validation, this method merely checks if the file specified in path exists.
Be aware that another process can potentially do something with the file in between the time you call the Exists method and perform another operation on the file, such as Delete. A recommended programming practice is to wrap the Exists method, and the operations you take on the file, in a try...catch block as shown in the example. This helps to narrow the scope for potential conflicts. The Exists method can only help to ensure that the file will be available, it cannot guarantee it.

Could this be a folder virtualization issue?

Is the file being opened for reading before it's being deleted? If it is, it's not unexpected to still be able to read from the opened file even after the filesystem has otherwise let it go.
RE: File.Exists():
File.Exists is inherently prone to race-conditions. It should not be used as the exclusive manner to verify that a file does or doesn't exist before performing some operation. This mistake can frequently result in a security flaw within your software.
Rather, always handle the exceptions that can be thrown from your actual file operations that open, etc, and verify your input once it's open.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Atomicity of File.Move - c#

Related

Safely saving a file in Windows 10 IOT

How WriteFile function interacts with another data on disk?

Deleting Created Temp FIles

How to ensure that data doesn't get corrupted when saving to file?

C# can read from a file that doesn't exist?

Categories

Resources