I'm developing a document based desktop app which writes a fairly large and complex file to disk when the user saves his document. What is the best practice to do here to prevent data corruption? There are a number of things that can happen:
The save process may fail half way, which is of course a serious application error, but in this case one would rather have the old file left than the corrupted half-written file. The same problem will occur if the application is terminated for some other reason half way through the file writing.
The most robust approach I can think of is using a temporary file while saving and only replace the original file once the new file has been successfully created. But I find there are several operations (creating tempfile, saving to tempfile, deleting original, moving tempfile to original) that may or may not fail, and I end up with quite a complicated mess of try/catch statements to handle them correctly.
Is there a best practice/standard for this scenario? For example is it better to copy the original to a temp file and then overwrite the original than to save to a temp file?
Also, how does one reason with the state of a file in a document based application (in windows)? Is it better to leave the file open for writing by the application until the user closes the document, or to just quickly get in an read the file on open and quickly close it again? Pros and cons?
Typically the file shuffling dance goes something like this, aiming to end up with file.txt containing the new data:
Write to file.txt.new
Move file.txt to file.txt.old
Move file.txt.new to file.txt
Delete file.txt.old
At any point you always have at least one valid file:
If only file.txt exists, you failed to start writing file.txt.new
If file.txt and file.txt.new exist, you probably failed during the write - file.txt should be the valid old copy. (If you can validate files, you could try loading the new file - it could be the move that failed)
If file.txt.old and file.txt.new exist, the second move operation failed. You can use either file, depending on whether you want new or old
If file.txt.old and file.txt exist, the delete operation failed. Again, you can use either file.
This is assuming you're on a file system with an atomic move operation. If that's not the case, I believe the procedure is the same but you'd need to be more careful about the recovery procedure.
Answering from the last question:
If we are talking here about fairly complex and big files, I would personaly choose to lock the file as during the reading I may not need to load all data on view, but only that one user needs now.
One first:
Save in temp file always.
Replace old one with new one, if this fails, considering the fact that your app is document management app, your primary objective failed, so the worst ever case, but you have new temp file. So on this error can close your app and reopen (critical error), on reopenning control if there is a temp file, if yes, run recovering of data, more or less like VS does in case of crashes.
Creating a temp file and then replacing the original file by the temp file (the latter being a cheap operation in terms of I/O) is the mechanism used by MFC's document persistence classes. I've NEVER seen it fail. Neither have users reported such problems. And yes back then the documents were large (they were complex as well but that's irrelevant as far as I/O is concerned).
Related
I have a file which I am copying to some location. Below is the code snippet -
//Document Status = Pending
var triggerFileWriter = new StringWriter();
triggerFileWriter.WriteLine("Only for test");
System.IO.File.WriteAllText(fullTriggerFilename, triggerFileWriter.ToString());
triggerFileWriter.Dispose();
if (System.IO.File.Exists(fullTriggerFilename))
{
// Document Status = Processed
}
Is File.Exists check sufficient to update the document status?
I am not worried about if file is not copied over and document status not updated. Because there is a timer job running every 10 minutes, 'Pending' items will be automatically picked up in the next run.
Is there any possibility of file copying being interrupted - which can result in a file but not actually copied completely?
What changes I can make to my code to address if that happens.
Thank you!
Well, the only way to know for sure is to compare the whole file, byte-by-byte, to the file you're trying to write. This is not exactly cheap, of course - you could have just as easily overwritten the file anyway.
On NTFS, files that weren't properly "committed" are basically deleted, so the File.Exists is fine. This may not be the case when using e.g. FAT-32, or when saving over a networked file system.
File size might help in that case - unless you pre-allocate the file in advance (which is quite a good practice for performance). Even without pre-allocating, it's quite possible for the file to be sized properly, but still missing data.
You can use an hash function such as SHA or MD5 on the original file and store it. Then apply the same hash function on the copied file and compare the two hashes. They must be identical.
You're calling File.WriteAllText method. That means your job will be done or you'll get an exception. So, you have a guarantee given by .NET I/O API that file was properly written.
But you'll never have guarantee that it exists in some resource. So you don't need to call File.Exists. Just don't rely on this. Everything can happen.
I am developing a WinForms application using C# 3.5. I have a requirement to save a file on a temporary basis. Let's just say, for arguments sake, that's it's for a short duration of time while the user is viewing a particular tab on the app. After the user navigates away from the tab I am free to delete this file. Each time the user navigates to the tab(which is typically only done once), the file will be created(using a GUID name).
To get to my question - is it considered good practice to save a file to the temp directory? I'll be using the following logic:
Path.GetTempFileName();
My intention would be to create the file and leave it without deleting it. I'm going to assume here that the Windows OS cleans up the temp directory at some interval based on % of available space remaining.
Note: I had considered using the IsolatedStorage option to create the file and manually delete the file when I was finished using it i.e. when the user navigates away from the tab. However, it's not going so well as I have a requirement to get the Absolute or Relative path to the file and this does not appear to be an straight-forward/safe chore when interacting with IsolatedStorage. My opinion is that it's just not designed to allow
this.
I write temp files quite frequently. In my humble opionion the key is to clean up after one self by deleting unneeded temp files.
In my opinion, it's a better practice to actually delete the temporary files when you don't need them. Consider the following remarks from Path.GetTempFileName() Method:
The GetTempFileName method will raise an IOException if it is used to
create more than 65535 files without deleting previous temporary
files.
The GetTempFileName method will raise an IOException if no
unique temporary file name is available. To resolve this error, delete
all unneeded temporary files.
Also, you should beaware about the following hotfix for Windows 7 and Windows Server 2008 R2.
Creating temp files in the temp directory is fine. It is considered good practice to clean up any temporary file when you are done using it.
Remember that temp files shouldn't persist any data you need on a long term basis (defined as across user sessions). Exaples of data needed "long term" are user settings or a saved data file.
Go ahead and save there, but clean up when you're done (closing the program). Keeping them until the end also allows re-use.
I'm reading the contents of an XML file and parsing that into an object model.
When I modify the values in the object model, then use the following code to save it back to the xml:
XElement optionXml = _panelElement.Elements("options").FirstOrDefault();
optionXml.SetAttributeValue("arming", value.ToString());
_document.Save(_fileName);
This works, as far as I can see, because when I close the application and restart it the values that I had saved are reflected in the object model next time I view it.
However, when I load the actual XML file, the values are still as they were originally.
Why is this? What do I need to do to save the actual XML file with the new values?
You are most likely experiencing file system virtualisation, which was introduced in Windows Vista.
Basically what this means is that you are saving your file, just not where you think you're saving it. For example, you might think that you are saving to C:\Program Files\Your App\yourFile.xml, but what is happening under the hood is that the OS is silently redirecting that to %APPDATA%\Your App\yourFile.xml. When you go to reload it, once again the OS silently redirects from that location.
This is a security measure designed to better encapsulate applications and their data and to prevent unauthorised writes to locations where damage can occur. You can still force a save to %PROGRAMFILES%\Your App, but to do that you either need to relax the ACLs applied to that folder, or you need to elevate the privilege level your application runs at.
I wasn't sure whether to put this as a comment or as an answer, but I think it could be a potential answer. It sounds like the XML file is being saved because the data is being persisted across instances of the application. It may be file system virtualization like slugster mentioned, but it might be a simple as the fact that you are looking at the wrong copy of the XML file. If you are using a relative path, the file may have been copied to the new location. I would suggest you do a quick file search for that file name and see what you get back.
It turns out the file was being copied to and read from the Output Directory. I can see that it's being updated as expected from there.
I need to read a text based log file to check for certain contents (the completion of a backup job). Obviously, the file is written to when the job completes.
My question is, how can I (or how SHOULD I write the code to) read the file, taking into account the file may be locked, or locked by my process when it needs to be read, without causing any reliability concerns.
Assuming the writing process has at least specified System.IO.FileShare.Read when opening the file, you should be able to read the text file while it is still being written to.
In addition to the answer by #BrokenGlass:
Only open the file for reading. If you try to open it for Read/Write access, it's more likely (almost certain) to fail - you may not be able to open it, and/or you may stop the other process being able to write to it.
Close the file when you aren't reading it to minimise the chance that you might cause problems for any other processes.
If the writing process denies read access while it is writing to the file, you may have to write some form of "retry loop", which allows your application to wait (keep retrying) until the file becomes readable. Just try to open the file (and catch errors) - if it fails, Sleep() for a bit and then try again. (However, if you're monitoring a log file, you will probbably want to keep checking it for more data anyway)
When a file is being written to, it is locked for all other processes that try to open the file in Write-mode. Read-mode will always be available.
However, if your writing process saves changes while you have already opened the file in your reading process, the changes will not be reflected there until you refresh (Close-Open) the file again.
I have an application that is modifying 5 identical xml files, each located on a different network share. I am aware that this is needlessly redundant, but "it must be so."
Every time this application runs, exactly one element (no more, no less) will be added/removed/modified.
Initially, the application opens each xml file, adds/removes/modifies the element to the appropriate node and saves the file, or throws an error if it cannot (Unable to access the network share, timeout, etc...)
How do I make this atomic?
My initial assumption was to:
foreach (var path in NetworkPaths)
if (!File.Exists(path)
isAtomic = false;
if (isAtomic)
{
//Do things
}
But I can see that only going so far. Is there another way to do this, or a direction I can be pointed to?
Unfortunately, for it to be truly "atomic" isn't really possible. My best advice would be to wrap up your own form of transaction for this, so you can at least undo the changes.
I'd do something like check for each file - if one doesn't exist, throw.
Backup each file, save the state needed to undo, or save a copy in memory if they're not huge. If you can't, throw.
Make your edits, then save the files. If you get a failure here, try to restore from each of the backups. You'll need to do some error handling here so you don't throw until all of the backups were restored. After restoring, throw your exception.
At least this way, you'll be more likely to not make a change to just a single file. Hopefully, if you can modify one file, you'll be able to restore it from your backup/undo your modification.
I suggest the following solution.
Try opening all files with a write lock.
If one or more fail, abort.
Modify and flush all files.
If one or more fail, roll the already modified ones back and flush them again.
Close all files.
If the rollback fails ... well ... try again, and try again, and try again ... and give up in an inconsitent state.
If you have control over all processes writing this files, you could implement a simple locking mechanism using a lock file. You could even perform write ahead logging and record the planned change in the lock file. If your process crashes, the next one attempting to modify the files would detect the incomplete operation and could continue it before doing it's one modification.
I would introduce versioning of the files. You can do this easily by appending a suffix to the filename. e.g a counter variable. The process for the reader is as follows:
prepare the next version of the file
write it to a temp file with a different name.
Get the highest version number
increment this version by one
rename the temp file to the new file
delete old files (you can keep e.g. 2 of them)
as Reader you do
- find the file with the highest version
- read it