Ensuring whether file is copied successfully - c#

I have a file which I am copying to some location. Below is the code snippet -
//Document Status = Pending
var triggerFileWriter = new StringWriter();
triggerFileWriter.WriteLine("Only for test");
System.IO.File.WriteAllText(fullTriggerFilename, triggerFileWriter.ToString());
triggerFileWriter.Dispose();
if (System.IO.File.Exists(fullTriggerFilename))
{
// Document Status = Processed
}
Is File.Exists check sufficient to update the document status?
I am not worried about if file is not copied over and document status not updated. Because there is a timer job running every 10 minutes, 'Pending' items will be automatically picked up in the next run.
Is there any possibility of file copying being interrupted - which can result in a file but not actually copied completely?
What changes I can make to my code to address if that happens.
Thank you!

Well, the only way to know for sure is to compare the whole file, byte-by-byte, to the file you're trying to write. This is not exactly cheap, of course - you could have just as easily overwritten the file anyway.
On NTFS, files that weren't properly "committed" are basically deleted, so the File.Exists is fine. This may not be the case when using e.g. FAT-32, or when saving over a networked file system.
File size might help in that case - unless you pre-allocate the file in advance (which is quite a good practice for performance). Even without pre-allocating, it's quite possible for the file to be sized properly, but still missing data.

You can use an hash function such as SHA or MD5 on the original file and store it. Then apply the same hash function on the copied file and compare the two hashes. They must be identical.

You're calling File.WriteAllText method. That means your job will be done or you'll get an exception. So, you have a guarantee given by .NET I/O API that file was properly written.
But you'll never have guarantee that it exists in some resource. So you don't need to call File.Exists. Just don't rely on this. Everything can happen.

Related

FileSystemWatcher and write completion

I am implementing an event handler that must open and process the content of a file created by a third part application over which I have no control. I am warned by a note in "C# 4.0 in a nutshell" (page 495) about the risk to open a file before it is fully populated; so I am wondering how to manage this occurrence. To keep at minimum the load on the event handler, I am considering to have the handler simply insert in a queue the file names and then to have a different thread to manage the processing, but, anyways, how may I make sure that the write is completed and the file read is safe? The file size could be arbitrary.
Some idea? Thanks
A reliable way to achieve what you want might be to use FileSystemWatcher + NTFS USN journal.
Maybe more complicated than you expected, but FileSystemWatcher alone won't tell you for sure that the newly created file has been closed
-first, the FileSystemWatcher, to know when a file is created. From there you have the complete file path, and are 1 or 2 pinvokes away from getting the file unique ID (which can help you to track it during its whole lifetime).
-then, read the USN journal, which tracks everything that occurs on your drive. Filter on entries corresponding to your new file's ID, and read the journal until reaching the entry with the 'Close' event.
From there, unless your file is manipulated in special ways (opened and closed multiple times by the application that generates it), you can assume it is safe to read it and do whatever you wanted to do with it.
A really great C# implementation of an USN journal parser is StCroixSkipper's work, available here:
http://mftscanner.codeplex.com/
If you are interested I can give you more help about USN journal, as I use it in my project.
Our workaround is to watch for a specific extension. When a file is uploaded, the extension is ".tmp". When its done uploading, it's renamed to have the proper extension.
Another alternative is to have the server try to move the file in a try/catch block. If the fie isn't done being uploaded, the attempt to move the file will throw an exception, so we wait and try again.
Realistically, you can't know. If the other applications "write" operation is to open the file denying write access to everyone else then when it's done, close the file. When you get a notification then you could simply open the file requesting write access and if that fails, you know the operation isn't complete. But, if the "write" operation is to open the file, write, close the file, open the file again, and write again, etc., then you're pretty much out of luck.
The best solution I've seen is to set a timer after the last notification. When the timer elapses, try to open the file for write--if you can, assume the "operation" is done and do what you need to do. If the open fails, assume the operation is still in progress and wait some more.
Of course, nothing is foolproof. Despite the above, another operation could start while you're doing what you want with the file and cause interaction problems.

Why isn't my XML file being saved, even though the program sees the updated values next time I start it?

I'm reading the contents of an XML file and parsing that into an object model.
When I modify the values in the object model, then use the following code to save it back to the xml:
XElement optionXml = _panelElement.Elements("options").FirstOrDefault();
optionXml.SetAttributeValue("arming", value.ToString());
_document.Save(_fileName);
This works, as far as I can see, because when I close the application and restart it the values that I had saved are reflected in the object model next time I view it.
However, when I load the actual XML file, the values are still as they were originally.
Why is this? What do I need to do to save the actual XML file with the new values?
You are most likely experiencing file system virtualisation, which was introduced in Windows Vista.
Basically what this means is that you are saving your file, just not where you think you're saving it. For example, you might think that you are saving to C:\Program Files\Your App\yourFile.xml, but what is happening under the hood is that the OS is silently redirecting that to %APPDATA%\Your App\yourFile.xml. When you go to reload it, once again the OS silently redirects from that location.
This is a security measure designed to better encapsulate applications and their data and to prevent unauthorised writes to locations where damage can occur. You can still force a save to %PROGRAMFILES%\Your App, but to do that you either need to relax the ACLs applied to that folder, or you need to elevate the privilege level your application runs at.
I wasn't sure whether to put this as a comment or as an answer, but I think it could be a potential answer. It sounds like the XML file is being saved because the data is being persisted across instances of the application. It may be file system virtualization like slugster mentioned, but it might be a simple as the fact that you are looking at the wrong copy of the XML file. If you are using a relative path, the file may have been copied to the new location. I would suggest you do a quick file search for that file name and see what you get back.
It turns out the file was being copied to and read from the Output Directory. I can see that it's being updated as expected from there.

Overwriting a document file

I'm developing a document based desktop app which writes a fairly large and complex file to disk when the user saves his document. What is the best practice to do here to prevent data corruption? There are a number of things that can happen:
The save process may fail half way, which is of course a serious application error, but in this case one would rather have the old file left than the corrupted half-written file. The same problem will occur if the application is terminated for some other reason half way through the file writing.
The most robust approach I can think of is using a temporary file while saving and only replace the original file once the new file has been successfully created. But I find there are several operations (creating tempfile, saving to tempfile, deleting original, moving tempfile to original) that may or may not fail, and I end up with quite a complicated mess of try/catch statements to handle them correctly.
Is there a best practice/standard for this scenario? For example is it better to copy the original to a temp file and then overwrite the original than to save to a temp file?
Also, how does one reason with the state of a file in a document based application (in windows)? Is it better to leave the file open for writing by the application until the user closes the document, or to just quickly get in an read the file on open and quickly close it again? Pros and cons?
Typically the file shuffling dance goes something like this, aiming to end up with file.txt containing the new data:
Write to file.txt.new
Move file.txt to file.txt.old
Move file.txt.new to file.txt
Delete file.txt.old
At any point you always have at least one valid file:
If only file.txt exists, you failed to start writing file.txt.new
If file.txt and file.txt.new exist, you probably failed during the write - file.txt should be the valid old copy. (If you can validate files, you could try loading the new file - it could be the move that failed)
If file.txt.old and file.txt.new exist, the second move operation failed. You can use either file, depending on whether you want new or old
If file.txt.old and file.txt exist, the delete operation failed. Again, you can use either file.
This is assuming you're on a file system with an atomic move operation. If that's not the case, I believe the procedure is the same but you'd need to be more careful about the recovery procedure.
Answering from the last question:
If we are talking here about fairly complex and big files, I would personaly choose to lock the file as during the reading I may not need to load all data on view, but only that one user needs now.
One first:
Save in temp file always.
Replace old one with new one, if this fails, considering the fact that your app is document management app, your primary objective failed, so the worst ever case, but you have new temp file. So on this error can close your app and reopen (critical error), on reopenning control if there is a temp file, if yes, run recovering of data, more or less like VS does in case of crashes.
Creating a temp file and then replacing the original file by the temp file (the latter being a cheap operation in terms of I/O) is the mechanism used by MFC's document persistence classes. I've NEVER seen it fail. Neither have users reported such problems. And yes back then the documents were large (they were complex as well but that's irrelevant as far as I/O is concerned).

File Handling Issue

I am developing a tool in c#, at one instance I start writing into a xml file continuously using my tool,when i suddenly restart my machine the particular xml file gets
corrupted, what is the reason an how to avoid it?
xmldocument x= new xmldocument();
x.open();
// change a value of the node every time
x.save();
x=null
this is my code
Use the "safe replace pattern". For example, to replace foo.txt
Write to foo.new
Move foo.txt to foo.old
Move foo.new to foo.txt
Delete foo.old
At any point, you have at least one complete, valid file.
(That helps if you want to write a new file periodically; for appending, I'd go with the answers suggesting that XML isn't the best way forward for you.)
Don't use XML.
XML has a syntax which doesn't lend itself well to writing continuously to the same file, as you always need a final end tag which you can't write unless the file is complete (which it never is with log files, for example).
That means you will always get an invalid XML file when you cancel the writing prematurely (by killing the process or restarting the computer, etc.).
We had a similar situation a while ago and settled on YAML as a nice format which allows for simply appending to the file.
Check that your file is properly closed before the application shuts down.
Also, as someone has pointed out, an XML file must be properly ended with closing tags.
Additional details would also be useful, such as the code that you use to open, write and close the file.
The reason for your file getting corrupted is that due to a crash, you never closed it.
I remember solving an issue like that once, with a file overlapping flag. But that was in C++ using method CreateFile.

Atomic modification of files across multiple networks

I have an application that is modifying 5 identical xml files, each located on a different network share. I am aware that this is needlessly redundant, but "it must be so."
Every time this application runs, exactly one element (no more, no less) will be added/removed/modified.
Initially, the application opens each xml file, adds/removes/modifies the element to the appropriate node and saves the file, or throws an error if it cannot (Unable to access the network share, timeout, etc...)
How do I make this atomic?
My initial assumption was to:
foreach (var path in NetworkPaths)
if (!File.Exists(path)
isAtomic = false;
if (isAtomic)
{
//Do things
}
But I can see that only going so far. Is there another way to do this, or a direction I can be pointed to?
Unfortunately, for it to be truly "atomic" isn't really possible. My best advice would be to wrap up your own form of transaction for this, so you can at least undo the changes.
I'd do something like check for each file - if one doesn't exist, throw.
Backup each file, save the state needed to undo, or save a copy in memory if they're not huge. If you can't, throw.
Make your edits, then save the files. If you get a failure here, try to restore from each of the backups. You'll need to do some error handling here so you don't throw until all of the backups were restored. After restoring, throw your exception.
At least this way, you'll be more likely to not make a change to just a single file. Hopefully, if you can modify one file, you'll be able to restore it from your backup/undo your modification.
I suggest the following solution.
Try opening all files with a write lock.
If one or more fail, abort.
Modify and flush all files.
If one or more fail, roll the already modified ones back and flush them again.
Close all files.
If the rollback fails ... well ... try again, and try again, and try again ... and give up in an inconsitent state.
If you have control over all processes writing this files, you could implement a simple locking mechanism using a lock file. You could even perform write ahead logging and record the planned change in the lock file. If your process crashes, the next one attempting to modify the files would detect the incomplete operation and could continue it before doing it's one modification.
I would introduce versioning of the files. You can do this easily by appending a suffix to the filename. e.g a counter variable. The process for the reader is as follows:
prepare the next version of the file
write it to a temp file with a different name.
Get the highest version number
increment this version by one
rename the temp file to the new file
delete old files (you can keep e.g. 2 of them)
as Reader you do
- find the file with the highest version
- read it

Categories

Resources