Atomic modification of files across multiple networks

Atomic modification of files across multiple networks - c#

I have an application that is modifying 5 identical xml files, each located on a different network share. I am aware that this is needlessly redundant, but "it must be so."
Every time this application runs, exactly one element (no more, no less) will be added/removed/modified.
Initially, the application opens each xml file, adds/removes/modifies the element to the appropriate node and saves the file, or throws an error if it cannot (Unable to access the network share, timeout, etc...)
How do I make this atomic?
My initial assumption was to:
foreach (var path in NetworkPaths)
if (!File.Exists(path)
isAtomic = false;
if (isAtomic)
{
//Do things
}
But I can see that only going so far. Is there another way to do this, or a direction I can be pointed to?

Unfortunately, for it to be truly "atomic" isn't really possible. My best advice would be to wrap up your own form of transaction for this, so you can at least undo the changes.
I'd do something like check for each file - if one doesn't exist, throw.
Backup each file, save the state needed to undo, or save a copy in memory if they're not huge. If you can't, throw.
Make your edits, then save the files. If you get a failure here, try to restore from each of the backups. You'll need to do some error handling here so you don't throw until all of the backups were restored. After restoring, throw your exception.
At least this way, you'll be more likely to not make a change to just a single file. Hopefully, if you can modify one file, you'll be able to restore it from your backup/undo your modification.

I suggest the following solution.
Try opening all files with a write lock.
If one or more fail, abort.
Modify and flush all files.
If one or more fail, roll the already modified ones back and flush them again.
Close all files.
If the rollback fails ... well ... try again, and try again, and try again ... and give up in an inconsitent state.
If you have control over all processes writing this files, you could implement a simple locking mechanism using a lock file. You could even perform write ahead logging and record the planned change in the lock file. If your process crashes, the next one attempting to modify the files would detect the incomplete operation and could continue it before doing it's one modification.

I would introduce versioning of the files. You can do this easily by appending a suffix to the filename. e.g a counter variable. The process for the reader is as follows:
prepare the next version of the file
write it to a temp file with a different name.
Get the highest version number
increment this version by one
rename the temp file to the new file
delete old files (you can keep e.g. 2 of them)
as Reader you do
- find the file with the highest version
- read it

Related

c# FileSystemWatcher fires on multiple clients

I'm using a FileSystemWatcher to watch a directory. I created a _Created() event handler to fire when a file is moved to this folder. My problem is the following:
The files in this directory get created when the user hits a "real life button" (a button in our stock, not in the application). The FileSystemWatcher take this file, do some stuff in the system and then delete it. That wouldn't be a problem when the application runs only once. But it is used by 6 clients. So every application on every client is trying to delete it. If one client is too slow, it will throw an exception because the file is already deleted.
What I'm asking for is: Is there a way to avoid this?
I tried using loops and check if the file still exists, but without any success.
while (File.Exists(file))
{
File.Delete(file);
Thread.Sleep(100);
}
Can someone give me a hint how it could probably work?

Design
If you want a file to be processed by a single instance only (for example, the first instance that reacts gets the job), then you should implement a locking mechanism. Only the instance that is able to obtain a lock on the file is allowed to process and remove it, all other instances should skip the file.
If you're fine with all instances processing the file, and only care that at least one of them succeeds, then you need to figure out which exceptions indicate a genuine failure and which ones indicate a failure caused by the actions of another instance.
Locking
To 'lock' a file, you can open it with share-mode FileShare.None. This prevents other processes from opening it until you close the file. However, you'll then need to close the file before you can delete it, which leaves a small gap during which another instance could open the file.
A better solution is to create a separate lock file for that purpose. Create it with file-mode FileMode.Create and share-mode FileShare.None and keep it open until the whole process is finished, including the removal of the processed file. Then the lock file can be closed and optionally removed.
Exception
As for the UnauthorizedAccessException you got, according to the documentation, that means one of 4 things:
You don't have the required permission
The file is an executable file that is in use
The path is a directory
The file is read-only
1 and 4 seem most likely in this case (if the file was open in another process you'd get an IOException).

If you want to synchronize access between multiple clients on the same computer you should use a Named Mutex.

FileSystemWatcher and write completion

I am implementing an event handler that must open and process the content of a file created by a third part application over which I have no control. I am warned by a note in "C# 4.0 in a nutshell" (page 495) about the risk to open a file before it is fully populated; so I am wondering how to manage this occurrence. To keep at minimum the load on the event handler, I am considering to have the handler simply insert in a queue the file names and then to have a different thread to manage the processing, but, anyways, how may I make sure that the write is completed and the file read is safe? The file size could be arbitrary.
Some idea? Thanks

A reliable way to achieve what you want might be to use FileSystemWatcher + NTFS USN journal.
Maybe more complicated than you expected, but FileSystemWatcher alone won't tell you for sure that the newly created file has been closed
-first, the FileSystemWatcher, to know when a file is created. From there you have the complete file path, and are 1 or 2 pinvokes away from getting the file unique ID (which can help you to track it during its whole lifetime).
-then, read the USN journal, which tracks everything that occurs on your drive. Filter on entries corresponding to your new file's ID, and read the journal until reaching the entry with the 'Close' event.
From there, unless your file is manipulated in special ways (opened and closed multiple times by the application that generates it), you can assume it is safe to read it and do whatever you wanted to do with it.
A really great C# implementation of an USN journal parser is StCroixSkipper's work, available here:
http://mftscanner.codeplex.com/
If you are interested I can give you more help about USN journal, as I use it in my project.

Our workaround is to watch for a specific extension. When a file is uploaded, the extension is ".tmp". When its done uploading, it's renamed to have the proper extension.
Another alternative is to have the server try to move the file in a try/catch block. If the fie isn't done being uploaded, the attempt to move the file will throw an exception, so we wait and try again.

Realistically, you can't know. If the other applications "write" operation is to open the file denying write access to everyone else then when it's done, close the file. When you get a notification then you could simply open the file requesting write access and if that fails, you know the operation isn't complete. But, if the "write" operation is to open the file, write, close the file, open the file again, and write again, etc., then you're pretty much out of luck.
The best solution I've seen is to set a timer after the last notification. When the timer elapses, try to open the file for write--if you can, assume the "operation" is done and do what you need to do. If the open fails, assume the operation is still in progress and wait some more.
Of course, nothing is foolproof. Despite the above, another operation could start while you're doing what you want with the file and cause interaction problems.

Read/writing to a file which is definitely going to be in use at random times

I need to read a text based log file to check for certain contents (the completion of a backup job). Obviously, the file is written to when the job completes.
My question is, how can I (or how SHOULD I write the code to) read the file, taking into account the file may be locked, or locked by my process when it needs to be read, without causing any reliability concerns.

Assuming the writing process has at least specified System.IO.FileShare.Read when opening the file, you should be able to read the text file while it is still being written to.

In addition to the answer by #BrokenGlass:
Only open the file for reading. If you try to open it for Read/Write access, it's more likely (almost certain) to fail - you may not be able to open it, and/or you may stop the other process being able to write to it.
Close the file when you aren't reading it to minimise the chance that you might cause problems for any other processes.
If the writing process denies read access while it is writing to the file, you may have to write some form of "retry loop", which allows your application to wait (keep retrying) until the file becomes readable. Just try to open the file (and catch errors) - if it fails, Sleep() for a bit and then try again. (However, if you're monitoring a log file, you will probbably want to keep checking it for more data anyway)

When a file is being written to, it is locked for all other processes that try to open the file in Write-mode. Read-mode will always be available.
However, if your writing process saves changes while you have already opened the file in your reading process, the changes will not be reflected there until you refresh (Close-Open) the file again.

Overwriting a document file

I'm developing a document based desktop app which writes a fairly large and complex file to disk when the user saves his document. What is the best practice to do here to prevent data corruption? There are a number of things that can happen:
The save process may fail half way, which is of course a serious application error, but in this case one would rather have the old file left than the corrupted half-written file. The same problem will occur if the application is terminated for some other reason half way through the file writing.
The most robust approach I can think of is using a temporary file while saving and only replace the original file once the new file has been successfully created. But I find there are several operations (creating tempfile, saving to tempfile, deleting original, moving tempfile to original) that may or may not fail, and I end up with quite a complicated mess of try/catch statements to handle them correctly.
Is there a best practice/standard for this scenario? For example is it better to copy the original to a temp file and then overwrite the original than to save to a temp file?
Also, how does one reason with the state of a file in a document based application (in windows)? Is it better to leave the file open for writing by the application until the user closes the document, or to just quickly get in an read the file on open and quickly close it again? Pros and cons?

Typically the file shuffling dance goes something like this, aiming to end up with file.txt containing the new data:
Write to file.txt.new
Move file.txt to file.txt.old
Move file.txt.new to file.txt
Delete file.txt.old
At any point you always have at least one valid file:
If only file.txt exists, you failed to start writing file.txt.new
If file.txt and file.txt.new exist, you probably failed during the write - file.txt should be the valid old copy. (If you can validate files, you could try loading the new file - it could be the move that failed)
If file.txt.old and file.txt.new exist, the second move operation failed. You can use either file, depending on whether you want new or old
If file.txt.old and file.txt exist, the delete operation failed. Again, you can use either file.
This is assuming you're on a file system with an atomic move operation. If that's not the case, I believe the procedure is the same but you'd need to be more careful about the recovery procedure.

Answering from the last question:
If we are talking here about fairly complex and big files, I would personaly choose to lock the file as during the reading I may not need to load all data on view, but only that one user needs now.
One first:
Save in temp file always.
Replace old one with new one, if this fails, considering the fact that your app is document management app, your primary objective failed, so the worst ever case, but you have new temp file. So on this error can close your app and reopen (critical error), on reopenning control if there is a temp file, if yes, run recovering of data, more or less like VS does in case of crashes.

Creating a temp file and then replacing the original file by the temp file (the latter being a cheap operation in terms of I/O) is the mechanism used by MFC's document persistence classes. I've NEVER seen it fail. Neither have users reported such problems. And yes back then the documents were large (they were complex as well but that's irrelevant as far as I/O is concerned).

rewriting the same file in rapid succession?

I am working on an app that will keep a running index of work in accomplished.
I could write once at the end of a work session, but I don't want to risk losing data if something blows up. Therefore, I rewrite to disk (XML) every time a new entry or a correction is made by the user.
private void WriteIndexFile()
{
XmlDocument IndexDoc
// Build document here
XmlTextWriter tw = new XmlTextWriter(_filePath, Encoding.UTF8);
tw.Formatting = Formatting.Indented;
IndexDoc.Save(tw);
}
It is possible for the writes to be triggered in rapid succession. If this happens, it tries to open the file for writing before the prior write is complete. (While it would not be normal, I suppose it is possible that the file gets opened for use by another program.)
How can I check if the file can be re-written?
Edit for clarification: This is part of an automated lab data collection system. The users will click a button to capture data (saved in separate files), and identify the sub-task the the data package is for. Typically, it will be 3-10 minutes between clicks.
If they make an error, they need to be able to go back and correct it, so it's not an append-only usage.
Finally, the files will be read by other automated tools and manually by humans. (XML/XSLT)
The size will be limited as each work session (worker shift or less) will have a new index file generated.
Further question: As the overwhelming consensus is to not use XML and write in an append-only mode, how would I solve the requirement of going back and correcting earlier entries?
I am considering having a "dirty" flag, and save a few minutes after the flag is set and upon closing the work session. If multiple edits happen in that time, only one write will occur - no more rapid user - also have a retry/cancel dialog if the save fails. Thoughts?

XML is a poor choice in your case because new content has to be inserted before the closing tag. Use Text istead and simply open the file for append and write the new content at the end of the file, see How to: Open and Append to a Log File.
You can also look into a simple logging framework like log4net and use that instead of handling the low level file stuff urself.

If all you want is a simple log of all operations, XML may be the wrong choice here as it is difficult to append to an XML document without rewriting the whole file, which will become slower and slower as the file grows.
I'd suggest instead File.AppendText or even better: keeping the file open for the duration of the aplication's life time and using WriteLine.
(Oh, and as others have pointed out, you need to lock to ensure that only one thread writes to the file at a time. This is still true even with this solution.)
There are also logging frameworks that already solve this problem, such as log4net. Have you considered using an existing logging framework instead of rolling your own?

I have a logger that uses System.Collections.Queue. Basically it waits until something is queued then trys to write it. While writing items, which could be slow, more items could be added to the queue.
This will also help in just grouping messages rather than trying to keep up. It is running on a separate thread.
private AutoResetEvent ResetEvent { get; set; }
LogMessage(string fullMessage)
{
this.logQueue.Enqueue(fullMessage);
// Trigger the Reset Event to send the
this.ResetEvent.Set();
}
private void ProcessQueueMessages()
{
while (this.Running)
{
// This will process all the items in the queue.
while (this.logQueue.Count > 0)
{
// This method will just log the top item on the queue
this.LogQueueItem();
}
// Once the queue is empty will wait for a
// another message to queueed before running again.
// Rather than sleeping and checking if the queue is full,
// saves from doing a System.Threading.Thread.Sleep(1000); stuff
this.ResetEvent.WaitOne();
}
}
I handle write failures but not dequeueing until it wrote to the file with no errors. Then I just keep attempting until it finally can write. This has saved me because somebody removed permissions from one of our apps during it process. Permission was given back with out shutting down our app, and we didn't lose a single log statement.

Consider using a flat text file. I have a process that I wrote that uses an XML log... it was a poor choice. You can't just write out the state as you run without having to constantly rewrite the file to make sure the tags are correct. If it was flat entries written to a file you could have an automatic timeline that could give you details of what happened without trying to figure out if it was the XML writer/tag set that blew up and you don't have to worry about your logs bloating out as much.

I agree with others suggesting you avoid XML. Also, I would suggest you have one component (a "monitor") that is responsible for all access to the file. That component will have the job of handling multiple simultaneous requests and making the disk writes happen one after another.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.