Is there a built in method for waiting for a file to be created in c#? How about waiting for a file to be completely written?
I've baked my own by repeatedly attempting File.OpenRead() on a file until it succeeds (and failing on a timeout), but spinning on a file doesn't seem like the right thing to do. I'm guessing there's a baked-in method in .NET to do this, but I can't find it.
What about using the FileSystemWatcher component ?
This class 'watches' a given directory or file, and can raise events when something (you can define what) has happened.
When creating a file with File.Create you can just call the Close Function.
Like this:
File.Create(savePath).Close();
FileSystemWatcher can notify you when a file is created, deleted, updated, attributes changed etc. It will solve your first issue of waitign for it to be created.
As for waiting for it to be written, when a file is created, you can spin off and start tracking it's size and wait for it stop being updated, then add in a settle time period, You can also try and get an exclusive lock but be careful of locking the file if the other process is also trying to lock it...you could cause unexpected thigns to occur.
FileSysWatcher cannot monitor network paths. In such instances, you manually have to "crawl" the files in a directory -- which can result in the above users error.
Is there an alternative, so that we can be sure we don't open a file before it has been fully written to disk and closed?
Related
I am familiar with the FileSystemWatcher class, and have tested using this, alternatively I have tested using a fast loop and doing a directory listing of files of type in a directory. In this particular case they are zip compressed SDF files, I need to decompress, open, and query.
The problem is that when a large file is put in a directory, sometimes that takes time, such as it being downloaded, or copied from a network location, etc...
When the FileSystemWatcher raises an OnChange event, I have a handle to the ChangeType and on these types of operations the Create is immediate, while the file is still not completely copied to the location.
Likewise using the loop, I see a file is there, before the whole file is there.
The FileSystemWatcher raises several change events, one after create, and then one or more during the copy, nothing that says This file is now complete
So if I am expecting files of a type, to be placed in a directory ultimately to read and processed, with no knowledge of their transport mechanism, and no knowledge of their final size...
How do I know when the file is ready to actually be processed other than with using error control as a workflow control (albeit the error control is there anyway as it should be)? This just seems like a bad way to have to handle this, as sometimes the error control may actually be representing a legitimate issue, sometimes it may just be that the file is not completely written, and I do not see any real safe way to differentiate.
I despise anticipated error, but realize that is has its place like sockets, nothing guarantees a check for open does not change before an attempt to read/write. But I do avoid it at all costs.
This particular one troubles me mostly because of the ambiguity of the message that will be produced. There is a conflict queue for files that legitimately error because they did not come across entirely or are otherwise corrupt, I do not want otherwise good files going there. Getting more granular to detect this specific case will be almost impossible.
edit:
I know I can do this... And I have read the other SA articles concerning others doing the same thing. (And I know this method is both crude and blocking, it is just an example.)
private static void OnChanged(object source, FileSystemEventArgs e)
{
if (e.ChangeType == WatcherChangeTypes.Created)
{
bool ready = false;
while (!ready)
{
try
{
using (FileStream fs = new FileStream(e.FullPath, FileMode.Open))
{
Console.WriteLine(String.Format("{0} - {1}", e.FullPath, fs.Length));
}
ready = true;
}
catch (IOException)
{
ready = false;
}
}
}
}
What I am trying to find out is this definitively the only way, is there no other component, or some hook to the file system that will actually do this with a proper event?
The only way to tell is to open the file with FileShare.Read. That will always fail if the process is still writing to the file and hasn't closed it yet. There is otherwise no mechanism to know anything at all about which particular process is doing the writing, FSW operates at the file system device driver level and doesn't know anything about what process is performing the operation. Could be more than one.
That will very often fail the first time you try, FSW is very efficient. In general you have no idea how much time the process will take, it of course depends on how it is written and might leave the file opened for a while. Could be hours or days, a log file would be an example.
So you need a re-try mechanism, it should have an exponential back-off algorithm to increase the re-try delays between attempts. Start it off at, say, a half second delay and keep increasing that delay when it fails. This needs to be done in a worker thread, not the FSW callback. Use a thread-safe queue to pass the path of the file from the FSW callback to the worker thread. Also in general a good strategy to deal with the multiple FSW notifications you get.
Watch out for startup effects, you of course missed any notification before you started running so there might be a load of files that are waiting for work. And watch out for Heisenbugs, whatever you do with the file might cause another process to fall over. Much like this process did to yours :)
Consider that a batch-style program that you periodically run with the task scheduler could be an easier alternative.
For the one extreme, you could use a file system mini filter driver which analyzes all activities for a file at the lowest level (and communicates with a user mode application).
I wrote a proof-of-concept mini filter some time ago to detect MS Office file conversions. See below. This way, you can reliably check for every open handle to the file.
But: even this would be no universal solution for you problem:
Consider:
A tool (e.g. FTP file transfer) could in theory write part of the file, close it, and re-open it again for appending new data. This seems very curious, but you cannot reliably just check for “no more open file handles” ==> “file is ready now”
Alex K. provided a good link in his comment, and I myself would use a solution similar to the answer from Jon (https://stackoverflow.com/a/4278034/4547223)
If time is not critical (you can waste a few seconds for the decision):
Periodic timer (1 second seems reasonable)
Check file size in every timer tick
If file size did not increment for e.g. 10 seconds and there are no more FSWatcher change events too, try to open it. If you realize that the size increments take place uneven or very slowly, you could adjust the “wait time” on the fly.
Your big advantage is that you are processing ZIP files only, where you have a chance of detecting invalid (incomplete) files due to “checksum not valid”
I do not expect official ways to detect this, since there is no universal notion of “file written completely”.
File System mini filter
This may be like a sledgehammer solution for the problem.
Some time ago, I had the requirement of working around a weird bug in Office 2010, where it does not copy ADS meta data during office file conversion (ADS needed for File Classification). We discussed this with Microsoft engineers (MS was not willing to fix the bug), they complied with our filter driver solution (in the end, this was stopped since business preferred a manual workaround.)
Nevertheless, if someony really want to check if this could be a possible solution:
I have written an explanation of the steps:
https://stackoverflow.com/a/29252665/4547223
I'm using a FileSystemWatcher to watch a directory. I created a _Created() event handler to fire when a file is moved to this folder. My problem is the following:
The files in this directory get created when the user hits a "real life button" (a button in our stock, not in the application). The FileSystemWatcher take this file, do some stuff in the system and then delete it. That wouldn't be a problem when the application runs only once. But it is used by 6 clients. So every application on every client is trying to delete it. If one client is too slow, it will throw an exception because the file is already deleted.
What I'm asking for is: Is there a way to avoid this?
I tried using loops and check if the file still exists, but without any success.
while (File.Exists(file))
{
File.Delete(file);
Thread.Sleep(100);
}
Can someone give me a hint how it could probably work?
Design
If you want a file to be processed by a single instance only (for example, the first instance that reacts gets the job), then you should implement a locking mechanism. Only the instance that is able to obtain a lock on the file is allowed to process and remove it, all other instances should skip the file.
If you're fine with all instances processing the file, and only care that at least one of them succeeds, then you need to figure out which exceptions indicate a genuine failure and which ones indicate a failure caused by the actions of another instance.
Locking
To 'lock' a file, you can open it with share-mode FileShare.None. This prevents other processes from opening it until you close the file. However, you'll then need to close the file before you can delete it, which leaves a small gap during which another instance could open the file.
A better solution is to create a separate lock file for that purpose. Create it with file-mode FileMode.Create and share-mode FileShare.None and keep it open until the whole process is finished, including the removal of the processed file. Then the lock file can be closed and optionally removed.
Exception
As for the UnauthorizedAccessException you got, according to the documentation, that means one of 4 things:
You don't have the required permission
The file is an executable file that is in use
The path is a directory
The file is read-only
1 and 4 seem most likely in this case (if the file was open in another process you'd get an IOException).
If you want to synchronize access between multiple clients on the same computer you should use a Named Mutex.
I am implementing an event handler that must open and process the content of a file created by a third part application over which I have no control. I am warned by a note in "C# 4.0 in a nutshell" (page 495) about the risk to open a file before it is fully populated; so I am wondering how to manage this occurrence. To keep at minimum the load on the event handler, I am considering to have the handler simply insert in a queue the file names and then to have a different thread to manage the processing, but, anyways, how may I make sure that the write is completed and the file read is safe? The file size could be arbitrary.
Some idea? Thanks
A reliable way to achieve what you want might be to use FileSystemWatcher + NTFS USN journal.
Maybe more complicated than you expected, but FileSystemWatcher alone won't tell you for sure that the newly created file has been closed
-first, the FileSystemWatcher, to know when a file is created. From there you have the complete file path, and are 1 or 2 pinvokes away from getting the file unique ID (which can help you to track it during its whole lifetime).
-then, read the USN journal, which tracks everything that occurs on your drive. Filter on entries corresponding to your new file's ID, and read the journal until reaching the entry with the 'Close' event.
From there, unless your file is manipulated in special ways (opened and closed multiple times by the application that generates it), you can assume it is safe to read it and do whatever you wanted to do with it.
A really great C# implementation of an USN journal parser is StCroixSkipper's work, available here:
http://mftscanner.codeplex.com/
If you are interested I can give you more help about USN journal, as I use it in my project.
Our workaround is to watch for a specific extension. When a file is uploaded, the extension is ".tmp". When its done uploading, it's renamed to have the proper extension.
Another alternative is to have the server try to move the file in a try/catch block. If the fie isn't done being uploaded, the attempt to move the file will throw an exception, so we wait and try again.
Realistically, you can't know. If the other applications "write" operation is to open the file denying write access to everyone else then when it's done, close the file. When you get a notification then you could simply open the file requesting write access and if that fails, you know the operation isn't complete. But, if the "write" operation is to open the file, write, close the file, open the file again, and write again, etc., then you're pretty much out of luck.
The best solution I've seen is to set a timer after the last notification. When the timer elapses, try to open the file for write--if you can, assume the "operation" is done and do what you need to do. If the open fails, assume the operation is still in progress and wait some more.
Of course, nothing is foolproof. Despite the above, another operation could start while you're doing what you want with the file and cause interaction problems.
I have a thread which polls a folder for new files. The problem is that it sees a new file and starts working on it even before the file has been completely copied by another process. Because of this the poller gets file used by another process error.
Is there a way to check the file is free to use or get notified? We can certainly use exception handling code, but is there a better way?
Tech: .NET 2.0/C#
Update:
Found out from other answers that if we have access to the app writing the file then better design is to start with some other extension .tmp and then rename it after copying.
The FileStream.Lock could be used if we don't control the source application
We attempt to get a lock on the file before processing it and handle the IOException rather than a generic exception during the attempt to read the file.
See FileStream.Lock on MSDN.
I have a requirement to move certain files after they has been processed. Another process access the file and I am not sure when it releases them. Is there any way I can find out when the handle to the file has been released so I can move them at that time.
I am using Microsoft C# and .Net framework 3.5.
Cheers,
Hamid
If you have control of both the producer of the file and the consumer, the old trick to use is create the file under a different name, and rename it once complete.
For example, say the producer is creating files always called file_.txt, and your consumer is scanning for all files beginning file_, then the producer can do this:
1. Create the file called tmpfile_.txt
2. When the file is written, the producer simply renames the file to file_.txt
The rename operation is atomic, so once your consumer sees its available, it is safe to open it.
Of course, this answer depends on if you are writing both programs.
HTH
Dermot.
Just contniually try to open the file for exclusive writing? (e.g. pass FileShare.None to the FileStream constructor). Once you have opened it, you know no one else is using it. However, this might not be the best way to do what you're doing.
If you're after two way communication, see if the other program can be talked to via a pipe.
If you have control of both of the sources, use a named mutex (which works across processes) to control access to the files rather than locking the file at the filesystem level. This way, you don't have to catch the exception raised by attempting to lock a locked file and loop on that, which is rather inelegant.