I want to write an application that gets events on every file change on the whole computer (to synchronize between file locations/permissions and my application's database).
I was thinking of using the .net filewatcher class but after some tests i found the following limitations:
1) The filewatcher has a buffer (http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher(v=vs.90).aspx):
If there are many changes in a short time, the buffer can overflow.
This causes the component to lose track of changes in the directory,
and it will only provide blanket notification. Increasing the size of
the buffer with the InternalBufferSize property is expensive, as it
comes from non-paged memory that cannot be swapped out to disk, so
keep the buffer as small yet large enough to not miss any file change
events. To avoid a buffer overflow, use the NotifyFilter and
IncludeSubdirectories properties so you can filter out unwanted change
notifications.
So in the whole computer, I can get some large amount of events (in peak) that i need to handle. Even if inside the event handling I'm only adding the event info to a queue I still can miss events.
2) Filewatcher has memory leaks:
http://connect.microsoft.com/VisualStudio/feedback/details/654232/filesystemwatcher-memory-leak
I checked it myself and it's true, after a few days my process memory grows from 20MB to 250MB
3) Microsoft says that we should use filewatcher for specific folders (I don't know why):
Use FileSystemWatcher to watch for changes in a specified directory.
So for these reasons i need an alternative solution to create my application. I know that I can write a driver but I prefer it to be a .net solution (based on win32 api, of course).
Thank you for your help,
Omri
Putting monitoring (especially synchronous notifications) will slow down the system. You can probably make use of our CallbackFilter product which provides a driver and a handy .NET API for tracking file changes. And CallbackFilter supports asynchronous notifications which are faster. Discounted and free licenses are possible.
Try doing this through WMI imo - the following link is relevant: http://www.codeproject.com/Articles/42212/WMI-and-File-System-Monitoring
Related
I would like to understand how does System.IO.FileSystemWatcher works under the hood? Because I have a requirement where I need to watch all the files present under 100 or more Folders where each folder will have around 1K files.
I am not sure if I used FileSystemwatcher whether it will create many threads to monitor these files which would impact the performance of my application? So could you please let me know how exactly System.IO.FileSystemWatcher works under the hood or does it uses Threads internally to monitor these directories ?
Internally FileSystemWatcher uses the Windows API ReadDirectoryChangesW function (as can be seen from the FileSystemWatcher reference source). The underlying implementation of ReadDirectoryChangesW is not documented, but in answer to your specific question as to whether FileSystemWatcher creates separate threads to monitor files, the answer is therefore "no".
It is however worth highlighting the following from the remarks on FileSystemWatcher in the documentation given that your directories contain many files:
The Windows operating system notifies your component of file changes in a buffer created by the FileSystemWatcher. If there are many changes in a short time, the buffer can overflow. This causes the component to lose track of changes in the directory, and it will only provide blanket notification. Increasing the size of the buffer with the InternalBufferSize property is expensive, as it comes from non-paged memory that cannot be swapped out to disk, so keep the buffer as small yet large enough to not miss any file change events. To avoid a buffer overflow, use the NotifyFilter and IncludeSubdirectories properties so you can filter out unwanted change notifications.
There are two takeaways from this:
With many FileSystemWatcher instances, there will be many buffers created to store the change events, and these are allocated from non-paged memory which is a limited resource.
If there are a large number of changes taking place in the directories you're monitoring, you may miss events.
I'm creating a simple App in which the user can record and play Television Shows with Seasons and Episodes based on a pre-existing system of folders already present on the user computer.
I would like to monitor if there are any changes to the actual files/folders so I want to store an instance of FileSystemWatcher in every season to listen for any changes that might occur in the corresponding folder. Depending on how many Seasons on each of the Shows, this can reach up to 1000s events listening in the same time. Are there are any performance/instability issues I should be aware?
One of the answers here suggests that cranking up the number of listeners up to the 10,000s is definitely a problem, but since I'm not expecting that many records I might go with this answer here saying that the performance hit of events in general is not that huge. Is there some kind of rule of thumb as to how many listeners should be active in the same time? Any piece of advice is most appreciated.
OP
so I want to store an instance of FileSystemWatcher in every season to listen for any changes that might occur in the corresponding folder
You could do that but why have multiple when you can have just one:
You can watch for changes in files and subdirectories of the specified directory - MSDN
e.g.
myMonitor.IncludeSubdirectories = true;
OP
Are there are any performance/instability issues I should be aware?
Possibly, but if you use just one it reduces the possiblity without reducing functionality.
Anyway, the problem isn't so much the number of listeners (well it is but the other page goes into that) rather that the FSW can't guarentee it will not miss events during high-volume disk activity:
MSDN:
The Windows operating system notifies your component of file changes in a buffer created by the FileSystemWatcher. If there are many changes in a short time, the buffer can overflow. This causes the component to lose track of changes in the directory, and it will only provide blanket notification. Increasing the size of the buffer with the InternalBufferSize property is expensive, as it comes from non-paged memory that cannot be swapped out to disk, so keep the buffer as small yet large enough to not miss any file change events. To avoid a buffer overflow, use the NotifyFilter and IncludeSubdirectories properties so you can filter out unwanted change notifications. Tell me more
I really don't want to introduce any delays in my high frequency trading software and at the same time I need to store thousands of lines of logs every second. 1 ms delay would be huge, I only agree to have 0.01-0.05 ms delay.
*Now*I just allocate 500 Mb in memory at start-up, store logs there and when application finish I put this log on disk.
However now I realized that I want more logs and I want them during application execution. So I now want to store logs during application execution (probably once per minute or once per 10 minute). How slow StreamWriter.WriteLine is? Would it be slower than just "adding to preallocated collection"?
Should I use StreamWriter.WriteLine directly (is it syncrhonous or asynchronous, is AutoFlush option affects perfomance?). I also can use BlockingCollection to add items to log and then use dedicated thread to process this blocking collection and to store logs on disk in another thread.
Don't
Reinvent a wheel
Do
Use a logging framework
Properly configure loggers and levels for each logger
Use sync logging for memory (it's simple and fast, but has problems with event persistence onto drive) and async for IO (it is difficult to get right, slow, harder to test) loggers
If you hadn't done so, check out log4net and NLog, this will be a good place to start.
Probably you could store your logs in circular buffer and spawn a new thread of execution which will just send data from that buffer in shared memory to disk.
Use log4net as Andre Calil suggests. It logs to SQL, disks and whatnot and is extremely customizable. It can seem a bit complicated at first, but it is worth the effort.
What you need is probably the RollingFileAppender. log4net is in nuget, but you should read the documentation at the log4net site. Start by looking at the appender config.
I'm looking for a reliable way of looking for changes in a directory.
I have tried using the FileSystemWatcher, but it's rather inaccurate when many small files are created, changed or deleted. It misses about 1 or 2 % of the files in my tests. That is quite a lot when you are adding or changing thousands of files rapidly.
I have tried polling for changes at different intervals 500 ms, 2000 ms etc. In this case I get too many hits. That might have something to do with the resolution of timestamps on the FileInfo object.
So my question is; is it possible, using the .NET Framework, to get the the changes to a directory reliably?
--
Christian
Have you tried increasing the InternalBufferSize?
What size have you set it to?
From MSDN:
Note that a FileSystemWatcher may miss
an event when the buffer size is
exceeded. To avoid missing events,
follow these guidelines: Increase the
buffer size by setting the
InternalBufferSize property. Avoid
watching files with long file names,
because a long file name contributes
to filling up the buffer. Consider
renaming these files using shorter
names.
Keep your event handling code as short as possible.
I'm writing a mini editor component that is much like Notepad++ or UltraEdit that needs to monitor the files the users open - its a bit slimy, but thats the way it needs to be.
Is it wise to use multiple instances of FileSystemWatcher to monitor the open files - again like Notepad++ or UltraEdit or is there a better way to manage these?
They'll be properly disposed once the document has been closed.
Sorry, one other thing, would it be wiser to create a generic FileSystemWatcher for the drive and monitor that, then only show them a message to reload the file once I know its the right file? Or is that retarted?
You're not going to run into problems with multiple FileSystemWatchers, and there really isn't any other way to pull this off.
For performance, just be sure to specify as narrow filters as you can get away with.
FileSystemWatcher have a drawback, it locks watched folder, so, for example, if you are watching file on removable storage, it prevent "safe device removal".
You can try using Shell Notifications via SHChangeNotifyRegister. In this case you will have one entry point for all changes (or several if you want to), but in this case you will need some native shell interop.
It depends on the likely use cases.
If a user is going to open several files in the same directory and likely not modify anything else a single watcher for that directory may be less onerous than one per file if the number of files is large.
The only way you will find out is by benchmarking. Certainly doing one per file makes the lifespan of the watcher much simpler so that should be your first approach. Note that the watchers fire their events on a system thread pool, so multiple watchers can fire at the same time (something that may influence you design)
I certainly wouldn't do a watcher per drive, you will cause far more effort that way even with aggressive filtering.
Using multiple watcher is fine if you have to. As the comment ShuggyCoUk says, you can optimize by combining file watchers into one if all your files are in the same folder.
It's probably unwise to create a file watcher on a much higher folder (e.g. the root of the drive), because now your code has to handle many more events firing from other changes happening in the file system, and it's fairly easy to get into buffer overflow if your code is not fast enough to handle the events.
Another argument for less file watcher, a filesystemwatcher is a native object, and it pins memory. So depending on the life span and size of your app, you might get into memory fragmentation issues here is how:
Your code runs for a long time (e.g. hours or days) whenever you open a file it create some chunk of data in memory and instantiates a file watcher. You then cleanup this temporary data but the file watcher is still there, IF you repeat that multiple times (and not close the files, or forget to dispose the watchers) you just created multiple objects in virtual memory that cannot be moved by the CLR, and can potentially get into memory congestion. Note that this is not a big deal if you have a few watchers around, but if you suspect you might get into the hundreds or more, beware that's going to become a major issue.