I'm creating a simple App in which the user can record and play Television Shows with Seasons and Episodes based on a pre-existing system of folders already present on the user computer.
I would like to monitor if there are any changes to the actual files/folders so I want to store an instance of FileSystemWatcher in every season to listen for any changes that might occur in the corresponding folder. Depending on how many Seasons on each of the Shows, this can reach up to 1000s events listening in the same time. Are there are any performance/instability issues I should be aware?
One of the answers here suggests that cranking up the number of listeners up to the 10,000s is definitely a problem, but since I'm not expecting that many records I might go with this answer here saying that the performance hit of events in general is not that huge. Is there some kind of rule of thumb as to how many listeners should be active in the same time? Any piece of advice is most appreciated.
OP
so I want to store an instance of FileSystemWatcher in every season to listen for any changes that might occur in the corresponding folder
You could do that but why have multiple when you can have just one:
You can watch for changes in files and subdirectories of the specified directory - MSDN
e.g.
myMonitor.IncludeSubdirectories = true;
OP
Are there are any performance/instability issues I should be aware?
Possibly, but if you use just one it reduces the possiblity without reducing functionality.
Anyway, the problem isn't so much the number of listeners (well it is but the other page goes into that) rather that the FSW can't guarentee it will not miss events during high-volume disk activity:
MSDN:
The Windows operating system notifies your component of file changes in a buffer created by the FileSystemWatcher. If there are many changes in a short time, the buffer can overflow. This causes the component to lose track of changes in the directory, and it will only provide blanket notification. Increasing the size of the buffer with the InternalBufferSize property is expensive, as it comes from non-paged memory that cannot be swapped out to disk, so keep the buffer as small yet large enough to not miss any file change events. To avoid a buffer overflow, use the NotifyFilter and IncludeSubdirectories properties so you can filter out unwanted change notifications. Tell me more
Related
I would like to understand how does System.IO.FileSystemWatcher works under the hood? Because I have a requirement where I need to watch all the files present under 100 or more Folders where each folder will have around 1K files.
I am not sure if I used FileSystemwatcher whether it will create many threads to monitor these files which would impact the performance of my application? So could you please let me know how exactly System.IO.FileSystemWatcher works under the hood or does it uses Threads internally to monitor these directories ?
Internally FileSystemWatcher uses the Windows API ReadDirectoryChangesW function (as can be seen from the FileSystemWatcher reference source). The underlying implementation of ReadDirectoryChangesW is not documented, but in answer to your specific question as to whether FileSystemWatcher creates separate threads to monitor files, the answer is therefore "no".
It is however worth highlighting the following from the remarks on FileSystemWatcher in the documentation given that your directories contain many files:
The Windows operating system notifies your component of file changes in a buffer created by the FileSystemWatcher. If there are many changes in a short time, the buffer can overflow. This causes the component to lose track of changes in the directory, and it will only provide blanket notification. Increasing the size of the buffer with the InternalBufferSize property is expensive, as it comes from non-paged memory that cannot be swapped out to disk, so keep the buffer as small yet large enough to not miss any file change events. To avoid a buffer overflow, use the NotifyFilter and IncludeSubdirectories properties so you can filter out unwanted change notifications.
There are two takeaways from this:
With many FileSystemWatcher instances, there will be many buffers created to store the change events, and these are allocated from non-paged memory which is a limited resource.
If there are a large number of changes taking place in the directories you're monitoring, you may miss events.
I want to write an application that gets events on every file change on the whole computer (to synchronize between file locations/permissions and my application's database).
I was thinking of using the .net filewatcher class but after some tests i found the following limitations:
1) The filewatcher has a buffer (http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher(v=vs.90).aspx):
If there are many changes in a short time, the buffer can overflow.
This causes the component to lose track of changes in the directory,
and it will only provide blanket notification. Increasing the size of
the buffer with the InternalBufferSize property is expensive, as it
comes from non-paged memory that cannot be swapped out to disk, so
keep the buffer as small yet large enough to not miss any file change
events. To avoid a buffer overflow, use the NotifyFilter and
IncludeSubdirectories properties so you can filter out unwanted change
notifications.
So in the whole computer, I can get some large amount of events (in peak) that i need to handle. Even if inside the event handling I'm only adding the event info to a queue I still can miss events.
2) Filewatcher has memory leaks:
http://connect.microsoft.com/VisualStudio/feedback/details/654232/filesystemwatcher-memory-leak
I checked it myself and it's true, after a few days my process memory grows from 20MB to 250MB
3) Microsoft says that we should use filewatcher for specific folders (I don't know why):
Use FileSystemWatcher to watch for changes in a specified directory.
So for these reasons i need an alternative solution to create my application. I know that I can write a driver but I prefer it to be a .net solution (based on win32 api, of course).
Thank you for your help,
Omri
Putting monitoring (especially synchronous notifications) will slow down the system. You can probably make use of our CallbackFilter product which provides a driver and a handy .NET API for tracking file changes. And CallbackFilter supports asynchronous notifications which are faster. Discounted and free licenses are possible.
Try doing this through WMI imo - the following link is relevant: http://www.codeproject.com/Articles/42212/WMI-and-File-System-Monitoring
I'm looking for a reliable way of looking for changes in a directory.
I have tried using the FileSystemWatcher, but it's rather inaccurate when many small files are created, changed or deleted. It misses about 1 or 2 % of the files in my tests. That is quite a lot when you are adding or changing thousands of files rapidly.
I have tried polling for changes at different intervals 500 ms, 2000 ms etc. In this case I get too many hits. That might have something to do with the resolution of timestamps on the FileInfo object.
So my question is; is it possible, using the .NET Framework, to get the the changes to a directory reliably?
--
Christian
Have you tried increasing the InternalBufferSize?
What size have you set it to?
From MSDN:
Note that a FileSystemWatcher may miss
an event when the buffer size is
exceeded. To avoid missing events,
follow these guidelines: Increase the
buffer size by setting the
InternalBufferSize property. Avoid
watching files with long file names,
because a long file name contributes
to filling up the buffer. Consider
renaming these files using shorter
names.
Keep your event handling code as short as possible.
I'm interested in getting solution ideas for a problem we have.
Background:
We have software tools that run on laptops and flash data onto hardware components. This software reads in a series of data files in order to do the programming on the hardware. It's in a manufacturing environment and is running continuously throughout the day.
Problem:
Currently, they're a central repository that the software connects to to read the data files. The software reads the files and retains a lock on them throughout the entire flashing process. This is running all throughout the day on different hardware components, so it's feasible that these files could be "locked" for most of the day.
There's new requirements that state these data files that the software is reading need to be updated in real time, will minimal impact to the end user who is doing the flashing. We will be writing the service that drops the files out there in real time.
The software is developed by a third party vendor and is not modifiable by us. However, it expects a location to look for the data files, so everything up until the point of flashing is our process that we're free to change.
Question:
What approach would you take to solve this from a solution programming standpoint? We're not sure how to drop files out there in real time given the locks that will be present on them throughout the day. We'll settle for an "as soon as possible" solution if that is significantly easier.
The only way out of this conundrum seems to be the introduction of an extra file repository, along with a service-like piece of logic in charge of keeping these repositories synchronized.
In other words, the file upload takes places in one of the repositories (call it the "input repository"), and the flashing process uses the other repository (call it the "ouput repository"). The synchronization logic permanently pools the input repository for new files (based on file time stamp or other...) and when it finds such new files, attempts to copy these to the "output directory"; such copy either takes place instantly, when the flashing logic hasn't locked the corresponding file in the output directory, or it is differed till the file gets unlocked.
Note: During the file copy, the synchronization logic can/should lock the file, hence very temporarily preventing the file to be overwritten by new uploads, but ensuring full integrity of the copied file. The difference with the existing system is that the lock is held for a much shorter amount of time.
The drawback of this system is the full duplication of the repository, and this could be a problem if the repository is very big. However there doesn't appear to be many alternatives since we do not have control over the flashing process.
"As soon as possible" is your only option. You can't update a file that's locked, that's the whole point of a lock.
Edit:
Would it be possible to put the new file in a different location and then tell the 3rd party service to look in that location the next time it needs the file?
I'm writing a mini editor component that is much like Notepad++ or UltraEdit that needs to monitor the files the users open - its a bit slimy, but thats the way it needs to be.
Is it wise to use multiple instances of FileSystemWatcher to monitor the open files - again like Notepad++ or UltraEdit or is there a better way to manage these?
They'll be properly disposed once the document has been closed.
Sorry, one other thing, would it be wiser to create a generic FileSystemWatcher for the drive and monitor that, then only show them a message to reload the file once I know its the right file? Or is that retarted?
You're not going to run into problems with multiple FileSystemWatchers, and there really isn't any other way to pull this off.
For performance, just be sure to specify as narrow filters as you can get away with.
FileSystemWatcher have a drawback, it locks watched folder, so, for example, if you are watching file on removable storage, it prevent "safe device removal".
You can try using Shell Notifications via SHChangeNotifyRegister. In this case you will have one entry point for all changes (or several if you want to), but in this case you will need some native shell interop.
It depends on the likely use cases.
If a user is going to open several files in the same directory and likely not modify anything else a single watcher for that directory may be less onerous than one per file if the number of files is large.
The only way you will find out is by benchmarking. Certainly doing one per file makes the lifespan of the watcher much simpler so that should be your first approach. Note that the watchers fire their events on a system thread pool, so multiple watchers can fire at the same time (something that may influence you design)
I certainly wouldn't do a watcher per drive, you will cause far more effort that way even with aggressive filtering.
Using multiple watcher is fine if you have to. As the comment ShuggyCoUk says, you can optimize by combining file watchers into one if all your files are in the same folder.
It's probably unwise to create a file watcher on a much higher folder (e.g. the root of the drive), because now your code has to handle many more events firing from other changes happening in the file system, and it's fairly easy to get into buffer overflow if your code is not fast enough to handle the events.
Another argument for less file watcher, a filesystemwatcher is a native object, and it pins memory. So depending on the life span and size of your app, you might get into memory fragmentation issues here is how:
Your code runs for a long time (e.g. hours or days) whenever you open a file it create some chunk of data in memory and instantiates a file watcher. You then cleanup this temporary data but the file watcher is still there, IF you repeat that multiple times (and not close the files, or forget to dispose the watchers) you just created multiple objects in virtual memory that cannot be moved by the CLR, and can potentially get into memory congestion. Note that this is not a big deal if you have a few watchers around, but if you suspect you might get into the hundreds or more, beware that's going to become a major issue.