Multiple FileSystemWatchers a good idea? - c#

I'm writing a mini editor component that is much like Notepad++ or UltraEdit that needs to monitor the files the users open - its a bit slimy, but thats the way it needs to be.
Is it wise to use multiple instances of FileSystemWatcher to monitor the open files - again like Notepad++ or UltraEdit or is there a better way to manage these?
They'll be properly disposed once the document has been closed.
Sorry, one other thing, would it be wiser to create a generic FileSystemWatcher for the drive and monitor that, then only show them a message to reload the file once I know its the right file? Or is that retarted?

You're not going to run into problems with multiple FileSystemWatchers, and there really isn't any other way to pull this off.
For performance, just be sure to specify as narrow filters as you can get away with.

FileSystemWatcher have a drawback, it locks watched folder, so, for example, if you are watching file on removable storage, it prevent "safe device removal".
You can try using Shell Notifications via SHChangeNotifyRegister. In this case you will have one entry point for all changes (or several if you want to), but in this case you will need some native shell interop.

It depends on the likely use cases.
If a user is going to open several files in the same directory and likely not modify anything else a single watcher for that directory may be less onerous than one per file if the number of files is large.
The only way you will find out is by benchmarking. Certainly doing one per file makes the lifespan of the watcher much simpler so that should be your first approach. Note that the watchers fire their events on a system thread pool, so multiple watchers can fire at the same time (something that may influence you design)
I certainly wouldn't do a watcher per drive, you will cause far more effort that way even with aggressive filtering.

Using multiple watcher is fine if you have to. As the comment ShuggyCoUk says, you can optimize by combining file watchers into one if all your files are in the same folder.
It's probably unwise to create a file watcher on a much higher folder (e.g. the root of the drive), because now your code has to handle many more events firing from other changes happening in the file system, and it's fairly easy to get into buffer overflow if your code is not fast enough to handle the events.
Another argument for less file watcher, a filesystemwatcher is a native object, and it pins memory. So depending on the life span and size of your app, you might get into memory fragmentation issues here is how:
Your code runs for a long time (e.g. hours or days) whenever you open a file it create some chunk of data in memory and instantiates a file watcher. You then cleanup this temporary data but the file watcher is still there, IF you repeat that multiple times (and not close the files, or forget to dispose the watchers) you just created multiple objects in virtual memory that cannot be moved by the CLR, and can potentially get into memory congestion. Note that this is not a big deal if you have a few watchers around, but if you suspect you might get into the hundreds or more, beware that's going to become a major issue.

Related

What algorithm does FileSystemWatcher in C# use to monitor file system changes? [duplicate]

I would like to understand how does System.IO.FileSystemWatcher works under the hood? Because I have a requirement where I need to watch all the files present under 100 or more Folders where each folder will have around 1K files.
I am not sure if I used FileSystemwatcher whether it will create many threads to monitor these files which would impact the performance of my application? So could you please let me know how exactly System.IO.FileSystemWatcher works under the hood or does it uses Threads internally to monitor these directories ?
Internally FileSystemWatcher uses the Windows API ReadDirectoryChangesW function (as can be seen from the FileSystemWatcher reference source). The underlying implementation of ReadDirectoryChangesW is not documented, but in answer to your specific question as to whether FileSystemWatcher creates separate threads to monitor files, the answer is therefore "no".
It is however worth highlighting the following from the remarks on FileSystemWatcher in the documentation given that your directories contain many files:
The Windows operating system notifies your component of file changes in a buffer created by the FileSystemWatcher. If there are many changes in a short time, the buffer can overflow. This causes the component to lose track of changes in the directory, and it will only provide blanket notification. Increasing the size of the buffer with the InternalBufferSize property is expensive, as it comes from non-paged memory that cannot be swapped out to disk, so keep the buffer as small yet large enough to not miss any file change events. To avoid a buffer overflow, use the NotifyFilter and IncludeSubdirectories properties so you can filter out unwanted change notifications.
There are two takeaways from this:
With many FileSystemWatcher instances, there will be many buffers created to store the change events, and these are allocated from non-paged memory which is a limited resource.
If there are a large number of changes taking place in the directories you're monitoring, you may miss events.

Filewatcher for the whole computer (alternative?)

I want to write an application that gets events on every file change on the whole computer (to synchronize between file locations/permissions and my application's database).
I was thinking of using the .net filewatcher class but after some tests i found the following limitations:
1) The filewatcher has a buffer (http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher(v=vs.90).aspx):
If there are many changes in a short time, the buffer can overflow.
This causes the component to lose track of changes in the directory,
and it will only provide blanket notification. Increasing the size of
the buffer with the InternalBufferSize property is expensive, as it
comes from non-paged memory that cannot be swapped out to disk, so
keep the buffer as small yet large enough to not miss any file change
events. To avoid a buffer overflow, use the NotifyFilter and
IncludeSubdirectories properties so you can filter out unwanted change
notifications.
So in the whole computer, I can get some large amount of events (in peak) that i need to handle. Even if inside the event handling I'm only adding the event info to a queue I still can miss events.
2) Filewatcher has memory leaks:
http://connect.microsoft.com/VisualStudio/feedback/details/654232/filesystemwatcher-memory-leak
I checked it myself and it's true, after a few days my process memory grows from 20MB to 250MB
3) Microsoft says that we should use filewatcher for specific folders (I don't know why):
Use FileSystemWatcher to watch for changes in a specified directory.
So for these reasons i need an alternative solution to create my application. I know that I can write a driver but I prefer it to be a .net solution (based on win32 api, of course).
Thank you for your help,
Omri
Putting monitoring (especially synchronous notifications) will slow down the system. You can probably make use of our CallbackFilter product which provides a driver and a handy .NET API for tracking file changes. And CallbackFilter supports asynchronous notifications which are faster. Discounted and free licenses are possible.
Try doing this through WMI imo - the following link is relevant: http://www.codeproject.com/Articles/42212/WMI-and-File-System-Monitoring

Most efficient way to search for files

I am writing a program that searches and copies mp3-files to a specified directory.
Currently I am using a List that is filled with all the mp3s in a directory (which takes - not surprisingly - a very long time.) Then I use taglib-sharp to compare the ID3Tags with the artist and title entered. If they match I copy the file.
Since this is my first program and I am very new to programming I figure there must be a better/more efficient way to do this. Does anybody have a suggestion on what I could try?
Edit: I forgot to add an important detail: I want to be able to specify what directories should be searched every time I start a search (the directory to be searched will be specified in the program itself). So storing all the files in a database or something similar isn't really an option (unless there is a way to do this every time which is still efficient). I am basically looking for the best way to search through all the files in a directory where the files are indexed every time. (I am aware that this is probably not a good idea but I'd like to do it that way. If there is no real way to do this I'll have to reconsider but for now I'd like to do it like that.)
You are mostly saddled with the bottleneck that is IO, a consequence of the hardware with which you are working. It will be the copying of files that is the denominator here (other than finding the files, which is dwarfed compared to copying).
There are other ways to go about file management, and each exposing better interfaces for different purposes, such as NTFS Change Journals and low-level sector handling (not recommended) for example, but if this is your first program in C# then maybe you don't want to venture into p/invoking native calls.
Other than alternatives to actual processes, you might consider mechanisms to minimise disk access - i.e. not redoing anything you have already done, or don't need to do.
Use an database (simple binary serialized file or an embedded database like RavenDb) to cache all files. And query that cache instead.
Also store modified time for each folder in the database. Compare the time in the database with the time on the folder each time you start your application (and sync changed folders).
That ought to give you much better performance. Threading will not really help searching folders since it's the disk IO that takes time, not your application.

C#/WPF FileSystemWatcher on every extension on every path

I need FileSystemWatcher, that can observing same specific paths, and specific extensions.
But the paths could by dozens, hundreds or maybe thousand (hope not :P), the same with extensions. The paths and ext are added by user.
Creating hundreds of FileSystemWatcher it's not good idea, isn't it?
So - how to do it?
Is it possible to watch/observing every device (HDDs, SD flash, pendrives, etc.)?
Will it be efficient? I don't think so... . Every changing Windows log file, scanning file by antyvirus program - it could realy slow down my program with SystemWatcher :(
Well try first and then you'll see if you run into troubles.
Trying to optimize something where you don't even know if there is a problem is usually not very effective.
You're probably right that creating 10,000+ FileSystemWatchers may cause a problem. If it does (as Foxfire says - test it), start with the easy consolidations -- ignore the extensions when setting up your FileSystemWatchers, and filter the events after you get them.
If that still results in too much resource usage, try intelligently combining paths in the same manner, perhaps even going so far as to only create one FileSystemWatcher per drive letter, and perform the rest of your filtering after the event is received by your code.

Looking for solution ideas on how to update files in real time that may be locked by other software

I'm interested in getting solution ideas for a problem we have.
Background:
We have software tools that run on laptops and flash data onto hardware components. This software reads in a series of data files in order to do the programming on the hardware. It's in a manufacturing environment and is running continuously throughout the day.
Problem:
Currently, they're a central repository that the software connects to to read the data files. The software reads the files and retains a lock on them throughout the entire flashing process. This is running all throughout the day on different hardware components, so it's feasible that these files could be "locked" for most of the day.
There's new requirements that state these data files that the software is reading need to be updated in real time, will minimal impact to the end user who is doing the flashing. We will be writing the service that drops the files out there in real time.
The software is developed by a third party vendor and is not modifiable by us. However, it expects a location to look for the data files, so everything up until the point of flashing is our process that we're free to change.
Question:
What approach would you take to solve this from a solution programming standpoint? We're not sure how to drop files out there in real time given the locks that will be present on them throughout the day. We'll settle for an "as soon as possible" solution if that is significantly easier.
The only way out of this conundrum seems to be the introduction of an extra file repository, along with a service-like piece of logic in charge of keeping these repositories synchronized.
In other words, the file upload takes places in one of the repositories (call it the "input repository"), and the flashing process uses the other repository (call it the "ouput repository"). The synchronization logic permanently pools the input repository for new files (based on file time stamp or other...) and when it finds such new files, attempts to copy these to the "output directory"; such copy either takes place instantly, when the flashing logic hasn't locked the corresponding file in the output directory, or it is differed till the file gets unlocked.
Note: During the file copy, the synchronization logic can/should lock the file, hence very temporarily preventing the file to be overwritten by new uploads, but ensuring full integrity of the copied file. The difference with the existing system is that the lock is held for a much shorter amount of time.
The drawback of this system is the full duplication of the repository, and this could be a problem if the repository is very big. However there doesn't appear to be many alternatives since we do not have control over the flashing process.
"As soon as possible" is your only option. You can't update a file that's locked, that's the whole point of a lock.
Edit:
Would it be possible to put the new file in a different location and then tell the 3rd party service to look in that location the next time it needs the file?

Categories

Resources