My case is the following:
Files are creating inside a directory by some process (about 10 files per minute with max file size 5MB each). Lets call this folder MYROOT.
That files need to be moved and categorized into sub directories according some specific logic based on the filename and external settings.
There will be 5 sub directories under the MYROOT with some sub directories inside them also, lets name them C1, C2, C3, C4, C5. And some sub directories inside them as A1, A2, A3, An...
So, we will have the following categorization: C:\MYROOT\C1\A1, C:\MYROOT\C1\A2, C:\MYROOT\C2\B1 and so on...
All files are written to C:\MYROOT and have the same file type and same naming convention. No renames, deletes, changes are made into this folder.
The FileSystemWatcher is set as follows:
this._watcher = new FileSystemWatcher();
this._watcher.NotifyFilter = NotifyFilters.CreationTime;
this._watcher.Filter = string.Empty;
this._watcher.Path = #"C:\MYROOT";
this._watcher.IncludeSubdirectories = false;
this._watcher.Created += this._watcher_Created;
The event handler:
private void _watcher_Created(object sender, FileSystemEventArgs e)
{
string filePath = string.Empty;
if (e.ChangeType != WatcherChangeTypes.Created)
return;
lock (FileOrganizer.Manager.s_LockObject)
{
filePath = e.FullPath;
//Further processing here like creating a custom object that
//contains the file name and some other info, add that object inside a queue
//that is processed async with logic applied to wait the file to be written,
//validate file contents, etc.. and finally move it where it should be moved.
//The code here is very fast, exception free and in a fire-and-forget manner.
}
}
When the MYROOT directory is empty and files are created by another process, they are moved inside the folders as I described. The folders are only created once if they do not exist.
The amount of files that exist inside the sub folders are increasing and we are talking about ~200GB and counting.
Now hear this:
When no files are created (the "creator" process is not running) nothing should trigger the watcher and I can see that from logs that I enable for debugging before posting this question. But, my process containing the watcher hits a constant 13 to 14% cpu on an octa core processor and increases as the size of the sub directories increase. During processing even if I create (copy-paste) 2000 files at once goes 1% more than that.
The funny part is that when I change the monitoring path to an empty one or the same but with less volume inside it, the cpu utilization of my process is 0% and only when files are created goes to a max of 2% - 5%.
Again as said, the observation is clear, when the monitoring path sub directories contain data, that data affect the watcher internals even if you set not to monitor the sub directories. And if that data volume is big, then high cpu resources are needed by the file system watcher. That is the same even if no changes are taking place to trigger any watcher events.
Is this the normal or by design behavior of the FileSystemWatcher?
PS.
When I monitor MYROOT folder and move the files outside that folder everything seems OK, but that is not an acceptable solution.
Thanx.
Marios
Related
I want to track file changes of particular path and I am pretty much done with the code which is now working fine.it is tracking file creation , renamed and changed .
My problem is when I am launching Filesystemwatcher it's working fine but after some time its stop working i.e it stops firing creation ,deleted and changed event.
Can anybody help me out?
Thank you in advance.
Here is my code
lstFolder is my multiple path list
this.listFileSystemWatcher = new List();
// Loop the list to process each of the folder specifications found
if (lstFolder.Count > 0)// check if path is available to watch else exit file watcher
{
foreach (CustomFolderSettings customFolder in lstFolder)
{
DirectoryInfo dir = new DirectoryInfo(customFolder.FWPath);
// Checks whether the folder is enabled and
// also the directory is a valid location
if (dir.Exists)//customFolder.FolderEnabled &&
{
customFolder.AllowedFiles = customFolder.FWExtension;// setting extension to allowed filw extension to log .
foreach (var strExt in customFolder.FWExtension.Split(','))
{
// Creates a new instance of FileSystemWatcher
//FileSystemWatcher fileSWatch = new FileSystemWatcher();
this.fileSWatch = new FileSystemWatcher();
// Sets the filter
fileSWatch.Filter = strExt;// customFolder.FolderFilter;
// Sets the folder location
fileSWatch.Path = customFolder.FWPath;
fileSWatch.InternalBufferSize = 64000;
// Sets the action to be executed
StringBuilder actionToExecute = new StringBuilder(customFolder.ExecutableFile);
// List of arguments
StringBuilder actionArguments = new StringBuilder(customFolder.ExecutableArguments);
// Subscribe to notify filters
fileSWatch.NotifyFilter = NotifyFilters.LastWrite | NotifyFilters.FileName | NotifyFilters.DirectoryName;
// Associate the events that will be triggered when a new file Created,Changed,Deleted,Renamed //
// is added to the monitored folder, using a lambda expression
fileSWatch.Created += (senderObj, fileSysArgs) => fileSWatch_Created(senderObj, fileSysArgs, actionToExecute.ToString(), customFolder.AllowedFiles);
fileSWatch.Changed += (senderObj, fileSysArgs) => fileSWatch_Changed(senderObj, fileSysArgs, actionToExecute.ToString(), customFolder.AllowedFiles);
fileSWatch.Deleted += (senderObj, fileSysArgs) => fileSWatch_Deleted(senderObj, fileSysArgs, actionToExecute.ToString(), customFolder.AllowedFiles);
fileSWatch.Renamed += (senderObj, fileSysArgs) => fileSWatch_Renamed(senderObj, fileSysArgs, actionToExecute.ToString(), customFolder.AllowedFiles);
fileSWatch.Error += (senderObj, fileSysArgs) => fileSWatch_Error(senderObj, fileSysArgs, actionToExecute.ToString(), customFolder.AllowedFiles);
// will track changes in sub-folders as well
fileSWatch.IncludeSubdirectories = customFolder.FWSubFolders;
// Begin watching
fileSWatch.EnableRaisingEvents = true;
// Add the systemWatcher to the list
listFileSystemWatcher.Add(fileSWatch);
GC.KeepAlive(fileSWatch);
GC.KeepAlive(listFileSystemWatcher);
}
}
}
}
else
{
Application.Exit();
}
Don't use
GC.KeepAlive(fileSWatch);
GC.KeepAlive(listFileSystemWatcher);
Create a List<FileSystemWatcher> and store each one instead
Also have a look at
Events and Buffer Sizes
Note that several factors can affect which file system change events
are raised, as described by the following:
Common file system operations might raise more than one event. For example, when a file is moved from one directory to another, several
OnChanged and some OnCreated and OnDeleted events might be raised.
Moving a file is a complex operation that consists of multiple simple
operations, therefore raising multiple events. Likewise, some
applications (for example, antivirus software) might cause additional
file system events that are detected by FileSystemWatcher.
The FileSystemWatcher can watch disks as long as they are not switched or removed. The FileSystemWatcher does not raise events for
CDs and DVDs, because time stamps and properties cannot change. Remote
computers must have one of the required platforms installed for the
component to function properly.
If multiple FileSystemWatcher objects are watching the same UNC path in Windows XP prior to Service Pack 1, or Windows 2000 SP2 or earlier,
then only one of the objects will raise an event. On machines running
Windows XP SP1 and newer, Windows 2000 SP3 or newer or Windows Server
2003, all FileSystemWatcher objects will raise the appropriate events.
Note that a FileSystemWatcher may miss an event when the buffer size
is exceeded. To avoid missing events, follow these guidelines:
Increase the buffer size by setting the InternalBufferSize property.
Avoid watching files with long file names, because a long file name contributes to filling up the buffer. Consider renaming these files
using shorter names.
Keep your event handling code as short as possible.
FileSystemWatcher.InternalBufferSize Property
Remarks
You can set the buffer to 4 KB or larger, but it must not exceed 64
KB. If you try to set the InternalBufferSize property to less than
4096 bytes, your value is discarded and the InternalBufferSize
property is set to 4096 bytes. For best performance, use a multiple of
4 KB on Intel-based computers.
The system notifies the component of file changes, and it stores those
changes in a buffer the component creates and passes to the APIs. Each
event can use up to 16 bytes of memory, not including the file name.
If there are many changes in a short time, the buffer can overflow.
This causes the component to lose track of changes in the directory,
and it will only provide blanket notification.
Increasing the size of the buffer can prevent missing file system
change events. However, increasing buffer size is expensive, because
it comes from non-paged memory that cannot be swapped out to disk, so
keep the buffer as small as possible. To avoid a buffer overflow, use
the NotifyFilter and IncludeSubdirectories properties to filter out
unwanted change notifications.
I am currently implementing file content watchers for OpenFOAM output files. These files get written by OpenFOAM in an Unix environment and consumed by my applications in a Windows environment.
Please consider my first, working watcher for convergence files (these files get updated after each iteration of the solution):
FileSystemWatcher watcher;
watcher = new FileSystemWatcher(WatchPath, "convergenceUp*.out");
watcher.NotifyFilter = NotifyFilters.LastWrite | NotifyFilters.Attributes | NotifyFilters.FileName | NotifyFilters.Size;
watcher.Changed += Watcher_Changed;
watcher.EnableRaisingEvents = true;
private void Watcher_Changed(object sender, FileSystemEventArgs e)
{
Files = Directory.GetFiles(WatchPath, "convergenceUp*.out").OrderBy(x => x).ToList(); // Update List of all files in the directory
ReadFiles(); // Do fancy stuff with the files
}
This works as expected. Everytime a file matching the pattern is changed in the watched directory (Notepad++ does notify me that the file has changed aswell) the files are processed.
Moving on from this simple "all files are in one directory" scenario I started to build a watcher for a different type of file (Force function objects for those familiar with OpenFOAM). These files are saved in a hierarchical folder structure like thus:
NameOfFunctionObject
|_StartTimeOfSolutionSetup#1
| |_forces.dat
|_StartTimeOfSolutionSetup#2
|_forces.dat
My goal is to read all forces.dat from "NameOfFunctionObject" and do some trickery with all the contained data. Additionally I also like to have the chance of reading and watching just one file. So my implementation (which borrows heavily from the above) currently looks like this:
FileSystemWatcher watcher;
if (isSingleFile)
watcher = new FileSystemWatcher(Directory.GetParent(WatchPath).ToString(), Path.GetFileName(WatchPath));
else
watcher = new FileSystemWatcher(WatchPath, "forces.dat");
watcher.IncludeSubdirectories = !isSingleFile;
watcher.NotifyFilter = NotifyFilters.LastWrite | NotifyFilters.Attributes | NotifyFilters.FileName | NotifyFilters.Size | NotifyFilters.DirectoryName | NotifyFilters.LastAccess | NotifyFilters.CreationTime | NotifyFilters.Security;
watcher.Changed += Watcher_Changed;
watcher.Created += Watcher_Created;
watcher.Deleted += Watcher_Deleted;
watcher.Error += Watcher_Error;
watcher.Renamed += Watcher_Renamed;
watcher.EnableRaisingEvents = isWatchEnabled;
So depending on wether I want to watch just one file or multiple files I set up the directory to watch and the file filter. If I watch multiple files I set the watcher to watch subdirectories aswell. Because of vigorous testing I filter for all notifications and catch all watcher events.
If I test the single file option, everything works as expected, changes to the file are reported and processed correctly (again, the check with trusty old Notepad++ works)
On testing the multi-file option though, things get pear shaped.
The file paths are correct, the initial read works as expected. But neither watcher event fires. Here comes the curious bit: Notepad++ beeps still away, saying the file has changed, Windows explorer shows a new file date and a new file size. If I save the file within Notepad++, the watcher gets triggered. If I create a new file matching the pattern insinde the watched directory (top level or below does not matter!) the watcher gets triggered. Even watching for a filter of . to catch creation of temporary files does not trigger, so it is safe to assume that no temporary files are created.
In general, the watcher behaves as expected, it can detect changes to a single file, it can detect creations of files in the root watched folder and its subfolders. It just fails to recognise non-windows-changes to a file once it is located in a subfolder. Is this behaviour by design? And more importantly: how can I work elegantly around it without resorting to using a timer and polling by hand?
I think this might be relevant to you
FileSystemWatcher uses ReadDirectoryChangesW Winapi call with a few relevant flags
When you first call ReadDirectoryChangesW, the system allocates a
buffer to store change information. This buffer is associated with the
directory handle until it is closed and its size does not change
during its lifetime. Directory changes that occur between calls to
this function are added to the buffer and then returned with the next
call. If the buffer overflows, the entire contents of the buffer are
discarded
The analogue in FileSystemWatcher is the FileSystemWatcher.InternalBufferSize property
Remarks You can set the buffer to 4 KB or larger, but it must not
exceed 64 KB. If you try to set the InternalBufferSize property to
less than 4096 bytes, your value is discarded and the
InternalBufferSize property is set to 4096 bytes. For best
performance, use a multiple of 4 KB on Intel-based computers.
The system notifies the component of file changes, and it stores those
changes in a buffer the component creates and passes to the APIs. Each
event can use up to 16 bytes of memory, not including the file name.
If there are many changes in a short time, the buffer can overflow.
This causes the component to lose track of changes in the directory,
and it will only provide blanket notification. Increasing the size of
the buffer can prevent missing file system change events. However,
increasing buffer size is expensive, because it comes from non-paged
memory that cannot be swapped out to disk, so keep the buffer as small
as possible. To avoid a buffer overflow, use the NotifyFilter and
IncludeSubdirectories properties to filter out unwanted change
notifications.
If worse comes to worse, you can use a mix of polling and tracking, it has helped me out of trouble a few times
I've created a simple FileSystemWatcher service that's running on my PC:
public static void Run()
{
var watcher = new FileSystemWatcher
{
Path = #"C:\Users\XXX\Google Drive",
NotifyFilter = NotifyFilters.LastAccess
| NotifyFilters.LastWrite
| NotifyFilters.FileName
| NotifyFilters.DirectoryName,
Filter = "*.*",
};
watcher.Created += OnChanged;
watcher.EnableRaisingEvents = true;
}
private static void OnChanged(object source, FileSystemEventArgs e)
{
FooPrintClass.SendToPrinter(e.FullPath);
}
As you see I'm watching a Google Drive folder. That folder is also synced on my server. From time to time a system on my server will create 2 pair of files with the same name but with diffrent type:
(Foo.pdf, Foo.txt)
Sometimes the system will create over 50 of those pairs and they all will be synced to my Google Drive folder.
So far so good, now to my problem:
My FileSystemWatcher service do work as expected, but it dosen't treat them in any sorting matter at all.
I need my service to actually process each pair at a time.
Expected Result:
Foo.pdf, Foo.txt
Bar.pdf, Foo.txt
Actual Result:
Bar.txt, Foo.pdf
Foo.txt, Bar.pdf
As the expected result show, I need to print the pairs in order first.
There are many ways to implement a "queue" solution, but in my case I don't know how many files there will be. So I don't know the total of the files and therefor it'll be harder to build a queue and sorting algorithm.
Any tips?
As you use 3d party system for syncing files you have no control how it is done. You may have problems - no control in which order or they are synced, no guaranty that when you get a notification from your watched a file is not locked.
To easy the problem with sync order you may sync files in bundles.
If you could modify the system that creates these files, you can ZIP both files in one zip file. Having Foo.zip you can print both files in order you want.
It doesn't solve the problem with possible locking. If you could notify your service somehow about a new pair of files, you can just download these files directly from Google Drive using the API. In this case you will have full control over files and the order you get them.
You could use Reactive Extensions to buffer a number of events and sort them before continuing.
An example would be something like this:
Observable
.FromEventPattern<FileSystemEventArgs>(watcher, "Created")
.Buffer(TimeSpan.FromSeconds(10))
.Subscribe(onNext);
public void onNext(IList<string>) { ... }
The example buffers all changes happening in 10 seconds and passes them to onNext as a list. This allows you to sort the files before doing anything else.
This ignores some edge cases like files being created right at the time when the buffer window ends. But there are multiple ways to solve those issues.
I need to detect when either of two file types are accessed in any way across an entire windows file system.
As I understand it the only way to do this without causing serious slow downs for the operating system is to create a file system filter driver?
Essentially all I need to do is take a copy of any doc(x) files and pdf's that are opened. I decided on this approach as it was either that or use file monitors in C# which wouldn't be effective for an entire drive.
My question is two fold, is there an easier way and secondly how would I go about simply taking a copy of each doc(x)/pdf file as it's accessed?
The solution needs to be deployable with the package we're currently producing.
UPDATE
I'm going to benchmark the file system watcher, after discussing it with people here I think it's possible that it may be acceptable, my concern is the fact that I need to monitor the common user directories where downloads will occur( so "C:\Users\SomeUser*" as well as the outlook temporary folder.
You will need to create a file system watcher. Here is a code example that will watch for changes to docx files.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Security.Permissions;
namespace filewatchtest
{
class Program
{
static void Main(string[] args)
{
Run();
}
[PermissionSet(SecurityAction.Demand, Name="FullTrust")]
public static void Run()
{
string[] args = System.Environment.GetCommandLineArgs();
// if directory not specified then end program
if (args.Length != 2)
{
Console.WriteLine("Usage: filewatchtest.exe directory");
return;
}
// create a new fileSystemWatcher and set its properties
FileSystemWatcher watcher = new FileSystemWatcher();
watcher.Path = args[1];
// set the notify filters
watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite | NotifyFilters.FileName | NotifyFilters.DirectoryName;
// set the file extension filter
watcher.Filter = "*.docx";
// add event handlers
watcher.Changed += new FileSystemEventHandler(OnChanged);
watcher.Created += new FileSystemEventHandler(OnChanged);
watcher.Deleted += new FileSystemEventHandler(OnChanged);
watcher.Renamed += new RenamedEventHandler(OnRenamed);
// bengin watching
watcher.EnableRaisingEvents = true;
// wait for the user to quit the program
Console.WriteLine("Plress q to quit the program");
while (Console.Read()!='q');
}
static void OnRenamed(object sender, RenamedEventArgs e)
{
Console.WriteLine("File: {0} renamed to {1}", e.OldFullPath, e.FullPath);
}
static void OnChanged(object sender, FileSystemEventArgs e)
{
Console.WriteLine("File:" + e.FullPath + " " + e.ChangeType);
}
}
}
I think that creating a copy on read will cause a lot of problems. For instance: virus scanners. Consider the following:
I open file "test.pdf"
Your program creates "test_copy.pdf"
Virus scanner detects new file and checks (reads) "test_copy.pdf"
Your program detects read access, and creates "test_copy_copy.pdf"
Virus scanner...
Now you ofcourse you could create copies with a different extension to prevent this, but still there will be a lot of READ actions on files. I sometimes open a file like 10 times, just because I closed it accidentally or I want to recheck something I just read. Now you'll have 10 copies?
I would definitly go with Hans Passant's suggestion of creating a copy on change/create. That happens a lot less by definition, because you always need to open it to alter it, but don't have to alter it when you open it.
The second problem would be to detect a read to a file. Now with docx you could check for the creation of hidden files like '~$_____.docx', but that doesn't work for PDF. Also like you mentioned, you will have to check an entire disk. There is no way around it. If a file can be in any folder, you'll have to check all the folders. Creating an internal list of docx and PDF files in a service could be faster, but as you'll have to loop trough each file again at set intervals it depends on how many files are on the system.
So if you really need to check read access, a file system driver is all you got. But since it will be called on every file access, causing problems or slow systems would be a mayor concern.
If you still want to, check out this File System Filter Driver Tutorial to learn how to do it. Personally, I wouldn't go there.
From what I read in the comments, a File System Watcher would probably work well. I am not exactly sure whether Search Everything uses one, but if it does, I cannot notice any impact.
Another option might be ETW - Windows Event Tracing as used by Process Monitor. Even with millions of changes, I can also hardly notice the impact.
I you want to go for Volume Shadow Copies as proposed by Hans Passant, Alpha Volume Shadow Copies might be a suitable library offering support for it.
Conclusion: a filter driver is probably not needed and keeps you away from other problems, although I admit that the description of hierarchical storage management systems might match your approach, thinking of the upload store as the next hierarchy after hard disk.
I have a server application which receives packets of information (basically a file path on a network) from various client applications via WCF. When it receives the incoming packet, it adds the file path to a list and then launches another process in a backgroundworker thread. In my backgroundworker DoWork function, I call a function called ProcessFiles() - note I've simplified this function to make the sample easier.
private bool ProcessFiles()
{
while (FileList.Count > 0)
{
var path = Path.Combine(FileList[0].TargetPath, #"\Translated Files");
if (!Directory.Exists(path))
{
Directory.CreateDirectory(path);
}
FileList.RemoveAt(0);
}
return true;
}
The function above simply starts working through the FileList and right now the only action I'm trying to do is simply create a new directory inside the target path destination. Now, this path is likely going to be on a network drive (I'm not sure if that matters)... but the server has access to that location. When I run my server/client applications and send it a file to process... theoretically it should create the new "Translated Files" folder in the TargetPath destination location... however, nothing ever gets created. The odd thing is that my DoWork function in my background worker completes its process correctly. I know this because I have a print statement in my RunWorkerCompleted event handler and it appears to be processing normally. Does anyone have any ideas why this directory folder is not being created correctly?