FileSystemWatcher sort algorithm - c#

I've created a simple FileSystemWatcher service that's running on my PC:
public static void Run()
{
var watcher = new FileSystemWatcher
{
Path = #"C:\Users\XXX\Google Drive",
NotifyFilter = NotifyFilters.LastAccess
| NotifyFilters.LastWrite
| NotifyFilters.FileName
| NotifyFilters.DirectoryName,
Filter = "*.*",
};
watcher.Created += OnChanged;
watcher.EnableRaisingEvents = true;
}
private static void OnChanged(object source, FileSystemEventArgs e)
{
FooPrintClass.SendToPrinter(e.FullPath);
}
As you see I'm watching a Google Drive folder. That folder is also synced on my server. From time to time a system on my server will create 2 pair of files with the same name but with diffrent type:
(Foo.pdf, Foo.txt)
Sometimes the system will create over 50 of those pairs and they all will be synced to my Google Drive folder.
So far so good, now to my problem:
My FileSystemWatcher service do work as expected, but it dosen't treat them in any sorting matter at all.
I need my service to actually process each pair at a time.
Expected Result:
Foo.pdf, Foo.txt
Bar.pdf, Foo.txt
Actual Result:
Bar.txt, Foo.pdf
Foo.txt, Bar.pdf
As the expected result show, I need to print the pairs in order first.
There are many ways to implement a "queue" solution, but in my case I don't know how many files there will be. So I don't know the total of the files and therefor it'll be harder to build a queue and sorting algorithm.
Any tips?

As you use 3d party system for syncing files you have no control how it is done. You may have problems - no control in which order or they are synced, no guaranty that when you get a notification from your watched a file is not locked.
To easy the problem with sync order you may sync files in bundles.
If you could modify the system that creates these files, you can ZIP both files in one zip file. Having Foo.zip you can print both files in order you want.
It doesn't solve the problem with possible locking. If you could notify your service somehow about a new pair of files, you can just download these files directly from Google Drive using the API. In this case you will have full control over files and the order you get them.

You could use Reactive Extensions to buffer a number of events and sort them before continuing.
An example would be something like this:
Observable
.FromEventPattern<FileSystemEventArgs>(watcher, "Created")
.Buffer(TimeSpan.FromSeconds(10))
.Subscribe(onNext);
public void onNext(IList<string>) { ... }
The example buffers all changes happening in 10 seconds and passes them to onNext as a list. This allows you to sort the files before doing anything else.
This ignores some edge cases like files being created right at the time when the buffer window ends. But there are multiple ways to solve those issues.

Related

FileSystemWatcher unreliable for changes in subdirectory

I am currently implementing file content watchers for OpenFOAM output files. These files get written by OpenFOAM in an Unix environment and consumed by my applications in a Windows environment.
Please consider my first, working watcher for convergence files (these files get updated after each iteration of the solution):
FileSystemWatcher watcher;
watcher = new FileSystemWatcher(WatchPath, "convergenceUp*.out");
watcher.NotifyFilter = NotifyFilters.LastWrite | NotifyFilters.Attributes | NotifyFilters.FileName | NotifyFilters.Size;
watcher.Changed += Watcher_Changed;
watcher.EnableRaisingEvents = true;
private void Watcher_Changed(object sender, FileSystemEventArgs e)
{
Files = Directory.GetFiles(WatchPath, "convergenceUp*.out").OrderBy(x => x).ToList(); // Update List of all files in the directory
ReadFiles(); // Do fancy stuff with the files
}
This works as expected. Everytime a file matching the pattern is changed in the watched directory (Notepad++ does notify me that the file has changed aswell) the files are processed.
Moving on from this simple "all files are in one directory" scenario I started to build a watcher for a different type of file (Force function objects for those familiar with OpenFOAM). These files are saved in a hierarchical folder structure like thus:
NameOfFunctionObject
|_StartTimeOfSolutionSetup#1
| |_forces.dat
|_StartTimeOfSolutionSetup#2
|_forces.dat
My goal is to read all forces.dat from "NameOfFunctionObject" and do some trickery with all the contained data. Additionally I also like to have the chance of reading and watching just one file. So my implementation (which borrows heavily from the above) currently looks like this:
FileSystemWatcher watcher;
if (isSingleFile)
watcher = new FileSystemWatcher(Directory.GetParent(WatchPath).ToString(), Path.GetFileName(WatchPath));
else
watcher = new FileSystemWatcher(WatchPath, "forces.dat");
watcher.IncludeSubdirectories = !isSingleFile;
watcher.NotifyFilter = NotifyFilters.LastWrite | NotifyFilters.Attributes | NotifyFilters.FileName | NotifyFilters.Size | NotifyFilters.DirectoryName | NotifyFilters.LastAccess | NotifyFilters.CreationTime | NotifyFilters.Security;
watcher.Changed += Watcher_Changed;
watcher.Created += Watcher_Created;
watcher.Deleted += Watcher_Deleted;
watcher.Error += Watcher_Error;
watcher.Renamed += Watcher_Renamed;
watcher.EnableRaisingEvents = isWatchEnabled;
So depending on wether I want to watch just one file or multiple files I set up the directory to watch and the file filter. If I watch multiple files I set the watcher to watch subdirectories aswell. Because of vigorous testing I filter for all notifications and catch all watcher events.
If I test the single file option, everything works as expected, changes to the file are reported and processed correctly (again, the check with trusty old Notepad++ works)
On testing the multi-file option though, things get pear shaped.
The file paths are correct, the initial read works as expected. But neither watcher event fires. Here comes the curious bit: Notepad++ beeps still away, saying the file has changed, Windows explorer shows a new file date and a new file size. If I save the file within Notepad++, the watcher gets triggered. If I create a new file matching the pattern insinde the watched directory (top level or below does not matter!) the watcher gets triggered. Even watching for a filter of . to catch creation of temporary files does not trigger, so it is safe to assume that no temporary files are created.
In general, the watcher behaves as expected, it can detect changes to a single file, it can detect creations of files in the root watched folder and its subfolders. It just fails to recognise non-windows-changes to a file once it is located in a subfolder. Is this behaviour by design? And more importantly: how can I work elegantly around it without resorting to using a timer and polling by hand?
I think this might be relevant to you
FileSystemWatcher uses ReadDirectoryChangesW Winapi call with a few relevant flags
When you first call ReadDirectoryChangesW, the system allocates a
buffer to store change information. This buffer is associated with the
directory handle until it is closed and its size does not change
during its lifetime. Directory changes that occur between calls to
this function are added to the buffer and then returned with the next
call. If the buffer overflows, the entire contents of the buffer are
discarded
The analogue in FileSystemWatcher is the FileSystemWatcher.InternalBufferSize property
Remarks You can set the buffer to 4 KB or larger, but it must not
exceed 64 KB. If you try to set the InternalBufferSize property to
less than 4096 bytes, your value is discarded and the
InternalBufferSize property is set to 4096 bytes. For best
performance, use a multiple of 4 KB on Intel-based computers.
The system notifies the component of file changes, and it stores those
changes in a buffer the component creates and passes to the APIs. Each
event can use up to 16 bytes of memory, not including the file name.
If there are many changes in a short time, the buffer can overflow.
This causes the component to lose track of changes in the directory,
and it will only provide blanket notification. Increasing the size of
the buffer can prevent missing file system change events. However,
increasing buffer size is expensive, because it comes from non-paged
memory that cannot be swapped out to disk, so keep the buffer as small
as possible. To avoid a buffer overflow, use the NotifyFilter and
IncludeSubdirectories properties to filter out unwanted change
notifications.
If worse comes to worse, you can use a mix of polling and tracking, it has helped me out of trouble a few times

C# FileSystemWatcher watch changes on network drive which is only done by current system

I am following this example of FileSystemWatcher, On the top of this, I have created windows form application which will open whenever any .txt file is created and renamed in Z drive.
I have built the console application and deployed to two system and both systems are listening to same network drive (I have mapped a network drive as Z drive in both systems)
However, the problem is whenever I am creating or renaming .txt file in network drive both system's forms are opening which is logical since both deployed console applications are listening to the same location.
But my requirement is " The form should be opened in that system only
who is performing the action of creating or renaming that .txt file."
Is there any way I can achieve this Or is this even possible with fileSystemWatcher class?
Here is the code snippet.
public class Watcher
{
public static void Main()
{
Run();
}
[PermissionSet(SecurityAction.Demand, Name = "FullTrust")]
public static void Run()
{
string[] args = System.Environment.GetCommandLineArgs();
FileSystemWatcher watcher = new FileSystemWatcher("Z:\\", "*.txt");
watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite
| NotifyFilters.FileName | NotifyFilters.DirectoryName;
watcher.IncludeSubdirectories = true;
// Add event handlers.
//watcher.Changed += new FileSystemEventHandler(OnChanged); //Fires everytime files is changed (mulitple times in copy operation)
watcher.Created += new FileSystemEventHandler(OnChanged);
watcher.Deleted += new FileSystemEventHandler(OnChanged);
watcher.Renamed += new RenamedEventHandler(OnRenamed);
// Begin watching.
watcher.EnableRaisingEvents = true;
// Wait for the user to quit the program.
Console.WriteLine("Press \'q\' to quit the sample.");
while (Console.Read() != 'q') ;
}
// Define the event handlers.
private static void OnChanged(object source, FileSystemEventArgs e)
{
// Specify what is done when a file is changed, created, or deleted.
Console.WriteLine("File: " + e.FullPath + " " + e.ChangeType);
Application.EnableVisualStyles();
Application.Run(new Feedback.Form1(e.FullPath));//Here I am opening new form for feedback
}
private static void OnRenamed(object source, RenamedEventArgs e)
{
// Specify what is done when a file is renamed.
Console.WriteLine("File: {0} renamed to {1}", e.OldFullPath, e.FullPath);
Application.EnableVisualStyles();
Application.Run(new Feedback.Form1(e.FullPath));//Here I am opening new form for feedback
}
}
FileSystemWatcher may notify you that something happened, and you might also be able to deduce what happened, but don't count on it. It's a quite limited and unreliable component in my (and others') experience. So if there is any chance of even moderate contention on the target folder I would use some kind of polling solution instead of file watcher.
That said, it won't tell you who did the change. Once you have deduced what has changed, you need to take additional steps for the "who" part. The filesystem stores quite sparse info, you won't find any source machine info. You could try mapping the fileshares that create these changes with different users, as you may deduce the modifying system from that:
Finding the user who modified the shared drive folder files.
If that is not an option, other solutions are much more complicated.
If you have access to the server hosting Z: you could turn on the file audit log for that resource and deduce who the machine was from the event log (event ids 4663 / 5145). The source machine name will be logged in this case. Should be a breeze to enable it if it's a windows server (Directory properties/security/advanced/audit), but reading and synchronizing logs is more complicated.
If none of the solutions above is possible, you may be able to implement a user-space filesystem to proxy your file share, using something like dokan. Source processes would map to your application instead of the fileshare, that way you could raise your own events or just write a detailed audit log to a database or whatever, and then you forward the actual commands to the fileshare. Very expensive and non-trivial solution though. But probably very fun.
FileSystemWatcher gives you notification on file changes.
If you want to use the file system for unique notification you'll need to create an isolated folder for each instance.
Something like :
Z:\Machine1\
Z:\Machine2\
Other option is to check who is the owner/created the file , but it can be really complicated in domain setups.

File system filter driver for specific file types

I need to detect when either of two file types are accessed in any way across an entire windows file system.
As I understand it the only way to do this without causing serious slow downs for the operating system is to create a file system filter driver?
Essentially all I need to do is take a copy of any doc(x) files and pdf's that are opened. I decided on this approach as it was either that or use file monitors in C# which wouldn't be effective for an entire drive.
My question is two fold, is there an easier way and secondly how would I go about simply taking a copy of each doc(x)/pdf file as it's accessed?
The solution needs to be deployable with the package we're currently producing.
UPDATE
I'm going to benchmark the file system watcher, after discussing it with people here I think it's possible that it may be acceptable, my concern is the fact that I need to monitor the common user directories where downloads will occur( so "C:\Users\SomeUser*" as well as the outlook temporary folder.
You will need to create a file system watcher. Here is a code example that will watch for changes to docx files.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Security.Permissions;
namespace filewatchtest
{
class Program
{
static void Main(string[] args)
{
Run();
}
[PermissionSet(SecurityAction.Demand, Name="FullTrust")]
public static void Run()
{
string[] args = System.Environment.GetCommandLineArgs();
// if directory not specified then end program
if (args.Length != 2)
{
Console.WriteLine("Usage: filewatchtest.exe directory");
return;
}
// create a new fileSystemWatcher and set its properties
FileSystemWatcher watcher = new FileSystemWatcher();
watcher.Path = args[1];
// set the notify filters
watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite | NotifyFilters.FileName | NotifyFilters.DirectoryName;
// set the file extension filter
watcher.Filter = "*.docx";
// add event handlers
watcher.Changed += new FileSystemEventHandler(OnChanged);
watcher.Created += new FileSystemEventHandler(OnChanged);
watcher.Deleted += new FileSystemEventHandler(OnChanged);
watcher.Renamed += new RenamedEventHandler(OnRenamed);
// bengin watching
watcher.EnableRaisingEvents = true;
// wait for the user to quit the program
Console.WriteLine("Plress q to quit the program");
while (Console.Read()!='q');
}
static void OnRenamed(object sender, RenamedEventArgs e)
{
Console.WriteLine("File: {0} renamed to {1}", e.OldFullPath, e.FullPath);
}
static void OnChanged(object sender, FileSystemEventArgs e)
{
Console.WriteLine("File:" + e.FullPath + " " + e.ChangeType);
}
}
}
I think that creating a copy on read will cause a lot of problems. For instance: virus scanners. Consider the following:
I open file "test.pdf"
Your program creates "test_copy.pdf"
Virus scanner detects new file and checks (reads) "test_copy.pdf"
Your program detects read access, and creates "test_copy_copy.pdf"
Virus scanner...
Now you ofcourse you could create copies with a different extension to prevent this, but still there will be a lot of READ actions on files. I sometimes open a file like 10 times, just because I closed it accidentally or I want to recheck something I just read. Now you'll have 10 copies?
I would definitly go with Hans Passant's suggestion of creating a copy on change/create. That happens a lot less by definition, because you always need to open it to alter it, but don't have to alter it when you open it.
The second problem would be to detect a read to a file. Now with docx you could check for the creation of hidden files like '~$_____.docx', but that doesn't work for PDF. Also like you mentioned, you will have to check an entire disk. There is no way around it. If a file can be in any folder, you'll have to check all the folders. Creating an internal list of docx and PDF files in a service could be faster, but as you'll have to loop trough each file again at set intervals it depends on how many files are on the system.
So if you really need to check read access, a file system driver is all you got. But since it will be called on every file access, causing problems or slow systems would be a mayor concern.
If you still want to, check out this File System Filter Driver Tutorial to learn how to do it. Personally, I wouldn't go there.
From what I read in the comments, a File System Watcher would probably work well. I am not exactly sure whether Search Everything uses one, but if it does, I cannot notice any impact.
Another option might be ETW - Windows Event Tracing as used by Process Monitor. Even with millions of changes, I can also hardly notice the impact.
I you want to go for Volume Shadow Copies as proposed by Hans Passant, Alpha Volume Shadow Copies might be a suitable library offering support for it.
Conclusion: a filter driver is probably not needed and keeps you away from other problems, although I admit that the description of hierarchical storage management systems might match your approach, thinking of the upload store as the next hierarchy after hard disk.

C# FileSystemWatcher high cpu load when no change happen to directory

My case is the following:
Files are creating inside a directory by some process (about 10 files per minute with max file size 5MB each). Lets call this folder MYROOT.
That files need to be moved and categorized into sub directories according some specific logic based on the filename and external settings.
There will be 5 sub directories under the MYROOT with some sub directories inside them also, lets name them C1, C2, C3, C4, C5. And some sub directories inside them as A1, A2, A3, An...
So, we will have the following categorization: C:\MYROOT\C1\A1, C:\MYROOT\C1\A2, C:\MYROOT\C2\B1 and so on...
All files are written to C:\MYROOT and have the same file type and same naming convention. No renames, deletes, changes are made into this folder.
The FileSystemWatcher is set as follows:
this._watcher = new FileSystemWatcher();
this._watcher.NotifyFilter = NotifyFilters.CreationTime;
this._watcher.Filter = string.Empty;
this._watcher.Path = #"C:\MYROOT";
this._watcher.IncludeSubdirectories = false;
this._watcher.Created += this._watcher_Created;
The event handler:
private void _watcher_Created(object sender, FileSystemEventArgs e)
{
string filePath = string.Empty;
if (e.ChangeType != WatcherChangeTypes.Created)
return;
lock (FileOrganizer.Manager.s_LockObject)
{
filePath = e.FullPath;
//Further processing here like creating a custom object that
//contains the file name and some other info, add that object inside a queue
//that is processed async with logic applied to wait the file to be written,
//validate file contents, etc.. and finally move it where it should be moved.
//The code here is very fast, exception free and in a fire-and-forget manner.
}
}
When the MYROOT directory is empty and files are created by another process, they are moved inside the folders as I described. The folders are only created once if they do not exist.
The amount of files that exist inside the sub folders are increasing and we are talking about ~200GB and counting.
Now hear this:
When no files are created (the "creator" process is not running) nothing should trigger the watcher and I can see that from logs that I enable for debugging before posting this question. But, my process containing the watcher hits a constant 13 to 14% cpu on an octa core processor and increases as the size of the sub directories increase. During processing even if I create (copy-paste) 2000 files at once goes 1% more than that.
The funny part is that when I change the monitoring path to an empty one or the same but with less volume inside it, the cpu utilization of my process is 0% and only when files are created goes to a max of 2% - 5%.
Again as said, the observation is clear, when the monitoring path sub directories contain data, that data affect the watcher internals even if you set not to monitor the sub directories. And if that data volume is big, then high cpu resources are needed by the file system watcher. That is the same even if no changes are taking place to trigger any watcher events.
Is this the normal or by design behavior of the FileSystemWatcher?
PS.
When I monitor MYROOT folder and move the files outside that folder everything seems OK, but that is not an acceptable solution.
Thanx.
Marios

Handle system folders event in windows

I am writing some C# code and I need to detect if a specific folder on my windows file system has been opened while the application is running. Is there any way to do it? WinAPI maybe?
There are three API things I think you should check out:
FindFirstChangeNotification() http://msdn.microsoft.com/en-us/library/aa364417%28VS.85%29.aspx
That gives you a handle you can wait on and use to find changes to a file in a particular file, directory, or tree of directories. It won't tell you when a directory is browsed, but it will tell you when a file is saved, renamed, and so on and so forth.
SetWindowsHookEx() http://msdn.microsoft.com/en-us/library/ms644990%28v=VS.85%29.aspx
You can set that up to give you a callback when any number of events occur - in fact I'm pretty positive that you CAN get this callback when a directory is opened, but it will probably be inordinately difficult because you'll be intercepting messages to explorer's window. So you'll be rebooting during debugging.
Windows Shells http://msdn.microsoft.com/en-us/library/bb776778%28v=VS.85%29.aspx
If that wasn't painful enough, you can try writing a shell program.
If you're trying to write a rootkit, I suppose you don't want me to spoil the details for you. If you're NOT trying to write a rootkit, I suggest you look it up - carefully. There are open source rootkits, and they all basically have to monitor file access this way to hide from the user / OS.
Go with the Windows Shell Extensions. You can use Shell Namespace Extensions to make a "virtual" folder that isn't there (or hides a real one), like the GAC (C:\Windows\assembly)
Here are several examples of Shell Extension coding in .Net 4.0.
A Column Handler would let you know when a folder is "Opened", and even let you provide extra data for each of the files (new details columns).
Check out the FileSystemWatcher class.
The closest thing that I can think of, that may be useful to you, is using the static Directory class. It provides methods to determine the last time a file or directory was accessed. You could setup a BackgroundWorker to monitor if the directory was accessed during a specified interval. Keep track of the start and end of the interval by using DateTime, and if the last access time falls between those, then you can use the BackgroundWorker's ProgressChanged event to notify the application.
BackgroundWorker folderWorker = new BackgroundWorker();
folderWorker.WorkerReportsProgress = true;
folderWorker.WorkerSupportsCancellation = true;
folderWorker.DoWork += FolderWorker_DoWork;
folderWorker.ProgressChanged += FolderWorker_ProgressChanged;
folderWorker.RunWorkerAsync();
void FolderWorker_DoWork(object sender, DoWorkEventArgs e)
{
BackgroundWorker worker = (BackgroundWorker)sender;
while(!worker.CancellationPending)
{
DateTime lastAccess = Directory.GetLastAccessTime(DIRECTORY_PATH);
//Check to see if lastAccess falls between the last time the loop started
//and came to end.
if(/*your check*/)
{
object state; //Modify this if you need to send back data.
worker.ReportProgress(0, state);
}
}
}
void FolderWorker_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
//Take action here from the worker.ReportProgress being invoked.
}
You could use the FileSystemInfo's LastAccessProperty. The problem though is that it can be cached.
FileSystemInfo: http://msdn.microsoft.com/en-us/library/975xhcs9.aspx
LastAccessTime Property: http://msdn.microsoft.com/en-us/library/system.io.filesysteminfo.lastaccesstimeutc.aspx
As noted that this can be pre-cached.
"The value of the LastAccessTimeUtc property is pre-cached if the current instance of the FileSystemInfo object was returned from any of the following DirectoryInfo methods:
GetDirectories
GetFiles
GetFileSystemInfos
EnumerateDirectories
EnumerateFiles
EnumerateFileSystemInfos
To get the latest value, call the Refresh method."
Therefore call the Refresh method but it still might not be up to date due to Windows caching the value. (This is according to msdn doc "FileSystemInfo.Refresh takes a snapshot of the file from the current file system. Refresh cannot correct the underlying file system even if the file system returns incorrect or outdated information. This can happen on platforms such as Windows 98." - link: http://msdn.microsoft.com/en-us/library/system.io.filesysteminfo.refresh.aspx
I think the only way you can reliably achieve this is by monitoring the currently running processes and watch closely for new Explorer.exe instances and/or new Explorer.exe spawned threads (the "Run every window on a separate process" setting gets in the way here).
I admit I don't have a clue about how to code this, but that's what I would look for.

Categories

Resources