I am currently in the process of writing a small application in C# to process batches of images and put them into a PDF. Each batch of images is stored in its own folder on a network share. The application will enable users to perform QA checks on a random number of images from a single batch before creating a PDF. At most there will be between 4-6 users running this application on individual desktops with access to the location where the image batches are stored.
The problem I'm running into at the moment is how do I prevent 2 users from processing the same batch? Initially I thought about using FileSystemWatcher to check for last access to each folder, but reading up on how FileSystemWatcher raises events it didn't seem suitable. I've condsidered using polling to check the images in each folder for File access using a filestream, but I don't think that will be suitable either(I may be wrong).
What would be the simplest solution?
I'd use a lock file with a package like this.
Code is quite simple:
var fileLock = new SimpleFileLock("networkFolder/file.lock", TimeSpan.FromMinutes(timeout));
where timeout is used to unlock the folder if the process using it crashed (so that it doesn't stay locked forever).
Then, everytime a process needs to use that directory, you go with a simple check:
if (fileLock.TryAcquireLock())
{
//Lock acquired - do your work here
}
else
{
//Failed to acquire lock - SpinWait or do something else
}
Code is taken from the samples on the repo, so that's the way the author suggests using his library.
I had the chance to use it and I found it both useful and reliable.
Related
I'm using a FileSystemWatcher to watch a directory. I created a _Created() event handler to fire when a file is moved to this folder. My problem is the following:
The files in this directory get created when the user hits a "real life button" (a button in our stock, not in the application). The FileSystemWatcher take this file, do some stuff in the system and then delete it. That wouldn't be a problem when the application runs only once. But it is used by 6 clients. So every application on every client is trying to delete it. If one client is too slow, it will throw an exception because the file is already deleted.
What I'm asking for is: Is there a way to avoid this?
I tried using loops and check if the file still exists, but without any success.
while (File.Exists(file))
{
File.Delete(file);
Thread.Sleep(100);
}
Can someone give me a hint how it could probably work?
Design
If you want a file to be processed by a single instance only (for example, the first instance that reacts gets the job), then you should implement a locking mechanism. Only the instance that is able to obtain a lock on the file is allowed to process and remove it, all other instances should skip the file.
If you're fine with all instances processing the file, and only care that at least one of them succeeds, then you need to figure out which exceptions indicate a genuine failure and which ones indicate a failure caused by the actions of another instance.
Locking
To 'lock' a file, you can open it with share-mode FileShare.None. This prevents other processes from opening it until you close the file. However, you'll then need to close the file before you can delete it, which leaves a small gap during which another instance could open the file.
A better solution is to create a separate lock file for that purpose. Create it with file-mode FileMode.Create and share-mode FileShare.None and keep it open until the whole process is finished, including the removal of the processed file. Then the lock file can be closed and optionally removed.
Exception
As for the UnauthorizedAccessException you got, according to the documentation, that means one of 4 things:
You don't have the required permission
The file is an executable file that is in use
The path is a directory
The file is read-only
1 and 4 seem most likely in this case (if the file was open in another process you'd get an IOException).
If you want to synchronize access between multiple clients on the same computer you should use a Named Mutex.
I have a web application to return images to my frontend.
In this application what happens is: when a request is made to a particular image the application checks if the image already exists on disk; if it exists the image is returned.
My problem starts when the image does not exist on disk. In that case two requests are made at the same time for the same image which does not exist on disk. Problem occurs when two threads try to create the same file on disk at the same time.
To solve the problem, what I tried to do was to create a Mutex in the creation of disk image. But it had a problem: as the server load is enormous due to the large number of simultaneous requests, the server crashes.
I would like to ask what your ideas to solve this problem. Or what you would do otherwise?
Thank you.
You could try the following pattern:
Try to read the image (if succeeds, than done)
Try to create the image with Write lock
Only on "File in use exception", small delay (milliseconds)
Go back to step 1 (retry)
Make the delay really small, just a tiny bit larger than the time it should need to create an image.
Implement a retry limit, max 3 times or so.
This would allow you to make use of the already existing (file) locking mechanism
You can call the open function with O_CREAT and O_EXCL flags. The first process's open call will get exclusive access to create the file and it will start downloading the image. The subsequent process's open call will fail because their open is not exclusive and "errno" will be set to EEXIST.
Based on your design, the subsequent processes can either wait for complete file creation or can return back.
fd = open(path, O_CREAT|O_EXCL)
I need to write an application that polls a directory which contains images on a file server and display 4 at a time.
This application will be run up to 50 times across the network at the same time.
I'm trying to think of the best architecture to complete this requirement.
I was working on the idea of opening a file with read/write access and no file share allowed so that if another PC came in to read it it would error and it would have to move on to the next one, the problem is, is that I need to access all 4 images in sequence on the same pc ensuring other pc's dont try to open them. So for example if PC1 tries to open 1.jpg it needs to be able to open 1,2,3,4.jpg. If another PC comes in at the same time to read them I need a way for it to then open 5,6,7,8.jpg and so on and so on.
It seems a simple requirement but a nightmare to try and build successfully.
You're basically dealing with a race condition here, and I don't see a way to handle it from separate instances of your application running on separate machines unless you can guarantee your file naming will always follow a standard naming convention that would allow you to work with the sequence of 4 files using only the name of the first.
The best way to handle this would be using a centralized resource to manage access to your files, either a database as was suggested in a comment or else a service (such as WCF) that would "hand out" each set of 4 files.
What about creating a 1.jpg.lock file? The presence of a the file indicates the images are locked and any other instance of the application should skip that set.
When I call FileInfo(path).LastAccessTime or FileInfo(path).LastWriteTime on a file that is in the process of being written it returns the time that the file was created, not the last time it was written to (ie. now).
Is there a way to get this information?
Edit: To all the responses so far. I hadn't tried Refresh() but that does not do it either. I am returned the time that the file was started to be written to. The same goes for the static method, and creating a new instance of FileInfo.
Codymanix might have the answer, but I'm not running Windows Server (using Windows 7), and I don't know where the setting is to test.
Edit 2: Nobody finds it interesting that this function doesn't seem to work?
The FileInfo values are only loaded once and then cached. To get the current value, call Refresh() before getting a property:
f.Refresh();
t = f.LastAccessTime;
Another way to get the current value is by using the static methods on the File class:
t = File.GetLastAccessTime(path);
Starting in Windows Vista, last access time is not updated by default. This is to improve file system performance. You can find details here:
http://blogs.technet.com/b/filecab/archive/2006/11/07/disabling-last-access-time-in-windows-vista-to-improve-ntfs-performance.aspx
To reenable last access time on the computer, you can run the following command:
fsutil behavior set disablelastaccess 0
As James has pointed out LastAccessTime is not updated.
The LastWriteTime has also undergone a twist since Vista. When the process has the file still open and another process checks the LastWriteTime it will not see the new write time for a long time -- until the process has closed the file.
As a workaround you can open and close the file from your external process. After you have done that you can try to read the LastWriteTime again which is then the up to date value.
File System Tunneling:
If an application implements something like a rolling logger which closes the file and then renames it to a different file name you will also run into issues since the creation time and file size of the "old" file is remembered by the OS although you did create a new file. This includes wrong reports of the file size even if you did recreate log.txt from scratch which is still 0 bytes in size. This feature is called OS File System Tunneling which is still present on Windows 8.1 . An example how to work around this issue check out RollingFlatFileTracelistener from Enterprise Library.
You can see the effects of file system tunneling on your own machine from the cmd shell.
echo test > file1.txt
ren file1.txt file2.txt
Wait one minute
echo test > file1.txt
dir /tc file*.txt
...
05.07.2015 19:26 7 file1.txt
05.07.2015 19:26 7 file2.txt
The file system is a state machine. Keeping states correctly synchronized is hard if you care about performance and correctness.
This strange tunneling syndrome is obviously still used by application which do e.g. autosave a file and move it to a save location and then recreate the file again at the same location. For these applications it makes to sense to give the file a new creation date because it was only copied around. Some installers do also such tricks to move files temporarily to a different location and write the contents back later to get past some file exists check for some install hooks.
Have you tried calling Refresh() just before accessing the property (to avoid getting a cached value)? If that doesn't work, have you looked at what Explorer shows at the same time? If Explorer is showing the wrong information, then it's probably something you can't really address - it might be that the information is only updated when the file handle is closed, for example.
There is a setting in windows which is sometimes set especially on server systems so that modified and accessed times for files are not set for better performance.
From MSDN:
When first called, FileSystemInfo
calls Refresh and returns the
cached information on APIs to get
attributes and so on. On subsequent
calls, you must call Refresh to get
the latest copy of the information.
FileSystemInfo.Refresh()
If you're application is the one doing the writing, I think you are going to have to "touch" the file by setting the LastWriteTime property your self between each buffer of data you write. Some psuedocode:
while(bytesWritten < totalBytes)
{
bytesWritten += br.Write(buffer);
myFileInfo.LastWriteTime = DateTime.Now;
}
I'm not sure how severely this will affect write performance.
Tommy Carlier's answer got me thinking....
A good way to visualise the differences is seperately running the two snippets (I just used LinqPAD) simliar to below while also running sysinternals Process Monitor.
while(true)
File.GetLastAccessTime([file path here]);
and
FileInfo bob = new FileInfo(path);
while(true){
string accessed = bob.LastAccessTime.ToString();
}
If you look at Process Monitor while running the first snippet you will see repeated and constant access attempts to the file for the LinqPAD process. The second snippet will do an initial access of the file, for which you will see activity in process monitor, and then very little afterwards.
However if you go and modify the file (I just opened the text file I was monitoring using FileInfo and added a character and saved) you will see a series of access attempts by the LinqPAD process to the file in process monitor.
This illustrates the non-cached and cached behaviour of the two different approachs respectively.
Will the non-cached approach wear a hole in the hard drive?!
EDIT
I went away feeling all clever over my testing and then used the caching behaviour of FileInfo in my windows service (basically to sit in a loop and say 'Has-file-changed-has-file-changed...' before doing processing)
While this approach worked on my dev box, it did not work in the production environment, ie the process just kept running regardless if the file had changed or not. I ended up changing my approach to checking and just used GetLastAccessTime as part of it. Don't know why it would behave differently on production server....but I am not too concerned at this point.
Recently I was working on displaying workflow diagram images in our web application. I managed to use the rehosted WF designer and create images on-the-fly on the server, but imagining how large the workflow diagrams can very quickly become, I wanted to give a better user experience by using some ajax control for displaying images that would support zoom & pan functionality.
I happened to come across the website of seadragon, which seems to be just an amazing piece of work that I could use. There is just one disadvantage - in order to use their library for generating deep zoom versions of images I have to use the file structure on a server. Because of the temporary nature of the images I am using (workflow diagrams with progress indicators), it is important to not only be able to create such images but also to get rid of them after some time.
Now the question is how can I best ensure that the temporary image files and the folder hierarchy can be created on a server (ASP.NET web app), and later cleaned up. I was thinking of using the cache functionality and by the expiration of the cache item delete the corresponding image folder hierarchy, or simply in the Application_Start and Application_End of Global.asax delete the content of the whole temporary folder, but I'm not really sure whether this is a good idea and whether there are some security restrictions or file-system-related troubles. What do you think ?
We do something similar for creating PDF reports and found the easiest way is to use a timestamp check to determine how "old" files are, and then delete them based on a period of time, in our case more then 2 hours old. This is done before the next PDF document is created, but as part of the creation process. We also created a specific folder and gave the ASP.Net user read/write access to the folder.
The only disadvantage is that if the process of creating PDF's is not used regularly there will be a build up of files, however they will be cleaned up eventually. In 2 years and close on 4000 PDF's we have yet to have an error doing it this way.
Use the App_Data folder. This folder is inside your application and writable by your app without having to go outside the context of the app, but it's also secured from casual browsing. It's meant to hold data files for your application.
Application_Start and Application_End will only fire once each, so if you need better cleanup than that, I would consider using a cache structure or a simple windows service to handle the cleanup.
First, you have to make sure your IIS worker process has rights to write/delete files from your cache directory (and NOT the rest of your site, just in case)
2nd, I would stay away from using App_Start and App_End, App end to clean up files is not 100% guaranteed to fire, and you could end up with a growing pile of orphaned images.
I would instead make a scheduled process, maybe runs once per hour, or once a day, depending on what you want. And have it check how old each image in your cache is, and if its older than your arbitrary "expiure time" then delete it.
Other than that there's not much to it.