I have an interesting problem - I've inherited a large code base (brown field).
The application runs on a schedule and takes a large amount of data files (text) in, processes them, and then exports a report and cleans up.
There is a bug that has been discovered whereby when trying to clean up afterwards, some files are left in a locked state, even though all file activity has long gone out of scope. This stops the application from being able to delete them during clean up.
There are literally hundreds of IO and stream objects etc being used in this application, and I'm wanting to find out where to start looking to save reviewing every instance of their use.
What are some good tools for investigating File locks in c# managed code, and how do you use them to do so?
This happens normally when you forgot to dispose the parent object that owns a file handle. E.g. you forgot to call Close/Dispose to a FileStream. Then the finalizer will clean up the file handles when they are no longer referenced during the next full GC.
You can check with Windbg if you have SafeFileHandles in the finalization queue ready for finalization. A profiler which can track such things is e.g. YourKit which can when you enable probes also search for files closed in the finalizer and gives you the creation call stack which gives you the ability to search in your code for the offending line.
Check out the process Inspection tab of YourKit to find the probe check.
You can monitor file access (read/write) using ProcMon from SysInternals.
Its not specific to c# but a general tool that can be used for many other things. Note you can export the results to csv and investigate it later.
You can use one of the following guides:
Detailed Windows I/O: Process Monitor - How to do simple file monitoring.
Using Process Monitor to Monitor File Access - More detailed guide explaining how to export the results into a csv you can investigate later.
Edit:
I didn't found anything for this purpose, so if I was you I would inherit from the steam used, and wrap it with logging logic.
This logging stream object, for example named LogStream will write log before each method entrance, call the base.function() and write another log when done.
This way you can monitor the file access as you wish. For example, logging each stream instance an Id using Guid.NewGuid(), logging Thread Id using System.Threading.Thread.CurrentThread.ManagedThreadId etc.
This way you can identify the instances and slowly investigate the calls.
A point to start is to check whether there is equal number both stream open and close, an exception might avoided one of the Dispose() calls.
Related
Imagine I have DLL which has class which writes data to file. Imagine also that class uses locks to makes sure that no two threads write to that file at the same time.
Now imagine I have two different programs A and B which separately use this DLL.
And in both cases path to the file where data is written to is same. In this case this is not thread safe anymore because both programs are writing to same file, am I right? And the locks I mentioned in DLL help only when it is used from ONLY program A, and not from program B simultaneously, am I right?
To put this question in a different way: My point is basically if there is ONLY ONE single copy of DLL loaded regardless if many different programs are using this DLL, then I am on the safe side, because the locks which I had in DLL will work - and not let different threads from all these programs to write to file in an out of syn way. Am I right?
Well, the code runs thread-safe, since a thread of program A cannot change the state of a thread in program B. You'll not have problems of variables and objects being accessed simultaneously.
But you have a problem of accessing a global resource which cannot or should not be shared. There are ways of synchronizing programs as well, e.g. with an IPCChannel (inter process communication, internally using a named pipe), a named Mutex or an EventWaitHandle.
You can also use the file itself as a synchonization object, if only a single operation on the file is needed. See the File.Open() method, especially with the FileShare.None option.
Simply access the file. If it is possible, then this process has access to the file. If it's not possible, opening the file will result in an exception. Wait until the file lock is released, then try again.
At the moment you are locking an object. A .NET object is created in (virtual) memory. Virtual memory is not shared between processes, so each process will have a different object that it locks on. Therefore, synchronization between processes is not possible with the lock statement.
The code in your DLL will be executed in each process that loads it.
I'm assuming you mean C# lock, which is process-local, so nothing is stopping both processes writing to the file simultaneously.
Your best best is to let the file system handle it. Open the file in a mode that prevents other processes from writing to it:
File.Open("test.txt", FileMode.Open, FileAccess.Write, FileShare.Read);
Where FileShare.Read specifies that subsequent attempts to open the file will succeed only for read access.
Where to begin....
I've inherited a application that searches for strings within files from a previous programmer (that had no documenation) its using EPocalipse.IFilter namespace. It has a few issues, the first of which is the VS Project is missing FilterReader.cs, FilterLoader.cs, among others I believe are required for EPocalipse IFilters (based on my research). The second is that the app (when built) is hanging on ReadToEnd() when run against .
I found this thread here:
TextReader Read and ReadToEnd hangs without throwing exception
Which was awesome...except no posted solution was given =(
Since I have this issue and others, I figured I'd start a new thread since I first want to ensure IFilter is installed properly. The project builds, but still hangs on certain files (usually MS Excel).
For example, if I try to "Go to Definition" in Visual Studio for my instantiation of FilterReader, it simply shows the tab "FilterReader [from metadata]". So I'm assuming the FilterReader.cs file is simply missing (its nowhere in the projects solution explorer either), which may be the cause of the hanging problem as well?
Any help is greatly appreciated.
SK
For detailed info on the subject, take a look at this article [CodeProject]
As for hanging issue, it cannot be easily solved. Basically, there are 2 possible solutions:
Apply infinite cycle checks like those in the thread you've found. However, some extremely complex docs may still hang inside of IFilter, and you can do nothing about it (IFilters are COM components, usually closed-source).
Make your extraction two-threaded: one thread to monitor the extraction process and stop document extraction when it times out and another thread to do the actual extraction. Should you choose this path, remember that you'll likely run into access violation exceptions, as EPocalypse implementation hasn't COM protection for multi-threaded access to ifilters.
I have an application which can only have 1 instance running at each time, however if a 2nd instance is launched it needs to be logged to a common logfile that the first could also be using.
I have the check for how many instances are running and I was planning on simply logging it to the event logger initially but the application can be running in user or system context and exceptions are thrown when attempting to query the eventlog source as a user so that idea is scrapped as the security logs are inaccessible to the user.
So I wanted to find out what the safest method of have 2 seperate instances of the same application write to a log file would be that would ensure both get an opportunity to write to it.
I would prefer not to use an existing additional framework if avoidable
Any help appreciated.
A Mutex could be used for interprocess synchronization of a shared resource such as log file. Here's a sample.
You could always write to the system event log. No locking or anything needed and the event viewer is more robust than some give it credit for.
In response to your comment, another user asked the question about write permissions for the event log here on SO. The answer linked to the msdn article that describes how to perform that.
See that question here.
You can dodge the problem if you prefer...
If this is a windows app, you can send the first instance a message and then just quit. On receiving the message, the original instance can write to the log file without any issues.
Why not use syslog protocol ? This will allow you to deliver the logs in a very standards-based and flexible manner. The protocol itself is quite simple, but there are plenty of examples on the Net, e.g. here. If your app is destined for the enterprise use, having a standard way of logging could be a big plus. (And, you do not need to maintain the files either - it becomes a job of a specialized software that does just that)
One way to hack it would be to memory-map the log file. That way, both instances of the application are sharing the same virtual memory image of the file. Then there are a number of ways of implementing a mutex inside the file.
Is there a way to bypass or remove the file lock held by another thread without killing the thread?
I am using a third-party library in my app that is performing read-only operations on a file. I need a second thread read the file at the same time to extract out some extra data the third-party library is not exposing. Unfortunately, the third-party library opened the file using a Read/Write lock and hence I am getting the usual "The process cannot access the file ... because it is being used by another process" exception.
I would like to avoid pre-loading the entire file with my thread because the file is large and would cause unwanted delays in the loading of this file and excess memory usage. Copying the file is not practical due to the size of the files. During normal operation, two threads hitting the same file would not cause any significant IO contention/performance problems. I don't need perfect time-synchronization between the two threads, but they need to be reading the same data within a half second of eachother.
I cannot change the third-party library.
Are there any work-arounds to this problem?
If you start messing with the underlying file handle you may be able to unlock portions, the trouble is that the thread accessing the file is not designed to handle this kind of tampering and may end up crashing.
My strong recommendation would be to patch the third party library, anything you do can and probably will blow up in real world conditions.
In short, you cannot do anything about the locking of the file by a third-party. You can get away with Richard E's answer above that mentions the utility Unlocker.
Once the third-party opens a file and sets the lock on it, the underlying system will give that third-party a lock to ensure no other process can access it. There are two trains of thought on this.
Using DLL injection to patch up the code to explicitly set the lock or unset it. This can be dangerous as you would be messing with another process's stability, and possibly end up crashing the process and rendering grief. Think about it, the underlying system is keeping track of files opened by a process..DLL injection at the point and patch up the code - this requires technical knowledge to determine which process you want to inject into at run-time and alter the flags upon interception of the Win32 API call OpenFile(...).
Since this was tagged as .NET, why not disassemble the source of the third-party into .il files, and alter the flag for the lock to shared, rebuild the library by recompiling all .il files back together into a DLL. This of course, would require to root around the code where the opening of the file is taking place in some class somewhere.
Have a look at the podcast here. And have a look here that explains how to do the second option highlighted above, here.
Hope this helps,
Best regards,
Tom.
This doesn't address your situation directly, but a tool like Unlocker acheieves what you're trying to do, but via a Windows UI.
Any low level hack to do this may result in a thread crashing, file corruption or etc.
Hence I thought I'd mention the next best thing, just wait your turn and poll until the file is not locked: https://stackoverflow.com/a/11060322/495455
I dont think this 2nd advice will help but the closest thing (that I know of) would be DemandReadFileIO:
IntSecurity.DemandReadFileIO(filename);
internal static void DemandReadFileIO(string fileName)
{
string full = fileName;
full = UnsafeGetFullPath(fileName);
new FileIOPermission(FileIOPermissionAccess.Read, full).Demand();
}
I do think that this is a problem that can be solved with c++. It is annoying but at least it works (as discussed here: win32 C/C++ read data from a "locked" file)
The steps are:
Open the file before the third-library with fsopen and the _SH_DENYNO flag
Open the file with the third-library
Read the file within your code
You may be interested in these links as well:
Calling c++ from c# (Possible to call C++ code from C#?)
The inner link from this post with a sample (http://blogs.msdn.com/b/borisj/archive/2006/09/28/769708.aspx)
Have you tried making a dummy copy of the file before your third-party library gets a hold of it... then using the actual copy for your manipulations, logically this would only be considered if the file we are talking about is fairly small. but it is a kind of a cheat :) good luck
If the file is locked and isn't being used, then you have a problem with the way your file locking/unlocking mechanism works. You should only lock a file when you are modifying it, and should then immediately unlock it to avoid situations like this.
I'm writing a mini editor component that is much like Notepad++ or UltraEdit that needs to monitor the files the users open - its a bit slimy, but thats the way it needs to be.
Is it wise to use multiple instances of FileSystemWatcher to monitor the open files - again like Notepad++ or UltraEdit or is there a better way to manage these?
They'll be properly disposed once the document has been closed.
Sorry, one other thing, would it be wiser to create a generic FileSystemWatcher for the drive and monitor that, then only show them a message to reload the file once I know its the right file? Or is that retarted?
You're not going to run into problems with multiple FileSystemWatchers, and there really isn't any other way to pull this off.
For performance, just be sure to specify as narrow filters as you can get away with.
FileSystemWatcher have a drawback, it locks watched folder, so, for example, if you are watching file on removable storage, it prevent "safe device removal".
You can try using Shell Notifications via SHChangeNotifyRegister. In this case you will have one entry point for all changes (or several if you want to), but in this case you will need some native shell interop.
It depends on the likely use cases.
If a user is going to open several files in the same directory and likely not modify anything else a single watcher for that directory may be less onerous than one per file if the number of files is large.
The only way you will find out is by benchmarking. Certainly doing one per file makes the lifespan of the watcher much simpler so that should be your first approach. Note that the watchers fire their events on a system thread pool, so multiple watchers can fire at the same time (something that may influence you design)
I certainly wouldn't do a watcher per drive, you will cause far more effort that way even with aggressive filtering.
Using multiple watcher is fine if you have to. As the comment ShuggyCoUk says, you can optimize by combining file watchers into one if all your files are in the same folder.
It's probably unwise to create a file watcher on a much higher folder (e.g. the root of the drive), because now your code has to handle many more events firing from other changes happening in the file system, and it's fairly easy to get into buffer overflow if your code is not fast enough to handle the events.
Another argument for less file watcher, a filesystemwatcher is a native object, and it pins memory. So depending on the life span and size of your app, you might get into memory fragmentation issues here is how:
Your code runs for a long time (e.g. hours or days) whenever you open a file it create some chunk of data in memory and instantiates a file watcher. You then cleanup this temporary data but the file watcher is still there, IF you repeat that multiple times (and not close the files, or forget to dispose the watchers) you just created multiple objects in virtual memory that cannot be moved by the CLR, and can potentially get into memory congestion. Note that this is not a big deal if you have a few watchers around, but if you suspect you might get into the hundreds or more, beware that's going to become a major issue.