C# solution for analysing files as they are written/modified - c#

I have several projects that require me to monitor files, and then edit them as they are getting written to disk. I have a feeling that what I am looking for is operationally the same as how anti-virus tools operate. Let me give more details:
1) I need to trap all files saved by Office application, and then add specific company tags to the headers/footers of each document as they are getting written to disk.
2) I need to know immediately when an editable file (of pretty much any type) is written to disk, so that I can undertake some scanning operations to check if files content meets certain company policies.
In short, you can see that I need to process any user files as they are being written to disk.
Here is my problem. I want to use C# for this task, but I am not sure if it has the ability to meet my requirements. Everything I have seen on the net is geared towards lower-level C programming, which I specifically want to avoid due to time constraints for this project. Anyone aware of how to easily do this task in C#? Is it even feasible (ie too high-level a language, too slow a language etc.)?

Performance won't be the issue. I guess I'd question the entire process- it sounds like a recipe for disaster. You can easily hack something together in C# using a FileSystemWatcher in a matter of minutes, but it will be fraught with issues. AV software is bad enough about locking files and screwing up various software, and it's not even trying to modify the file. How do you know when the other app is "done" writing the file? What do you do when you've got the file locked and something else breaks because it can't get access?

Have you looked at the FileSystemWatcher?

C# can easily do this. Look at the FileSystemWatcher class (http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher.aspx).

Related

*FASTEST* directory listing

I have massive directories, and I would like to read all the files as fast as I can. I mean, not DirectoryInfo.GetFiles fast, but 'get-clusters-from-disk-low-level' fast.
Of course, .NET 2.0, c#
Similar question was here, but this approach wasn't any good:
C# Directory listing massive directory
Someone suggested pInvoke on FindFirst/FindNext. Anybody tried that and is able to share results?
For a "normal" approach, basically everything boils down to FindFirstFile/FindNextFile, you don't really get much faster than that... and that isn't super-turbo-fast.
If you really need speed, look into reading the MFT manually - but know that this requires admin privileges, and is prone to break whenever NTFS gets updated (and, oh yeah, won't work for non-NTFS filesystems). You might want to have a look at this code which has USN and MFT stuff.
However, perhaps there's a different solution. If your app is running constantly and needs to pick up changes, you can start off by doing one slow FindFirstFile/FindNextFile pass, and then use directory change notification support to be informed of updates... that works for limited users, and doesn't depend on filesystem structures.
For the best performance, it is possible to P/Invoke NtQueryDirectoryFile, documented as ZwQueryDirectoryFile.
(That short of accessing the disk directly and reading the raw file system structures directly, which usually is not practical.)
Try using something like this DirectoryManager and refine it by your needs. Works faster than the .NET Framework GetDirectories() or GetFiles() because we ommitted there cross-platform checkings and adaptations.

How to determine who changed a file?

In Windows, how can I programmatically determine which user account last changed or deleted a file?
I know that setting up object access auditing may be an option, but if I use that I then have the problem of trying to match up audit log entries to specific files... sounds complex and messy! I can't think of any other way, so does anyone either have any tips for this approach or any alternatives?
You can divide your problem into two parts:
Write to a log whenever a file is accessed.
Parse, filter and present the relevant information of the log.
Of those two part 1, writing to the log is a built in function through auditing as you mention. Reinventing that would be hard and probably never get as good as the builtin functionality.
I would use the built in functionality for logging by setting up an audit ACL on those files. Then I would focus my efforts on providing a good interface that reads the event log, filters out relevant events and presents them in a way that is suitable and relevant for your users.
You could always create a file system filter. This might be overkill, but it depends on your purposes. You can have it load at boot and it sits behind pretty much every file access (its what virus scanners usually use to scan files as they are accessed).
Simply need to log the "owner" of the application that is writing to the file.
Also see the MSDN documentation
The only way I know of to do this is to set up a FileSystemWatcher and keep it running. Oh, and if it's across a network drive, it may randomly lose connection, so it may be good to force a disconnect/reconnect every few hours just to make sure it has a fresh connection.

Shredding files in .NET

Is there a SDK that can be used in managed code to shred files securely?
EDIT: This is the only link i could find in google that helps me
EDIT: Either SDK or some kind of COM based component.
This code from codeproject may be a good starting point.
Eraser has been around for years, you could call out to it by using System.Diagnostics.Process, or at least review the algorithm there.
Take a look at Windows.WinAny.Helper at the CodePlex. It has SecureDelete extension which allows you to shredd files with different algorithms like Gutmann, DoD-7, DoD-3, Random or Quick.
Technology has changed in the past few years so when I happened to see this answer (why wasn't an answer accepted again?) I wanted to provide an update for others with similar questions.
Please note that shredding is very much filesystem and media dependent. Attempting to "shred" a file on a log based filesystem or a filesystem stored on smart (write leveling) flash isn't going to get you very far. You would have to, at a minimum, write enough data to complete fill the device to hope that the old data might be overwritten one time.
More likely you would have to write several smaller files and when you get FS full, delete one and then keep writing a new one, to ensure that all reserved space has been overwritten as well. Then you will probably be fairly safe. Probably.
I say probably because the storage media/FS could decide that a block was failing (or used too much relatively) and map it away substituting some other part of the disk instead. This is a per-block thing of course, so any much larger file is unlikely to be reconstructed.

C# Creating a log system

I was reading the following article:
http://odetocode.com/articles/294.aspx
This article raised me a lot of question regarding logs.
(I don’t know if I should have made this in separated questions… but I don’t want to spam stackoverflow.com with questions of mine)
The 1st one is if I should store it in a .txt, or .xml file… or even in a table inside the database.
Probably saving in the .txt will be better regarding performance. But when someone needs to find something the .txt file, it may become a pain in the... neck.
So… which one should I use, and why?
The second one, is there any specific class to deal with “log” thing?
I have read several threads about this subject, and I didn’t find the answers to my questions.
Thanks in advance.
The easiest approach I've taken in the past is using log4net. That way you can configure the logging in the config file. If you need it to go to a database, set it up as such. If you want to be notified when a major error occurs, set it up that way.
As far as sorting through the logs, it really depends on the approach you want to take, and how much you plan on logging. Normally I log to a flat text file as I don't enable a lot of logging in my applications. So parsing through them isn't a big deal.
Unless you want to write a system for education purposes, I honestly think that you'd be best off sticking with log4net or nlog.
And further, you would probably be better off studying the code to those systems instead of writing your own.
As to your question, I would stick to a text file and buffer the messages before spitting them to disk.
Why bother inventing wheel? you can check MS enterprise library Logging Block.
definitely not xml.
with xml, you will need to read it all, parse it, add whatever, then generate the whole xml again, and write it back to hard disk. every single time you log something.
unless of course you append the nodes to the xml file manually, in which way you loose most of xml advantages.
warnings to fatal errors - whatever will help you to debug the application if it crashes - those logs i would store in a txt file.
append a new line for every entry.
this way you can also ask from your user to check it out (if you assist him via the phone).
if it's not a meta log, such as mentioned above, in other words, if it's anything related to the program itself you may need to analyze - keep on the db.
Regarding file vs database, it's up to you to choose.
File logs give greater performance but with pain of access.
If the logs are there just to rarely provide information (e.g. the app crashes and you need to know why), you're better off storing the logs in a file.
If you want to give access to those logs, analyze them, etc, you should store them in a database.
.net is really not my zone, but there are lots of reasons why you should use the framework's logging classes.
For my apps I have chosen to write to db. Its easier (for me) to read the logs this way. However I do not go log crazy as some people do, I only log what I need to log and nothing else.
I gave log4net a shot not to long ago and did not like it at all. It was a whole lot of junk to just write to a db and send an email. I ended up writing a custom logging class and it was a whole ~200 lines and took just a few hours. It works great, I don't have another dependency, and it can be easily changed.
If you're dealing with ASP.NET, ELMAH is another good logging tool. It's apparently what Microsoft's Scott Hanselman uses.
It does need some additional code to get it to work with ASP.NET MVC's HandleError attribute, though.
NLog and log4net both provide a rich logging API but neither addresses the challanges you face managing and analyzing all the data in your log files.
If you're willing to consider a commerical tool, take a look at GIBRLATAR - it works with NLog and log4net and also collects useful performance metrics. Most importantly, GIBRALTAR provides great tools for managing and analyzing logs.

Encrypt my framework and code

i am creating my own CMS frame work, because many of the clients i have, the have same requirements, like news module, newsletter module, etc.
now i am doing it fine, the only thing that is bothering me, is if a client wants to move from my server he would ask me to gibe him his files, and of course if i do so the new person who will take it he will see all my code, use it and benefit from i, and this is so bad for me that i spend all this time on creating my system and any one can easily see the code, plus he will see all the logic for my system, and he can easily know how other clients of mine sites are working, and that is a threat to me, finally i am using third party controls that i have paid for their license, and i don't want him to take it on a golden plate.
now what is the best way to solve this ? i thought it is encrypting, but how can i do that and how efficient is it ?
-should i merge all my CS files and Dlls in bin folder to one Dll and encrypt it, and how can i do that ?
i totally appreciate all the help on this matter as it is really crucial for me.
you should read this
Best .NET obfuscation tools/strategy
How effective is obfuscation?
In my experience, this is rarely worth the effort. Lots of companies who provide libraries like this don't bother obfuscating their code (Telerik, etc).
Especially considering what you are writing (CMSes are everywhere), you'd likely see more benefit from your time spent implementing features that put your product/implementation in a competitive advantage and make companies see that the software you are capable of writing has value, rather than the code itself.
In the end, you want to ensure you are a key factor in making software work for a company, not the DLLs you give them.
You'll need to precompile your site and obfuscate dlls.
Visual Studio has something like Dotfuscator Community Edition shipped with it. You could give it a try.
Of course, HTML output, CSS declarations, database structure and stored procedures code cannot be encrypted.
You can however try to compress CSS which will also reduce its readbility by humans.
Check here: The best approach to scramble CSS definitions to a human-unreadable state throughout an ASP.NET application
One other idea would be to use a frame in your HTML and put the most of the site pages inside of it. This way, it will not be visible when doing "View source".
Or just state it clearly that you offer whatever you're doing as a service and do not provide source codes of your work. I somehow doubt salesforce would be willing to give their sources to anyone who asks.

Categories

Resources