Deleting Created Temp FIles

Deleting Created Temp FIles - c#

In C# or Jscript.NET is it necessary (or better to) delete files created in the folder returned by System.IO.Path.GetTempPath() or is it acceptable to wait for the system to delete them (if this ever happens)?

In terms of how your program will work, it makes no difference.
The issue is one of cluttering the filesystem with many files in a folder - the more files in a folder, the slower it will be to read it.
There is also a chance of running out of disk space if the files are not deleted.
In general, if your software creates temporary files, it should cleanup after itself (that is, delete the files once they are no longer in use).

In my opinion, you always clean up the mess you create. "Temp" files tend to hang around for a long time unless the user is savvy enough to use cleanup tools like CCleaner.
Note: This does add the complexity (and potential bugs) of having to remember all the temp files you've created in order to clean them up, but I think it's worth it. This answer by Mr. Gravell has an easy way to take care of that issue.

Related

Performance wise: File.Copy vs File.WriteAllText function in C#?

I have file content in string and I need to put the same content in 3 different files.
So, I am using File.WriteAllText() function of C# to put the content in first file.
now, for other 2 files, I have two options:
Using File.Copy(firstFile, otherFile)
Using File.WriteAllText(otherFile, content)
Performance wise which option is better?

If the file is relatively small it is likely to remain cached in Windows disk cache, so the performance difference will be small, or it might even be that File.Copy() is faster (since Windows will know that the data is the same, and File.Copy() calls a Windows API that is extremely optimised).
If you really care, you should instrument it and time things, although the timings are likely to be completely skewed because of Windows file caching.
One thing that might be important to you though: If you use File.Copy() the file attributes including Creation Time will be copied. If you programatically create all the files, the Creation Time is likely to be different between the files.
If this is important to you, you might want to programatically set the the file attributes after the copy so that they are the same for all files.
Personally, I'd use File.Copy().

How to ensure that data doesn't get corrupted when saving to file?

I am relatively new to C# so please bear with me.
I am writing a business application (in C#, .NET 4) that needs to be reliable. Data will be stored in files. Files will be modified (rewritten) regularly, thus I am afraid that something could go wrong (power loss, application gets killed, system freezes, ...) while saving data which would (I think) result in a corrupted file. I know that data which wasn't saved is lost, but I must not lose data which was already saved (because of corruption or ...).
My idea is to have 2 versions of every file and each time rewrite the oldest file. Then in case of unexpected end of my application at least one file should still be valid.
Is this a good approach? Is there anything else I could do? (Database is not an option)
Thank you for your time and answers.

Rather than "always write to the oldest" you can use the "safe file write" technique of:
(Assuming you want to end up saving data to foo.data, and a file with that name contains the previous valid version.)
Write new data to foo.data.new
Rename foo.data to foo.data.old
Rename foo.data.new to foo.data
Delete foo.data.old
At any one time you've always got at least one valid file, and you can tell which is the one to read just from the filename. This is assuming your file system treats rename and delete operations atomically, of course.
If foo.data and foo.data.new exist, load foo.data; foo.data.new may be broken (e.g. power off during write)
If foo.data.old and foo.data.new exist, both should be valid, but something died very shortly afterwards - you may want to load the foo.data.old version anyway
If foo.data and foo.data.old exist, then foo.data should be fine, but again something went wrong, or possibly the file couldn't be deleted.
Alternatively, simply always write to a new file, including some sort of monotonically increasing counter - that way you'll never lose any data due to bad writes. The best approach depends on what you're writing though.
You could also use File.Replace for this, which basically performs the last three steps for you. (Pass in null for the backup name if you don't want to keep a backup.)

A lot of programs uses this approach, but usually, they do more copies, to avoid also human error.
For example, Cadsoft Eagle (a program used to design circuits and printed circuit boards) do up to 9 backup copies of the same file, calling them file.b#1 ... file.b#9
Another thing you can do to enforce security is to hashing: append an hash like a CRC32 or MD5 at the end of the file.
When you open it you check the CRC or MD5, if they don't match the file is corrupted.
This will also enforce you from people that accidentally or by purpose try to modify your file with another program.
This will also give you a way to know if hard drive or usb disk got corrupted.
Of course, faster the save file operation is, the less risk of loosing data you have, but you cannot be sure that nothing will happen during or after writing.
Consider that both hard drives, usb drives and windows OS uses cache, and it means, also if you finish writing the data may be OS or disk itself still didn't physically wrote it to the disk.
Another thing you can do, save to a temporary file, if everything is ok you move the file in the real destination folder, this will reduce the risk of having half-files.
You can mix all these techniques together.

In principle there are two popular approaches to this:
Make your file format log-based, i.e. do not overwrite in the usual save case, just append changes or the latest versions at the end.
or
Write to a new file, rename the old file to a backup and rename the new file into its place.
The first leaves you with (way) more development effort, but also has the advantage of making saves go faster if you save small changes to large files (Word used to do this AFAIK).

Most efficient way to search for files

I am writing a program that searches and copies mp3-files to a specified directory.
Currently I am using a List that is filled with all the mp3s in a directory (which takes - not surprisingly - a very long time.) Then I use taglib-sharp to compare the ID3Tags with the artist and title entered. If they match I copy the file.
Since this is my first program and I am very new to programming I figure there must be a better/more efficient way to do this. Does anybody have a suggestion on what I could try?
Edit: I forgot to add an important detail: I want to be able to specify what directories should be searched every time I start a search (the directory to be searched will be specified in the program itself). So storing all the files in a database or something similar isn't really an option (unless there is a way to do this every time which is still efficient). I am basically looking for the best way to search through all the files in a directory where the files are indexed every time. (I am aware that this is probably not a good idea but I'd like to do it that way. If there is no real way to do this I'll have to reconsider but for now I'd like to do it like that.)

You are mostly saddled with the bottleneck that is IO, a consequence of the hardware with which you are working. It will be the copying of files that is the denominator here (other than finding the files, which is dwarfed compared to copying).
There are other ways to go about file management, and each exposing better interfaces for different purposes, such as NTFS Change Journals and low-level sector handling (not recommended) for example, but if this is your first program in C# then maybe you don't want to venture into p/invoking native calls.
Other than alternatives to actual processes, you might consider mechanisms to minimise disk access - i.e. not redoing anything you have already done, or don't need to do.

Use an database (simple binary serialized file or an embedded database like RavenDb) to cache all files. And query that cache instead.
Also store modified time for each folder in the database. Compare the time in the database with the time on the folder each time you start your application (and sync changed folders).
That ought to give you much better performance. Threading will not really help searching folders since it's the disk IO that takes time, not your application.

Secure wipe a directory

I know how to wipe a file in C# including it's sectors and such.
But how do I overwrite the directories themselves?
Example: #"C:\mydirectory\" must be unrecoverable gone forever (all files insides are already wiped) so that it will be impossible to recover the directory structure or their names.
------------------- Update below (comment formatting is such a hassle so I post it here)---
For the file deletion I look up the partition's cluster and section size's and overwrite it at least 40 times using 5 different algorithms where the last algorithm is always the random one. The data is also actually written to the disk each time (and not just in memory or something). The only risk is that when I wipe something the physical address on the disk of that file could theoretically have been changed. The only solution I know for that is to also wipe the free disk space after the file has been wiped and hope that no other file currently partially holds the old physical location of the wiped file. Or does Windows not do such a thing?
http://www.sans.org/reading_room/whitepapers/incident/secure-file-deletion-fact-fiction_631 states:
"It is important to note the consensus that overwriting the data only reduces the
likelihood of data being recovered. The more times data is overwritten, the more
expensive and time consuming it becomes to recover the data. In fact Peter Guttman
states “…it is effectively impossible to sanitize storage locations by simple overwriting
them, no matter how many overwrite passes are made or what data patterns are written.”3 Overwritten data can be recovered using magnetic force microscopy, which
deals with imaging magnetization patterns on the platters of the hard disk. The actual
details of how this is accomplished are beyond the scope of this paper."
Personally I believe that when I overwrite the data like 100+ times using different (maybe unknown) algorithms (and if there is no copy of the data left elsewhere like in the swap files) that it will take any very expensive team of professionals many many years to get that data back. And if they do get the data back after all those years then they deserve it I guess... That must be a project for life.
So:
wiping unused data: use cipher (http://support.microsoft.com/kb/315672) or fill the hard-disk with 4GB files or use the Eraser command line executable.
wiping swap files: ?
wiping bad sectors: ?
wiping directories: use Eraser (as Teoman Soygul stated)
How do we know for sure that we overwrote the actual physical addresses?
wiping the most recently used files and the Windows log files should of course be piece a cake for any programmer :)
Eraser solves most of the above problems but cannot wipe the pages files. So any forensic will still find the data back if it was in those swap files at any moment.
AFAIK eraser does not wipe the file allocation tables. But I'm not sure.
And the conclusion should then be: It's (near) impossible to secure wipe in C#?

there is no general approach for this ... consider a SSD: you can't even make sure that your write operation will write to the same physical address, because of wear-levelling methods

If all files/folders inside the folder is already wiped (as you stated), all that is left is the directory itself. Using a cryptic random number generator rename the directory and delete it. It will be as good as wiped.
If this isn't enough for you, grab a copy of Eraser command line executable and execute the command:
Process.Start("eraserl.exe", #"-folder "C:\MyDirectory\" -subfolders -method DoD_E -silent");

Securely deleting is not straightforward, as you know. So it may be worth considering an alternative strategy.
Have you considered using something like TrueCrypt to create an encrypted volume? You could store the files there, then use standard delete routines. An adversary would then need to both decrypt the encrypted volume AND recover the deleted files.

Multiple FileSystemWatchers a good idea?

I'm writing a mini editor component that is much like Notepad++ or UltraEdit that needs to monitor the files the users open - its a bit slimy, but thats the way it needs to be.
Is it wise to use multiple instances of FileSystemWatcher to monitor the open files - again like Notepad++ or UltraEdit or is there a better way to manage these?
They'll be properly disposed once the document has been closed.
Sorry, one other thing, would it be wiser to create a generic FileSystemWatcher for the drive and monitor that, then only show them a message to reload the file once I know its the right file? Or is that retarted?

You're not going to run into problems with multiple FileSystemWatchers, and there really isn't any other way to pull this off.
For performance, just be sure to specify as narrow filters as you can get away with.

FileSystemWatcher have a drawback, it locks watched folder, so, for example, if you are watching file on removable storage, it prevent "safe device removal".
You can try using Shell Notifications via SHChangeNotifyRegister. In this case you will have one entry point for all changes (or several if you want to), but in this case you will need some native shell interop.

It depends on the likely use cases.
If a user is going to open several files in the same directory and likely not modify anything else a single watcher for that directory may be less onerous than one per file if the number of files is large.
The only way you will find out is by benchmarking. Certainly doing one per file makes the lifespan of the watcher much simpler so that should be your first approach. Note that the watchers fire their events on a system thread pool, so multiple watchers can fire at the same time (something that may influence you design)
I certainly wouldn't do a watcher per drive, you will cause far more effort that way even with aggressive filtering.

Using multiple watcher is fine if you have to. As the comment ShuggyCoUk says, you can optimize by combining file watchers into one if all your files are in the same folder.
It's probably unwise to create a file watcher on a much higher folder (e.g. the root of the drive), because now your code has to handle many more events firing from other changes happening in the file system, and it's fairly easy to get into buffer overflow if your code is not fast enough to handle the events.
Another argument for less file watcher, a filesystemwatcher is a native object, and it pins memory. So depending on the life span and size of your app, you might get into memory fragmentation issues here is how:
Your code runs for a long time (e.g. hours or days) whenever you open a file it create some chunk of data in memory and instantiates a file watcher. You then cleanup this temporary data but the file watcher is still there, IF you repeat that multiple times (and not close the files, or forget to dispose the watchers) you just created multiple objects in virtual memory that cannot be moved by the CLR, and can potentially get into memory congestion. Note that this is not a big deal if you have a few watchers around, but if you suspect you might get into the hundreds or more, beware that's going to become a major issue.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.