Scanning a drive with drilldowns using C#? - c#

I'm trying to create an application which scans a drive. The tricky part though, is that my drive contains a set of folders that have folders within folders and contain documents. I'm trying to scan the drive, take a "snapshot" of all documents & folders and dump into a .txt file.
The first time i run this app, the output will be a text file with all the folders & files.
The second time i run this application, it will take the 2 text files (the one produced from the 2nd time i run the app and the .txt file from the 1st time i have run the app) and compare them...reporting what has been moved/overridden/deleted.
Does anybody have any code for this? I'm a newbie at this C# stuff and any help would be greatly appreciated.
Thanks in advance.

One thing that we learned in the 80's was that if it's really tempting to use recursion for file system walking, but the moment you do that, someone will make a file system with nesting levels that will cause your stack to overflow. It's far better to use heap-based walking of the file system.
Here is a class I knocked together which does just that. It's not super pretty, but it does the job quite well:
using System;
using System.IO;
using System.Collections.Generic;
namespace DirectoryWalker
{
public class DirectoryWalker : IEnumerable<string>
{
private string _seedPath;
Func<string, bool> _directoryFilter, _fileFilter;
public DirectoryWalker(string seedPath) : this(seedPath, null, null)
{
}
public DirectoryWalker(string seedPath, Func<string, bool> directoryFilter, Func<string, bool> fileFilter)
{
if (seedPath == null)
throw new ArgumentNullException(seedPath);
_seedPath = seedPath;
_directoryFilter = directoryFilter;
_fileFilter = fileFilter;
}
public IEnumerator<string> GetEnumerator()
{
Queue<string> directories = new Queue<string>();
directories.Enqueue(_seedPath);
Queue<string> files = new Queue<string>();
while (files.Count > 0 || directories.Count > 0)
{
if (files.Count > 0)
{
yield return files.Dequeue();
}
if (directories.Count > 0)
{
string dir = directories.Dequeue();
string[] newDirectories = Directory.GetDirectories(dir);
string[] newFiles = Directory.GetFiles(dir);
foreach (string path in newDirectories)
{
if (_directoryFilter == null || _directoryFilter(path))
directories.Enqueue(path);
}
foreach (string path in newFiles)
{
if (_fileFilter == null || _fileFilter(path))
files.Enqueue(path);
}
}
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
Typical usage is this:
DirectoryWalker walker = new DirectoryWalker(#"C:\pathToSource\src", null, (x => x.EndsWith(".cs")));
foreach (string s in walker)
{
Console.WriteLine(s);
}
Which recursively lists all files that end in ".cs"

A better approach than your text file comparisons would be to use the FileSystemWatcher Class.
Listens to the file system change notifications and raises events when a directory, or file in a directory, changes.
You could log the changes and then generate your reports as needed from that log.

you can easily utilize the DirectoryInfo/FileInfo classes for this.
Basically instantiate an instance of the DirectoryInfo class, pointing towards the c:\ folder. Then using it's objects walk the folder structure.
http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx has code that could quite easily be translated.
Now, the other part of your question is insanity. You can find the differences between the two files relatively easily, but translating that into what has been moved/deleted/etc will take the some fairly advanced logic structures. After all, if I have two files, both named myfile.dat, and one is found at c:\foo and the other at c:\notfoo, how would the one at c:\notfoo be reported if I deleted the one at c:\foo? Another example, is if I have a file myfile2.dat and copy it from c:\bar to c:\notbar is that considered a move? What happens if I copy it on Tuesday, and then on Thursday I delete c:\bar\myfile2.dat--is that a move or a delete? And would the answer change if I ran the program on every Monday as opposed to daily?
There's a whole host of questions, and their corresponding logic structures which you'd need to think of amd code for in order to build that functionality, and even then, it would not be 100% correct, because it's not paging the file system as changes occur--there will always exist the possibility of a scenario that did not get reported correctly in your logic due to timing, logic structure, process time, when the app runs, or just due to the sheer perversity of computers.
Additionally, the processing time would grow exponentially with the size of your drive. After all, you'd need to check every file against every other file to determine it's state as opposed to its previous state. I'd hate to have to run this against my 600+GB drive at home, let alone the 40TB drives I have on servers at work.

Related

Spelling with textboxes and custom dictionaries slowing down my application in C# WPF

I am using a WPF TextBoxes inside my WinForm application for spell checking. Each time I create one, I load the same file in as a CustomDictionary. All has been fine until recently. Now, they take a long time to load, up to a second. Some forms have 30 or more, meaning delays of nearly half a minute. This seems to be the case Windows 10 (not Windows 8 as I originally posted). The application is running under DotNet 4.0, I have tried 4.5 and 4.6 (not 4.61) and all versions are slow.
I have seen sfaust’s question Spell check textbox in Win10 - Slow and am7zd’s answer. Thanks to these, I looked at the GLOBAL registry key in HKEY_CURRENT_USER\Software\Microsoft\Spelling\Dictionaries. I have 580 entries (after pruning out entries without matching files) and still things are slow.
At present, every time I create a TextBox and add a custom dictionary to it, a new entry seems to be generated in _GLOBAL_
Is there a better way of doing things than loading the custom dictionary in from file every time?
Is there a way of re-using the same entry in _GLOBAL_ every time instead of creating a new one?
Is there a clean way of clearing previous entries in GLOBAL created by my application and their matching .dic files when closing the application (or on restarting it)?
I could clear _GLOBAL_ completely each time I start my application. This brings back the speed I want, but what is the downside?
Any advice gratefully received.
No answers from anyone else, so this is what I have done:
I made sure I use CustomDictionaries.Remove on all textboxes with custom dictionaries before closing the form they are on. This gets rid of new entries in _GLOBAL_ and the related files in AppData\Local\Temp.
But there will be times when things go wrong or the user just ends the task, leaving _GLOBAL_ entries and .dic files in place, so:
I decided to take things a stage further. When I start my application, I will not only clean entries in _GLOBAL_ that don't have matching files (as suggested in the previous post referenced above), but also to remove all entries referring to .dic files in AppData\Local\Temp. My theory being that anyone who has left entries there didn't mean to, otherwise they would probably have saved the .dic file in a different folder (as Microsoft Office does).
try
{
string[] allDictionaries = (string[])Registry.GetValue(#"HKEY_CURRENT_USER\Software\Microsoft\Spelling\Dictionaries", "_Global_", new string[0]);
if (allDictionaries.Count() > 0)
{
List<string> realDictionaries = new List<string>();
bool changedSomething = false;
foreach (string thisD in allDictionaries)
{
if (File.Exists(thisD))
{
if (thisD.Contains(#"\AppData\Local\Temp\"))
{
// Assuming that anyone who wants to keep a permanent .dic file will not store it in \AppData\Local\Temp
// So delete the file and don't copy the name of the dictionary into the list of good dictionaries.
File.Delete(thisD);
changedSomething = true;
}
else
{
realDictionaries.Add(thisD);
}
}
else
{
// File does not exist, so don't copy the name of the dictionary into the list of good dictionaries.
changedSomething = true;
}
}
if (changedSomething)
{
Registry.SetValue(#"HKEY_CURRENT_USER\Software\Microsoft\Spelling\Dictionaries", "_Global_", realDictionaries.ToArray());
}
}
}
catch (Exception ex)
{
MessageBox.Show(this, "Error clearing up old dictionary files.\n\nFull message:\n\n" + ex.Message, "Unable to delete file", MessageBoxButtons.OK, MessageBoxIcon.Warning);
}
I am still wondering if it is totally safe to clear entries in _GLOBAL_ that refer to files in AppData\Local\Temp. Surely people shouldn't be leaving important stuff in a temp folder... should they?
What would be really nice would be an overload to CustomDictionaries.Add that allows us to set the name and folder of the .dic file, allowing all the textboxes in the same application to share the same .dic file and making sure we don't leave a load of redundant entries and files with seemingly random names hanging around in the first place..... please Microsoft.

Identifying bad ReparsePoints with GetDirectories() in .Net 3.5?

I am using Directory.GetDirectories() with a Linq statement to loop through all directories in a folder that aren't system folders, however I am discovering a bunch of bad ReparsePoints in the folder, which is causing the method to take a long time as it times out on each bad reparse point.
The code I am currently using looks like this:
subdirectories = directory.GetDirectories("*", SearchOption.TopDirectoryOnly)
.Where(d => ((d.Attributes & FileAttributes.Hidden) != FileAttributes.Hidden)
&& ((d.Attributes & FileAttributes.System) != FileAttributes.System));
I have also tried using code like this for testing, but it also hangs for a full minute or so on the bad folders:
foreach (var item in dir.GetDirectories("*", SearchOption.TopDirectoryOnly))
{
Console.WriteLine(item.Name);
Console.WriteLine(item.Attributes);
}
It should be noted that the above bit of code works fine in .Net 4.0, but in 3.5 it will hang for a minute on each bad reparse point.
Trying to open these folders manually in Windows Explorer results in a "Network Path Not Found" error.
Is there another way to loop through good subfolders inside a folder that doesn't use the Attributes property, or that bypasses the bad reparse points?
I have already tried using Directory.Exists(), and that is equally slow.
According to this answer: *FASTEST* directory listing
For the best performance, it is possible to P/Invoke NtQueryDirectoryFile, documented as ZwQueryDirectoryFile
From MSDN: FILE_REPARSE_POINT_INFORMATION structure
This information can be queried in either of the following ways:
Call ZwQueryDirectoryFile, passing FileReparsePointInformation as the value of FileInformationClass and passing a caller-allocated, FILE_REPARSE_POINT_INFORMATION-structured buffer as the value of FileInformation.
Create an IRP with major function code IRP_MJ_DIRECTORY_CONTROL and minor function code IRP_MN_QUERY_DIRECTORY.

Is there anyway to get all files names without exceptions in C#?

Update: I be glad to drop the C# requirement, and just see any program that can list all the files running as Admin or System, my question is has anyone seen such a thing?
There are numerous methods of enumerating files in a directory, but all suffer the same problems:
"The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters."
"Access to the path 'C:\Users\All Users\Application Data' is denied"
etc.
Even running under admin, single user machine, it seems impossible to list all the files without encountering exceptions\errors.
Is it really an impossible task just to get list of all the files under windows? Has anyone ever been able to obtain the complete list of all files on their machine using C# or any other method?
This link from MS with the title "Enumerate Directories and Files" , does not show how to Enumerate Directories and Files, it only show a subset of what that will not throw : DirectoryNotFoundException, UnauthorizedAccessException, PathTooLongException,
Update : Here is sample code to run over C and attempt to enumerate all the files and errors. Even when running this as admin there are folders that not only can be access, but I even can't change their ownership to Admin! for example : "C:\Windows\CSC"
just have look at "Errors {0}.csv" log file to see how many places are inaccessible to admin.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
class Program
{
static System.IO.StreamWriter logfile;
static System.IO.StreamWriter errorfile;
static void Main(string[] args)
{
string directory = #"C:\";
logfile = new System.IO.StreamWriter(string.Format(#"E:\Files {0}.csv", DateTime.Now.ToString("yyyyMMddHHmm")));
errorfile = new System.IO.StreamWriter(string.Format(#"E:\Errors {0}.csv", DateTime.Now.ToString("yyyyMMddHHmm")));
TraverseTree(directory, OnGotFileInfo, OnGotException);
logfile.Close();
errorfile.Close();
}
public static void OnGotFileInfo(System.IO.FileInfo fileInfo)
{
logfile.WriteLine("{0},{1},", fileInfo.FullName, fileInfo.Length.ToString("N0"));
}
public static void OnGotException(Exception ex, string info)
{
errorfile.WriteLine("{0},{1}", ex.Message, info);
}
public static void TraverseTree(string root, Action<System.IO.FileInfo> fileAction, Action<Exception, string> errorAction)
{
// Data structure to hold names of subfolders to be
// examined for files.
Stack<string> dirs = new Stack<string>(20);
if (!System.IO.Directory.Exists(root))
{
throw new ArgumentException();
}
dirs.Push(root);
while (dirs.Count > 0)
{
string currentDir = dirs.Pop();
string[] subDirs;
try
{
subDirs = System.IO.Directory.GetDirectories(currentDir);
}
// An UnauthorizedAccessException exception will be thrown if we do not have
// discovery permission on a folder or file. It may or may not be acceptable
// to ignore the exception and continue enumerating the remaining files and
// folders. It is also possible (but unlikely) that a DirectoryNotFound exception
// will be raised. This will happen if currentDir has been deleted by
// another application or thread after our call to Directory.Exists. The
// choice of which exceptions to catch depends entirely on the specific task
// you are intending to perform and also on how much you know with certainty
// about the systems on which this code will run.
catch (System.Exception e)
{
errorAction(e, currentDir);
continue;
}
string[] files = null;
try
{
files = System.IO.Directory.GetFiles(currentDir);
}
catch (System.Exception e)
{
errorAction(e, currentDir);
continue;
}
// Perform the required action on each file here.
// Modify this block to perform your required task.
foreach (string file in files)
{
try
{
// Perform whatever action is required in your scenario.
System.IO.FileInfo fi = new System.IO.FileInfo(file);
fileAction(fi);
}
catch (System.Exception e)
{
// If file was deleted by a separate application
// or thread since the call to TraverseTree()
// then just continue.
errorAction(e ,file);
continue;
}
}
// Push the subdirectories onto the stack for traversal.
// This could also be done before handing the files.
foreach (string str in subDirs)
dirs.Push(str);
}
}
}
Yes, it is at very least hard to enumerate all files without exceptions.
Several set of issues here:
some path (long ones- PathTooLongException) are not supported by CLR
security restrictions on folders/files
junctions/hard links that introduce duplicates (and in theory cycles to case StackOverflow in recursive iteration).
basic sharing violation restrictions (if you try to read files).
For PathTooLongException: I think you'll need to deal with PInvoke of corresponding Win32 functions. All path related methods in CLR are restricted to 256 characters long.
Security restrictions - you may be able to enumerate everything if you run under system (not sure) or with backup permissions, but any other account is guaranteed to not being able to access all files on system configured by default.
Instead of getting exceptions you can PInvoke native versions and handle error codes instead. You may be able to decrease number of exceptions on going into directories by checking ACL on the directly first.

How can I check access rights on a given directory?

Currently I have a program that searches a user set directory and sub-directories for music files and adds them to a collection. However if one of the directories it comes accross is protected then the program falls over. I wanted to know how I can check if the user has access to the directory before trying to search it to avoid this problem.
Below is the code I'm using for the search, it currently contains a basic work around for "System Volume Information" but as there is a possibility that there may be other protected directories I wanted to change this to include them.
public void SearchForMusic()
{
//Searches selected directory and its sub directories for music files and adds their path to ObservableCollection<string> MusicFound
foreach (string ext in extentions)
{
foreach (string song in Directory.GetFiles(SearchDirectory, ext))
{
musicFound.Add(song);
}
foreach (string directory in Directory.GetDirectories(SearchDirectory))
{
if (directory.Contains("System Volume Information"))
{
}
else
{
foreach (string song in Directory.GetFiles(directory, ext))
{
musicFound.Add(song);
}
foreach (string subDirectory in Directory.GetDirectories(directory))
{
foreach (string subSong in Directory.GetFiles(subDirectory, ext))
{
musicFound.Add(subSong);
}
}
}
}
}
}
Many thanks :)
By far the easiest way to be sure that you have access to a file system object is to attempt to access it. If it fails with an Access Denied error, then you don't have access. Just detect that error condition and proceed with the next item in the search.
In other words, delegate checking access to the system which is, after all, the ultimate arbiter of access rights.
You can check this question by replacing the Write with Read permissions. Also, wrap your code in a try catch block and if the exception is thrown, you can assume (or properly check the exception type to be sure) that the directory cannot be traversed.

How do I find and open a file in a Visual Studio 2005 add-in?

I'm making an add-in with Visual Studio 2005 C# to help easily toggle between source and header files, as well as script files that all follow a similar naming structure. However, the directory structure has all of the files in different places, even though they are all in the same project.
I've got almost all the pieces in place, but I can't figure out how to find and open a file in the solution based only on the file name alone. So I know I'm coming from, say, c:\code\project\subproject\src\blah.cpp, and I want to open c:\code\project\subproject\inc\blah.h, but I don't necessarily know where blah.h is. I could hardcode different directory paths but then the utility isn't generic enough to be robust.
The solution has multiple projects, which seems to be a bit of a pain as well. I'm thinking at this point that I'll have to iterate through every project, and iterate through every project item, to see if the particular file is there, and then get a proper reference to it.
But it seems to me there must be an easier way of doing this.
To work generically for any user's file structure, you'll need to enumerate all the files in all the projects. This should get you started. And, well, pretty much finished :-)
internal static string GetSourceOrInclude(bool openAndActivate)
{
// Look in the project for a file of the same name with the opposing extension
ProjectItem thisItem = Commands.Application.ActiveDocument.ProjectItem;
string ext = Path.GetExtension(thisItem.Name);
string searchExt = string.Empty;
if (ext == ".cpp" || ext == ".c")
searchExt = ".h";
else if (ext == ".h" || ext == ".hpp")
searchExt = ".cpp";
else
return(string.Empty);
string searchItemName = thisItem.Name;
searchItemName = Path.ChangeExtension(searchItemName, searchExt);
Project proj = thisItem.ContainingProject;
foreach(ProjectItem item in proj.ProjectItems)
{
ProjectItem foundItem = FindChildProjectItem(item, searchItemName);
if (foundItem != null)
{
if (openAndActivate)
{
if (!foundItem.get_IsOpen(Constants.vsViewKindCode))
{
Window w = foundItem.Open(Constants.vsViewKindCode);
w.Visible = true;
w.Activate();
}
else
{
foundItem.Document.Activate();
}
}
return(foundItem.Document.FullName);
}
return(string.Empty);
}
Note that it is possible for a header to be in the include path without being added to the project, so if the above fails, you could potentially look in the include paths for the containing project too. I'll leave that as an exercise for the reader.

Categories

Resources