I have to create an app that drills into a specific drive, reads all file names and replaces illegal SharePoint characters with underscores.
The illegal characters I am referring to are: ~ # % & * {} / \ | : <> ? - ""
Can someone provide either a link to code or code itself on how to do this? I am VERY new to C# and need all the help i can possibly get. I have researched code on recursively drilling through a drive but i am not sure how to put the character replace and the recursive looping together. Please help!
The advice for removing illegal characters is here:
How to remove illegal characters from path and filenames?
You just have to change the character set to your set of characters that you want to remove.
If you have figured out how to recurse the folders, you can get all of the files in each folder with:
var files = System.IO.Directory.EnumerateFiles(currentPath);
and then
foreach (string file in files)
{
System.IO.File.Move(file, ConvertFileName(file));
}
The ConvertFileName method you will write to accept a filename as a string, and return a filename stripped of the bad characters.
Note that, if you are using .NET 3.5, GetFiles() works too. According to MSDN:
The EnumerateFiles and GetFiles
methods differ as follows: When you
use EnumerateFiles, you can start
enumerating the collection of names
before the whole collection is
returned; when you use GetFiles, you
must wait for the whole array of names
to be returned before you can access
the array. Therefore, when you are
working with many files and
directories, EnumerateFiles can be
more efficient.
How to recursively list directories
string path = #"c:\dev";
string searchPattern = "*.*";
string[] dirNameArray = Directory.GetDirectories(path, searchPattern, SearchOption.AllDirectories);
// Or, for better performance:
// (but breaks if you don't have access to a sub directory; see 2nd link below)
IEnumerable<string> dirNameEnumeration = Directory.EnumerateDirectories(path, searchPattern, SearchOption.AllDirectories);
How to: Enumerate Directories and Files
How to recursively list all the files in a directory in C#?
Not really an answer, but consider both of the following:
The following characters are not valid in filenames anyways so you don't have to worry about them: /\:*?"<>|.
Make sure your algorithm handles duplicate names appropriately. For example, My~Project.doc and My#Project.doc would both be renamed to My_Project.doc.
A recursive method to rename files in folders is what you want. Just pass it the root folder and it will call itself for all subfolders found.
private void SharePointSanitize(string _folder)
{
// Process files in the directory
string [] files = Directory.GetFiles(_folder);
foreach(string fileName in files)
{
File.Move(fileName, SharePointRename(fileName));
}
string[] folders = Directory.GetDirectories(_folder);
foreach(string folderName in folders)
{
SharePointSanitize(folderName);
}
}
private string SharePointRename(string _name)
{
string newName = _name;
newName = newName.Replace('~', '');
newName = newName.Replace('#', '');
newName = newName.Replace('%', '');
newName = newName.Replace('&', '');
newName = newName.Replace('*', '');
newName = newName.Replace('{', '');
newName = newName.Replace('}', '');
// .. and so on
return newName;
}
Notes:
You can replace the '' in the SharePointRename() method to whatever character you want to replace with, such as an underscore.
This does not check if two files have similar names like thing~ and thing%
class Program
{
private static Regex _pattern = new Regex("[~#%&*{}/\\|:<>?\"-]+");
static void Main(string[] args)
{
DirectoryInfo di = new DirectoryInfo("C:\\");
RecursivelyRenameFilesIn(di);
}
public static void RecursivelyRenameFilesIn(DirectoryInfo root)
{
foreach (FileInfo fi in root.GetFiles())
if (_pattern.IsMatch(fi.Name))
fi.MoveTo(string.Format("{0}\\{1}", fi.Directory.FullName, Regex.Replace(fi.Name, _pattern.ToString(), "_")));
foreach (DirectoryInfo di in root.GetDirectories())
RecursivelyRenameFilesIn(di);
}
}
Though this will not handle duplicates names as Steven pointed out.
Related
I am trying to write a code that will check all files under given directory and sub directories for a string passed from the web page. As of now I have this code:
private void ProcessDirectory(string targetDirectory, string origDirectory, string ObjectName)
{
string[] fileEntries = Directory.GetFiles(targetDirectory);
string[] subdirectoryEntries = Directory.GetDirectories(targetDirectory);
foreach (string fileName in fileEntries)
{
ProcessFile(fileName, origDirectory, ObjectName);
}
foreach (string subdirectory in subdirectoryEntries)
ProcessDirectory(subdirectory, origDirectory, ObjectName);
}
private void ProcessFile(string path, string origDirectory, string ObjectName)
{
if (ObjectName != "")
{
var fileLines = File.ReadAllLines(path);
List<string> fileItems = new List<string>(fileLines);
if (fileItems.Contains(ObjectName))
{
string sExt = Path.GetExtension(path).ToLower();
if (sExt == ".txt")
{
listTextFiles.Items.Add(path.Replace(origDirectory, ""));
}
}
}
It works, but the problem is that it looks only for a complete word in the file. For example, if I look for the word 'Account', and the file contains word 'Account', my code will work. If the file contains the word 'AccountCode', my search won't find it. Is there a way to fix it?
Another question, how to add a counter that would show at the end of the process how many files were checked under the given directory and all sub directories.
This is an awfully round-about way of doing it. Just load the entire file content and use IndexOf:
var content = File.ReadAllText(path);
if (content.IndexOf(ObjectName) > -1) {
// rest of your code here
}
There is no need to load line-by-line, initialize a whole new list with those lines, and check each line.
This also gives the benefit of a partial search, as you've asked.
You could probably improve this immensely by carefully auditing how much memory you're consuming. Both your method and the one I provided here will likely allocate large blocks of memory, only for them to be useless after the conditional check. Consider using a StringBuilder and re-using it with each file.
if fileItems.Contains(ObjectName)) will search the list fileItems on the condition: if that list contains items that is equal to ObjectName.
You probably want: if that list contains items that contains ObjectName. So change to this:
if (fileItems.Any(e => e.Contains(ObjectName)))
To answer the second question. Because you use recursion here you would need to declare a property or class level variable and increment it in your ProcessFile method e.g.:
public int NumberOfMatches { get; set; }
ProcessFile...
{
if (fileItems.Contains(ObjectName))
{
NumberOfMatches++;
}
As a side note there is no reason to use recursion here you could simply get all the files with a single call:
string[] allFiles = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories);
You may also consider mutli-threading if performance is an issue:
Parallel.ForEach(allFiles,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
allFiles =>
{
...
}
When checking contents of a string, don't forget to implement a comparer for the string
If(string.Contains( value ,StringComparer.CurrentCultureIgnoreCase ))
// Apply logic...
It's very often left out...
Is there an easy way to get a list of filenames that matach a filename pattern including references to parent directory? What I want is for "..\ThirdParty\dlls\*.dll" to return a collection like ["..\ThirdParty\dlls\one.dll", "..\ThirdParty\dlls\two.dll", ...]
I can find several questions relating matching files names including full path, wildcards, but nothing that includes "..\" in the pattern. Directory.GetFiles explicitly disallows it.
What I want to do with the names is to include them in a zip archive, so if there is a zip library that can understand relative paths like this I am happier to use that.
The pattern(s) are coming from an input file, they are not known at compile time. They can get quite complex, e.g ..\src\..\ThirdParty\win32\*.dll so parsing is probably not feasible.
Having to put it in zip is also the reason I am not very keen on converting the pattern to fullpath, I do want the relative paths in zip.
EDIT: What I am looking for really is a C# equivalent of /bin/ls.
static string[] FindFiles(string path)
{
string directory = Path.GetDirectoryName(path); // seperate directory i.e. ..\ThirdParty\dlls
string filePattern = Path.GetFileName(path); // seperate file pattern i.e. *.dll
// if path only contains pattern then use current directory
if (String.IsNullOrEmpty(directory))
directory = Directory.GetCurrentDirectory();
//uncomment the following line if you need absolute paths
//directory = Path.GetFullPath(directory);
if (!Directory.Exists(directory))
return new string[0];
var files = Directory.GetFiles(directory, filePattern);
return files;
}
There is the Path.GetFullPath() function that will convert from relative to absolute. You could use it on the path part.
string pattern = #"..\src\..\ThirdParty\win32\*.dll";
string relativeDir = Path.GetDirectoryName(pattern);
string absoluteDir = Path.GetFullPath(relativeDir);
string filePattern = Path.GetFileName(pattern);
foreach (string file in Directory.GetFiles(absoluteDir, filePattern))
{
}
If I understand you correctly you could use Directory.EnumerateFiles in combination with a regular expression like this (I haven't tested it though):
var matcher = new Regex(#"^\.\.\\ThirdParty\\dlls\\[^\\]+.dll$");
foreach (var file in Directory.EnumerateFiles("..", "*.dll", SearchOption.AllDirectories)
{
if (matcher.IsMatch(file))
yield return file;
}
I'm trying to rename files that my program lists as having "illegal characters" for a SharePoint file importation. The illegal characters I am referring to are: ~ # % & * {} / \ | : <> ? - ""
What i'm trying to do is recurse through the drive, gather up a list of filenames and then through Regular Expressions, pick out file names from a List and try to replace the invalid characters in the actual filenames themselves.
Anybody have any idea how to do this? So far i have this: (please remember, i'm a complete n00b to this stuff)
class Program
{
static void Main(string[] args)
{
string[] files = Directory.GetFiles(#"C:\Documents and Settings\bob.smith\Desktop\~Test Folder for [SharePoint] %testing", "*.*", SearchOption.AllDirectories);
foreach (string file in files)
{
Console.Write(file + "\r\n");
}
Console.WriteLine("Press any key to continue...");
Console.ReadKey(true);
string pattern = " *[\\~#%&*{}/:<>?|\"-]+ *";
string replacement = " ";
Regex regEx = new Regex(pattern);
string[] fileDrive = Directory.GetFiles(#"C:\Documents and Settings\bob.smith\Desktop\~Test Folder for [SharePoint] %testing", "*.*", SearchOption.AllDirectories);
StreamWriter sw = new StreamWriter(#"C:\Documents and Settings\bob.smith\Desktop\~Test Folder for [SharePoint] %testing\File_Renames.txt");
foreach(string fileNames in fileDrive)
{
string sanitized = regEx.Replace(fileNames, replacement);
sw.Write(sanitized + "\r\n");
}
sw.Close();
}
}
So what i need to figure out is how to recursively search for these invalid chars, replace them in the actual filename itself. Anybody have any ideas?
When you are working recursively with files and directories, many times it's easier to use the DirectoryInfo class and it's members instead of the static methods. There is a pre-built tree structure for you, so you don't have to manage that yourself.
GetDirectories returns more DirectoryInfo instances so you can walk the tree, while GetFiles returns FileInfo objects.
This guy created a custom iterator to recursively yield file info, which when you combine it with your existing regex work, would complete your solution.
File.Move() effectively renames files. Basically, you'll just need to
File.Move(fileNames, sanitized);
inside the latter loop.
ALERT - possibly there'll be duplicate file names, so you'll have to establish a policy to avoid this, like appending a counter at the end of the sanitized variable. Also, apply a proper exception handling.
PS: Certainly, you don't need to search for characters like :\*.
I have a directory that contains jpg,tif,pdf,doc and xls. The client DB conly contains the file names without extension. My app has to pick up the file and upload the file. One of the properties of the upload object is the file extension.
Is there a way of getting file extension if all i have is the path and name
eg:
C:\temp\somepicture.jpg is the file and the information i have through db is
c:\temp\somepicture
Use Directory.GetFiles(fileName + ".*"). If it returns just one file, then you find the file you need. If it returns more than one, you have to choose which to upload.
Something like this maybe:
DirectoryInfo D = new DirectoryInfo(path);
foreach (FileInfo fi in D.GetFiles())
{
if (Path.GetFileNameWithoutExtension(fi.FullName) == whatever)
// do something
}
You could obtain a list of all of the files with that name, regardless of extension:
public string[] GetFileExtensions(string path)
{
System.IO.DirectoryInfo directory =
new System.IO.DirectoryInfo(System.IO.Path.GetDirectoryName(path));
return directory.GetFiles(
System.IO.Path.GetFileNameWithoutExtension(path) + ".*")
.Select(f => f.Extension).ToArray();
}
Obviously, if you have no other information and there are 2 files with the same name and different extensions, you can't do anything (e.g. there is somepicture.jpg and somepicture.png at the same time).
On the other hand, usually that won't be the case so you can simply use a search pattern (e.g. somepicture.*) to find the one and only (if you're lucky) file.
Search for files named somepicture.* in that folder, and upload any that matches ?
Get the lowest level folder for each path. For your example, you would have:
'c:\temp\'
Then find any files that start with your filename in that folder, in this case:
'somepicture'
Finally, grab the extension off the matching filename. If you have duplicates, you would have to handle that in a unique way.
You would have to use System.IO.Directory.GetFiles() and iterate through all the filenames. You will run into issues when you have a collision like somefile.jpg and somefile.tif.
Sounds like you have bigger issues than just this and you may want to make an argument to store the file extension in your database as well to remove the ambiguity.
you could do something like this perhaps....
DirectoryInfo di = new DirectoryInfo("c:/temp/");
FileInfo[] rgFiles = di.GetFiles("somepicture.*");
foreach (FileInfo fi in rgFiles)
{
if(fi.Name.Contains("."))
{
string name = fi.Name.Split('.')[0].ToString();
string ext = fi.Name.Split('.')[1].ToString();
System.Console.WriteLine("Extension is: " + ext);
}
}
One more, with the assumption of no files with same name but different extension.
string[] files = Directory.GetFiles(#"c:\temp", #"testasdadsadsas.*");
if (files.Length >= 1)
{
string fullFilenameAndPath = files[0];
Console.WriteLine(fullFilenameAndPath);
}
From the crippled file path you can get the directory path and the file name:
string path = Path.GetDirectoryName(filename);
string name = Path.GetFileName(filename);
Then you can get all files that matches the file name with any extension:
FileInfo[] found = new DirectoryInfo(path).GetFiles(name + ".*");
If the array contains one item, you have your match. If there is more than one item, you have to decide which one to use, or what to do with them.
All the pieces are here in the existing answers, but just trying to unify them into one answer for you - given the "guaranteed unique" declaration you're working with, you can toss in a FirstOrDefault since you don't need to worry about choosing among multiple potential matches.
static void Main(string[] args)
{
var match = FindMatch(args[0]);
Console.WriteLine("Best match for {0} is {1}", args[0], match ?? "[None found]");
}
private static string FindMatch(string pathAndFilename)
{
return FindMatch(Path.GetDirectoryName(pathAndFilename), Path.GetFileNameWithoutExtension(pathAndFilename));
}
private static string FindMatch(string path, string filename)
{
return Directory.GetFiles(path, filename + ".*").FirstOrDefault();
}
Output:
> ConsoleApplication10 c:\temp\bogus
Best match for c:\temp\bogus is [None found]
> ConsoleApplication10 c:\temp\7z465
Best match for c:\temp\7z465 is c:\temp\7z465.msi
> ConsoleApplication10 c:\temp\boot
Best match for c:\temp\boot is c:\temp\boot.wim
If i have lots of directory names either as literal strings or contained in variables, what is the easiest way of combining these to make a complete path?
I know of Path.Combine but this only takes 2 string parameters, i need a solution that can take any number number of directory parameters.
e.g:
string folder1 = "foo";
string folder2 = "bar";
CreateAPath("C:", folder1, folder2, folder1, folder1, folder2, "MyFile.txt")
Any ideas?
Does C# support unlimited args in methods?
Does C# support unlimited args in methods?
Yes, have a look at the params keyword. Will make it easy to write a function that just calls Path.Combine the appropriate number of times, like this (untested):
string CombinePaths(params string[] parts) {
string result = String.Empty;
foreach (string s in parts) {
result = Path.Combine(result, s);
}
return result;
}
LINQ to the rescue again. The Aggregate extension function can be used to accomplish what you want. Consider this example:
string[] ary = new string[] { "c:\\", "Windows", "System" };
string path = ary.Aggregate((aggregation, val) => Path.Combine(aggregation, val));
Console.WriteLine(path); //outputs c:\Windows\System
I prefer to use DirectoryInfo vs. the static methods on Directory, because I think it's better OO design. Here's a solution with DirectoryInfo + extension methods, that I think is quite nice to use:
public static DirectoryInfo Subdirectory(this DirectoryInfo self, params string[] subdirectoryName)
{
Array.ForEach(
subdirectoryName,
sn => self = new DirectoryInfo(Path.Combine(self.FullName, sn))
);
return self;
}
I don't love the fact that I'm modifying self, but for this short method, I think it's cleaner than making a new variable.
The call site makes up for it, though:
DirectoryInfo di = new DirectoryInfo("C:\\")
.Subdirectory("Windows")
.Subdirectory("System32");
DirectoryInfo di2 = new DirectoryInfo("C:\\")
.Subdirectory("Windows", "System32");
Adding a way to get a FileInfo is left as an exercise (for another SO question!).
Try this one:
public static string CreateDirectoryName(string fileName, params string[] folders)
{
if(folders == null || folders.Length <= 0)
{
return fileName;
}
string directory = string.Empty;
foreach(string folder in folders)
{
directory = System.IO.Path.Combine(directory, folder);
}
directory = System.IO.Path.Combine(directory, fileName);
return directory;
}
The params makes it so that you can append an infinite amount of strings.
Path.Combine does is to make sure that the inputted strings does not begin with or ends with slashes and checks for any invalid characters.