Changing FileNames using RegEx and Recursion - c#

I'm trying to rename files that my program lists as having "illegal characters" for a SharePoint file importation. The illegal characters I am referring to are: ~ # % & * {} / \ | : <> ? - ""
What i'm trying to do is recurse through the drive, gather up a list of filenames and then through Regular Expressions, pick out file names from a List and try to replace the invalid characters in the actual filenames themselves.
Anybody have any idea how to do this? So far i have this: (please remember, i'm a complete n00b to this stuff)
class Program
{
static void Main(string[] args)
{
string[] files = Directory.GetFiles(#"C:\Documents and Settings\bob.smith\Desktop\~Test Folder for [SharePoint] %testing", "*.*", SearchOption.AllDirectories);
foreach (string file in files)
{
Console.Write(file + "\r\n");
}
Console.WriteLine("Press any key to continue...");
Console.ReadKey(true);
string pattern = " *[\\~#%&*{}/:<>?|\"-]+ *";
string replacement = " ";
Regex regEx = new Regex(pattern);
string[] fileDrive = Directory.GetFiles(#"C:\Documents and Settings\bob.smith\Desktop\~Test Folder for [SharePoint] %testing", "*.*", SearchOption.AllDirectories);
StreamWriter sw = new StreamWriter(#"C:\Documents and Settings\bob.smith\Desktop\~Test Folder for [SharePoint] %testing\File_Renames.txt");
foreach(string fileNames in fileDrive)
{
string sanitized = regEx.Replace(fileNames, replacement);
sw.Write(sanitized + "\r\n");
}
sw.Close();
}
}
So what i need to figure out is how to recursively search for these invalid chars, replace them in the actual filename itself. Anybody have any ideas?

When you are working recursively with files and directories, many times it's easier to use the DirectoryInfo class and it's members instead of the static methods. There is a pre-built tree structure for you, so you don't have to manage that yourself.
GetDirectories returns more DirectoryInfo instances so you can walk the tree, while GetFiles returns FileInfo objects.
This guy created a custom iterator to recursively yield file info, which when you combine it with your existing regex work, would complete your solution.

File.Move() effectively renames files. Basically, you'll just need to
File.Move(fileNames, sanitized);
inside the latter loop.
ALERT - possibly there'll be duplicate file names, so you'll have to establish a policy to avoid this, like appending a counter at the end of the sanitized variable. Also, apply a proper exception handling.
PS: Certainly, you don't need to search for characters like :\*.

Related

How Do I get Files From C# Directory

I need to get all files with prefix 009 from a server path.
But my code retrieving all files with 0000 prefix not specifically that starts with 009.
For example, I have files "000028447_ ghf.doc","0000316647 abcf.doc","009028447_ test2.doc","abcd.doc".
string [] files =Directory.GetFiles(filePath,"009*.doc)
is giving me all files except "abcd.doc". But I need "009028447_ test2.doc" instead.
If im giving Directory.GetFiles(filePath,"ab*.doc) it will retrieve "abcd.doc", and working as fine.But When im trying to give a pattern like "009"or "00002" it wont work as expected.
Your code snippet is missing a closing quote-character in the pattern. The code should be:
string[] files = Directory.GetFiles(filePath, "009*.doc");
Other than that, it seems to be working as intended. I've tested this by creating a folder with the files you mention in the question:
Next I created a console application, which uses your code to find the files, and prints all the results to the console. The output is the expected result:
C:\testfolder\009028447_ test2.doc
Here is the entire code for the console application:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
string filePath = #"C:\testfolder";
string[] files = Directory.GetFiles(filePath, "009*.doc");
// Creates a string with all the elements of the array, separated by ", "
string matchingFiles = string.Join(", ", files);
Console.WriteLine(matchingFiles);
// Since there is only one matching file, the above line only prints:
// C:\testfolder\009028447_ test2.doc
}
}
In conclusion, the code works. If you are getting other results, there must be other differences in your setup or code that you haven't mentioned.
If (and I did not check,) it is true that you are only receiving the wrong Files you should maybe use a foreach or linq to check if the Files match your criteria:
Foreach:
List<string> arrPaths = new List<string>();
Foreach(string strPath in Directory.GetFiles(filePath,".doc"))
{
if(strPath.EndsWith(".doc") & strPath.StartsWith("009"))
arrPaths.Add(strPath);
}
Linq:
List<string> arrPaths = Directory.GetFiles(filePath,".doc").Where(pths => pths.StartsWith("009") && pths.EndsWith(".doc")).ToList();
Both ways are more a workaround than a real solution, but I hope they're helping:)
EDIT
If you want to only get the Filenames i would subtract the filePath from your strPath like this:
Foreach:
arrPaths.Add(strPath.Replace(filePath + "\\", ""));
Linq:
List<string> arrPaths = Directory.GetFiles(filePath,".doc").Where(pt => pt.StartsWith("009") && pths.EndsWith(".doc")).Select(pths => pths.ToString().Replace(filePath + "\\", "").ToList();

Finding files starting with variable given by users

System.IO.FileInfo[] fileNames = dir.GetFiles("*check*.*");
This returns me any files with name check. But what i want is return those file starting with string given by users.
Example
string word = Console.Readkey();
System.IO.FileInfo[] fileNames = dir.GetFiles("*word*.*");
I tried this
System.IO.FileInfo[] fileNames = dir.GetFiles("*"+word+"*".*");
But this didnt work
First of all, Console.ReadKey() reads a key which I think you don't intend it to do, so change it to Console.ReadLine().
That being said, if you want to use the variable word, you don't put within the string. Put it outside and either concatenate with the rest of the pattern or use String.Format().
So use this code:
string word = Console.ReadLine();
System.IO.FileInfo[] fileNames = dir.GetFiles(String.Format("{0}*.*", word));
System.IO.FileInfo[] fileNames = dir.GetFiles(string.Format("{0}*.*", word));
You need to use this
public static string[] GetFiles(
string path,
string searchPattern
)
method from Directory class. The version with one string parameter accepts the path.

File names matching "..\ThirdParty\dlls\*.dll"

Is there an easy way to get a list of filenames that matach a filename pattern including references to parent directory? What I want is for "..\ThirdParty\dlls\*.dll" to return a collection like ["..\ThirdParty\dlls\one.dll", "..\ThirdParty\dlls\two.dll", ...]
I can find several questions relating matching files names including full path, wildcards, but nothing that includes "..\" in the pattern. Directory.GetFiles explicitly disallows it.
What I want to do with the names is to include them in a zip archive, so if there is a zip library that can understand relative paths like this I am happier to use that.
The pattern(s) are coming from an input file, they are not known at compile time. They can get quite complex, e.g ..\src\..\ThirdParty\win32\*.dll so parsing is probably not feasible.
Having to put it in zip is also the reason I am not very keen on converting the pattern to fullpath, I do want the relative paths in zip.
EDIT: What I am looking for really is a C# equivalent of /bin/ls.
static string[] FindFiles(string path)
{
string directory = Path.GetDirectoryName(path); // seperate directory i.e. ..\ThirdParty\dlls
string filePattern = Path.GetFileName(path); // seperate file pattern i.e. *.dll
// if path only contains pattern then use current directory
if (String.IsNullOrEmpty(directory))
directory = Directory.GetCurrentDirectory();
//uncomment the following line if you need absolute paths
//directory = Path.GetFullPath(directory);
if (!Directory.Exists(directory))
return new string[0];
var files = Directory.GetFiles(directory, filePattern);
return files;
}
There is the Path.GetFullPath() function that will convert from relative to absolute. You could use it on the path part.
string pattern = #"..\src\..\ThirdParty\win32\*.dll";
string relativeDir = Path.GetDirectoryName(pattern);
string absoluteDir = Path.GetFullPath(relativeDir);
string filePattern = Path.GetFileName(pattern);
foreach (string file in Directory.GetFiles(absoluteDir, filePattern))
{
}
If I understand you correctly you could use Directory.EnumerateFiles in combination with a regular expression like this (I haven't tested it though):
var matcher = new Regex(#"^\.\.\\ThirdParty\\dlls\\[^\\]+.dll$");
foreach (var file in Directory.EnumerateFiles("..", "*.dll", SearchOption.AllDirectories)
{
if (matcher.IsMatch(file))
yield return file;
}

Recursively looping through a drive and replacing illegal characters

I have to create an app that drills into a specific drive, reads all file names and replaces illegal SharePoint characters with underscores.
The illegal characters I am referring to are: ~ # % & * {} / \ | : <> ? - ""
Can someone provide either a link to code or code itself on how to do this? I am VERY new to C# and need all the help i can possibly get. I have researched code on recursively drilling through a drive but i am not sure how to put the character replace and the recursive looping together. Please help!
The advice for removing illegal characters is here:
How to remove illegal characters from path and filenames?
You just have to change the character set to your set of characters that you want to remove.
If you have figured out how to recurse the folders, you can get all of the files in each folder with:
var files = System.IO.Directory.EnumerateFiles(currentPath);
and then
foreach (string file in files)
{
System.IO.File.Move(file, ConvertFileName(file));
}
The ConvertFileName method you will write to accept a filename as a string, and return a filename stripped of the bad characters.
Note that, if you are using .NET 3.5, GetFiles() works too. According to MSDN:
The EnumerateFiles and GetFiles
methods differ as follows: When you
use EnumerateFiles, you can start
enumerating the collection of names
before the whole collection is
returned; when you use GetFiles, you
must wait for the whole array of names
to be returned before you can access
the array. Therefore, when you are
working with many files and
directories, EnumerateFiles can be
more efficient.
How to recursively list directories
string path = #"c:\dev";
string searchPattern = "*.*";
string[] dirNameArray = Directory.GetDirectories(path, searchPattern, SearchOption.AllDirectories);
// Or, for better performance:
// (but breaks if you don't have access to a sub directory; see 2nd link below)
IEnumerable<string> dirNameEnumeration = Directory.EnumerateDirectories(path, searchPattern, SearchOption.AllDirectories);
How to: Enumerate Directories and Files
How to recursively list all the files in a directory in C#?
Not really an answer, but consider both of the following:
The following characters are not valid in filenames anyways so you don't have to worry about them: /\:*?"<>|.
Make sure your algorithm handles duplicate names appropriately. For example, My~Project.doc and My#Project.doc would both be renamed to My_Project.doc.
A recursive method to rename files in folders is what you want. Just pass it the root folder and it will call itself for all subfolders found.
private void SharePointSanitize(string _folder)
{
// Process files in the directory
string [] files = Directory.GetFiles(_folder);
foreach(string fileName in files)
{
File.Move(fileName, SharePointRename(fileName));
}
string[] folders = Directory.GetDirectories(_folder);
foreach(string folderName in folders)
{
SharePointSanitize(folderName);
}
}
private string SharePointRename(string _name)
{
string newName = _name;
newName = newName.Replace('~', '');
newName = newName.Replace('#', '');
newName = newName.Replace('%', '');
newName = newName.Replace('&', '');
newName = newName.Replace('*', '');
newName = newName.Replace('{', '');
newName = newName.Replace('}', '');
// .. and so on
return newName;
}
Notes:
You can replace the '' in the SharePointRename() method to whatever character you want to replace with, such as an underscore.
This does not check if two files have similar names like thing~ and thing%
class Program
{
private static Regex _pattern = new Regex("[~#%&*{}/\\|:<>?\"-]+");
static void Main(string[] args)
{
DirectoryInfo di = new DirectoryInfo("C:\\");
RecursivelyRenameFilesIn(di);
}
public static void RecursivelyRenameFilesIn(DirectoryInfo root)
{
foreach (FileInfo fi in root.GetFiles())
if (_pattern.IsMatch(fi.Name))
fi.MoveTo(string.Format("{0}\\{1}", fi.Directory.FullName, Regex.Replace(fi.Name, _pattern.ToString(), "_")));
foreach (DirectoryInfo di in root.GetDirectories())
RecursivelyRenameFilesIn(di);
}
}
Though this will not handle duplicates names as Steven pointed out.

File extension - c#

I have a directory that contains jpg,tif,pdf,doc and xls. The client DB conly contains the file names without extension. My app has to pick up the file and upload the file. One of the properties of the upload object is the file extension.
Is there a way of getting file extension if all i have is the path and name
eg:
C:\temp\somepicture.jpg is the file and the information i have through db is
c:\temp\somepicture
Use Directory.GetFiles(fileName + ".*"). If it returns just one file, then you find the file you need. If it returns more than one, you have to choose which to upload.
Something like this maybe:
DirectoryInfo D = new DirectoryInfo(path);
foreach (FileInfo fi in D.GetFiles())
{
if (Path.GetFileNameWithoutExtension(fi.FullName) == whatever)
// do something
}
You could obtain a list of all of the files with that name, regardless of extension:
public string[] GetFileExtensions(string path)
{
System.IO.DirectoryInfo directory =
new System.IO.DirectoryInfo(System.IO.Path.GetDirectoryName(path));
return directory.GetFiles(
System.IO.Path.GetFileNameWithoutExtension(path) + ".*")
.Select(f => f.Extension).ToArray();
}
Obviously, if you have no other information and there are 2 files with the same name and different extensions, you can't do anything (e.g. there is somepicture.jpg and somepicture.png at the same time).
On the other hand, usually that won't be the case so you can simply use a search pattern (e.g. somepicture.*) to find the one and only (if you're lucky) file.
Search for files named somepicture.* in that folder, and upload any that matches ?
Get the lowest level folder for each path. For your example, you would have:
'c:\temp\'
Then find any files that start with your filename in that folder, in this case:
'somepicture'
Finally, grab the extension off the matching filename. If you have duplicates, you would have to handle that in a unique way.
You would have to use System.IO.Directory.GetFiles() and iterate through all the filenames. You will run into issues when you have a collision like somefile.jpg and somefile.tif.
Sounds like you have bigger issues than just this and you may want to make an argument to store the file extension in your database as well to remove the ambiguity.
you could do something like this perhaps....
DirectoryInfo di = new DirectoryInfo("c:/temp/");
FileInfo[] rgFiles = di.GetFiles("somepicture.*");
foreach (FileInfo fi in rgFiles)
{
if(fi.Name.Contains("."))
{
string name = fi.Name.Split('.')[0].ToString();
string ext = fi.Name.Split('.')[1].ToString();
System.Console.WriteLine("Extension is: " + ext);
}
}
One more, with the assumption of no files with same name but different extension.
string[] files = Directory.GetFiles(#"c:\temp", #"testasdadsadsas.*");
if (files.Length >= 1)
{
string fullFilenameAndPath = files[0];
Console.WriteLine(fullFilenameAndPath);
}
From the crippled file path you can get the directory path and the file name:
string path = Path.GetDirectoryName(filename);
string name = Path.GetFileName(filename);
Then you can get all files that matches the file name with any extension:
FileInfo[] found = new DirectoryInfo(path).GetFiles(name + ".*");
If the array contains one item, you have your match. If there is more than one item, you have to decide which one to use, or what to do with them.
All the pieces are here in the existing answers, but just trying to unify them into one answer for you - given the "guaranteed unique" declaration you're working with, you can toss in a FirstOrDefault since you don't need to worry about choosing among multiple potential matches.
static void Main(string[] args)
{
var match = FindMatch(args[0]);
Console.WriteLine("Best match for {0} is {1}", args[0], match ?? "[None found]");
}
private static string FindMatch(string pathAndFilename)
{
return FindMatch(Path.GetDirectoryName(pathAndFilename), Path.GetFileNameWithoutExtension(pathAndFilename));
}
private static string FindMatch(string path, string filename)
{
return Directory.GetFiles(path, filename + ".*").FirstOrDefault();
}
Output:
> ConsoleApplication10 c:\temp\bogus
Best match for c:\temp\bogus is [None found]
> ConsoleApplication10 c:\temp\7z465
Best match for c:\temp\7z465 is c:\temp\7z465.msi
> ConsoleApplication10 c:\temp\boot
Best match for c:\temp\boot is c:\temp\boot.wim

Categories

Resources