Get files from specifically structured sub-folders? - c#

How do I get *.xml files from a specifically structured folder/sub-folder system in array to perform some operation.
Eg: The sample structure of the parent folders in user provided path (say, myPath) is
2017-36459-20124-301236\2017\36459\20124\301236\301236.xml
I cannot use things like string[] tarDir = Directory.GetDirectories(myPath, "foldernameinitial"); as the folder name is changeable.
Does anyone have any idea how to solve this issue?

As I gathered clarification from your comments, this will get you all the sub-directories with only files in them ie., the last sub-directory
static IEnumerable<string> GetLastDirectory(string path) =>
Directory.GetDirectories(path, "*", SearchOption.AllDirectories)
.Where(dir => Directory.GetDirectories(dir).Length == 0);
Now use it as:
var MyDirectories = GetLastDirectory(#"D:\Softwares\Xtras"); //your path goes here
foreach (var subdir in MyDirectories)
{
var onlyXMLfiles = Directory.GetFiles(subdir, "*.xml");
foreach (var file in onlyXMLfiles)
{
//do your operation
}
}
To be frank I don't know regex, I tried this pattern match at regex101. But as you said in the comments below you want to match the pattern of directory structure also, you can do this:
string pattern = #"\d{4}-\d{4,10}-\d{4,10}-\d{4,10}\\\d{4}\\\d{4,10}\\\d{4,10}\\\d{4,10}";
//Now you won't have to use "GetLastDirectory", instead use "Directory.GetDirectories"
var MyDirectories = Directory.GetDirectories("your path goes here");
foreach (var subdir in MyDirectories)
{
if ((Regex.Match(subdir, pattern) != Match.Empty))
{
var onlyXMLfiles = Directory.GetFiles(subdir, "*.xml");
foreach (var file in onlyXMLfiles)
{
//do your operations
}
}
}
Probable pattern explanation:
\ : match keyword, maybe!?<br>
- : hyphen as mentioned in the folder structure<br>
\d : match digits only<br>
\d{4} : match digits of length 4 and above<br>
\d{4,10} : match digits of length 4 and limit upto upto 10<br>
\\ : match \ as in the folder path<br>

var job_folders = Directory.EnumerateDirectories(textBox1.Text, "*", SearchOption.TopDirectoryOnly);
if (job_folders.ToArray().Length == 0)
{
MessageBox.Show("NO job folders are found...");
}
else
{
foreach (string job_folder in job_folders)
{
var target_xml_file = Directory.GetFiles(job_folder, "*.xml", SearchOption.AllDirectories).Where(x => Path.GetFileName(Path.GetDirectoryName(x)).ToLower() == "xml");
var target_meta_file = Directory.GetFiles(job_folder, "*.xml", SearchOption.AllDirectories).Where(x => Path.GetFileName(Path.GetDirectoryName(x)).ToLower() == "meta");
}
}

Related

How to I get all files from a directory with a variable extension of specified length?

I have a huge directory I need retrieve files from including subdirectories.
I have files that are folders contain various files but I am only interested in specific proprietary files named with an extension with a length of 7 digits.
For example, I have folder that contains the following files:
abc.txt
def.txt
GIWFJ1XA.0201000
GIWFJ1UC.0501000
NOOBO0XA.0100100
summary.pdf
someinfo.zip
T7F4JUXA.0300600
vxy98796.csv
YJHLPLBO.0302300
YJHLPLUC.0302800
I have tried the following:
var fileList = Directory.GetFiles(someDir, "*.???????", SearchOption.AllDirectories)
and also
string searchSting = string.Empty;
for (int j = 0; j < 9999999; j++)
{
searchSting += string.Format(", *.{0} ", j.ToString("0000000"));
}
var fileList2 = Directory.GetFiles(someDir, searchSting, SearchOption.AllDirectories);
which errors because the string is too long obviously.
I want to only return the files with the specified length of the extension, in this case, 7 digits to avoid having to loop over the thousands I would have to process.
I have considered creating a variable string for the search criteria that would contain all 99,999,999 possible digits but d
How can I accomplish this?
I don't believe there's a way you can do this without looping through the files in the directory and its subfolders. The search pattern for GetFiles doesn't support regular expressions, so we can't really use something like [\d]{7} as a filter. I would suggest using Directory.EnumerateFiles and then return the files that match your criteria.
You can use this to enumerate the files:
private static IEnumerable<string> GetProprietaryFiles(string topDirectory)
{
Func<string, bool> filter = f =>
{
string extension = Path.GetExtension(f);
// is 8 characters long including the .
// all remaining characters are digits
return extension.Length == 8 && extension.Skip(1).All(char.IsDigit);
};
// EnumerateFiles allows us to step through the files without
// loading all of the filenames into memory at once.
IEnumerable<string> matchingFiles =
Directory.EnumerateFiles(topDirectory, "*", SearchOption.AllDirectories)
.Where(filter);
// Return each file as the enumerable is iterated
foreach (var file in matchingFiles)
{
yield return file;
}
}
Path.GetExtension includes the . so we check that the number of characters including the . is 8, and that all remaining characters are digits.
Usage:
List<string> fileList = GetProprietaryFiles(someDir).ToList();
I would just grab the list of files in the directory, and then check if the substring length after the '.' is equal to 7. (* As long as you know no other files would have that length extension)
EDITED to use Path instead:
Directory.GetFiles(#"C:\temp").Where(
fileName => Path.GetExtension(fileName).Length == 8
).ToList();
OLD:
Directory.GetFiles(someDir).Where(
fileName => fileName.Substring(fileName.LastIndexOf('.') + 1).Length == 7
).ToList();
Consider files as Directory.GetFiles() result.
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
List<string> files = new List<string>()
{"abc.txt", "def.txt", "GIWFJ1XA.0201000", "GIWFJ1UC.0501000", "NOOBO0XA.0100100", "summary.pdf", "someinfo.zip", "T7F4JUXA.0300600", "vxy98796.csv", "YJHLPLBO.0302300", "YJHLPLUC.0302800"};
Regex r = new Regex("^\\.\\d{7}$");
foreach (string file in files.Where(o => r.IsMatch(Path.GetExtension(o))))
{
Console.WriteLine(file);
}
}
}
Output:
GIWFJ1XA.0201000
GIWFJ1UC.0501000
NOOBO0XA.0100100
T7F4JUXA.0300600
YJHLPLBO.0302300
YJHLPLUC.0302800
Edit: I tried (r.IsMatch) instead of using o but dotnetfiddle Compiler is giving me error saying
Compilation error (line 14, col 27): The call is ambiguous between the following methods or properties: 'System.Linq.Enumerable.Where<string>(System.Collections.Generic.IEnumerable<string>, System.Func<string,bool>)' and 'System.Linq.Enumerable.Where<string>(System.Collections.Generic.IEnumerable<string>, System.Func<string,int,bool>)'
Can't debug it since I am busy now, I'd be happy if anyone passing by suggest any fix for that. But the current code above works.

Use Foreach loop in C# to loop through files and remove a dash plus three extra characters before .pdf

I have several hundred scanned in files within a Desktop directory. I need to loop thru them and change the file names. The files must retain the first 6 characters of their name. Everything after and including the dash needs to be removed. The file extension (.pdf) is needed too.
The file names are like this:
000001-067.pdf
000034-003.pdf
000078-123.pdf
000009-011.pdf
What I need to do is remove the the dash and final three characters in file name. So the results will be:
000001.pdf
000034.pdf
000078.pdf
000009.pdf
I wrote the following code but it throws an error on File.Move. Any ideas how to fix it?
DirectoryInfo d = new DirectoryInfo(#"C:\Users\BrewMaster\Desktop\ScannedFilesToProcess\");
FileInfo[] infos = d.GetFiles();
foreach (FileInfo f in infos)
{
string fileName = f.Name;
int indexOfDash = fileName.LastIndexOf('-'); // find the position of -
int indexOfPeriod = fileName.LastIndexOf('.'); // find the position of .
// find remove the text between - and .
string newFileName = fileName.Remove(indexOfDash, indexOfPeriod - indexOfDash);
//File.Move(f.FullName, f.FullName.Replace("-", "")); //This only removes the dash. The 3 characters after it remain
File.Move(f.Name, newFileName); //This throws and error. System.IO.FileNotFoundException ' Could not find file C:\Users\BrewMaster\source\repos\ChangeFileName\bin\Debug\000001-067.pdf
}
Here was my solution:
DirectoryInfo d = new DirectoryInfo(#"C:\Users\BrewMaster\Desktop\ScannedFilesToProcess\");
FileInfo[] infos = d.GetFiles();
foreach (FileInfo f in infos)
{
string fileName = f.Name;
int indexOfDash = fileName.LastIndexOf('-'); // find the position of '-'
int indexOfPeriod = fileName.LastIndexOf('.'); // find the position of '.'
// find remove the text between '-' and '.'
string newFileName = fileName.Remove(indexOfDash, indexOfPeriod - indexOfDash);
File.Move(f.FullName, f.FullName.Replace(f.Name, newFileName));
}
I think the issue is that f.Name only returns the name of the file, not the fullpath needed to move it. An easy idea would be get the f.FullName and the f.Name, so once you move it, you specify the original path of the archive, and them combine the new directory path with the new file name (which you already redefined) to move it.
It is trying to rename/move the file in the current directory, and not in the directory which you originally scanned.
Try:
String path = #"C:\Users\BrewMaster\Desktop\ScannedFilesToProcess\";
DirectoryInfo d = new DirectoryInfo(path);
FileInfo[] infos = d.GetFiles();
foreach (FileInfo f in infos)
{
string fileName = f.Name;
int indexOfDash = fileName.LastIndexOf('-'); // find the position of -
int indexOfPeriod = fileName.LastIndexOf('.'); // find the position of .
// find remove the text between - and .
string newFileName = fileName.Remove(indexOfDash, indexOfPeriod - indexOfDash);
//File.Move(f.FullName, f.FullName.Replace("-", "")); //This only removes the dash. The 3 characters after it remain
File.Move(Path.Combine(path, f.Name), Path.Combine(path, newFileName));
}

Check if a folder exists in the filepath

I want to loop through all sub folders and files in a folder and check whether a particular filename contains a folder say "X" in its path (ancestor). I dont want to use string comparison.Is there a better way?
Answering your specific question (the one that is in the title of your question, not in the body), once you have the filename (which other answers tell you how to find), you can do:
bool PathHasFolder(string pathToFileName, string folderToCheck)
{
return Path.GetDirectoryName(pathToFileName)
.Split(Path.DirectorySeparatorChar)
.Any(x => x == folderToCheck);
}
This will work only with absolute paths... if you have relative paths you can complicate it further (this requires the file to actually exist though):
bool PathHasFolder(string pathToFileName, string folderToCheck)
{
return new FileInfo(pathToFileName)
.Directory
.FullName
.Split(Path.DirectorySeparatorChar)
.Any(x => x == folderToCheck);
}
You can use Directory.GetFiles()
// Only get files that begin with the letter "c."
string[] dirs = Directory.GetFiles(#"c:\", "c*");
Console.WriteLine("The number of files starting with c is {0}.", dirs.Length);
foreach (string dir in dirs)
{
Console.WriteLine(dir);
}
https://msdn.microsoft.com/en-us/library/6ff71z1w(v=vs.110).aspx
You can use recursive search like
// sourcedir = path where you start searching
public void DirSearch(string sourcedir)
{
try
{
foreach (string dir in Directory.GetDirectories(sourcedir))
{
DirSearch(dir);
}
// If you're looking for folders and not files take Directory.GetDirectories(string, string)
foreach (string filepath in Directory.GetFiles(sourcedir, "whatever-file*wildcard-allowed*"))
{
// list or sth to hold all pathes where a file/folder was found
_internalPath.Add(filepath);
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
So in your case you're looking for folder XYZ, use
// Takes all folders in sourcedir e.g. C:/ that starts with XYZ
foreach (string filepath in Directory.GetDirectories(sourcedir, "XYZ*")){...}
So if you would give sourcedir C:/ it would search in all folders available on C:/ which would take quite a while of course

In C# how can I prepare a string to be valid for windows directory name

I am writing a C# program which reads certain tags from files and based on tag values it creates a directory structure.
Now there could be anything in those tags,
If the tag name is not suitable for a directory name I have to prepare it to make it suitable by replacing those characters with anything suitable. So that directory creation does not fail.
I was using following code but I realised this is not enough..
path = path.replace("/","-");
path = path.replace("\\","-");
please advise what's the best way to do it..
thanks,
Import System.IO namespace and for path use
Path.GetInvalidPathChars
and for filename use
Path.GetInvalidFileNameChars
For Eg
string filename = "salmnas dlajhdla kjha;dmas'lkasn";
foreach (char c in Path.GetInvalidFileNameChars())
filename = filename.Replace(System.Char.ToString(c), "");
foreach (char c in Path.GetInvalidPathChars())
filename = filename.Replace(System.Char.ToString(c), "");
Then u can use Path.Combine to add tags to create a path
string mypath = Path.Combine(#"C:\", "First_Tag", "Second_Tag");
//return C:\First_Tag\Second_Tag
You can use the full list of invalid characters here to handle the replacement as desired. These are available directly via the Path.GetInvalidFileNameChars and Path.GetInvalidPathChars methods.
The characters you must now use are: ? < > | : \ / * "
string PathFix(string path)
{
List<string> _forbiddenChars = new List<string>();
_forbiddenChars.Add("?");
_forbiddenChars.Add("<");
_forbiddenChars.Add(">");
_forbiddenChars.Add(":");
_forbiddenChars.Add("|");
_forbiddenChars.Add("\\");
_forbiddenChars.Add("/");
_forbiddenChars.Add("*");
_forbiddenChars.Add("\"");
for (int i = 0; i < _forbiddenChars.Count; i++)
{
path = path.Replace(_forbiddenChars[i], "");
}
return path;
}
Tip: You can't include double-quote ("), but you can include 2 quotes ('').
In this case:
string PathFix(string path)
{
List<string> _forbiddenChars = new List<string>();
_forbiddenChars.Add("?");
_forbiddenChars.Add("<");
_forbiddenChars.Add(">");
_forbiddenChars.Add(":");
_forbiddenChars.Add("|");
_forbiddenChars.Add("\\");
_forbiddenChars.Add("/");
_forbiddenChars.Add("*");
//_forbiddenChars.Add("\""); Do not delete the double-quote character, so we could replace it with 2 quotes (before the return).
for (int i = 0; i < _forbiddenChars.Count; i++)
{
path = path.Replace(_forbiddenChars[i], "");
}
path = path.Replace("\"", "''"); //Replacement here
return path;
}
You'll of course use only one of those (or combine them to one function with a bool parameter for replacing the quote, if needed)
The correct answer of Nikhil Agrawal has some syntax errors.
Just for the reference, here is a compiling version:
public static string MakeValidFolderNameSimple(string folderName)
{
if (string.IsNullOrEmpty(folderName)) return folderName;
foreach (var c in System.IO.Path.GetInvalidFileNameChars())
folderName = folderName.Replace(c.ToString(), string.Empty);
foreach (var c in System.IO.Path.GetInvalidPathChars())
folderName = folderName.Replace(c.ToString(), string.Empty);
return folderName;
}

how to get specific file names using c#

I have 10 zip files in a folder having path like this
TargetDirectory = "C:\docs\folder\"
some of the zip files names are like this
abc-19870908.Zip
abc-19870908.zip
abc-12345678.zip
and some are like this...
doc.zip
doc123.zip
..
I am getting all file names by using following code...
string [] fileEntries = Directory.GetFiles(targetDirectory);
foreach(string fileName in fileEntries)
{
// here I need to compare ,
// I mean I want to get only these files which are having
// these type of filenames `abc-19870908.Zip`
if (filename == "")
{
// I want to do something
}
}
What i have to put in the double quotes in this line if(filename == "") to get like abc-19870908.Zip these filenames.
Would any one please suggest any idea about this?
Many thanks...
If you're only interested in zip files containing a dash, you can provide a search pattern to Directory.GetFiles.
string [] fileEntries = Directory.GetFiles(targetDirectory, "*-*.zip");
Check out this link for more information on those search patterns: http://msdn.microsoft.com/en-us/library/wz42302f.aspx
I guess you can do
if (filename.Contains("-"))
{
...
}
if the - is always present in the filenames you are interested in
or
if (filename.StartsWith("abc-"))
{
...
}
if the filenames always start with abc- for the ones you are interested in.
you can do if(filename.StartsWith ("abc-") ) or you can do if (filename.Contains ( "-" ) ) or you can do string [] fileEntries = Directory.GetFiles(targetDirectory, "abc-*.Zip");
// Consider using this overload:
// public static string[] GetFiles( string path, string searchPattern)
string [] fileEntries = Directory.GetFiles(targetDirectory, "abc*.zip");
Alternatively, you can use a regular expression as follows:
string [] fileEntries = Directory.GetFiles(targetDirectory);
foreach(string fileName in fileEntries)
{
if(Regex.Match (filename, #"abc.*?\.zip", RegexOptions.IgnoreCase))
{
// i want to do something
}
}
List<String> files = Directory.GetFiles(#"C:\docs\folder\").ToList();
var g = from String s in files where s.StartsWith("abc") select s;
foreach(var z in g)
{
//Do stuff in here as a replacement for your if
}
You could use a regular expression that matches your filenames, something along thees lines:
string sPattern = "abc-\d+\.zip";
string [] fileEntries = Directory.GetFiles(targetDirectory);
foreach(string fileName in fileEntries)
{
// here i need to compare , i mean i want to get only these files which are having these type of filenames `abc-19870908.Zip`
if(System.Text.RegularExpressions.Regex.IsMatch(filename , sPattern, System.Text.RegularExpressions.RegexOptions.IgnoreCase))
{
// i want to do something
}
}
The regular expression "abc-\d+.zip" means
the string "abc-" followed by any number of digits, followed by a . followed by the string "zip" (regular expression syntax)

Categories

Resources