Find Particular File Types - c#

I've encountered an peculiar issue when utilizing System.IO. When you iterate through a directory for a file with a type of File, without an extension, the file isn't detected.
// Successful:
var files = DirectoryInfo(server.Path).GetFiles("sitemap*.*);
var contents = Directory.GetFiles(server.Path, "sitemap*.*", ...);
The above code would suffice in most instances, however if you have other types with identical name, they'll be collected as well.
Our issue is encountered when you only want the sitemap.file.
// Invalid Code:
var files = DirectoryInfo(server.Path).GetFiles("sitemap*.file);
var contents = Directory.GetFiles(server.Path, "sitemap*.file", ...);
var examples = DirectoryInfo(server.Path).GetFiles("sitemap*);
The array is empty, it doesn't find any of the raw .file extension files. I'm assuming the issue occurs because it doesn't actually have an extension.
How do you circumvent this limitation?
Update: I know I could do something along these lines with FileInfo[], but was hoping to find a more simple approach then iteration, then compare with:
var files = DirectoryInfo(server.Path).GetFiles("sitemap*.*);
foreach(var file in files)
if(file.Extension != ".gz" && file.Extension != ".xml")
{
// Do something with File.
}
Especially if you have a wide assortment of extensions within your directory. You would think it would account for such a type, or lack there of.

I believe you are looking to search files starting with sitemap and doesn't have any extension. Use "sitemap*." pattern.
var contents = Directory.GetFiles(server.Path, "sitemap*.");
Notice the last dot (.) in the pattern, that specifies to get those files which doesn't have any extension associated with it.
This will give you files like:
sitemap1
sitemap2
and will exclude files like:
sitemap1.gz
sitempa2.gz

The file doesn't have an extension. Configure your Explorer to not hide extensions and you'll see.
If you're only looking for extensionless files, change your if to:
if(string.IsNullOrEmpty(file.Extension))

Related

Parsing string containing Special Folder

I've allowed for custom paths to be entered and wanted the default to be something along the lines of:
%UserProfile%/Documents/foo, of course this needs to successfully parse the string and though it will work in Windows Explorer, I was wondering if I'm missing a library call or option for parsing this correctly.
DirectoryInfo's constructor certainly doesn't work, treating %UserProfile% like any other folder name.
If there is no good way, I'll manually parse it to substitute %foo% with the actual special folder location if it is in the Special Folders Enumeration.
Edit:
Code that does what I'm looking for (though would prefer a proper .NET library call):
var path = #"%UserProfile%/Documents/foo";
var specialFolders = Regex.Matches(path, "%(?<possibleSpecial>.+)%");
foreach (var spec in specialFolders.AsEnumerable())
{
if (Enum.TryParse<Environment.SpecialFolder>(spec.Groups["possibleSpecial"].Value, out var sf))
{
path = Regex.Replace(path, spec.Value, Environment.GetFolderPath(sf));
}
}
Use Environment.ExpandEnvironmentVariables on the path before using it.
var pathWithEnv = #"%UserProfile%/Documents/foo";
var path = Environment.ExpandEnvironmentVariables(pathWithEnv);
// your code...

Configuration Builder - search through subfolders for JSON files

I'd like to know how to set a ConfigurationBuilder to search through all subfolders for JSON files not only in Base Path itself.
var testDataBuilder = new ConfigurationBuilder();
var basePath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "myData");
testDataBuilder.SetBasePath(basePath);
testDataBuilder.AddJsonFile("myData.json");
testDataBuilder.AddJsonFile("myDataInSubCatalogue.json"); //HOW TO REACH THAT?
return testDataBuilder.Build();
First of all, if you know the names and locations of the files it's far safer to add them explicitly. Adding arbitrary configuration files without some way of checking their validity is a great way to get hacked.
AddJsonFile accepts a path but doesn't care where it points. There's no restriction that forces it to look into the current folder. That path could be a relative or absolute path. It could point to a remote shared folder too, allowing multiple clients to use a centralized configuration file.
You could enumerate all *.json files under the current folder and add them one after the other, eg :
var jsonFiles=Directory.EnumerateFiles(".","*.json", SearchOption.AllDirectories);
foreach(var path in jsonFiles)
{
builder.AddJsonFile(path);
}

Need to find list of exact FileNames based on an input file number

I need to make a list of image files based on a file number that I input. I can only find image files that have exactly that file number, even though the image file may have the same characters before and/or after.
So, I am trying to locate some image files in a folder structure based on a file number.
So if the file number is '00441' I need to find files numbered like this:
IM00441_000A.jpg
IM00441_205A.jpg
IM00441_110D.jpg
IM00441_A11.jpg
So, this wouldn’t be a problem if every image file had a underscore like these examples, BUT, there are some that do not have an underscore.
AND there are file numbers that have a suffix like we have '00441' AND '00441A'
So, in addition to the image files listed above, I might have:
IM00441A_000A.JPG
IM00441A_105A.JPG
IM00441A_110A.JPG
IM00441A_302A.JPG
So, because of these problems, I cannot break at the underscore and I can't use StartsWith (like the code below), because all of these example all start with '00441', just some end with 'A'.
This is the code that I have so far, but it doesn’t work because of what I said previously.
LocalFile = "IM" + FileNumber;
if (ValidateFilepath(di))
lstDocuments = di.GetFiles("*.jpg")
.Where(file => file.Name.ToUpperInvariant().StartsWith(LocalFile))
.Select(file => file.FullName).ToList(); // The ToUpperInvariant() makes the file name upper case, because the actual file names are not consistent.
I would recommend using Regular Expressions for this.
Here's an example (substitute the 00441 in the pattern variable for the file name you are searching for):
using System.Text.RegularExpressions;
// ...
var files = new List<string>
{
"IM00441_000A.jpg",
"IM00441_205A.jpg",
"IM00441_110D.jpg",
"IM00441_A11.jpg",
"IM00441A_000A.JPG",
"IM00441A_105A.JPG",
"IM00441A_110A.JPG",
"IM00441A_302A.JPG",
"IM00123_123A.jpg",
"IM00123A_123B.jpg",
"AIM00123_123C.jpg",
"IM00456_123A.jpg",
"IM00456A_123B.jpg",
"AIM00456_123C.jpg"
};
var pattern = #".+00441.*\.[jpg|JPG]";
foreach (var file in files)
{
if (Regex.IsMatch(file, pattern))
Console.WriteLine($"Found file: {file}");
}
// Output:
// Found file: IM00441_000A.jpg
// Found file: IM00441_205A.jpg
// Found file: IM00441_110D.jpg
// Found file: IM00441_A11.jpg
// Found file: IM00441A_000A.JPG
// Found file: IM00441A_105A.JPG
// Found file: IM00441A_110A.JPG
// Found file: IM00441A_302A.JPG

Doc and docx filetypes returns the same file

I am trying to get the document files in the below mentioned path. so i set the file types as doc and docx.
In the documents folder, i have the file "issues.docx"
string path1 = "C:\\Documents\\";
var dir = new DirectoryInfo(path1);
string[] fileTypes = { "doc", "docx" };
var myFiles = fileTypes.SelectMany(dir.GetFiles);
This is the code i have used in my application. here it returns the issues.docx file two times. But it should return the files one time only.How can i achieve it without any change in fileTypes?
NTFS isn't a search engine. Performing two separate searches will result in two scans of all the files, taking double the time.
It would be faster if you used EnumerateFiles to search for .doc files and split them by extension afterwards, eg with ToDictionary.
var filesByExtension=dir.EnumerateFiles("*.doc?")
.ToDictionary(fi=>fi.Extension,fi=>fi);
You can also group the results, if you want, eg to calculate statistics:
dir.EnumerateFiles("*.doc?")
.GroupBy(fi=>fi.Extension)
.Select(g=>new {
Extension=g.Key,
TotalSize=g.Sum(f=>f.Length),
Files=g.ToArray()
});
If you want accelerated searching, you can use the Windows Search service. Calling it isn't straightforward though, you have to call it as if it were an OLEDB database. The results may not be accurate either, if the indexer is still scanning the files
UPDATE
If there is nothing common in the file types to search, filtering can be performed in a Where expression:
var extensions=new[]{".doc",".docx",".png",".jpg"};
dir.EnumerateFiles()
.Where(fi=>extensions.Contains(fi.Extension))
.GroupBy(fi=>fi.Extension)
.Select(g=>new {
Extension=g.Key,
TotalSize=g.Sum(f=>f.Length),
Files=g.ToArray()
});
Where can be used to filter out small or large files, eg:
var extensions=new[]{".doc",".docx",".png",".jpg"};
dir.EnumerateFiles()
.Where(fi=>extensions.Contains(fi.Extension) && fi.Length>1024)
.GroupBy(fi=>fi.Extension)
.Select(g=>new {
Extension=g.Key,
TotalSize=g.Sum(f=>f.Length),
Files=g.ToArray()
});

Get directory name from full directory path regardless of trailing slash

I need to get the directory name from its path regardless of any of having a trailing backslash. For example, user may input one of the following 2 strings and I need the name of logs directory:
"C:\Program Files (x86)\My Program\Logs"
"C:\Program Files (x86)\My Program\Logs\"
None of the following gives correct answer ("Logs"):
Path.GetDirectoryName(m_logsDir);
FileInfo(m_logsDir).Directory.Name;
They apparently analyze the path string and in the 1st example decide that Logs is a file while it's really a directory.
So it should check if the last word (Logs in our case) is really a directory; if yes, return it, if no (Logs might be a file too), return a parent directory. If would require dealing with the actual filesystem rather than analyzing the string itself.
Is there any standard function to do that?
new DirectoryInfo(m_logsDir).Name;
This may help
var result = System.IO.Directory.Exists(m_logsDir) ?
m_logsDir:
System.IO.Path.GetDirectoryName(m_logsDir);
For this we have a snippet of code along the lines of:
m_logsDir.HasFlag(FileAttribute.Directory); //.NET 4.0
or
(File.GetAttributes(m_logsDir) & FileAttributes.Directory) == FileAttributes.Directory; // Before .NET 4.0
Let me rephrase my answer, because you have two potential flaws by the distinguishing factors. If you do:
var additional = #"C:\Program Files (x86)\My Program\Logs\";
var path = Path.GetDirectoryName(additional);
Your output would be as intended, Logs. However, if you do:
var additional = #"C:\Program Files (x86)\My Program\Logs";
var path = Path.GetDirectoryName(additional);
Your output would be My Program, which causes a difference in output. I would either try to enforce the ending \ otherwise you may be able to do something such as this:
var additional = #"C:\Program Files (x86)\My Program\Logs";
var filter = additional.Split('\\');
var getLast = filter.Last(i => !string.IsNullOrEmpty(i));
Hopefully this helps.
Along the lines of the previous answer, you could enforce the trailing slash like this:
Path.GetDirectoryName(m_logsDir + "\");
Ugly but it seems to work - whether there's 0 or 1 slash at the end. The double-slash is treated like a single-slash by GetDirectoryName.

Categories

Resources