LINQ IN where query - c#

I want to write foreach loop to get all files with specified extention from external txt file. For example I have in file variable:
extensions = "jpg,tif,bmp,png" or
extensions "jpg,tif" and I want to only get this files.
So far I have something like this but I don`t know how to go on.
extensions = Extensions.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string sourceFile in Directory.GetFiles(SourcePath, "*.*", SearchOption.AllDirectories).Where(s => s.EndsWith(extensions.)))
{
}
I don`t know how to get to every element in 'extensions' array. How can I solved that?

You can use Enumerable.Contains and System.IO.Path.GetExtension:
string[] extensions = {".jpg",".tif",".bmp",".png" };
var files = Directory.EnumerateFiles(SourcePath, "*.*", SearchOption.AllDirectories)
.Where(s => extensions.Contains(Path.GetExtension(s), StringComparer.InvariantCultureIgnoreCase));

Related

Adding File and Directory to Array

I have this method that searches all files and folders in "C:\Sharing".
string[] fileArray = Directory.GetFiles(#"C:\Sharing", "*.*", SearchOption.AllDirectories);
And foreach shows me full path of each file. Great. However, since these are in a directory called "Sharing", I want to check and add files that are like
C:\Sharing\Jerry2022\wedding.jpg (array: 'wedding.jpg', 'Jerry2022')
C:\Sharing\snapshot.jpg (array: 'snapshot.jpg')
C:\Sharing\Newsletter\cover-june.webp (array: 'cover-june.webp', 'Newsletter')
So as you can see, I want to add file and subdirectory name to a string array or List, doesnt matter. Excluding "Sharing".
How can I split the results? I know I can use Substring and LastIndexOf("\") + 1 and separate the ending '' but I'm not sure how to match up the filename with the subdir name too.
Any help is appreciated
You can use DirectoryInfo to get the information you want:
C#:
var directoryInfo = new DirectoryInfo(#"C:\Sharing");
if (directoryInfo.Exists)
{
foreach (var fileInfo in directoryInfo.GetFiles("*.*", SearchOption.AllDirectories))
{
var fileName = fileInfo.Name;
Console.WriteLine(fileName);
var directoryName = fileInfo.DirectoryName;
// you can use split to get the directory name array
Console.WriteLine(directoryName);
}
}
I found an other way, use Uri for this scenario:
C#:
string[] fileArray = Directory.GetFiles(#"C:\Sharing", "*.*", SearchOption.AllDirectories);
foreach (var s in fileArray)
{
var uri = new Uri(s);
var uriSegments = uri.Segments.ToArray();
}
You will see each part of the full path, but you may need to use .Trim('/') for each part. Then you can use string.Equals to get directories which you want.
You could split the results using Split
But of course you can also work with FileInfo instead

How to get files from a specific sub-directories using c#?

I'm trying to get all *.html files which are inside sub-directories named abcd to an array.
The path given can contain multiple *.html files in multiple sub-directories and even in the root directory(i.e. immediately inside the user given path) but I only want those *.html files which are inside the specificly named sub-directories(abcd) using LINQ.
This is what I tried
string workingPath = #"D:\Testing";
string[] myFiles = workingPath.Select(dirs => Directory.GetDirectories(workingPath)
.Select(folders => (from item in Directory.GetDirectories(folders, "abcd", SearchOption.AllDirectories)
.Select(item => Directory.GetFiles(item, "*.html"))
)));
I'm getting an error
A query body must end with a select clause or a group clause (CS0742)
. How do I fix this?
Your code does not look like it will compile. To start with workingPath.Select will return a collection of chars and you are trying to iterate over that again , which does not make sense considering your requirements.
You need something like this
var files = new List<string>();
if (Directory.Exists(workingPath))
{
foreach (var f in Directory.GetDirectories(workingPath, "abcd",
SearchOption.AllDirectories))
{
files.AddRange(Directory.GetFiles(f, "*.html"));
}
}
You can also do a one liner using LINQ
var files2 = Directory.GetDirectories(workingPath, "abcd", SearchOption.AllDirectories)
.SelectMany(d => Directory.GetFiles(d, "*.html")).ToArray();

Getting subdirectories of a path that have specific file extensions

I was wondering if there was a way to only retrieve the directories that have certain extensions.
For example
List<string> directories = Directory.GetDirectories(sourceTextBox.Text, "*", SearchOption.AllDirectories).ToList();
would give me all of the directories and subdirectories inside the path I gave it. However I only want it to retrieve the directories that have a .jpg or .png file inside of them.
List<string> directories = Directory.GetDirectories(sourceTextBox.Text, "*.png", SearchOption.AllDirectories).ToList();
directories.addRange(Directory.GetDirectories(sourceTextBox.Text, "*.jpg", SearchOption.AllDirectories).ToList());
Any way I could do this?
No guarantees in terms of performance, but for each directory you could check its files to see if it contains any with the matching extension:
List<string> imageDirectories = Directory.GetDirectories(sourceTextBox.Text, "*", SearchOption.AllDirectories)
.Where(d => Directory.EnumerateFiles(d)
.Select(Path.GetExtension)
.Where(ext => ext == ".png" || ext == ".jpg")
.Any())
.ToList();
There is no built in way of doing it, You can try something like this
var directories = Directory
.GetDirectories(path, "*", SearchOption.AllDirectories)
.Where(x=> Directory.EnumerateFiles(x, "*.jpg").Any() || Directory.EnumerateFiles(x, "*.png").Any())
.ToList();
You can use Directory.EnumerateFiles method to get the file matching criteria and then you can get their Path minus file name using Path.GetDirectoryName and add it to the HashSet. HashSet would only keep the unique entries.
HashSet<string> directories = new HashSet<string>();
foreach(var file in Directory.EnumerateFiles(sourceTextBox.Text,
"*.png",
SearchOption.AllDirectories))
{
directories.Add(Path.GetDirectoryName(file));
}
For checking multiple file extensions you have to enumerate all files and then check for extension for each file like:
HashSet<string> directories = new HashSet<string>();
string[] allowableExtension = new [] {"jpg", "png"};
foreach(var file in Directory.EnumerateFiles(sourceTextBox.Text,
"*",
SearchOption.AllDirectories))
{
string extension = Path.GetExtension(file);
if (allowableExtension.Contains(extension))
{
directories.Add(Path.GetDirectoryName(file));
}
}

Find in Files C#

I have a Folder which has multiple sub folders. Each sub folder has many .dot and .txt files in them.
Is there a simple solution in C# .NET that will iterate through each file and check the contents of that file for a key phrase or keyword?
Document Name Keyword1 Keyword2 Keyword3 ...
test.dot Y N Y
To summarise:
Select a folder
Enter a list of keywords to search for
The program will then search through each file and at the end output something like above, I am not to worried about creating the datatable to show the datagrid as I can do this. I just need to perform the find in files function similar to Notepad++'s find in files option
Thanks in advance
What you want is recursively iterate files in a directory (and maybe it's subdirectories).
So your steps would be to loop eeach file in the specified directory with Getfiles() from .NET. then if you encounter a directory loop it again.
This can be easily done with this code sample:
public static IEnumerable<string> GetFiles(string path)
{
foreach (string s in Directory.GetFiles(path, "*.extension_here"))
{
yield return s;
}
foreach (string s in Directory.GetDirectories(path))
{
foreach (string s1 in GetFiles(s))
{
yield return s1;
}
}
}
A more indepth look on iterating throug files in directories in .NET is located here:
http://blogs.msdn.com/b/brada/archive/2004/03/04/84069.aspx
Then you use the IndexOf method from String to check if your keywords are in the file (I discourage the use of ReadAllText, if your file is 5 MB big, your string will be too. Line-by-line will be less memory-hungry)
You can use Directory.EnumerateFiles with a searchpattern and the recursive hint(SearchOption.AllDirectories). The rest is easy with LINQ:
var keyWords = new []{"Y","N","Y"};
var allDotFiles = Directory.EnumerateFiles(folder, "*.dot", SearchOption.AllDirectories);
var allTxtFiles = Directory.EnumerateFiles(folder, "*.txt", SearchOption.AllDirectories);
var allFiles = allDotFiles.Concat(allTxtFiles);
var allMatches = from fn in allFiles
from line in File.ReadLines(fn)
from kw in keyWords
where line.Contains(kw)
select new {
File = fn,
Line = line,
Keyword = kw
};
foreach (var matchInfo in allMatches)
Console.WriteLine("File => {0} Line => {1} Keyword => {2}"
, matchInfo.File, matchInfo.Line, matchInfo.Keyword);
Note that you need to add using System.Linq;
Is there a way just to get the line number?
If you just want the line numbers you can use this query:
var matches = allFiles.Select(fn => new
{
File = fn,
LineIndices = String.Join(",",
File.ReadLines(fn)
.Select((l,i) => new {Line=l, Index =i})
.Where(x => keyWords.Any(w => x.Line.Contains(w)))
.Select(x => x.Index)),
})
.Where(x => x.LineIndices.Any());
foreach (var match in matches)
Console.WriteLine("File => {0} Linenumber => {1}"
, match.File, match.LineIndices);
It's a little bit more difficult since LINQ's query syntax doesn't allow to pass the index.
The first step: locate all files. It is easily done with System.IO.Directory.GetFiles() + System.IO.File.ReadAllText(), as others have mentioned.
The second step: find keywords in a file. This is simple if you have one keyword and it can be done with IndexOf() method, but iterating a file multiple times (especially if it is big) is a waste.
To quickly find multiple keywords in a text I think you should use the Aho-Corasick automaton (algorithm). See the C# implementation at CodeProject: http://www.codeproject.com/Articles/12383/Aho-Corasick-string-matching-in-C
Here's a way using Tim's original answer to get the line number:
var keyWords = new[] { "Keyword1", "Keyword2", "Keyword3" };
var allDotFiles = Directory.EnumerateFiles(folder, "*.dot", SearchOption.AllDirectories);
var allTxtFiles = Directory.EnumerateFiles(folder, "*.txt", SearchOption.AllDirectories);
var allFiles = allDotFiles.Concat(allTxtFiles);
var allMatches = from fn in allFiles
from line in File.ReadLines(fn).Select((item, index) => new { LineNumber = index, Line = item})
from kw in keyWords
where line.Line.Contains(kw)
select new
{
File = fn,
Line = line.Line,
LineNumber = line.LineNumber,
Keyword = kw
};
foreach (var matchInfo in allMatches)
Console.WriteLine("File => {0} Line => {1} Keyword => {2} Line Number => {3}"
, matchInfo.File, matchInfo.Line, matchInfo.Keyword, matchInfo.LineNumber);

c# Directory.GetFiles file structure from app root

I have the following piece of code:
string root = Path.GetDirectoryName(Application.ExecutablePath);
List<string> FullFileList = Directory.GetFiles(root, "*.*",
SearchOption.AllDirectories).Where(name =>
{
return !(name.EndsWith("dmp") || name.EndsWith("jpg"));
}).ToList();
Now this works very well, however the file names with it are quire long.
is there a way i can take out the path till root? but still show all the subfolders?
Root = C:\Users\\Desktop\Test\
But the code would return the whole path from C:
while I'd prefer if I could take out the root bit straight away. but still keep the file structure after it.
eg
C:\Users\\Desktop\Test\hi\hello\files.txt
would return
\hi\hello\files.txt
I know i can just iterate over the file list generated and remove it all one by one, I'm wondering if I can just filter it out stright.
Using the power of LINQ:
string root = Path.GetDirectoryName(Application.ExecutablePath);
List<string> FullFileList = Directory.GetFiles(root, "*.*", SearchOption.AllDirectories)
.Where(name =>
{
return !(name.EndsWith("dmp") || name.EndsWith("jpg"));
})
.Select(file => file.Replace(root, "")
.ToList();

Categories

Resources