Search file by filter with regex - c#

I need to find a file(s) that begin with the character "prft" the name of this files is "prft0000.140", "prft2100.140", "prft1258.140"... etc. And I need to verify if this file(s) exists in a directory specific. So I have this Regex for find them, but I don't know how write the filter to match.
List<string> prftFiles = (new DirectoryInfo(filePath))
.GetFiles(".", SearchOption.AllDirectories)
.Where(a => Regex.IsMatch(a.Name, "prft[^*]$"))
.Select(fi => fi.Name)
.ToList();
this not work "prft[^*]$", so, How is it??

why not just do List prftFiles = (new DirectoryInfo(filePath)).GetFiles("prft*", SearchOption.AllDirectories)

This is regex you could use
string pattern = #"^(prft\d{4}\.\d{3})$";
but you can find files by the wildcard and * like other guys said
if you want exactly math for patter prft 4 digits . 3 digits you should use the regex
because the prft* will find any files with name starts with prft

You actually don't need to use a Regex here, as the Directory class has a searching mechanism in the pattern you select.
Directory.GetFiles("C:\SomeDirectory", "prft*");
The * widlcard matches to anything.

Related

How to extract preceding strings in a given directory

The folder names are variable but I have this constant value in the directory - the "distributions" folder.
How can I extract the all the strings before the "distributions" folder?
> /<root>/win/<usr>/distributions/<dbms>/<repository>/<port
> type>/<remote system>/<port>
Currently I'm doing it in lengthy way (e.g. getting the length of the whole directory, finding the location of distributions word in the string, etc...).
I'm looking for a more elegant way. Could this be done using Regex, or a shorter version of my current implementation?
string.Split followed by TakeWhile can help you
var resultArray = str.Split(new []{#"/"},StringSplitOptions.RemoveEmptyEntries)
.TakeWhile(x=>!x.Equals("distributions"));
Output
<root>
win
<usr>
Update based on Commments
If you need entire path based before "distributions", You can use
var result = str.Split(new []{#"distributions"},StringSplitOptions.RemoveEmptyEntries)
.First();
Output
/<root>/win/<usr>/
string.split('/') will put each "component" of the path (or any string) in an array splitting them by delimiter (/ here). you could then loop through it.
Assuming you do want to get the path up until that point i would recommend using regex here is how i would do it.
Regex regex = new Regex(#".+?(?=distributions)");
Debug.WriteLine(regex.Match("/<root>/win/<usr>/distributions/<dbms>/<repository>").Value);
this outputs
/<root>/win/<usr>/
What is the problem with the good old way?
var s = "/<root>/win/<usr>/distributions/<dbms>/<repository>/<port.....";
var result = s.Substring(0, s.IndexOf("distributions"));
or s.Substring(0, s.IndexOf("/distributions/")+1) if that text might appeare in other form too...

C# How to GetDirectories without 8.3 format names

I'm trying to figure out the best way to get a list of directories using a wildcard that doesn't include 8.3 format names. I'm trying to get all the directories using a wildcard pattern like "3626*". The problem is both 'Directory.GetDirectoriesandDirectoryInfo.GetDirectories` include long file names and 8.3 format names. Thus I get entries I don't want. For example with the above wild card I get both "3626 Burnt Chimney" and "3689 Lavista". You can see the same behavior using a command prompt and the command "dir 3626*". This is on Windows 7 32 bit. How can I get only long file names to return?
Perform the filter after you retrieve the directories, e.g,
var files = new DirectoryInfo(#"C:\Path\")
.GetDirectories()
.Select(f => f.Name)
.Where(name => name.StartsWith("3626"));
You can find some more information here: List files in folder which match pattern
The comment from RagtimeWilly is what I also want to suggest.
For a good explanation of 8.3 names see http://en.wikipedia.org/wiki/8.3_filename
I just wrote a small example, perhaps you can use this as a starting point:
using System.Text.RegularExpressions;
...
Regex re83 = new Regex(#"^[^.]{1,8}(\.[^.]{0,3})?$");
DirectoryInfo directoryInfo = new DirectoryInfo("C:\\windows");
foreach (string no83 in directoryInfo.GetDirectories("i*").Select(di => di.Name).Where(n => !re83.IsMatch(n)))
{
Console.WriteLine(no83);
}
Searching the web, you find some other regex for matching 8.3 names, some a bit more complex. Some characters are not allowed in 8.3. names, which is handled by those. But you maybe are just interested in filtering out some unwanted path names.

Find files using wild card in C#

I am trying to find files from a directory:
String[] search1 = Directory.GetFiles(voiceSource, "85267-*.wav")
.Select(path => Path.GetFileName(path))
.ToArray();
String[] search2 = Directory.GetFiles(voiceSource, "85267 *.wav")
.Select(path => Path.GetFileName(path))
.ToArray();
But in search1, it selects both 85267-s.wav and 85267 -s.wav. But I want only 85267-s.wav to be selected.
search2 is doing well.
How can I do that?
The behaviour you are experiencing is because of short file name. Since you will get 85267-~1.WAV for 85267 -s.wav and since that matches your wild card "85267-*.wav" you get both files back.
The is explained in Directory.GetFiles Method (String, String)
Because this method checks against file names with both the 8.3 file
name format and the long file name format, a search pattern similar
to "1.txt" may return unexpected file names. For example, using a
search pattern of "1.txt" will return "longfilename.txt" because the
equivalent 8.3 file name format would be "longf~1.txt".
For workaround you can use Directory.EnumerateFiles to first select both files matching your criteria and then compare the actual(long) file name part using StartsWith. Remember EnumerateFiles does lazy evaluation.
String[] search1 = Directory.EnumerateFiles(#"C:\test", "85267-*.wav")
.Where(file => Path.GetFileName(file).StartsWith("85267-"))
.Select(path => Path.GetFileName(path))
.ToArray();
Yes, this is a side-effect of the MS-Dos 8.3 short name support that's still turned on today on most file systems. Something you can see with the DIR /X command, it displays those short names. On my machine:
C:\temp>dir /x *.wav
01/21/2015 09:11 AM 6 85267-~1.WAV 85267 -s.wav
01/21/2015 09:11 AM 6 85267-s.wav
2 File(s) 12 bytes
0 Dir(s) 235,121,160,192 bytes free
Note how the short name for "85267 -s" is missing the space. It is not a valid character in a short name. What's left over now also matches your wildcard.
That's not where the trouble ends with those short names, A wildcard like *.wav will also match a file like foobar.wavx, a completely different file type.
Short-name generation is, frankly, a relic from the previous century that ought to be turned off today. But that is not typically anything you can control yourself. You have to deal with these accidental matches and double-check what you get back. With a Regex for example.

File sequence > Find Name Pattern

I'm trying to figure out a solid way to solve multiple types of file sequences.
Consider these sequences
file_0000.jpg
file_0001.jpg
file_0002.jpg etc
&
new1File001.jpg
new1File002.jpg
new1File003.jpg
So it needs to find out where the first decimal of the sequence code starts.
FileInfo[] files = new DirectoryInfo(#"\\fileserver\").GetFiles("*.*", SearchOption.AllDirectories);
var grouped = files.OrderBy(f => f.Name).GroupBy(f => f.Name.Substring(0, f.Name.LastIndexOf("_")));
Obviously this finds file sequences where the sequence numbering is separated by "_". I want it to be filtered by the position of the first decimal of the last decimal sequence. My regex skills are not good and even then I don't know how to use it in the lamba expression.
The main question is, how can I find out where the number string starts for the above mentioned cases.
Any pointers would be great!
Thanks,
-Johan
Yes, regex is to rescue:
var r = new Regex(#".+(\d{2,}).");
var grouped =
files.
OrderBy(f => f.Name).
GroupBy(f => r.Match(f.Name).Groups[0].Value);

RegEx for extracting number from a string

I have a bunch of files in a directory, mostly labled something like...
PO1000000100.doc or .pdf or .txt
Some of them are PurchaseOrderPO1000000109.pdf
What i need to do is extract the PO1000000109 part of it. So basically PO with 10 numbers after it...
How can I do this with a regex?
(What i'll do is a foreach loop on the files in the directory, get the filename, and run it through the regex to get the PO number...)
I'm using C# - not sure if this is relevant.
Try this
String data =
Regex.Match(#"PO\d{10}", "PurchaseOrderPO1000000109.pdf",
RegexOptions.IgnoreCase).Value;
Could add a Regex.IsMatch with same vars above ofc :)
If the PO part is always the same, you can just get the number without needing to use a regex:
new string(theString.Where(c => char.IsDigit(c)).ToArray());
Later you can prepend the PO part manually.
NOTE: I'm assuming that you have only one single run of numbers in your strings. If you have for example "abc12345def678" you will get "12345678", which may not be what you want.
Regex.Replace(fileName, #"^.?PO(\d{10}).$", "$1");
Put stars after dots.
string data="PurchaseOrderPO1000000109.pdf\nPO1000000100.doc";
MatchCollection matches = Regex.Matches(data, #"PO[0-9]{10}");
foreach(Match m in matches){
print(m.Value);
}
Results
PO1000000109
PO1000000100
This RegEx will pick up all numbers from a string \d*.
As described here.
A possible regexp could be:
^.*(\d{10})\.\D{3}$
var re = new System.Text.RegularExpressions.Regex("(?<=^PurchaseOrder)PO\\d{10}(?=\\.pdf$)");
Assert.IsTrue(re.IsMatch("PurchaseOrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("some PurchaseOrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("OrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("PurchaseOrderPO1234567890.pdf2"));

Categories

Resources