I'm trying to figure out the best way to get a list of directories using a wildcard that doesn't include 8.3 format names. I'm trying to get all the directories using a wildcard pattern like "3626*". The problem is both 'Directory.GetDirectoriesandDirectoryInfo.GetDirectories` include long file names and 8.3 format names. Thus I get entries I don't want. For example with the above wild card I get both "3626 Burnt Chimney" and "3689 Lavista". You can see the same behavior using a command prompt and the command "dir 3626*". This is on Windows 7 32 bit. How can I get only long file names to return?
Perform the filter after you retrieve the directories, e.g,
var files = new DirectoryInfo(#"C:\Path\")
.GetDirectories()
.Select(f => f.Name)
.Where(name => name.StartsWith("3626"));
You can find some more information here: List files in folder which match pattern
The comment from RagtimeWilly is what I also want to suggest.
For a good explanation of 8.3 names see http://en.wikipedia.org/wiki/8.3_filename
I just wrote a small example, perhaps you can use this as a starting point:
using System.Text.RegularExpressions;
...
Regex re83 = new Regex(#"^[^.]{1,8}(\.[^.]{0,3})?$");
DirectoryInfo directoryInfo = new DirectoryInfo("C:\\windows");
foreach (string no83 in directoryInfo.GetDirectories("i*").Select(di => di.Name).Where(n => !re83.IsMatch(n)))
{
Console.WriteLine(no83);
}
Searching the web, you find some other regex for matching 8.3 names, some a bit more complex. Some characters are not allowed in 8.3. names, which is handled by those. But you maybe are just interested in filtering out some unwanted path names.
Related
I've read the docs about the Directory.GetPath search pattern and how it is used, because I noticed that *.dll finds both test.dll and test.dll_20170206. That behavior is documented
Now, I have a program that lists files in a folder based on a user-configured mask and processes them. I noticed that masks like *.txt lead to the above mentioned "problem" as expected.
However, the mask fixedname.txt also causes fixedname.txt_20170206 or the like to appear in the list, even though the documentation states this only occurs
When you use the asterisk wildcard character in a searchPattern such as "*.txt"
Why is that?
PS: I just checked: Changing the file mask to fixednam?.txt does not help even though the docs say
When you use the question mark wildcard character, this method returns only files that match the specified file extension. For example, given two files, "file1.txt" and "file1.txtother", in a directory, a search pattern of "file?.txt" returns just the first file, whereas a search pattern of "file*.txt" returns both files.
If you need a solution you may transform the filter pattern into a regular expression by replacing * by (.*) and ? by .. You also have to escape some pattern characters like the dot. Then you check each filename you got from Directory.GetFiles against this regular expression. Keep in mind to not only check if it is a match but that the match length is equal to the length of the filename. Otherwise you get the same results as before.
GetFiles uses pattern serach, it searches for all names in path ending with the letters specified.
You can write code similar to below to get only .txt extension file
foreach (string strFileName in Directory.GetFiles(#"D:\\test\","*.txt"))
{
string extension;
extension = Path.GetExtension(strFileName);
if (extension != ".txt")
continue;
else
{
//processed the file
}
}
I am trying to find files from a directory:
String[] search1 = Directory.GetFiles(voiceSource, "85267-*.wav")
.Select(path => Path.GetFileName(path))
.ToArray();
String[] search2 = Directory.GetFiles(voiceSource, "85267 *.wav")
.Select(path => Path.GetFileName(path))
.ToArray();
But in search1, it selects both 85267-s.wav and 85267 -s.wav. But I want only 85267-s.wav to be selected.
search2 is doing well.
How can I do that?
The behaviour you are experiencing is because of short file name. Since you will get 85267-~1.WAV for 85267 -s.wav and since that matches your wild card "85267-*.wav" you get both files back.
The is explained in Directory.GetFiles Method (String, String)
Because this method checks against file names with both the 8.3 file
name format and the long file name format, a search pattern similar
to "1.txt" may return unexpected file names. For example, using a
search pattern of "1.txt" will return "longfilename.txt" because the
equivalent 8.3 file name format would be "longf~1.txt".
For workaround you can use Directory.EnumerateFiles to first select both files matching your criteria and then compare the actual(long) file name part using StartsWith. Remember EnumerateFiles does lazy evaluation.
String[] search1 = Directory.EnumerateFiles(#"C:\test", "85267-*.wav")
.Where(file => Path.GetFileName(file).StartsWith("85267-"))
.Select(path => Path.GetFileName(path))
.ToArray();
Yes, this is a side-effect of the MS-Dos 8.3 short name support that's still turned on today on most file systems. Something you can see with the DIR /X command, it displays those short names. On my machine:
C:\temp>dir /x *.wav
01/21/2015 09:11 AM 6 85267-~1.WAV 85267 -s.wav
01/21/2015 09:11 AM 6 85267-s.wav
2 File(s) 12 bytes
0 Dir(s) 235,121,160,192 bytes free
Note how the short name for "85267 -s" is missing the space. It is not a valid character in a short name. What's left over now also matches your wildcard.
That's not where the trouble ends with those short names, A wildcard like *.wav will also match a file like foobar.wavx, a completely different file type.
Short-name generation is, frankly, a relic from the previous century that ought to be turned off today. But that is not typically anything you can control yourself. You have to deal with these accidental matches and double-check what you get back. With a Regex for example.
I need to list files in directory which match some pattern.
I tried playing with Directory.GetFiles, but don't fully
get why it behaves in some way.
1) For example, this code:
string[] dirs = Directory.GetFiles(#"c:\test\", "*t");
foreach (string dir in dirs)
{
Debugger.Log(0, "", dir);
Debugger.Log(0, "", "\n");
}
outputs this:
c:\test\11.11.2007.txtGif
c:\test\12.1.1990.txt
c:\test\2.tGift
c:\test\2.txtGif
c:\test\test.txt
...others hidden
You can see some files end with f but were still returned by query, why?
2) Also, this:
string[] dirs = Directory.GetFiles(#"c:\test\", "*.*.*.txt");
foreach (string dir in dirs)
{
Debugger.Log(0, "", dir);
Debugger.Log(0, "", "\n");
}
returns this:
c:\test\1.1.1990.txt
c:\test\1.31.1990.txt
c:\test\12.1.1990.txt
c:\test\12.31.1990.txt
But according to the documentation (http://msdn.microsoft.com/en-us/library/07wt70x2(v=vs.110).aspx) I think it had to return also
this file which is in the directory:
11.11.2007.txtGif
since extension (in the query string) is 3 letters long, but it didn't. why?
(when query extension is 3 letters long, doc says it will return extensions which start with specified extensions too, e.g., see Remarks).
Am I the only one who finds these results strange?
Is there any other approach you would recommend for using when one wants to list files in folder which match certain pattern?
User in my case may arbitrarily type some pattern, and I don't want to rely on
method which I am unsure about the result (like it happened with GetFiles).
This is the way that the Windows API works - you will see the same results if you use the dir command in a command prompt. This does NOT use regular expressions! It's pretty obscure...
If you want to do your own filtering, you can do it like so:
var filesEndingInT = Directory.EnumerateFiles(#"c:\test\").Where(f => f.EndsWith("t"));
If you want to use regular expressions to match, you can do it thusly:
Regex regex = new Regex(".*t$");
var matches = Directory.EnumerateFiles(#"c:\test\").Where(f => regex.IsMatch(f));
I suspect that you will want to let the user type in a simplified form of pattern and turn it into a regular expression, e.g.
"*.t" -> ".*t$"
The regular expression to find all filenames ending in t is ".*t$":
.*t$
Debuggex Demo
All of this behavior is exactly as described in the documentation you've linked. Here's an excerpt of the pertinent bits:
When you use the asterisk wildcard character in a searchPattern such
as "*.txt", the number of characters in the specified extension
affects the search as follows:
If the specified extension is exactly three characters long, the method returns files with extensions that begin with the specified
extension. For example, "*.xls" returns both "book.xls" and
"book.xlsx".
In all other cases, the method returns files that exactly match the specified extension. For example, "*.ai" returns "file.ai" but not
"file.aif".
When you use the question mark wildcard character, this method returns
only files that match the specified file extension. For example, given
two files, "file1.txt" and "file1.txtother", in a directory, a search
pattern of "file?.txt" returns just the first file, whereas a search
pattern of "file*.txt" returns both files. NoteNote
Because this method checks against file names with both the 8.3 file
name format and the long file name format, a search pattern similar to
"1.txt" may return unexpected file names. For example, using a
search pattern of "1.txt" returns "longfilename.txt" because the
equivalent 8.3 file name format is "LONGFI~1.TXT".
http://msdn.microsoft.com/en-us/library/wz42302f%28v=vs.110%29.aspx
The last paragraph above clearly explains your results when searching for *t. You can see this by using the command dir C:\test /x to show the 8.3 filenames. Here, C:\test\11.11.2007.txtGif matches *t because its 8.3 filename is 111120~1.TXT.
For the treatment of *.*.*.txt, I think you're either mis-interpreting the first bit about three-letter file extensions or perhaps it wasn't written quite clearly. Note that they quite specifically mentioned wildcard usage 'in a searchPattern such as "*.txt"'. Your search pattern doesn't match that, so you have to read between the lines a bit to see why their comment about three-letter file extensions applies to the example they gave but not yours. Really, I think that whole top section can be ignored if you just put a bit of thought into the last bit about 8.3 filenames. The treatment of three-letter file extensions after wildcards is really just a side-effect of the 8.3 filename search behavior.
Consider the examples they gave:
"*.xls" returns both "book.xls" and "book.xlsx"
This is because the filename for "book.xls" (both 8.3 and long filename, since the name naturally complies with 8.3) and the 8.3 filename for "book.xlsx" ("BOOK~1.XLS") matches a query of "*.xls".
"*.ai" returns "file.ai" but not "file.aif"
This is because "file.ai" naturally matches "*.ai" while "file.aif" doesn't. 8.3 search behavior doesn't come into play here at all, because both filenames are already 8.3-compliant. However, even if they weren't, the same would still hold true because any 8.3 filename for a file with an extension of ".ai" is still going to end in just ".AI".
The only reason it matters whether or not the file extension in your search is exactly three characters is because the 8.3 filenames are included in the search, and 8.3 filname extensions for objects with long filenames will always have just the first three characters after the last dot in the long filename. The key part missing from the documentation above is that the "first three characters" matching is done only against the 8.3 filename.
So, let's look at the anomalies you're asking about here. (If you want any other strange behaviors explained, beyond your results for *.t and *.*.*.txt, please post them as separate questions.)
TL;DR:
Output of a search for *t includes 11.11.2007.txtGif and 2.txtGif.
This is because the 8.3 filenames match a pattern of *t.
11.11.2007.txtGif = 111120~1.TXT
2.txtGIF = 2BEFD~1.TXT
(Both 8.3 filenames end in "T".)
Output of a search for *.*.*.txt does not include 11.11.2007.txtGif.
This is because neither the long filename, nor the 8.3 filename, match a pattern of *.*.*.txt.
11.11.2007.txtGif = 111120~1.TXT
(The long filename doesn't match because it doesn't end in ".txt", and the 8.3 filename doesn't match because it only has one dot.)
https://learn.microsoft.com/en-us/dotnet/api/system.io.directoryinfo.getfiles?view=netframework-4.5
The above Microsoft documentation is wrong as usual,
it says this code:
DirectoryInfo di = new DirectoryInfo(#"C:\Users\tomfitz\Documents\ExampleDir");
Console.WriteLine("No search pattern returns:");
Console.WriteLine();
Console.WriteLine("Search pattern *2* returns:");
foreach (var fi in di.GetFiles("*2*"))
{
Console.WriteLine(fi.Name);
Console.WriteLine(fi.Fullname); // this reveals the bug
}
should return the following but it does not
It still matches against the whole file path not just the filename.
Search pattern *2* returns:
log2.txt
test2.txt
I am currently looking for a regex that can help validate a file path e.g.:
C:\test\test2\test.exe
I decided to post this answer which does use a regular expression.
^(?:[a-zA-Z]\:|\\\\[\w\.]+\\[\w.$]+)\\(?:[\w]+\\)*\w([\w.])+$
Works for these:
\\test\test$\TEST.xls
\\server\share\folder\myfile.txt
\\server\share\myfile.txt
\\123.123.123.123\share\folder\myfile.txt
c:\folder\myfile.txt
c:\folder\myfileWithoutExtension
Edit: Added example usage:
if (Regex.IsMatch (text, #"^(?:[a-zA-Z]\:|\\\\[\w\.]+\\[\w.$]+)\\(?:[\w]+\\)*\w([\w.])+$"))
{
// Valid
}
*Edit: * This is an approximation of the paths you could see. If possible, it is probably better to use the Path class or FileInfo class to see if a file or folder exists.
I would recommend using the Path class instead of a Regex if your goal is to work with filenames.
For example, you can call Path.GetFullPath to "verify" a path, as it will raise an ArgumentException if the path contains invalid characters, as well as other exceptiosn if the path is too long, etc. This will handle all of the rules, which will be difficult to get correct with a Regex.
This is regular expression for Windows paths:
(^([a-z]|[A-Z]):(?=\\(?![\0-\37<>:"/\\|?*])|\/(?![\0-\37<>:"/\\|?*])|$)|^\\(?=[\\\/][^\0-\37<>:"/\\|?*]+)|^(?=(\\|\/)$)|^\.(?=(\\|\/)$)|^\.\.(?=(\\|\/)$)|^(?=(\\|\/)[^\0-\37<>:"/\\|?*]+)|^\.(?=(\\|\/)[^\0-\37<>:"/\\|?*]+)|^\.\.(?=(\\|\/)[^\0-\37<>:"/\\|?*]+))((\\|\/)[^\0-\37<>:"/\\|?*]+|(\\|\/)$)*()$
And this is for UNIX/Linux paths
^\/$|(^(?=\/)|^\.|^\.\.)(\/(?=[^/\0])[^/\0]+)*\/?$
Here are my tests:
Win Regex
Unix Regex
These works with Javascript
EDIT
I've added relative paths, (../, ./, ../something)
EDIT 2
I've added paths starting with tilde for unix, (~/, ~, ~/something)
The proposed one is not really good, this one I build for XSD, it's Windows specific:
^(?:[a-zA-Z]\:(\\|\/)|file\:\/\/|\\\\|\.(\/|\\))([^\\\/\:\*\?\<\>\"\|]+(\\|\/){0,1})+$
Try this one for Windows and Linux support: ((?:[a-zA-Z]\:){0,1}(?:[\\/][\w.]+){1,})
I use this regex for capturing valid file/folder paths in windows (including UNCs and %variables%), with the exclusion of root paths like "C:\" or "\\serverName"
^(([a-zA-Z]:|\\\\\w[ \w\.]*)(\\\w[ \w\.]*|\\%[ \w\.]+%+)+|%[ \w\.]+%(\\\w[ \w\.]*|\\%[ \w\.]+%+)*)
this regex does not match leading spaces in path elements, so
"C:\program files" is matched
"C:\ pathWithLeadingSpace" is not matched
variables are allowed at any level
"%program files%" is matched
"C:\my path with inner spaces\%my var with inner spaces%" is matched
regex CmdPrompt("^([A-Z]:[^\<\>\:\"\|\?\*]+)");
Basically we look for everything that's not in the list of forbidden Windows Path Characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
I know this is really old... but expanding on #agent-j's response I've added named groups, and support for period characters.
^(?<ParentPath>(?:[a-zA-Z]\:|\\\\[\w\s\.]+\\[\w\s\.$]+)\\(?:[\w\s\.]+\\)*)(?<BaseName>[\w\s\.]*?)$
I've saved this at Regexr
I found most of the answers here to be a little hit or miss.
Found a good solution here though:
https://social.msdn.microsoft.com/forums/vstudio/en-US/31d2bc84-c948-4914-8a9d-97b9e788b341/validate-a-network-folder-path
Note* - this is only for network shares - not local files
Answer:
string pattern = #"^\\{2}[\w-]+(\\{1}(([\w-][\w-\s]*[\w-]+[$$]?)|([\w-][$$]?$)))+";
string[] names = { #"\\my-network\somelocation", #"\\my-network\\somelocation",
#"\\\my-network\somelocation", #"my-network\somelocation",
#"\\my-network\\somelocation",#"\\my-network\somelocation\aa\dd",
#"\\my-network\somelocation\",#"\\my-network\\somelocation"};
foreach (string name in names)
{
if (Regex.IsMatch(name, pattern))
{
Console.WriteLine(name);
//Directory.Exists function to check if file exists
}
}
Alexander's Answer + Relative Paths
Alexander has the most correct answer thus far since it supports spaces in file names (i.e. C:\Program Files (x86)\ will match)... This aims to include relative paths as well.
For example, you can do cd / or cd \ and it does the same thing.
Further more, if you're currently in C:\some\path\to\some\place and you type either of those commands, you end up at C:\
Even more, you should consider paths, that start with '/' as a root path (to the current drive).
(?:[a-zA-Z]:(\|/)|file://|\\|.(/|\)|/)([^,\/:*\?\<>\"\|]+(\|/){0,1})
A Modified version of Alexander's answer, however, we include paths that are relative with no leading / or drive letter, as well as / with no leading drive letter (relative to the current drive as root).
I need to find a file(s) that begin with the character "prft" the name of this files is "prft0000.140", "prft2100.140", "prft1258.140"... etc. And I need to verify if this file(s) exists in a directory specific. So I have this Regex for find them, but I don't know how write the filter to match.
List<string> prftFiles = (new DirectoryInfo(filePath))
.GetFiles(".", SearchOption.AllDirectories)
.Where(a => Regex.IsMatch(a.Name, "prft[^*]$"))
.Select(fi => fi.Name)
.ToList();
this not work "prft[^*]$", so, How is it??
why not just do List prftFiles = (new DirectoryInfo(filePath)).GetFiles("prft*", SearchOption.AllDirectories)
This is regex you could use
string pattern = #"^(prft\d{4}\.\d{3})$";
but you can find files by the wildcard and * like other guys said
if you want exactly math for patter prft 4 digits . 3 digits you should use the regex
because the prft* will find any files with name starts with prft
You actually don't need to use a Regex here, as the Directory class has a searching mechanism in the pattern you select.
Directory.GetFiles("C:\SomeDirectory", "prft*");
The * widlcard matches to anything.