I need to be able to extract the full file path out of this string (without whatever is after the file extension):
$/FilePath/FilePath/KeepsGoing/Folder/Script.sql (CS: 123456)
A simple solution such as the following could would work for this case, however it is only limited to a file extension with 3 characters:
(\$.*\..{3})
However, I find problems with this when the file contains multiple dots:
$/FilePath/FilePath/File.Setup.Task.exe.config (CS: 123456)
I need to be able to capture the full file path (from $ to the end of whatever the file extension is, which can be any number of things). I need to be able to get this no matter how many dots are in the name of the file. In some cases there are spaces in the name of the file too, so I need to be able to incorporate that.
Edit: The ending (CS....) in this case is not standard. All kinds of stuff can follow the path so I cannot predict what will come after the path, but the path will always be first. Sometimes spaces do exist in the file name.
Any suggestions?
Try this:
(\$.*\.[\w.-]+)
But! it will not properly match files with space or special chars in the file extension. If you need to match files that might have special chars in the file extension you'll need to elaborate on the input (is it quoted? is it escaped?).
Related
I've read the docs about the Directory.GetPath search pattern and how it is used, because I noticed that *.dll finds both test.dll and test.dll_20170206. That behavior is documented
Now, I have a program that lists files in a folder based on a user-configured mask and processes them. I noticed that masks like *.txt lead to the above mentioned "problem" as expected.
However, the mask fixedname.txt also causes fixedname.txt_20170206 or the like to appear in the list, even though the documentation states this only occurs
When you use the asterisk wildcard character in a searchPattern such as "*.txt"
Why is that?
PS: I just checked: Changing the file mask to fixednam?.txt does not help even though the docs say
When you use the question mark wildcard character, this method returns only files that match the specified file extension. For example, given two files, "file1.txt" and "file1.txtother", in a directory, a search pattern of "file?.txt" returns just the first file, whereas a search pattern of "file*.txt" returns both files.
If you need a solution you may transform the filter pattern into a regular expression by replacing * by (.*) and ? by .. You also have to escape some pattern characters like the dot. Then you check each filename you got from Directory.GetFiles against this regular expression. Keep in mind to not only check if it is a match but that the match length is equal to the length of the filename. Otherwise you get the same results as before.
GetFiles uses pattern serach, it searches for all names in path ending with the letters specified.
You can write code similar to below to get only .txt extension file
foreach (string strFileName in Directory.GetFiles(#"D:\\test\","*.txt"))
{
string extension;
extension = Path.GetExtension(strFileName);
if (extension != ".txt")
continue;
else
{
//processed the file
}
}
I have a file that uses NUL's and SOH as markers. I need to look at the pattern of those special characters in order to parse out what I need. For example, when the file is viewed in Notepad++:
NULNULNUL Boom BoomSOHNULNULNULNULDLENULNULNULSIJohn Lee HookerSOH
I would like to extract the "Boom Boom" and the "John Lee Hooker". Those values will change (these are music files) with each file.
I was thinking of usiing the "NULNULNUl" pattern to find the first section and the "NULSI" to find the second part.
I tried a FileStream to read in the bytes, but i don't know how to detect the special characters.
So, I feel lame for asking this, but I'm kinda stumped. I'm trying to get a list of file in a directory that end in tif ... only tif ... not tiff. So, I did this in C# ...
Directory.GetFiles(path, "*.tif", SearchOption.TopDirectoryOnly);
I would expect it to only return tif files, but that is not the case. I get tiff as well. I would think that if I supplied the mask .tif? that would get me both, but not the mask .tif. I tried it at a command prompt as well and I am getting both as well in DOS. Am I missing something here? This just seems wrong to me. I guess I could sanitize the results afterwards, but if I don't have to that would be best.
From MSDN:
When using the asterisk wildcard character in a searchPattern (for
example, "*.txt"), the matching behavior varies depending on the
length of the specified file extension. A searchPattern with a file
extension of exactly three characters returns files with an extension
of three or more characters, where the first three characters match
the file extension specified in the searchPattern. A searchPattern
with a file extension of one, two, or more than three characters
returns only files with extensions of exactly that length that match
the file extension specified in the searchPattern. When using the
question mark wildcard character, this method returns only files that
match the specified file extension. For example, given two files in a
directory, "file1.txt" and "file1.txtother", a search pattern of
"file?.txt" returns only the first file, while a search pattern of
"file*.txt" returns both files.
That's just how Directory.GetFiles works. From the manual:
When using the asterisk wildcard character in a searchPattern, such as
"*.txt", the matching behavior when the extension is exactly three
characters long is different than when the extension is more or less
than three characters long. A searchPattern with a file extension of
exactly three characters returns files having an extension of three or
more characters, where the first three characters match the file
extension specified in the searchPattern.
Directory.GetFiles internally uses FindFirstFile function from Win32 API.
From the documentation of FindFirstFile:
• The search includes the long and short file names.
A file that has long file name of asd.tiff will have a short file name like asd~1.tif and this is why it shows up in the results.
More than three character extensions are matched except when the path is on a network share (or mapped drive). For some reason the pattern only matches the long file name on remote drives.
Using c# in a Windows Form I need to search a directory "C:\XML\Outbound" for the file that contains an order number 3860457 and return the path to the file that contains the order number so I can then open the file and display the contents to the user in a RickTextBox.
The end user will have the order number but will not know what file contains that order number so that is why I need to search all files till it finds the filecontaining the order number and return the path (e.g. "C:\XML\Outbound\some_file_name_123.txt")
I am somewhat new to c# so I am not even sure where to start with this. Any direction for this?
Sorry the order number is inside the file so I need to search each file contents for the order number and once the file containing the order number is found return the path to that file. Order number is not part of the file name.
Straight answer:
public string GetFileName(string search){
List<string> paths = Directory.GetFiles(#"C:\XML\Outbond","*.txt",SearchOption.AllDirectories).ToList();
string path = paths.FirstOrDefault(p=>File.ReadAllLines(p).Any(line=>line.IndexOf(search)>=0));
return path;
}
Not-so straight answer:
Even though the above function will give you the path for given string (some handling of errors and edge cases may be nice) it will be terribly slow, especially if you have lots of files. If that's the case you need to tell us more about your environment because chances are you're doing it wrong (:
I am currently looking for a regex that can help validate a file path e.g.:
C:\test\test2\test.exe
I decided to post this answer which does use a regular expression.
^(?:[a-zA-Z]\:|\\\\[\w\.]+\\[\w.$]+)\\(?:[\w]+\\)*\w([\w.])+$
Works for these:
\\test\test$\TEST.xls
\\server\share\folder\myfile.txt
\\server\share\myfile.txt
\\123.123.123.123\share\folder\myfile.txt
c:\folder\myfile.txt
c:\folder\myfileWithoutExtension
Edit: Added example usage:
if (Regex.IsMatch (text, #"^(?:[a-zA-Z]\:|\\\\[\w\.]+\\[\w.$]+)\\(?:[\w]+\\)*\w([\w.])+$"))
{
// Valid
}
*Edit: * This is an approximation of the paths you could see. If possible, it is probably better to use the Path class or FileInfo class to see if a file or folder exists.
I would recommend using the Path class instead of a Regex if your goal is to work with filenames.
For example, you can call Path.GetFullPath to "verify" a path, as it will raise an ArgumentException if the path contains invalid characters, as well as other exceptiosn if the path is too long, etc. This will handle all of the rules, which will be difficult to get correct with a Regex.
This is regular expression for Windows paths:
(^([a-z]|[A-Z]):(?=\\(?![\0-\37<>:"/\\|?*])|\/(?![\0-\37<>:"/\\|?*])|$)|^\\(?=[\\\/][^\0-\37<>:"/\\|?*]+)|^(?=(\\|\/)$)|^\.(?=(\\|\/)$)|^\.\.(?=(\\|\/)$)|^(?=(\\|\/)[^\0-\37<>:"/\\|?*]+)|^\.(?=(\\|\/)[^\0-\37<>:"/\\|?*]+)|^\.\.(?=(\\|\/)[^\0-\37<>:"/\\|?*]+))((\\|\/)[^\0-\37<>:"/\\|?*]+|(\\|\/)$)*()$
And this is for UNIX/Linux paths
^\/$|(^(?=\/)|^\.|^\.\.)(\/(?=[^/\0])[^/\0]+)*\/?$
Here are my tests:
Win Regex
Unix Regex
These works with Javascript
EDIT
I've added relative paths, (../, ./, ../something)
EDIT 2
I've added paths starting with tilde for unix, (~/, ~, ~/something)
The proposed one is not really good, this one I build for XSD, it's Windows specific:
^(?:[a-zA-Z]\:(\\|\/)|file\:\/\/|\\\\|\.(\/|\\))([^\\\/\:\*\?\<\>\"\|]+(\\|\/){0,1})+$
Try this one for Windows and Linux support: ((?:[a-zA-Z]\:){0,1}(?:[\\/][\w.]+){1,})
I use this regex for capturing valid file/folder paths in windows (including UNCs and %variables%), with the exclusion of root paths like "C:\" or "\\serverName"
^(([a-zA-Z]:|\\\\\w[ \w\.]*)(\\\w[ \w\.]*|\\%[ \w\.]+%+)+|%[ \w\.]+%(\\\w[ \w\.]*|\\%[ \w\.]+%+)*)
this regex does not match leading spaces in path elements, so
"C:\program files" is matched
"C:\ pathWithLeadingSpace" is not matched
variables are allowed at any level
"%program files%" is matched
"C:\my path with inner spaces\%my var with inner spaces%" is matched
regex CmdPrompt("^([A-Z]:[^\<\>\:\"\|\?\*]+)");
Basically we look for everything that's not in the list of forbidden Windows Path Characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
I know this is really old... but expanding on #agent-j's response I've added named groups, and support for period characters.
^(?<ParentPath>(?:[a-zA-Z]\:|\\\\[\w\s\.]+\\[\w\s\.$]+)\\(?:[\w\s\.]+\\)*)(?<BaseName>[\w\s\.]*?)$
I've saved this at Regexr
I found most of the answers here to be a little hit or miss.
Found a good solution here though:
https://social.msdn.microsoft.com/forums/vstudio/en-US/31d2bc84-c948-4914-8a9d-97b9e788b341/validate-a-network-folder-path
Note* - this is only for network shares - not local files
Answer:
string pattern = #"^\\{2}[\w-]+(\\{1}(([\w-][\w-\s]*[\w-]+[$$]?)|([\w-][$$]?$)))+";
string[] names = { #"\\my-network\somelocation", #"\\my-network\\somelocation",
#"\\\my-network\somelocation", #"my-network\somelocation",
#"\\my-network\\somelocation",#"\\my-network\somelocation\aa\dd",
#"\\my-network\somelocation\",#"\\my-network\\somelocation"};
foreach (string name in names)
{
if (Regex.IsMatch(name, pattern))
{
Console.WriteLine(name);
//Directory.Exists function to check if file exists
}
}
Alexander's Answer + Relative Paths
Alexander has the most correct answer thus far since it supports spaces in file names (i.e. C:\Program Files (x86)\ will match)... This aims to include relative paths as well.
For example, you can do cd / or cd \ and it does the same thing.
Further more, if you're currently in C:\some\path\to\some\place and you type either of those commands, you end up at C:\
Even more, you should consider paths, that start with '/' as a root path (to the current drive).
(?:[a-zA-Z]:(\|/)|file://|\\|.(/|\)|/)([^,\/:*\?\<>\"\|]+(\|/){0,1})
A Modified version of Alexander's answer, however, we include paths that are relative with no leading / or drive letter, as well as / with no leading drive letter (relative to the current drive as root).