Regular Expressions and Relative File Paths - c#

I am trying to match the folder name in a relative path using C#. I am using the expression: "/(.*)?/" and reversing the matching from left to right to right to left.
When I pass "images/gringo/" into the regular expression, it correctly gives me "gringo" in the first group - I'm only interested in what is between the brackets.
When I pass in "images/", it fails to pick up "images".
I have tried using [/^] and [/$] but neither work.
Thanks,
David

You're probably better off using the System.IO.DirectoryInfo class to interpret your relative path. You can then pick off folder or file names using its members:
DirectoryInfo di = new DirectoryInfo("images/gringo/");
Console.Out.WriteLine(di.Name);
This will be much safer than any regexps you could use.

Don't do this. Use System.IO.Path to break apart path parts and then compare them.

How about:
"([^/]+)/?$"
1 or more non / characters
Optional /
End of string
But as #Blair Conrad says - better to go with a class that encapsulates this for you....

Agreed with the "don't do it this way" answers, but, since it's tagged "regex"...
You don't need the ?. * already accepts 0 repetitions as a match, so (.*) is exactly equivalent to (.*)?
You rarely actually want to use .* anyhow. If you're trying to capture what's between a pair of slashes, use /([^/]*)/ or else testing against "foo/bar/baz/" will (on most regex implementations) return a single match for "bar/baz" instead of matching "bar" and "baz" separately.

Related

How do I substitute string and number values with a variable that says any string or any number?

I am using c# and need to rename a lot of files. They all follow the same naming convention. like AA-A0000-(1+)-A_words-sdsd_morewords. The only problem is the all follow this pattern but the A0000 and (1+) sections change file to file. How can I say if string follows that pattern than run my custom funciton on it?
How can I say if the file starts with two letters a hyphen the a letter followed by 4 numbers, another hyphen, a number, then another hyphen, then change the file name?
As the commenters have pointed out, Regular expressions are your answer. In .NET, this uses the Regex class. There are a number of tutorials for regular expressions that you can look at; the .NET version is documented at https://msdn.microsoft.com/en-us/library/az24scfc.aspx.
Depending on how the different sections of the file name change in your example above, you can alter your regular expression to fit. So for instance,
Regex.Replace(fileName, #"[a-z ]+-A(\d{4}-\(\d+)", "BB-B$1", RegexOptions.IgnoreCase);
Will match AA-A0000-(1+)..., AA-A3456-(72+)..., C D-A3456-(72+)..., etc, and replace the A's (and "C D") with B's. See https://dotnetfiddle.net/hFpUkW for an example of this in action.
You can use regex.
If your filenames look, for example, like this:
aB-C0101-2-some text that contains-Numbers_01987etc.ext
then the pattern to match it would be:
[a-zA-Z]{2}-[a-zA-Z]\d{4}-\d-[\s0-9a-zA-Z_-]+\.[a-zA-Z]{3}
Here are some additional resources:
tutorial: http://www.regular-expressions.info/tutorial.html
to test a regex online (there are a lot more):
http://www.regexr.com/
http://www.regexplanet.com/
example use of Regex.Replace() method in C#:
http://www.dotnetperls.com/regex-replace

Regular expression for filenames that doesn't exclude whitespaces

I have been using this regular expression to extract file names out of file path strings:
Regex r = new Regex(#"\w+[.]\w+$+");
This works, as long as there is no space in the file name. For example:
r.Match("c:\somestuff\myfile.doc").Value = "myfile.doc"
r.Match("c:\somestuff\my file.doc").Value = "file.doc"
I need my regular expression to give me "my file.doc", and not just "file.doc"
I tried messing around with the expression myself. In particular I tried adding \s+ after learning that that is for matching whitespaces. I didn't get the results I hoped for.
I did devise a solution just to get the job done: I started at the end of the string, went backwards until a backslash was reached. This gave me the file name in reverse order (i.e. cod.elifym) into an array of chars, then I used Array.Reverse() to turn it around. However I'd like to learn how to achieve this by simply modifying my original regular expression.
Does it have to be a regular expression? Use System.IO.Path.GetFileName() instead.
Regex r = new Regex(#"[\w ]+\.\w+$");
A working regex might simply look like:
[^\\]+$
Consider using:
System.IO.Path.GetFileName(path)

how to replace the exact word by another in a list?

I have a list like :
george fg
michel fgu
yasser fguh
I would like to replace fg, fgu, and fguh by "fguhCool" I already tried something like this :
foreach (var ignore in NameToPoulate)
{
tempo = ignore.Replace("fg", "fguhCool");
NameToPoulate_t.Add(tempo);
}
But then "fgu" become "fguhCoolu" and "fguh" become "fguhCooluh" is there are a better idea ?
Thanks for your help.
I assume that this is a homework assignment and that you are being tested for the specific algorihm rather than any code that does the job.
This is probably what your teacher has in mind:
Students will realize that the code should check for "fguh" first, then "fgu" then "fg". The order is important because replacing "fg" will, as you have noticed, destroy a "fguh".
This will by some students be implemented as a loop with if-else conditions in them. So that you will not replace a "fg" that is within an already replaced "fguhCool".
But then you will find that the algorithm breaks down if "fg" and "fgu" are both within the same string. You cannot then allow the presence of "fgu" prevent you to check for "fg" at a different part of the string.
The answer that your teacher is looking for is probably that you should first locate "fguh", "fgu" and "fg" (in that order) and replace them with an intermediary string that doesn't contain "fg". Then after you have done that, you can search for that intermediary string and replace it with "fguhCool".
You could use regular expressions:
Regex.Replace(#"\bfg\b", "fguhCool");
The \b matches a so-called word boundary which means it matches the beginnnig or end of a word (roughly, but for this purpose enough).
Use a regular expression:
Regex.Replace("fg(uh?)?", "fguhCool");
An alternative would be replacing the long words for the short ones first, then replacing the short for the end value (I'm assuming all words - "fg", "fgu" and "fguh" - would map to the same value "fguhCool", right?)
tempo = ignore
.Replace("fguh", "fg")
.Replace("fgu", "fg")
.Replace("fg", "fguhCool");
Obs.: That assumes those words can appear anywhere in the string. If you're worried about whole words (i.e. cases where those words are not substrings of a bigger word), then see #Joey's answer (in this case, simple substitutions won't do, regexes are really the best option).

Regex between, from the last to specific end

Today my wish is to take text form the string.
This string must be, between last slash and .partX.rar or .rar
First I tried to find edge's end of the word and then the beginning. After I get that two elements I merged them but I got empty results.
String:
http://hosting.xyz/1234/15-game.part4.rar.html
http://hosting.xyz/1234/16-game.rar.html
Regex:
Begin:(([^/]*)$) - start from last /
End:(.*(?=.part[0-9]+.rar|.rar)) stop before .partX.rar or .rar
As you see, if I merge that codes I won't get any result.
What is more, "end" select me only .partX instead of .partX.rar
All what I want is:
15-game.part4.rar and 16-game.rar
What i tried:
(([^/]*)$)(.*(?=.part[0-9]+.rar|.rar))
(([^/]*)$)
(.*(?=.part[0-9]+.rar|.rar))
I tried also
/[a-zA-Z0-9]+
but I do not know how select symbols.. This could be the easiest way. But this select only letters and numbers, not - or _.
If I could select symbols..
You don't really need a regex for this as you can merely split the url on / and then grab the part of the file name that you need. Since you didn't mention a language, here's an implementation in Perl:
use strict;
use warnings;
my $str1="http://hosting.xyz/1234/15-game.part4.rar.html";
my $str2="http://hosting.xyz/1234/16-game.rar.html";
my $file1=(split(/\//,$str1))[-1]; #last element of the resulting array from splitting on slash
my $file2=(split(/\//,$str2))[-1];
foreach($file1,$file2)
{
s/\.html$//; #for each file name, if it ends in ".html", get rid of that ending.
print "$_\n";
}
The output is:
15-game.part4.rar
16-game.rar
Nothing could be simpler! :-)
Use this:
new Regex("^.*\/(.*)\.html$")
You'll find your filename in the first captured group (don't have a c# compiler at hand, so can't give you working sample, but you have a working regex now! :-) )
See a demo here: http://rubular.com/r/UxFNtJenyF
I'm not a C# coder so can't write full code here but I think you'll need support of negative lookahead here like this:
new Regex("/(?!.*/)(.+?)\.html$");
Matched Group # 1 will have your string i.e. "16-game.rar" OR "15-game.part4.rar"
Use two regexes:
start to substitute .*/ with nothing;
then substitute \.html with nothing.
Job done!

C# - Regex for file paths e.g. C:\test\test.exe

I am currently looking for a regex that can help validate a file path e.g.:
C:\test\test2\test.exe
I decided to post this answer which does use a regular expression.
^(?:[a-zA-Z]\:|\\\\[\w\.]+\\[\w.$]+)\\(?:[\w]+\\)*\w([\w.])+$
Works for these:
\\test\test$\TEST.xls
\\server\share\folder\myfile.txt
\\server\share\myfile.txt
\\123.123.123.123\share\folder\myfile.txt
c:\folder\myfile.txt
c:\folder\myfileWithoutExtension
Edit: Added example usage:
if (Regex.IsMatch (text, #"^(?:[a-zA-Z]\:|\\\\[\w\.]+\\[\w.$]+)\\(?:[\w]+\\)*\w([\w.])+$"))
{
// Valid
}
*Edit: * This is an approximation of the paths you could see. If possible, it is probably better to use the Path class or FileInfo class to see if a file or folder exists.
I would recommend using the Path class instead of a Regex if your goal is to work with filenames.
For example, you can call Path.GetFullPath to "verify" a path, as it will raise an ArgumentException if the path contains invalid characters, as well as other exceptiosn if the path is too long, etc. This will handle all of the rules, which will be difficult to get correct with a Regex.
This is regular expression for Windows paths:
(^([a-z]|[A-Z]):(?=\\(?![\0-\37<>:"/\\|?*])|\/(?![\0-\37<>:"/\\|?*])|$)|^\\(?=[\\\/][^\0-\37<>:"/\\|?*]+)|^(?=(\\|\/)$)|^\.(?=(\\|\/)$)|^\.\.(?=(\\|\/)$)|^(?=(\\|\/)[^\0-\37<>:"/\\|?*]+)|^\.(?=(\\|\/)[^\0-\37<>:"/\\|?*]+)|^\.\.(?=(\\|\/)[^\0-\37<>:"/\\|?*]+))((\\|\/)[^\0-\37<>:"/\\|?*]+|(\\|\/)$)*()$
And this is for UNIX/Linux paths
^\/$|(^(?=\/)|^\.|^\.\.)(\/(?=[^/\0])[^/\0]+)*\/?$
Here are my tests:
Win Regex
Unix Regex
These works with Javascript
EDIT
I've added relative paths, (../, ./, ../something)
EDIT 2
I've added paths starting with tilde for unix, (~/, ~, ~/something)
The proposed one is not really good, this one I build for XSD, it's Windows specific:
^(?:[a-zA-Z]\:(\\|\/)|file\:\/\/|\\\\|\.(\/|\\))([^\\\/\:\*\?\<\>\"\|]+(\\|\/){0,1})+$
Try this one for Windows and Linux support: ((?:[a-zA-Z]\:){0,1}(?:[\\/][\w.]+){1,})
I use this regex for capturing valid file/folder paths in windows (including UNCs and %variables%), with the exclusion of root paths like "C:\" or "\\serverName"
^(([a-zA-Z]:|\\\\\w[ \w\.]*)(\\\w[ \w\.]*|\\%[ \w\.]+%+)+|%[ \w\.]+%(\\\w[ \w\.]*|\\%[ \w\.]+%+)*)
this regex does not match leading spaces in path elements, so
"C:\program files" is matched
"C:\ pathWithLeadingSpace" is not matched
variables are allowed at any level
"%program files%" is matched
"C:\my path with inner spaces\%my var with inner spaces%" is matched
regex CmdPrompt("^([A-Z]:[^\<\>\:\"\|\?\*]+)");
Basically we look for everything that's not in the list of forbidden Windows Path Characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
I know this is really old... but expanding on #agent-j's response I've added named groups, and support for period characters.
^(?<ParentPath>(?:[a-zA-Z]\:|\\\\[\w\s\.]+\\[\w\s\.$]+)\\(?:[\w\s\.]+\\)*)(?<BaseName>[\w\s\.]*?)$
I've saved this at Regexr
I found most of the answers here to be a little hit or miss.
Found a good solution here though:
https://social.msdn.microsoft.com/forums/vstudio/en-US/31d2bc84-c948-4914-8a9d-97b9e788b341/validate-a-network-folder-path
Note* - this is only for network shares - not local files
Answer:
string pattern = #"^\\{2}[\w-]+(\\{1}(([\w-][\w-\s]*[\w-]+[$$]?)|([\w-][$$]?$)))+";
string[] names = { #"\\my-network\somelocation", #"\\my-network\\somelocation",
#"\\\my-network\somelocation", #"my-network\somelocation",
#"\\my-network\\somelocation",#"\\my-network\somelocation\aa\dd",
#"\\my-network\somelocation\",#"\\my-network\\somelocation"};
foreach (string name in names)
{
if (Regex.IsMatch(name, pattern))
{
Console.WriteLine(name);
//Directory.Exists function to check if file exists
}
}
Alexander's Answer + Relative Paths
Alexander has the most correct answer thus far since it supports spaces in file names (i.e. C:\Program Files (x86)\ will match)... This aims to include relative paths as well.
For example, you can do cd / or cd \ and it does the same thing.
Further more, if you're currently in C:\some\path\to\some\place and you type either of those commands, you end up at C:\
Even more, you should consider paths, that start with '/' as a root path (to the current drive).
(?:[a-zA-Z]:(\|/)|file://|\\|.(/|\)|/)([^,\/:*\?\<>\"\|]+(\|/){0,1})
A Modified version of Alexander's answer, however, we include paths that are relative with no leading / or drive letter, as well as / with no leading drive letter (relative to the current drive as root).

Categories

Resources