Matching optional slash in regex - c#

I need a regex that matches first two words between three "/" characters in url: eg. in /en/help/test/abc/def it should match /en/help/.
I use this regex: /.*?/(.*?)/ however sometimes I have the url without the last slash like /en/help which does not match because of the missing last slash.
Can you help me to adjust the regex to match only "/en/help" part? Thanks

A simple way to solve it is to replace reluctant (.*?)/ with greedy ([^/]*):
/.*?/([^/]*)
This would stop at the third slash if there is one, or at the end of the string if the final slash is not there.
Note that you could replace .*? with the same [^/]* expression for consistency:
/[^/]*/([^/]*)

If characters will contain alphanumeric, then you can use the following pattern:
static void Main(string[] args)
{
string s1 = "/en/help/test/abc/def";
string s2 = "/en/help ";
string pattern =
#"(?ix) #Options
/ #This will match first slash
\w+ #This will match [a-z0-9]
/ #This will match second slash
\w+ #Finally, this again will match [a-z0-9] until 3-rd slash (or end)";
foreach(string s in new[] { s1, s2})
{
var match = Regex.Match(s, pattern);
if (match.Success) Console.WriteLine($"Found: '{match.Value}'");
}
}

Related

Regex exclude ":" and a whitespace if they exist

So I have a regex here:
var text = new Regex(#"(?<=Paybacks).*", RegexOptions.IgnoreCase);
This looks for the line where it starts with Paybacks. Now it currently prints ": blah".
The context sometimes can be "Paybacks" or "Paybacks:" or "Paybacks " or I don't know "Paybacks (with thousands of whitespaces). How can I modify this regex to be like.. after "Paybacks" ignore a colon and a whitespace (or whitespaces) that may or may not exist.
I've been playing with it in regex101 and this seems to be working, but is there a better way?
(?<=Volatility(:\s)).*
In these situations, you'd better use a regex with a capturing group:
var pattern = new Regex(#"Paybacks[\s:]*(.*)", RegexOptions.IgnoreCase);
Then, you can use
var output = Regex.Match(text, pattern)?.Groups[1].Value;
See the .NET regex demo:
See the C# demo:
var texts = new List<string> { "Paybacks: blah","Paybacks:blah","Paybacks blah"};
var pattern = new Regex(#"Paybacks[\s:]*(.*)", RegexOptions.IgnoreCase);
texts.ForEach(text => Console.WriteLine(pattern.Match(text)?.Groups[1].Value));
printing 3 blahs.
You might also match optional colons and whitspace chars in the lookbehind, and start matching the first chars being any non whitspace char other than :
(?<=Paybacks[:\s]*)[^\s:].*
The pattern matches:
(?<= Positive lookbehind, assert what is on the left is
Paybacks Match literally
[:\s]* Optionally match either : or a whitespace char using a character class
) Close lookbehind
[^\s:].* Match a single non whitespace char other than : and the rest of the line
Regex demo | C# demo
var regex = new Regex(#"(?<=Paybacks[:\s]*)[^\s:].*", RegexOptions.IgnoreCase);
string[] strings = {"Paybacks: blah", "Paybacks blah", "Paybacks blah"};
foreach (String s in strings)
{
Console.WriteLine(regex.Match(s)?.Value);
}
Output
blah
blah
blah
If the order should be a single optional colon and optional whitespace chars, you can make the colon optional and the quantifier for the whitespace chars 0 or more using :?\s*
(?<=Paybacks:?\s*)[^\s:].*
Regex demo

I think my regular expression pattern in C# is incorrect

I'm checking to see if my regular expression matches my string.
I have a filename that looks like somename_somthing.txt and I want to match it to somename_*.txt, but my code is failing when I try to pass something that should match. Here is my code.
string pattern = "somename_*.txt";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
using (ZipFile zipFile = ZipFile.Read(fullPath))
{
foreach (ZipEntry e in zipFile)
{
Match m = r.Match("somename_something.txt");
if (!m.Success)
{
throw new FileNotFoundException("A filename with format: " + pattern + " not found.");
}
}
}
The asterisk is matching the underscore and throwing it off.
Try:
somename_(\w+).txt
The (\w+) here will match the group at this location.
You can see it match here: https://regex101.com/r/qS8wA5/1
In General
Regex give in this code matches the _ with an * meaning zero or more underscores instead of what you intended. The * is used to denote zero or more of the previous item. Instead try
^somename_(.*)\.txt$
This matches exactly the first part "somename_".
Then anything (.*)
And finally the end ".txt". The backslash escapes the 'dot'.
More Specific
You can also say if you only want letters and not numbers or symbols in the middle part of the match with:
^somename_[a-z]*\.txt$
As written, your regular expression
somename_*.txt
matches (in a case-insensitive manner):
the literal text somename, followed by
zero or more underscore characters (_), followed
any character (other than newline), followed
the literal text txt
And it will match that anywhere in the source text. You probably want to write something like
Regex myPattern = new Regex( #"
^ # anchor the match to start-of-text, followed by
somename # the literal 'somename', followed by
_ # a literal underscore character, followed by
.* # zero or of any character (except newline), followed by
\. # a literal period/fullstop, followed by
txt # the literal text 'txt'
$ # with the match anchored at end-of-text
" , RegexOptions.IgnoreCase|RegexOptions.IgnorePatternWhitespace
) ;
Hi I think the pattern should be
string pattern = "somename_.*\\.txt";
Regards

REGEX: What the meaning of . followed by +?

Sorry to ask this question, but I'm really stuck. This code belongs to someone who already left the company. And it causing problem.
protected override string CleanDataLine(string line)
{
//the regular expression for GlobalSight log
Regex regex = new Regex("\".+\"");
Match match = regex.Match(line);
if (match.Success)
{
string matchPart = match.Value;
matchPart =
matchPart.Replace(string.Format("\"{0}\"",
Delimiter), string.Format("\"{0}\"", "*+*+"));
matchPart = matchPart.Replace(Delimiter, '_');
matchPart =
matchPart.Replace(string.Format("\"{0}\"", "*+*+"),
string.Format("\"{0}\"", Delimiter));
line = line.Replace(match.Value, matchPart);
}
return line;
}
I've spent to much time researching. What was he trying to accomplish?
Thanks for helping.
That regex matches
a quote ("),
followed by one or more (+) characters (any character except newlines (.), as many as possible),
followed by a quote ".
It's not a very good regex. For example, in the string foo "bar" baz "bam" boom, it will match "bar" baz "bam".
If the intention is to match a quoted string, a more appropriate regex would be "[^"]*".
. is any character except \n, + means 1 or more.
So: .+ is "1 or more characters"
The dot matches any character except line breaks.
+ is "one or more" (equal to {1,})
protected override string CleanDataLine(string line)
{
//the regular expression for GlobalSight log
Regex regex = new Regex("\".+\"");
Match match = regex.Match(line);
if (match.Success)
{
string matchPart = match.Value;
matchPart =
matchPart.Replace(string.Format("\"{0}\"",
Delimiter), string.Format("\"{0}\"", "*+*+"));
matchPart = matchPart.Replace(Delimiter, '_');
matchPart =
matchPart.Replace(string.Format("\"{0}\"", "*+*+"),
string.Format("\"{0}\"", Delimiter));
line = line.Replace(match.Value, matchPart);
}
return line;
}
line is just some text, could be Hello World, or anything really.
new Regex("\".+\"") the \" is an escaped quote, this means it's actually looking for a string to start with a double quote. .+ means to find any character not including the new-line character one or more times.
If it does match, then he tries to figure out the part that matched by grabbing the value.
It then just becomes a normal search and replace for whatever string was matched.

Get the matches of a pattern in a string

I have the following regular expression:
^[[][A-Za-z_1-9]+[\]]$
I want to be able to get all the matches of this regular expression in a string. The match should be of the form [Whatever]. Inside the braces, there could also be an _ or numeric characters. So I wrote the following code:
private const String REGEX = #"^[[][A-Za-z_1-9]+[\]]$";
static void Main(string[] args)
{
String expression = "([ColumnName] * 500) / ([AnotherColumn] - 50)";
MatchCollection matches = Regex.Matches(expression, REGEX);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
Console.ReadLine();
}
But unfortunately, matches is always having a count of zero. Apparently, the regular expression is checking whether the whole String is a match and not getting the matches out of the string. I'm not sure whether the regular expression is wrong or the way I'm using Regex.Matches() is incorrect.
Any thoughts?
You need to remove the start/end of string anchors (^ and $) from your pattern, since the matches you are looking for are not actually at the start and end of the string. You can also just use \[ and \] instead of [[] and [\]]:
private const String REGEX = #"\[[A-Za-z_1-9]+\]";
Should do the trick.
You're anchoring your regex to the beginning and end of the string so of course it won't match anything.
Removing the anchors (^ for beginning and $ for end) works fine:
[[][A-Za-z_1-9]+[\]]
It returns, as you would hopefully expect:
[ColumnName]
[AnotherColumn]

Regexp skip pattern

Problem
I need to replace all asterisk symbols('*') with percent symbol('%'). The asterisk symbols in square brackets should be ignored.
Example
[Test]
public void Replace_all_asterisks_outside_the_square_brackets()
{
var input = "Hel[*o], w*rld!";
var output = Regex.Replace(input, "What_pattern_should_be_there?", "%")
Assert.AreEqual("Hel[*o], w%rld!", output));
}
Try using a look ahead:
\*(?![^\[\]]*\])
Here's a bit stronger solution, which takes care of [] blocks better, and even escaped \[ characters:
string text = #"h*H\[el[*o], w*rl\]d!";
string pattern = #"
\\. # Match an escaped character. (to skip over it)
|
\[ # Match a character class
(?:\\.|[^\]])* # which may also contain escaped characters (to skip over it)
\]
|
(?<Asterisk>\*) # Match `*` and add it to a group.
";
text = Regex.Replace(text, pattern,
match => match.Groups["Asterisk"].Success ? "%" : match.Value,
RegexOptions.IgnorePatternWhitespace);
If you don't care about escaped characters you can simplify it to:
\[ # Skip a character class
[^\]]* # until the first ']'
\]
|
(?<Asterisk>\*)
Which can be written without comments as: #"\[[^\]]*\]|(?<Asterisk>\*)".
To understand why it works we need to understand how Regex.Replace works: for every position in the string it tries to match the regex. If it fails, it moves one character. If it succeeds, it moves over the whole match.
Here, we have dummy matches for the [...] blocks so we may skip over the asterisks we don't want to replace, and match only the lonely ones. That decision is made in a callback function that checks if Asterisk was matched or not.
I couldn't come up with a pure RegEx solution. Therefore I am providing you with a pragmatic solution. I tested it and it works:
[Test]
public void Replace_all_asterisks_outside_the_square_brackets()
{
var input = "H*]e*l[*o], w*rl[*d*o] [o*] [o*o].";
var actual = ReplaceAsterisksNotInSquareBrackets(input);
var expected = "H%]e%l[*o], w%rl[*d*o] [o*] [o*o].";
Assert.AreEqual(expected, actual);
}
private static string ReplaceAsterisksNotInSquareBrackets(string s)
{
Regex rx = new Regex(#"(?<=\[[^\[\]]*)(?<asterisk>\*)(?=[^\[\]]*\])");
var matches = rx.Matches(s);
s = s.Replace('*', '%');
foreach (Match match in matches)
{
s = s.Remove(match.Groups["asterisk"].Index, 1);
s = s.Insert(match.Groups["asterisk"].Index, "*");
}
return s;
}
EDITED
Okay here is my final attempt ;)
Using negative lookbehind (?<!) and negative lookahead (?!).
var output = Regex.Replace(input, #"(?<!\[)\*(?!\])", "%");
This also passes the test in the comment to another answer "Hel*o], w*rld!"

Categories

Resources