C# Regular Expression - c#

Can you explain me what is the meaning of this regular expression. What would be the string which matches to this expression.
Regex(#"/Type\s*/Page[^s]");
what is # symbol?? Thanks in advance.
Please provide full explaination. What
would be the string which matches to
this expression.

The # symbol designates a verbatim string literal:
A verbatim string literal consists of
an # character followed by a
double-quote character, zero or more
characters, and a closing double-quote
character. A simple example is
#"hello". In a verbatim string
literal, the characters between the
delimiters are interpreted verbatim,
the only exception being a
quote-escape-sequence. In particular,
simple escape sequences and
hexadecimal and Unicode escape
sequences are not processed in
verbatim string literals. A verbatim
string literal may span multiple
lines.
As for the regular expression it breaks down like this:
/Type match this string exactly
\s* match any whitespace character zero or more times
/Page match this string exactly
[^s] match any character that isn't "s"

# says that the string literal is verbatim.
The regex matches:
/Type followed by zero or more whitespaces, followed by /Page and a character that is not s
It will match strings like /Type/Pagex, /Type /Page3, /Type /Page?

# starts a c# verbatim string, in which the compiler doesn't process escape sequences, making writing expressions with lots of \ characters easier.
both of the following match
/Type /Page4
/Type /Pagex

Your regular expression matches any string containing the following:
A "/" character
The word "Type" (case sensitive)
Optionally, some whitespace
Another "/"
The word "Page" (case sensitive)
Any character that isn't an "s"
Examples would be "/Type /Paged" or "/Type/Pager".
If you want to match either "Page" or "Pages" at the end, you probably want this instead:
Regex(#"/Type\s*/Pages?");
Here is a good online C# regex tester.

Roughly, it matches: /Type{optional space}/Page{not an 's'}

Related

Make Regex Match word containing spetial characters

My Code is like this:
string currentPageSlug = "securities/EBR#03L$ZZZ";
string patern= #"securities/(\w+)[\#\$]";
string res = Regex.Match(currentPageSlug, patern).Value;
Console.WriteLine(res);
which gives me this result:
securities/EBR#
but I want to get:
securities/EBR#03L$ZZZ
whole word including all special characters (# and $ and maybe others too)
my regex pattern does not seem to work.
Your regex matches words followed by a single special character. You need to include [#$] in the repeating construct +, like this:
string patern= #"securities/((?:\w|[#$])+)";
Note that since # and $ are used inside a character class, it is not necessary to escape them with a backslash \.

Regex to identify escaped characters issue

Let's suppose we have the following string:
#"Hello m\u00e9 name is Mat\u00bfQu"
I am using the regex:
private static readonly Regex ESCAPING_REGEX = new Regex("\\+[^\"][a-zA-Z0-9]*", RegexOptions.Compiled);
However, this regex doesn't seem to return any matches:
MatchCollection matches = ESCAPING_REGEX.Matches(text);
// matches.Count == 0
I tried the regex on Regex101 and it does return the two matches that I was looking for.
How can I fix my regular expression to achieve expected behavior? (Any tips for improvement are gladly accepted.)
Your regex declaration is faulty because you require a literal + to be in the beginning of the match. Look what your regex looks like for a regex engine:
\+ - Matches a literal +
[^"] - Matches any character other than "
[a-zA-Z0-9]* - Matches 0 or more characters that are digits or Latin letters.
If you use a verbatim string literal to create your regex, e.g.
Regex.Matches(str, #"\\+[^""][a-zA-Z0-9]*");
you'd get 2 matches. \\ in a verbatim string literal will match a literal \, and + will be treated as a quantifier.
Actually, you do not even need the + (since it will match \\\\) and [^""] (unless there can be some "s right after \ and that is not what you want to match), you can use
#"\\[a-zA-Z0-9]+"
to match your substrings (\\ matches a \, [a-zA-Z0-9]+ will match 1 or more characters from the range).

Trying to understand this regex

I have this regex
^(\\w|#|\\-| |\\[|\\]|\\.)+$
I'm trying to understand what it does exactly but I can't seem to get any result...
I just can't understand the double backslashes everywhere... Isn't double backslash supposed to be used to get a single backslash?
This regex is to validate that a username doesn't use weird characters and stuff.
If someone could explain me the double backslashes thing please. #_#
Additional info: I got this regex in C# using Regex.IsMatch to check if my username string match the regex. It's for an asp website.
My guess is that it's simply escaping the \ since backslash is the escape character in c#.
string pattern = "^(\\w|#|\\-| |\\[|\\]|\\.)+$";
Can be rewritten using a verbatim string as
string pattern = #"^(\w|#|\-| |\[|\]|\.)+$";
Now it's a bit easier to understand what's going on. It will match any word character, at-sign, hyphen, space, square bracket or period, repeated one or more times. The ^ and $ match the begging and end of the string, respectively, so only those characters are allowed.
Therefore this pattern is equivalent to:
string pattern = #"^([\w# \[\].-])+$";
Double slash are supposed to be single slash. Double slash are used to escape the slash itself, as slashes are used for other escape characters in C# String context e.g. \n stands for new line
With double slashes sorted out, it becomes ^(\w|#|\-| |\[|\]|\.)+$
Break down this regex, as | means OR, and \w|#|\-| |\[|\]|\. would mean \w or # or \- or space or \[ or \] or \.. That is, any alphanumeric character, #, -, space, [, ] and . characters. Note that this slash is regex escape, to escape -, [, ] and . characters as they all have special meanings in regex context
And, + means the previous token (i.e. \w|#|\-| |\[|\]|\.) repeated one or more times
So, the entire thing means one or more of any combination of alphanumeric character, #, -, space, [, ] and . characters.
There are online tools to analyze regexes. Once such is at http://www.myezapp.com/apps/dev/regexp/show.ws
where it reports
Sequence: match all of the followings in order
BeginOfLine
Repeat
CapturingGroup
GroupNumber:1
OR: match either of the followings
WordCharacter
#
-
[
]
.
one or more times
EndOfLine
As others have noted, the double backslashes just escape a backslash so you can embed the regex in a string. For example, "\\w" will be interpreted as "\w" by the parser.
^ means beginning of the line.
the parentheses is use for grouping
\w is a word character
| means OR
# match the # character
\- match the hyphen character
[ and ] matches the squares brackets
\. match a period
+ means one or more
$ the end of line.
So the regex is use to match a string which contains only word characters or an # or an hyphen or a space or squares brackets or a dot.
Here's what it means:
^(\\w|#|\\-| |\\[|\\]|\\.)+$
^ - Means the regex starts at the beginning of the string. The match shouldn't start in the middle of the string.
Here's the individual things in the parentheses:
\\w - Indicates a "word" character. Normally, this is shown as \w, but this is being escaped.
# - Indicates an # symbol is allowed
\\- - Indicates a - is allowed. This is escaped since the dash can have other meanings in regex. Since it's not in a character class, I don't believe this is technically needed.
- A space is allowed
\\[ and \\] - [ and ] are allowed.
\\. - A period is a valid character. Escaped because periods have special meanings in regex.
Now all of those characters have | as delimiters in the parentheses - this means OR. So any of those characters are valid.
The + at the end means one or more characters as described in parentheses are valid. The $ means the end of the regex must match the end of the string.
Note that the double slashes aren't necessary if you just prefix the string like this:
#"\w" is the same as "\\w"

convert any string to be used as match expression in regex

If you have a string with special characters that you want to match with:
System.Text.RegularExpressions.Regex.Matches(theTextToCheck, myString);
It will obviously give you wrong results, if you have special characters inside myString like "%" or "\".
The idea is to convert myString and replacing all occurences of special characters like "%" to be replaced by their corresponding characters.
Does anyone know how to solve that or does someone have a RegEx for that? :)
Update:
The following characters have a special meaning, that I should turn of with adding a leading backslash: \, &, ~, ^, %, [, ], {, }, ?, +, *,(,),|,$
are there any others I should replace?
As #Kobi links to in the comments, you need to use Regex.Escape to ensure that that regular expression string is properly escaped.
Escapes a minimal set of characters (\, *, +, ?, |, {, [, (,), ^, $,., #, and white space) by replacing them with their escape codes. This instructs the regular expression engine to interpret these characters literally rather than as metacharacters.
If you want to escape all characters that carry a special meaning in regex, you could simply escape every character with a backslash (There is no harm in escaping characters that don't need to be escaped).
But if you do, why are you using Regex at all instead of string.IndexOf?
Regex.Escape will do that for you. Somewhere in msdn doc it reads:
Escape converts a string so that the regular expression engine will interpret any metacharacters that it may contain as character literals
which is much more informative that the function description.
This is left for search/replace reference.
Use this as your regex:
(\\|\&|\~|\^|\%|\[|\]|\{|\}|\?|\+|\*|\(|\)|\||\$)
gets your chars of interes in a numbered group
And this as your replacement string:
\$1
replaces the matches with backslash plus the group content
Sample code:
Regex re = new Regex(#"(\\|\&|\~|\^|\%|\[|\]|\{|\}|\?|\+|\*|\(|\)|\||\$)");
string replaced = re.Replace(#"Look for (special {characters} and scape [100%] of them)", #"\$1");

Regex for matching C# string literals

I am trying to write a regular expression that will match a string that contains name-value pairs of the form:
<name> = <value>, <name> = <value>, ...
Where <value> is a C# string literal. I already know the s that I need to find via this regular expression. So far I have the following:
regex = new Regex(fieldName + #"\s*=\s*""(.*?)""");
This works well, but it of course fails to match in the case where the string I am trying to match contans a <value> with an escaped quote. I am struggling to work out how to solve this, I think I need a lookahead, but need a few pointers. As an example, I would like to be able to match the value of the 'difficult' named value below:
difficult = "\\\a\b\'\"\0\f \t\v", easy = "one"
I would appreciate a decent explanation with your answers, I want to learn, rather than copy ;-)
Try this to capture the key and value:
(\w+)\s*=\s*(#"(?:[^"]|"")*"|"(?:\\.|[^\\"])*")
As a bonus, it also works on verbatim strings.
C# Examples:https://dotnetfiddle.net/vQP4rn
Here's an annotated version:
string pattern = #"
(\w+)\s*=\s* # key =
( # Capturing group for the string
#"" # verbatim string - match literal at-sign and a quote
(?:
[^""]|"""" # match a non-quote character, or two quotes
)* # zero times or more
"" #literal quote
| #OR - regular string
"" # string literal - opening quote
(?:
\\. # match an escaped character,
|[^\\""] # or a character that isn't a quote or a backslash
)* # a few times
"" # string literal - closing quote
)";
MatchCollection matches = Regex.Matches(s, pattern,
RegexOptions.IgnorePatternWhitespace);
Note that the regular string allows all characters to be escaped, unlike in C#, and allows newlines. It should be easy to correct if you need validation, but it should be file for parsing.
This should match only the string literal part (you can tack on whatever else you want to the beginning/end):
Regex regex = new Regex("\"((\\.)|[^\\\\\"])*\"");
and if you want a pattern which doesn't allow "multi-line" string literals (as C# string literals really are):
Regex regex = new Regex("\"((\\[^\n\r])|[^\\\\\"\n\r])*\"");
You can use this:
#" \s* = \s* (?<!\\)"" (.* ) (?<!\\)"""
It's almost like yours, but instead of using "", I used (?<!\\)"" to match when suffix \ is not present, so it won't match escaped quotes.

Categories

Resources