Disallowing Backslashes in a Regular Expression C# - c#

For a Username field there are certain varaitions that cannot be chosen as an appropiate username nor can certain characters be used.
For example: TIM1....TIM9 cannot be used BIN1....BIN9 cannot be used, nor can the characters <>:\/|?* appear anywhere in the field.
The code I have so far is thus:
private bool ValidateId(string regexValue)
{
Regex regex = new Regex("TIM[1-9]|BIN[1-9]|[<>:\"/|?*]");
return !regex.IsMatch(regexValue);
}
What I'm struggling to allow for however is the backslash character. Trying to escape it as I have done with the quotation character doesn't appear to work.
Thanks in advance.

You need to do a double escape. Try this:
Regex regex = new Regex("TIM[1-9]|BIN[1-9]|[<>:\\\\\"/|?*]");
Explanation:
You need to escape the backslash in C# strings to get a backslash in the string. Additionally, the string needs to have two backslashes, because Regex also requires the backslashes to be escaped.
BTW, using verbatim strings makes it a bit more readable:
Regex regex = new Regex(#"TIM[1-9]|BIN[1-9]|[<>:\\""/|?*]");
Both codes will result in a Regex with this expression:
TIM[1-9]|BIN[1-9]|[<>:\\"/|?*]

Related

Regex escape with \ or \\?

Can someone explain to me when using regular expressions when a double backslash or single backslash needs to be used to escape a character?
A lot of references online use a single backslash and online regex testers work with single backslashes, but in practice I often have to use a double backslash to escape a character.
For example:
"SomeString\."
Works in an online regex tester and matches "SomeString" followed by a dot.
However in practice I have to use a double escape:
if (Regex.IsMatch(myString, "SomeString\\."))
C# does not have a special syntax for construction of regular expressions, like Perl, Ruby or JavaScript do. It instead uses a constructor that takes a string. However, strings have their own escaping mechanism, because you want to be able to put quotes inside the string. Thus, there are two levels of escaping.
So, in a regular expression, w means the letter "w", while \w means a word character. However, if you make a string "\w", you are escaping the character "w", which makes no sense, since character "w" is not a quote or a backslash, so "w" == "\w". Then this string containing only "w" gets passed to the regexp constructor, and you end up matching the letter "w" instead of any word character. Thus, to pass the backslash to regexp, you need to put in two backslashes in the string literal (\\w): one will be removed when the string literal is interpreted, one will be used by the regular expression.
When working with regular expressions directly (such as on most online regexp testers, or when using verbatim strings #"..."), you don't have to worry about the interpretation of string literals, and you always write just one backslash (except when you want to match the backslash itself, but then you're espacing the backslash for the regexp, not for the string).
\ Is also an escape character for string literals in c# so the first \ is escaping the second \ being passed to the method and the second one is escaping the . in the regex.
Use:
if (Regex.IsMatch(myString, #"SomeString\."))
If you want to avoid double escaping.
I you use a verbatim symbol #(verbatim string), you don't need to escape the backslash again.
if (Regex.IsMatch(myString, #"SomeString\."))
Old post but Regex.Escape may be useful
In JavaScript you have to use double escape character: \
let m = "My numer is [56]".match("\\[(.*)\\]");
alert(m[1]);//outputs 56
In C# single \

Regex find string variable

How can I find something like fo32p_dasf[0] = (string)"random string here"; with Regex.Match? I have problem using (string) within the regex string.
Because parentheses have special meaning in Regex, you need to escape them with a backslash.
However, backslashes also have meaning in C# strings, so you need to escape the escape.
Square brackets also have a special meaning in regex, so they need escaping, and the quotes need escaping, so after all that, you end up with something like this:
var pattern = "fo32p_dasf\\[0\\] = \\(string\\)\".*\";";

C# Unrecognized escape sequence

I have following Regex on C# and its causing Error: C# Unrecognized escape sequence on \w \. \/ .
string reg = "<a href=\"[\w\.\/:]+\" target=\"_blank\">.?<img src=\"(?<imgurl>\w\.\/:])+\"";
Regex regex = new Regex(reg);
I also tried
string reg = #"<a href="[w./:]+" target=\"_blank\">.?<img src="(?<imgurl>w./:])+"";
But this way the string "ends" at href=" "-char
Can anyone help me please?
Use "" to escape quotations when using the # literal.
There are two escaping mechanisms at work here, and they interfere. For example, you use \" to tell C# to escape the following double quote, but you also use \w to tell the regular expression parser to treat the following W special. But C# thinks \w is meant for C#, doesn't understand it, and you get a compiler error.
For example take this example text:
<a href="file://C:\Test\Test2\[\w\.\/:]+">
There are two ways to escape it such that C# accepts it.
One way is to escape all characters that are special to C#. In this case the " is used to denote the end of the string, and \ denotes a C# escape sequence. Both need to be prefixed with a C# escape \ to escape them:
string s = "<a href=\"file://C:\\Test\\Test2\\[\\w\\.\\/:]+\">";
But this often leads to ugly strings, especially when used with paths or regular expressions.
The other way is to prefix the string with # and escape only the " by replacing them with "":
string s = #"<a href=""file://C:\Test\Test2\[\w\.\/:]+"">";
The # will prevent C# from trying to interpret the \ in the string as escape characters, but since \" will not be recognized then either, they invented the "" to escape the double quote.
Here's a better regex, yours is filled with problems:
string reg = #"<a href=""[\w./:]+"" target=""_blank"">.?<img src=""(?<imgurl>[\w./:]+)""";
Regex regex = new Regex(reg);
var m = regex.Match(#"http://www.yahoo.com""
target=""_blank"">http://flickr.com/something.jpg""");
Catches <a href="http://www.yahoo.com" target="_blank"><img src="http://flickr.com/something.jpg".
Problems with yours: Forward slashes don't need to be escaped, missing the [ bracket in the img part, putting the ) in the right position in the closing of the group.
However, as has been said many times, HTML is not structured enough to be caught by regex. But if you need to get something quick and dirty done, it will do.
Here's the deal. C# Strings recognize certain character combinations as specific special characters to manipulate strings. Maybe you are familiar with inserting a \n in a string to work as and End of Line character, for example?
When you put a single \ in a string, it will try to verify it, along with the next character, as one of these special commands, and will throw an error when its not a valid combination.
Fortunately, that does not prevent you from using backslashes, as one of those sequences, \\, works for that purpose, being interpreted as a single backslash.
So, in practice, if you substitute every backslash in your string for a double backslash, it should work properly.

\w gives me an error while using regular exprasions

I was using Regex and I tried to write:
Regex RegObj2 = new Regex("\w[a][b][(c|d)][(c|d)].\w");
Gives me this error twice, one for each appearance of \w:
unrecognized escape sequence
What am I doing wrong?
You are not escaping the \s in a non-verbatim string literal.
Solution: put a # in front of the string or double the backslashes, as per the C# rules for string literals.
Try to escape the escape ;)
Regex RegObj2 = new Regex("\\w[a][b][(c|d)][(c|d)].\\w");
or add a # (as #Dominic Kexel suggested)
There are two levels of potential escaping required when writing a regular expression:
The regular expression escaping (e.g. escaping brackets, or in this case specifying a character class)
The C# string literal escaping
In this case, it's the latter which is tripping you up. Either escape the \ so that it becomes part of the string, or use a verbatim string literal (with an # prefix) so that \ doesn't have its normal escaping meaning. So either of these:
Regex regex1 = new Regex(#"\w[a][b][(c|d)][(c|d)].\w");
Regex regex2 = new Regex("\\w[a][b][(c|d)][(c|d)].\\w");
The two approaches are absolutely equivalent at execution time. In both cases you're trying to create a string constant with the value
\w[a][b][(c|d)][(c|d)].\w
The two forms are just different ways of expressing this in C# source code.
The backslashes are not being escaped e.g. \\ or
new Regex(#"\w[a][b][(c|d)][(c|d)].\w");

Regex Expressions for all non alphanumeric symbols

I am trying to make a regular expression for a string that has at least 1 non alphanumeric symbol in it
The code I am trying to use is
Regex symbolPattern = new Regex("?[!##$%^&*()_-+=[{]};:<>|./?.]");
I'm trying to match only one of !##$%^&*()_-+=[{]};:<>|./?. but it doesn't seem to be working.
If you want to match non-alphanumeric symbols then just use \W|_.
Regex pattern = new Regex(#"\W|_");
This will match anything except 0-9 and a-z. Information on the \W character class and others available here (c# Regex Cheet Sheet).
https://www.mikesdotnetting.com/article/46/c-regular-expressions-cheat-sheet
You could also avoid regular expressions if you want:
return s.Any(c => !char.IsLetterOrDigit(c))
Can you check for the opposite condition?
Match match = Regex.Match(#"^([a-zA-Z0-9]+)$");
if (!match.Success) {
// it's alphanumeric
} else {
// it has one of those characters in it.
}
I didn't get your entire question, but this regex will match those strings that contains at least one non alphanumeric character. That includes whitespace (couldn't see that in your list though)
[^\w]+
Your regex just needs little tweaking. The hyphen is used to form ranges like A-Z, so if you want to match a literal hyphen, you either have to escape it with a backslash or move it to the end of the list. You also need to escape the square brackets because they're the delimiters for character class. Then get rid of that question mark at the beginning and you're in business.
Regex symbolPattern = new Regex(#"[!##$%^&*()_+=\[{\]};:<>|./?,-]");
If you only want to match ASCII punctuation characters, this is probably the simplest way. \W matches whitespace and control characters in addition to punctuation, and it matches them from the entire Unicode range, not just ASCII.
You seem to be missing a few characters, though: the backslash, apostrophe and quotation mark. Adding those gives you:
#"[!##$%^&*()_+=\[{\]};:<>|./?,\\'""-]"
Finally, it's a good idea to always use C#'s verbatim string literals (#"...") for regexes; it saves you a lot of hassle with backslashes. Quotation marks are escaped by doubling them.

Categories

Resources