Why can white-space character class not be used in Regex? [duplicate]

Why can white-space character class not be used in Regex? [duplicate] - c#

I have an application which needs some verifications for some fields. One of them is for a last name which can be composed of 2 words. In my regex, I have to accept these spaces so I tried a lot of things but I did'nt find any solution.
Here is my regex :
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ-\s]+$"
The \s are normally for the spaces but it does not work and I got this error message :
parsing "^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ-\s]+$" - Cannot include class \s in character range.
ANy idea guys?

- denotes a character range, just as you use A-Z to describe any character between A and Z. Your regex uses ñ-\s which the engine tries to interpret as any character between ñ and \s -- and then notices, that \s doesn't make a whole lot of sense there, because \s itself is only an abbreviation for any whitespace character.
That's where the error comes from.
To get rid of this, you should always put - at the end of your character class, if you want to include the - literal character:
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ\s-]+$"
This way, the engine knows that \s- is not a character range, but the two characters \s and - seperately.
The other way is to escape the - character:
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêç\-\s]+$"
So now the engine interprets ñ\-\s not as a character range, but as any of the characters ñ, - or \s. Personally, though I always try to avoid escaping as often as possible, because IMHO it clutters up and needlessly stretches the expression in length.

You need to escape the last - character - ñ-\s is parsed like the range a-z:
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ\-\s]+$"
See also on Regex Storm: [a-\s] , [a\-\s]

[RegularExpression(#"^[a-zA-Z\s]+$", ErrorMessage = "Only alphabetic characters and spaces are allowed.")]
This works

Related

How to match string by using regular expression which will not allow same special character at same time?

I m trying to matching a string which will not allow same special character at same time
my regular expression is:
[RegularExpression(#"^+[a-zA-Z0-9]+[a-zA-Z0-9.&' '-]+[a-zA-Z0-9]$")]
this solve my all requirement except the below two issues
this is my string : bracks
acceptable :
bra-cks, b-r-a-c-ks, b.r.a.c.ks, bra cks (by the way above regular expression solved this)
not acceptable:
issue 1: b.. or bra..cks, b..racks, bra...cks (two or more any special character together),
issue 2: bra cks (two ore more white space together)

You can use a negative lookahead to invalidate strings containing two consecutive special characters:
^(?!.*[.&' -]{2})[a-zA-Z0-9.&' -]+$
Demo: https://regex101.com/r/7j14bu/1

The goal
From what i can tell by your description and pattern, you are trying to match text, which start and end with alphanumeric (due to ^+[a-zA-Z0-9] and [a-zA-Z0-9]$ inyour original pattern), and inside, you just don't want to have any two consecuive (adjacent) special characters, which, again, guessing from the regex, are . & ' -
What was wrong
^+ - i think here you wanted to assure that match starts at the beginning of the line/string, so you don't need + here
[a-zA-Z0-9.&' '-] - in this character class you doubled ' which is totally unnecessary
Solution
Please try pattern
^[a-zA-Z0-9](?:(?![.& '-]{2,})[a-zA-Z0-9.& '-])*[a-zA-Z0-9]$
Pattern explanation
^ - anchor, match the beginning of the string
[a-zA-Z0-9] - character class, match one of the characters inside []
(?:...) - non capturing group
(?!...) - negative lookahead
[.& '-]{2,} - match 2 or more of characters inside character class
[a-zA-Z0-9.& '-] - character class, match one of the characters inside []
* - match zero or more text matching preceeding pattern
$ - anchor, match the end of the string
Regex demo

Some remarks on your current regex:
It looks like you placed the + quantifiers before the pattern you wanted to quantify, instead of after. For instance, ^+ doesn't make much sense, since ^ is just the start of the input, and most regex engines would not even allow that.
The pattern [a-zA-Z0-9.&' '-]+ doesn't distinguish between alphanumerical and other characters, while you want the rules for them to be different. Especially for the other characters you don't want them to repeat, so that + is not desired for those.
In a character class it doesn't make sense to repeat the same character, like you have a repeat of a quote ('). Maybe you wanted to somehow delimit the space, but realise that those quotes are interpreted literally. So probably you should just remove them. Or if you intended to allow for a quote, only list it once.
Here is a correction (add the quote if you still need it):
^[a-zA-Z0-9]+(?:[.& -][a-zA-Z0-9]+)*$
Follow-up
Based on a comment, I suspect you would allow a non-alphanumerical character to be surrounded by single spaces, even if that gives a sequence of more than one non-alphanumerical character. In that case use this:
^[a-zA-Z0-9]+(?:(?:[ ]|[ ]?[.&-][ ]?)[a-zA-Z0-9]+)*$
So here the space gets a different role: it can optionally occur before and after a delimiter (one of ".&-"), or it can occur on its own. The brackets around the spaces are not needed, but I used them to stress that the space is intended and not a typo.

What does .* do in regex?

After extensive search, I am unable to find an explanation for the need to use .* in regex. For example, MSDN suggests a password regex of
#\"(?=.{6,})(?=(.*\d){1,})(?=(.*\W){1,})"
for length >= 6, 1+ digit and 1+ special character.
Why can't I just use:
#\"(?=.{6,})(?=(\d){1,})(?=(\W){1,})"

.* just means "0 or more of any character"
It's broken down into two parts:
. - a "dot" indicates any character
* - means "0 or more instances of the preceding regex token"
In your example above, this is important, since they want to force the password to contain a special character and a number, while still allowing all other characters. If you used \d instead of .*, for example, then that would restrict that portion of the regex to only match decimal characters (\d is shorthand for [0-9], meaning any decimal). Similarly, \W instead of .*\W would cause that portion to only match non-word characters.
A good reference containing many of these tokens for .NET can be found on the MSDN here: Regular Expression Language - Quick Reference
Also, if you're really looking to delve into regex, take a look at http://www.regular-expressions.info/. While it can sometimes be difficult to find what you're looking for on that site, it's one of the most complete and begginner-friendly regex references I've seen online.

Just FYI, that regex doesn't do what they say it does, and the way it's written is needlessly verbose and confusing. They say it's supposed to match more than seven characters, but it really matches as few as six. And while the other two lookaheads correctly match at least one each of the required character types, they can be written much more simply.
Finally, the string you copied isn't just a regex, it's an XML attribute value (including the enclosing quotes) that seems to represent a C# string literal (except the closing quote is missing). I've never used a Membership object, but I'm pretty sure that syntax is faulty. In any case, the actual regex is:
(?=.{6,})(?=(.*\d){1,})(?=(.*\W){1,})
..but it should be:
(?=.{8,})(?=.*\d)(?=.*\W)
The first lookahead tries to match eight or more of any characters. If it succeeds, the match position (or cursor, if you prefer) is reset to the beginning and the second lookahead scans for a digit. If it finds one, the cursor is reset again and the third lookahead scans for a special character. (Which, by the way, includes whitespace, control characters, and a boatload of other esoteric characters; probably not what the author intended.)
If you left the .* out of the latter two lookaheads, you would have (?=\d) asserting that the first character is a digit, and (?=\W) asserting that it's not a digit. (Digits are classed as word characters, and \W matches anything that's not a word character.) The .* in each lookahead causes it to initially gobble up the whole string, then backtrack, giving back one character at a time until it reaches a spot where the \d or \W can match. That's how they can match the digit and the special character anywhere in the string.

The .* portion just allows for literally any combination of characters to be entered. It's essentially allowing for the user to add any level of extra information to the password on top of the data you are requiring
Note: I don't think that MSDN page is actually suggesting that as a password validator. It is just providing an example of a possible one.

C# Regex - Accept spaces in a string

I have an application which needs some verifications for some fields. One of them is for a last name which can be composed of 2 words. In my regex, I have to accept these spaces so I tried a lot of things but I did'nt find any solution.
Here is my regex :
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ-\s]+$"
The \s are normally for the spaces but it does not work and I got this error message :
parsing "^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ-\s]+$" - Cannot include class \s in character range.
ANy idea guys?

- denotes a character range, just as you use A-Z to describe any character between A and Z. Your regex uses ñ-\s which the engine tries to interpret as any character between ñ and \s -- and then notices, that \s doesn't make a whole lot of sense there, because \s itself is only an abbreviation for any whitespace character.
That's where the error comes from.
To get rid of this, you should always put - at the end of your character class, if you want to include the - literal character:
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ\s-]+$"
This way, the engine knows that \s- is not a character range, but the two characters \s and - seperately.
The other way is to escape the - character:
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêç\-\s]+$"
So now the engine interprets ñ\-\s not as a character range, but as any of the characters ñ, - or \s. Personally, though I always try to avoid escaping as often as possible, because IMHO it clutters up and needlessly stretches the expression in length.

You need to escape the last - character - ñ-\s is parsed like the range a-z:
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ\-\s]+$"
See also on Regex Storm: [a-\s] , [a\-\s]

[RegularExpression(#"^[a-zA-Z\s]+$", ErrorMessage = "Only alphabetic characters and spaces are allowed.")]
This works

RegularExpressionValidator for TextBox

I had a question on here for a RegularExpressionValidator which I'm relatively new to. It was to accept all alphanumeric, apostrophe, hyphen, underscore, space, ampersand, comma, parentheses, full stop.
The answer I was given was:
"^([a-zA-Z0-9 '-_&,()\.])+$"
This seemed good at first but it seems to accept amoung other things '*'.
Can anybody tell me what I have wrong here?

The problem appears to be the dash - inside a character class, if unescaped and not at the very end or very beginning of the character class, it denotes a range (A-Z would be a good example from your own regex).
Therefore '-_ is also interpreted as a range, and the characters between ASCII 39 (') and ASCII 95 (_) are ()*+,-./0-9:;<=>?#A-Z[\]^.
Put the dash at the end, and you should be fine:
^[a-zA-Z0-9 '_&,().-]+$

Your character class is not quite correct. This part: '-_ creates a range from the apostrophe character to the underscore character. In the ASCII table, the * character falls in between. You need to either escape the hyphen:
^([a-zA-Z0-9 '\-_&,()\.])+$
Or move it somewhere "insignificant", such as the end of the character class:
^([a-zA-Z0-9 '_&,()\.-])+$

In addition to the '-_ issue touched on by other people you also have the + on the end in the wrong place.
The value capture group in this regex:
^([a-zA-Z0-9 '-_&,()\.])+$
in Expresso is the last character in the string.
If you want to capture the whole thing inside the regex then put the + straight after the ] like
^([a-zA-Z0-9 '-_&,()\.]+)$
If you are not bothered about extracting the value captured inside the ( ) then drop the ()
^[a-zA-Z0-9 '-_&,()\.]+$

As I also tripped up on the fact that this uses a character class in my initial answer, I dug around for more info. Found the following tutorial excerpt at http://www.regular-expressions.info/charclass.html
The only special characters or
metacharacters inside a character
class are the closing bracket (]), the
backslash (), the caret (^) and the
hyphen (-). The usual metacharacters
are normal characters inside a
character class, and do not need to be
escaped by a backslash.
Escaping the - with \- should solve your problem.

Regular Expression for alphanumeric and space

What is the regular exp for a text that can't contain any special characters except space?

Because Prajeesh only wants to match spaces, \s will not suffice as it matches all whitespace characters including line breaks and tabs.
A character set that should universally work across all RegEx parsers is:
[a-zA-Z0-9 ]
Further control depends on your needs. Word boundaries, multi-line support, etc... I would recommend visiting Regex Library which also has some links to various tutorials on how Regular Expression Parsing works.

[\w\s]*
\w will match [A-Za-z0-9_] and the \s will match whitespaces.
[\w ]* should match what you want.

Assuming "special characters" means anything that's not a letter or digit, and "space" means the space character (ASCII 32):
^[A-Za-z0-9 ]+$

You need #"^[A-Za-z0-9 ]+$". The \s character class matches things other than space (such as tab) and you since you want to match sure that no part of the string has other characters you should anchor it with ^ and $.

If you just want alphabets and spaces then you can use: #"[A-Za-z\s]+" to match at least one character or space. You could also use #"[A-Za-z ]+" instead without explicitly denoting the space.
Otherwise please clarify.

In C#, I'd believe it's ^(\w|\s)*$

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why can white-space character class not be used in Regex? [duplicate] - c#

You need to escape the last - character - ñ-\s is parsed like the range a-z: #"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ\-\s]+$" See also on Regex Storm: [a-\s] , [a\-\s]

[RegularExpression(#"^[a-zA-Z\s]+$", ErrorMessage = "Only alphabetic characters and spaces are allowed.")] This works

Related

How to match string by using regular expression which will not allow same special character at same time?

What does .* do in regex?

C# Regex - Accept spaces in a string

RegularExpressionValidator for TextBox

Regular Expression for alphanumeric and space

Categories

Resources