I had a question on here for a RegularExpressionValidator which I'm relatively new to. It was to accept all alphanumeric, apostrophe, hyphen, underscore, space, ampersand, comma, parentheses, full stop.
The answer I was given was:
"^([a-zA-Z0-9 '-_&,()\.])+$"
This seemed good at first but it seems to accept amoung other things '*'.
Can anybody tell me what I have wrong here?
The problem appears to be the dash - inside a character class, if unescaped and not at the very end or very beginning of the character class, it denotes a range (A-Z would be a good example from your own regex).
Therefore '-_ is also interpreted as a range, and the characters between ASCII 39 (') and ASCII 95 (_) are ()*+,-./0-9:;<=>?#A-Z[\]^.
Put the dash at the end, and you should be fine:
^[a-zA-Z0-9 '_&,().-]+$
Your character class is not quite correct. This part: '-_ creates a range from the apostrophe character to the underscore character. In the ASCII table, the * character falls in between. You need to either escape the hyphen:
^([a-zA-Z0-9 '\-_&,()\.])+$
Or move it somewhere "insignificant", such as the end of the character class:
^([a-zA-Z0-9 '_&,()\.-])+$
In addition to the '-_ issue touched on by other people you also have the + on the end in the wrong place.
The value capture group in this regex:
^([a-zA-Z0-9 '-_&,()\.])+$
in Expresso is the last character in the string.
If you want to capture the whole thing inside the regex then put the + straight after the ] like
^([a-zA-Z0-9 '-_&,()\.]+)$
If you are not bothered about extracting the value captured inside the ( ) then drop the ()
^[a-zA-Z0-9 '-_&,()\.]+$
As I also tripped up on the fact that this uses a character class in my initial answer, I dug around for more info. Found the following tutorial excerpt at http://www.regular-expressions.info/charclass.html
The only special characters or
metacharacters inside a character
class are the closing bracket (]), the
backslash (), the caret (^) and the
hyphen (-). The usual metacharacters
are normal characters inside a
character class, and do not need to be
escaped by a backslash.
Escaping the - with \- should solve your problem.
Related
I m trying to matching a string which will not allow same special character at same time
my regular expression is:
[RegularExpression(#"^+[a-zA-Z0-9]+[a-zA-Z0-9.&' '-]+[a-zA-Z0-9]$")]
this solve my all requirement except the below two issues
this is my string : bracks
acceptable :
bra-cks, b-r-a-c-ks, b.r.a.c.ks, bra cks (by the way above regular expression solved this)
not acceptable:
issue 1: b.. or bra..cks, b..racks, bra...cks (two or more any special character together),
issue 2: bra cks (two ore more white space together)
You can use a negative lookahead to invalidate strings containing two consecutive special characters:
^(?!.*[.&' -]{2})[a-zA-Z0-9.&' -]+$
Demo: https://regex101.com/r/7j14bu/1
The goal
From what i can tell by your description and pattern, you are trying to match text, which start and end with alphanumeric (due to ^+[a-zA-Z0-9] and [a-zA-Z0-9]$ inyour original pattern), and inside, you just don't want to have any two consecuive (adjacent) special characters, which, again, guessing from the regex, are . & ' -
What was wrong
^+ - i think here you wanted to assure that match starts at the beginning of the line/string, so you don't need + here
[a-zA-Z0-9.&' '-] - in this character class you doubled ' which is totally unnecessary
Solution
Please try pattern
^[a-zA-Z0-9](?:(?![.& '-]{2,})[a-zA-Z0-9.& '-])*[a-zA-Z0-9]$
Pattern explanation
^ - anchor, match the beginning of the string
[a-zA-Z0-9] - character class, match one of the characters inside []
(?:...) - non capturing group
(?!...) - negative lookahead
[.& '-]{2,} - match 2 or more of characters inside character class
[a-zA-Z0-9.& '-] - character class, match one of the characters inside []
* - match zero or more text matching preceeding pattern
$ - anchor, match the end of the string
Regex demo
Some remarks on your current regex:
It looks like you placed the + quantifiers before the pattern you wanted to quantify, instead of after. For instance, ^+ doesn't make much sense, since ^ is just the start of the input, and most regex engines would not even allow that.
The pattern [a-zA-Z0-9.&' '-]+ doesn't distinguish between alphanumerical and other characters, while you want the rules for them to be different. Especially for the other characters you don't want them to repeat, so that + is not desired for those.
In a character class it doesn't make sense to repeat the same character, like you have a repeat of a quote ('). Maybe you wanted to somehow delimit the space, but realise that those quotes are interpreted literally. So probably you should just remove them. Or if you intended to allow for a quote, only list it once.
Here is a correction (add the quote if you still need it):
^[a-zA-Z0-9]+(?:[.& -][a-zA-Z0-9]+)*$
Follow-up
Based on a comment, I suspect you would allow a non-alphanumerical character to be surrounded by single spaces, even if that gives a sequence of more than one non-alphanumerical character. In that case use this:
^[a-zA-Z0-9]+(?:(?:[ ]|[ ]?[.&-][ ]?)[a-zA-Z0-9]+)*$
So here the space gets a different role: it can optionally occur before and after a delimiter (one of ".&-"), or it can occur on its own. The brackets around the spaces are not needed, but I used them to stress that the space is intended and not a typo.
I have an application which needs some verifications for some fields. One of them is for a last name which can be composed of 2 words. In my regex, I have to accept these spaces so I tried a lot of things but I did'nt find any solution.
Here is my regex :
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ-\s]+$"
The \s are normally for the spaces but it does not work and I got this error message :
parsing "^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ-\s]+$" - Cannot include class \s in character range.
ANy idea guys?
- denotes a character range, just as you use A-Z to describe any character between A and Z. Your regex uses ñ-\s which the engine tries to interpret as any character between ñ and \s -- and then notices, that \s doesn't make a whole lot of sense there, because \s itself is only an abbreviation for any whitespace character.
That's where the error comes from.
To get rid of this, you should always put - at the end of your character class, if you want to include the - literal character:
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ\s-]+$"
This way, the engine knows that \s- is not a character range, but the two characters \s and - seperately.
The other way is to escape the - character:
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêç\-\s]+$"
So now the engine interprets ñ\-\s not as a character range, but as any of the characters ñ, - or \s. Personally, though I always try to avoid escaping as often as possible, because IMHO it clutters up and needlessly stretches the expression in length.
You need to escape the last - character - ñ-\s is parsed like the range a-z:
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ\-\s]+$"
See also on Regex Storm: [a-\s] , [a\-\s]
[RegularExpression(#"^[a-zA-Z\s]+$", ErrorMessage = "Only alphabetic characters and spaces are allowed.")]
This works
I have an application which needs some verifications for some fields. One of them is for a last name which can be composed of 2 words. In my regex, I have to accept these spaces so I tried a lot of things but I did'nt find any solution.
Here is my regex :
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ-\s]+$"
The \s are normally for the spaces but it does not work and I got this error message :
parsing "^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ-\s]+$" - Cannot include class \s in character range.
ANy idea guys?
- denotes a character range, just as you use A-Z to describe any character between A and Z. Your regex uses ñ-\s which the engine tries to interpret as any character between ñ and \s -- and then notices, that \s doesn't make a whole lot of sense there, because \s itself is only an abbreviation for any whitespace character.
That's where the error comes from.
To get rid of this, you should always put - at the end of your character class, if you want to include the - literal character:
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ\s-]+$"
This way, the engine knows that \s- is not a character range, but the two characters \s and - seperately.
The other way is to escape the - character:
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêç\-\s]+$"
So now the engine interprets ñ\-\s not as a character range, but as any of the characters ñ, - or \s. Personally, though I always try to avoid escaping as often as possible, because IMHO it clutters up and needlessly stretches the expression in length.
You need to escape the last - character - ñ-\s is parsed like the range a-z:
#"^[a-zA-Zàéèêçñ\s][a-zA-Zàéèêçñ\-\s]+$"
See also on Regex Storm: [a-\s] , [a\-\s]
[RegularExpression(#"^[a-zA-Z\s]+$", ErrorMessage = "Only alphabetic characters and spaces are allowed.")]
This works
I've inherited some C# code with the following regular expression
Regex(#"^[a-zA-Z''-'\s]{1,40}$")
I understand this string except for the role of the single quotes. I've searched all over but can't seem to find an explanation. Any ideas?
From what I can tell, the expression is redundant.
It matches a-z or A-Z, or the ' character, or anything between ' and ' (which of course is only the ' character again, or any whitespace.
I've tested this using RegexPal and it doesn't appear to match anything but these characters. Perhaps the sequence was generated by code, or it used to match a wider range of characters in an earlier version?
UPDATE: From your comments (matching a name), I'm gonna go ahead and guess the author thought (s)he was escaping a hyphen by putting it in quotes, and wasn't the most stellar software tester. What they probably meant was:
Regex(#"^[a-zA-Z'\-\s]{1,40}$") //Escaped the hyphen
Which could also be written as:
Regex(#"^[a-zA-Z'\s-]{1,40}$") //Put the hyphen at the end where it's not ambiguous
The only way having the apostrophe / single quote three times makes sense is if the second and third instances are actually fancy curly single quotes such as ‘, ’, and ‛. If so a better (clearer) way to represent it would be to use the unicode escapes:
Regex(#"^[a-zA-Z'\u2018-\u201B\s]{1,40}$")
Incidentally some languages, such as PowerShell, explicitly allow these curly single quotes and treat them the same as the ASCII ' (0x27) character. From the PowerShell 2.0 Language Specification:
single-quote-character:
' (U+0027)
Left single quotation mark (U+2018)
Right single quotation mark (U+2019)
Single low-9 quotation mark (U+201A)
Single high-reversed-9 quotation mark (U+201B)
As it is the three single quote characters are redundant. They represent the single quote character (#1) and the range of characters which both begins and ends at the single quote (#2 and #3 separated by a hyphen).
It looks like it is an error, the writer seems to have meant to include the hyphen character in the class by "escaping" it in single quotes. Without escaping it the hyphen represents a character range, like in a-z and A-Z.
I'm guessing the original author meant [a-zA-Z'\-\s]
The extra apostrophes are redundant, so it doesn't make much sense. One possibility is that the author tried to escape the dash to include it in the pattern, but the correct way to do that would be to use a backslash:
Regex(#"^[a-zA-Z'\-\s]{1,40}$")
(Using apostrophes around a literal is for example used in custom format strings, where the author might have picked it up.)
Why this do not match and how to make it work?
Regex.Match("qwe", ".*?(?=([ $]))");
I should match everything to first space or to the end of line.
Your specific problem is that you need to use an alternation, not a character class, because inside a character class the $ symbol literally means "match a dollar symbol", and does not have its special meaning end-of-line in that context.
( |$)
It seems however that your example is a bit strange. It would be simpler to match any character except space, then you wouldn't need a lookahead at all.
Try with:
Regex.Match("qwe", "^([^ ]*)");