Please provide a solution to write a regular expression as following in C#.NET:
I would require a RegEx for Non-Alphabets(a to z;A to Z) and Non-Numerals(0 to 9).
Mean to say as reverse way for getting regular expression other than alphabets and otherthan numerals(0 to 9).
Kindly suggest the solution for the same.
You can use a negated character class here:
[^a-zA-Z0-9]
Above regex will match a single character which can't be a latin lowercase or uppercase letter or a number.
The ^ at the start of the character class (the part between [ and ]) negates the complete class so that it matches anything not in the class, instead of normal character class behavior.
To make it useful, you probably want one of those:
Zero or more such characters
[^a-zA-Z0-9]*
The asterisk (*) here signifies that the preceding part can be repeated zero or more times.
One or more such characters
[^a-zA-Z0-9]+
The plus (+) here signifies that the preceding part can be repeated one or more times.
A complete (possibly empty) string, consisting only of such characters
^[^a-zA-Z0-9]*$
Here the characters ^ and $ have a meaning as anchors, matching the start and end of the string, respectively. This ensures that the entire string consists of characters not in that character class and no other characters come before or after them.
A complete (non-empty) string, consisting only of such characters
^[^a-zA-Z0-9]+$
Elaborating a bit, this won't (and can't) make sure that you won't use any other characters, possibly from other scripts. The string аеΒ would be completely valid with the above regular expression, because it uses letters from Greek and Cyrillic. Furthermore there are other pitfalls. The string á will pass above regular expression, while the string ́a will not (because it constructs the letter á from the letter a and a combining diacritical mark).
So negated character classes have to be taken with care at times.
I can also use numerals from other scripts, if I wanted to: ١٢٣ :-)
You can use the character class
[^\p{L&}\p{Nd}]
if you need to take care of the above things.
just negate the class:
[^A-Za-z0-9]
To obey local setting use:
[^[:alnum:]]
Related
I m trying to matching a string which will not allow same special character at same time
my regular expression is:
[RegularExpression(#"^+[a-zA-Z0-9]+[a-zA-Z0-9.&' '-]+[a-zA-Z0-9]$")]
this solve my all requirement except the below two issues
this is my string : bracks
acceptable :
bra-cks, b-r-a-c-ks, b.r.a.c.ks, bra cks (by the way above regular expression solved this)
not acceptable:
issue 1: b.. or bra..cks, b..racks, bra...cks (two or more any special character together),
issue 2: bra cks (two ore more white space together)
You can use a negative lookahead to invalidate strings containing two consecutive special characters:
^(?!.*[.&' -]{2})[a-zA-Z0-9.&' -]+$
Demo: https://regex101.com/r/7j14bu/1
The goal
From what i can tell by your description and pattern, you are trying to match text, which start and end with alphanumeric (due to ^+[a-zA-Z0-9] and [a-zA-Z0-9]$ inyour original pattern), and inside, you just don't want to have any two consecuive (adjacent) special characters, which, again, guessing from the regex, are . & ' -
What was wrong
^+ - i think here you wanted to assure that match starts at the beginning of the line/string, so you don't need + here
[a-zA-Z0-9.&' '-] - in this character class you doubled ' which is totally unnecessary
Solution
Please try pattern
^[a-zA-Z0-9](?:(?![.& '-]{2,})[a-zA-Z0-9.& '-])*[a-zA-Z0-9]$
Pattern explanation
^ - anchor, match the beginning of the string
[a-zA-Z0-9] - character class, match one of the characters inside []
(?:...) - non capturing group
(?!...) - negative lookahead
[.& '-]{2,} - match 2 or more of characters inside character class
[a-zA-Z0-9.& '-] - character class, match one of the characters inside []
* - match zero or more text matching preceeding pattern
$ - anchor, match the end of the string
Regex demo
Some remarks on your current regex:
It looks like you placed the + quantifiers before the pattern you wanted to quantify, instead of after. For instance, ^+ doesn't make much sense, since ^ is just the start of the input, and most regex engines would not even allow that.
The pattern [a-zA-Z0-9.&' '-]+ doesn't distinguish between alphanumerical and other characters, while you want the rules for them to be different. Especially for the other characters you don't want them to repeat, so that + is not desired for those.
In a character class it doesn't make sense to repeat the same character, like you have a repeat of a quote ('). Maybe you wanted to somehow delimit the space, but realise that those quotes are interpreted literally. So probably you should just remove them. Or if you intended to allow for a quote, only list it once.
Here is a correction (add the quote if you still need it):
^[a-zA-Z0-9]+(?:[.& -][a-zA-Z0-9]+)*$
Follow-up
Based on a comment, I suspect you would allow a non-alphanumerical character to be surrounded by single spaces, even if that gives a sequence of more than one non-alphanumerical character. In that case use this:
^[a-zA-Z0-9]+(?:(?:[ ]|[ ]?[.&-][ ]?)[a-zA-Z0-9]+)*$
So here the space gets a different role: it can optionally occur before and after a delimiter (one of ".&-"), or it can occur on its own. The brackets around the spaces are not needed, but I used them to stress that the space is intended and not a typo.
I'm trying to modify a fairly basic regex pattern in C# that tests for phone numbers.
The patterns is -
[0-9]+(\.[0-9][0-9]?)?
I have two questions -
1) The existing expression does work (although it is fairly restrictive) but I can't quite understand how it works. Regexps for similar issues seem to look more like this one -
/^[0-9()]+$/
2) How could I extend this pattern to allow brackets, periods and a single space to separate numbers. I tried a few variations to include -
[0-9().+\s?](\.[0-9][0-9]?)?
Although i can't seem to create a valid pattern.
Any help would be much appreciated.
Thanks,
[0-9]+(\.[0-9][0-9]?)?
First of all, I recommend checking out either regexr.com or regex101.com, so you yourself get an understanding of how regex works. Both websites will give you a step-by-step explanation of what each symbol in the regex does.
Now, one of the main things you have to understand is that regex has special characters. This includes, among others, the following: []().-+*?\^$. So, if you want your regex to match a literal ., for example, you would have to escape it, since it's a special character. To do so, either use \. or [.]. Backslashes serve to escape other characters, while [] means "match any one of the characters in this set". Some special characters don't have a special meaning inside these brackets and don't require escaping.
Therefore, the regex above will match any combination of digits of length 1 or more, followed by an optional suffix (foobar)?, which has to be a dot, followed by one or two digits. In fact, this regex seems more like it's supposed to match decimal numbers with up to two digits behind the dot - not phone numbers.
/^[0-9()]+$/
What this does is pretty simple - match any combination of digits or round brackets that has the length 1 or greater.
[0-9().+\s?](\.[0-9][0-9]?)?
What you're matching here is:
one of: a digit, round bracket, dot, plus sign, whitespace or question mark; but exactly once only!
optionally followed by a dot and one or two digits
A suitable regex for your purpose could be:
(\+\d{2})?((\(0\)\d{2,3})|\d{2,3})?\d+
Enter this in one of the websites mentioned above to understand how it works. I modified it a little to also allow, for example +49 123 4567890.
Also, for simplicity, I didn't include spaces - so when using this regex, you have to remove all the spaces in your input first. In C#, that should be possible with yourString.Replace(" ", ""); (simply replacing all spaces with nothing = deleting spaces)
The + after the character set is a quantifier (meaning the preceeding character, character set or group is repeated) at least one, and unlimited number of times and it's greedy (matched the most possible).
Then [0-9().+\s]+ will match any character in set one or more times.
PFB the regex. I want to make sure that the regex should not contain any special character just after # and just before. In-between it can allow any combination.
The regex I have now:
#"^[^\W_](?:[\w.-]*[^\W_])?#(([a-zA-Z0-9]+)(\.))([a-zA-Z]{2,3}|[0-9]{1,3})(\]?)$"))"
For example, the regex should not match
abc#.sj.com
abc#-.sj-.com
SSDFF-SAF#-_.SAVAVSAV-_.IP
Since you consider _ special, I'd recommend using [^\W_] at the beginning and then rearrange the starting part a bit. To prevent a special char before a #, just make sure there is a letter or digit there. I also recommend to remove redundant capturing groups/convert them into non-capturing:
#"^[^\W_](?:[\w.-]*[^\W_])?#(?:\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.|(?:[\w-]+\.)+)(?:[a-zA-Z]{2,3}|[0-9]{1,3})\]?$"
Here is a demo of how this regex matches now.
The [^\W_](?:[\w.-]*[^\W_])? matches:
[^\W_] - a digit or a letter only
(?:[\w.-]*[^\W_])? - a 1 or 0 occurrences of:
[\w.-]* - 0+ letters, digits, _, . and -
[^\W_] - a digit or a letter only
Change the initial [\w-\.]+ for [A-Za-z0-9\-\.]+.
Note that this excludes many acceptable email addresses.
Update
As pointed out, [A-Za-z0-9] is not an exact translation of \w. However, you appear to have a specific definition as to what you consider special characters and so it is probably easier for you to define within the square brackets what you class as allowable.
After extensive search, I am unable to find an explanation for the need to use .* in regex. For example, MSDN suggests a password regex of
#\"(?=.{6,})(?=(.*\d){1,})(?=(.*\W){1,})"
for length >= 6, 1+ digit and 1+ special character.
Why can't I just use:
#\"(?=.{6,})(?=(\d){1,})(?=(\W){1,})"
.* just means "0 or more of any character"
It's broken down into two parts:
. - a "dot" indicates any character
* - means "0 or more instances of the preceding regex token"
In your example above, this is important, since they want to force the password to contain a special character and a number, while still allowing all other characters. If you used \d instead of .*, for example, then that would restrict that portion of the regex to only match decimal characters (\d is shorthand for [0-9], meaning any decimal). Similarly, \W instead of .*\W would cause that portion to only match non-word characters.
A good reference containing many of these tokens for .NET can be found on the MSDN here: Regular Expression Language - Quick Reference
Also, if you're really looking to delve into regex, take a look at http://www.regular-expressions.info/. While it can sometimes be difficult to find what you're looking for on that site, it's one of the most complete and begginner-friendly regex references I've seen online.
Just FYI, that regex doesn't do what they say it does, and the way it's written is needlessly verbose and confusing. They say it's supposed to match more than seven characters, but it really matches as few as six. And while the other two lookaheads correctly match at least one each of the required character types, they can be written much more simply.
Finally, the string you copied isn't just a regex, it's an XML attribute value (including the enclosing quotes) that seems to represent a C# string literal (except the closing quote is missing). I've never used a Membership object, but I'm pretty sure that syntax is faulty. In any case, the actual regex is:
(?=.{6,})(?=(.*\d){1,})(?=(.*\W){1,})
..but it should be:
(?=.{8,})(?=.*\d)(?=.*\W)
The first lookahead tries to match eight or more of any characters. If it succeeds, the match position (or cursor, if you prefer) is reset to the beginning and the second lookahead scans for a digit. If it finds one, the cursor is reset again and the third lookahead scans for a special character. (Which, by the way, includes whitespace, control characters, and a boatload of other esoteric characters; probably not what the author intended.)
If you left the .* out of the latter two lookaheads, you would have (?=\d) asserting that the first character is a digit, and (?=\W) asserting that it's not a digit. (Digits are classed as word characters, and \W matches anything that's not a word character.) The .* in each lookahead causes it to initially gobble up the whole string, then backtrack, giving back one character at a time until it reaches a spot where the \d or \W can match. That's how they can match the digit and the special character anywhere in the string.
The .* portion just allows for literally any combination of characters to be entered. It's essentially allowing for the user to add any level of extra information to the password on top of the data you are requiring
Note: I don't think that MSDN page is actually suggesting that as a password validator. It is just providing an example of a possible one.
I'm currently using the following line of code:
Regex Regex_Alpha = new Regex(#"[a-zA-Z]+('[a-zA-Z])?[a-zA-Z]*");
What I want to do is filter the input of text fields with the condition that input should only be letters and the apostrophe symbol (actually, I still want to add more, but I'm trying to resolve this first).
Right now, it is accepting ALL characters, even numbers.
With my understanding of Regex, I tried to formulate my own expression in the line of:
Regex Regex_Alpha = new Regex(#"^[a-zA-Z'-"+$);
It filters numbers, but doesn't accept the apostrophe symbol. Tried to remove the # sign and filter the apostrophe with the backslash escape character, but still no use.
What should be the best approach to filter the input so that it only accepts letters and apostrophe? (I'll do the rest of the symbols once I understand how this one should work)
As I've commented, your first regular expression is a pretty good shot at "letters, with a single apostrophe not at either end". However, it matchs any string with even a single letter because a regular expression looks for any match in the input, not for whether the entire input matches.
You can fix this by doing what you've done in your second regular expression - just put a ^ at the start and a $ at the end. This means the start and end of the expression have to match the start and end of the input, so it ensures the whole input is only made up of letters and a possible apostrophe.
Regarding your second regular expression, you have a few of problems.
If you want a double-quote in a #"..." string literal, you need to put two double quotes. (I think this might just be a typing mistake in your question, as what you currently have wouldn't even compile.)
You need to close your character class with a ], otherwise the [ and everything inside just get treated as a sequence of characters to match, one after the other.
If you want a hyphen in a character class, it has to go at the start or end, or it gets mistaken for a "between" hyphen (as in A-Z).
The expression #"^[a-zA-Z'""-]+$" should match "any string entirely made of letters, apostrophes, quotes or hyphens".