C# regex for English char and non-English - c#

I use this ^[a-zA-Z''-'\s]{1,40}$ regex for name validator according to MSDN.
Now I want add NON-English characters to this.
How I can do this?

To support all BMP and astral planes, you need both \p{L} (all letters) and \p{M} (all diacritics) Unicode category classes:
^[\p{L}\p{M}\s'-]{1,40}$
Note that \p{L} already includes [a-zA-Z], and all lower- and uppercase letters.
Or, since \s matches newlines (I doubt you really need newline symbols to match), you can use \p{Zs} - Unicode separator class (various kinds of spaces):
^[\p{L}\p{M}\p{Zs}'-]{1,40}$
Placing the hyphen at the end is just best practice, although it would be handled as a literal hyphen in your regex, too.

You can try this:
^[\p{L}'\s-]{1,40}$
Note that \p{L} is Unicode property and it matches everything that has the property letter.

Related

Regex to restrict special characters in the beginning of an email address

PFB the regex. I want to make sure that the regex should not contain any special character just after # and just before. In-between it can allow any combination.
The regex I have now:
#"^[^\W_](?:[\w.-]*[^\W_])?#(([a-zA-Z0-9]+)(\.))([a-zA-Z]{2,3}|[0-9]{1,3})(\]?)$"))"
For example, the regex should not match
abc#.sj.com
abc#-.sj-.com
SSDFF-SAF#-_.SAVAVSAV-_.IP
Since you consider _ special, I'd recommend using [^\W_] at the beginning and then rearrange the starting part a bit. To prevent a special char before a #, just make sure there is a letter or digit there. I also recommend to remove redundant capturing groups/convert them into non-capturing:
#"^[^\W_](?:[\w.-]*[^\W_])?#(?:\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.|(?:[\w-]+\.)+)(?:[a-zA-Z]{2,3}|[0-9]{1,3})\]?$"
Here is a demo of how this regex matches now.
The [^\W_](?:[\w.-]*[^\W_])? matches:
[^\W_] - a digit or a letter only
(?:[\w.-]*[^\W_])? - a 1 or 0 occurrences of:
[\w.-]* - 0+ letters, digits, _, . and -
[^\W_] - a digit or a letter only
Change the initial [\w-\.]+ for [A-Za-z0-9\-\.]+.
Note that this excludes many acceptable email addresses.
Update
As pointed out, [A-Za-z0-9] is not an exact translation of \w. However, you appear to have a specific definition as to what you consider special characters and so it is probably easier for you to define within the square brackets what you class as allowable.

At least one digit, minimum 8 chars length, with unicode

I know that regex questions have been asked many times before, but I just can't make it to work as I need. What I need is a regex, with a minimum of 8 characters, containing at least one digit (digits can appear in the start, end or after other characters), and supporting Unicode, so that Hebrew, Arabic etc. characters can be used.
Here's the basic regex:
^(?=.*?\d).{8}
^.{8} will match any string that has at least 8 characters. (?=.*?\d) will assert there's a digit in there.
As for the Unicode support, that's up to the regex engine. If Unicode is supported, . should match a Unicode character. If you want to match graphemes instead, your regex flavor may support \X, which you could use instead of ..
If you want to allow non-latin digits, you may need to replace \d with \p{N} depending on your regex engine.
Update for the .NET flavor:
\d already matches Unicode digits so you don't need to use \p{N}
\X is not supported so you'll have to stick with . or use a workaround like (?>\P{M}\p{M}*).
Assuming you are using a C# or Java like regex flavor, and you mean
with characters a character of the unicode category "letter" you can
use:
(?=\p{L}*?\p{Nd})[\p{L}\p{Nd}]{8,}

Regex to allow non-ascii and foreign letters?

Is it possible to create a regular expression to allow non-ascii letters along with Latin alphabets, for example Chinese or Greek symbols(eg. A汉语AbN漢語 allowed)?
I currently have the following ^[\w\d][\w\d_\-\.\s]*$ which only allows Latin alphabets.
In .NET,
^[\p{L}\d_][\p{L}\d_.\s-]*$
is equivalent to your regex, additionally allowing other Unicode letters.
Explanation:
\p{L} is a shorthand for the Unicode property "Letter".
Caveat: I think you wanted to not allow the underscore as initial character (evidenced by its presence only in the second character class). Since \w includes the underscore, your regex did allow it, though. You might want to remove it from the first character class in my solution (it's not included in \p{L}, of course).
In ECMAScript, things are not so easy. You would have to define your own Unicode character ranges. Fortunately, a fellow StackOverflow user has already risen to the occasion and designed a JavaScript regex converter:
https://stackoverflow.com/a/8933546/20670

Regex Expressions for all non alphanumeric symbols

I am trying to make a regular expression for a string that has at least 1 non alphanumeric symbol in it
The code I am trying to use is
Regex symbolPattern = new Regex("?[!##$%^&*()_-+=[{]};:<>|./?.]");
I'm trying to match only one of !##$%^&*()_-+=[{]};:<>|./?. but it doesn't seem to be working.
If you want to match non-alphanumeric symbols then just use \W|_.
Regex pattern = new Regex(#"\W|_");
This will match anything except 0-9 and a-z. Information on the \W character class and others available here (c# Regex Cheet Sheet).
https://www.mikesdotnetting.com/article/46/c-regular-expressions-cheat-sheet
You could also avoid regular expressions if you want:
return s.Any(c => !char.IsLetterOrDigit(c))
Can you check for the opposite condition?
Match match = Regex.Match(#"^([a-zA-Z0-9]+)$");
if (!match.Success) {
// it's alphanumeric
} else {
// it has one of those characters in it.
}
I didn't get your entire question, but this regex will match those strings that contains at least one non alphanumeric character. That includes whitespace (couldn't see that in your list though)
[^\w]+
Your regex just needs little tweaking. The hyphen is used to form ranges like A-Z, so if you want to match a literal hyphen, you either have to escape it with a backslash or move it to the end of the list. You also need to escape the square brackets because they're the delimiters for character class. Then get rid of that question mark at the beginning and you're in business.
Regex symbolPattern = new Regex(#"[!##$%^&*()_+=\[{\]};:<>|./?,-]");
If you only want to match ASCII punctuation characters, this is probably the simplest way. \W matches whitespace and control characters in addition to punctuation, and it matches them from the entire Unicode range, not just ASCII.
You seem to be missing a few characters, though: the backslash, apostrophe and quotation mark. Adding those gives you:
#"[!##$%^&*()_+=\[{\]};:<>|./?,\\'""-]"
Finally, it's a good idea to always use C#'s verbatim string literals (#"...") for regexes; it saves you a lot of hassle with backslashes. Quotation marks are escaped by doubling them.

Regular Expression for alphanumeric and space

What is the regular exp for a text that can't contain any special characters except space?
Because Prajeesh only wants to match spaces, \s will not suffice as it matches all whitespace characters including line breaks and tabs.
A character set that should universally work across all RegEx parsers is:
[a-zA-Z0-9 ]
Further control depends on your needs. Word boundaries, multi-line support, etc... I would recommend visiting Regex Library which also has some links to various tutorials on how Regular Expression Parsing works.
[\w\s]*
\w will match [A-Za-z0-9_] and the \s will match whitespaces.
[\w ]* should match what you want.
Assuming "special characters" means anything that's not a letter or digit, and "space" means the space character (ASCII 32):
^[A-Za-z0-9 ]+$
You need #"^[A-Za-z0-9 ]+$". The \s character class matches things other than space (such as tab) and you since you want to match sure that no part of the string has other characters you should anchor it with ^ and $.
If you just want alphabets and spaces then you can use: #"[A-Za-z\s]+" to match at least one character or space. You could also use #"[A-Za-z ]+" instead without explicitly denoting the space.
Otherwise please clarify.
In C#, I'd believe it's ^(\w|\s)*$

Categories

Resources