C# Regular Expression to match letters, numbers and underscore - c#

I am trying to create a regular expression pattern in C#. The pattern can only allow for:
letters
numbers
underscores
So far I am having little luck (i'm not good at RegEx). Here is what I have tried thus far:
// Create the regular expression
string pattern = #"\w+_";
Regex regex = new Regex(pattern);
// Compare a string against the regular expression
return regex.IsMatch(stringToTest);

EDIT :
#"^[a-zA-Z0-9\_]+$"
or
#"^\w+$"

#"^\w+$"
\w matches any "word character", defined as digits, letters, and underscores. It's Unicode-aware so it'll match letters with umlauts and such (better than trying to roll your own character class like [A-Za-z0-9_] which would only match English letters).
The ^ at the beginning means "match the beginning of the string here", and the $ at the end means "match the end of the string here". Without those, e.g. if you just had #"\w+", then "##Foo##" would match, because it contains one or more word characters. With the ^ and $, then "##Foo##" would not match (which sounds like what you're looking for), because you don't have beginning-of-string followed by one-or-more-word-characters followed by end-of-string.

Try experimenting with something like http://www.weitz.de/regex-coach/ which lets you develop regex interactively.
It's designed for Perl, but helped me understand how a regex works in practice.

Regex
packedasciiRegex = new Regex(#"^[!#$%&'()*+,-./:;?#[\]^_]*$");

Related

Using Regular Expressions to replace patterns in C#

I'm a little too new to RegEx's so this is mostly asking for help with specific pattern matching and a little with how to implement them in C#.
I have a large Excel file full of, amon other things, repeated addresses that are written in different styles. Most are abbreviations of words like Avenue/etc.
For the simple ones I looked up the string.replace() function:
address.Replace("Av ", "Av. ");
And it does the trick there and for some others; but what if I want to replace the word "Ave" I run into the possibility of it being part of another word (some addresses are in Spanish so this is likely to happen). I thought about including whitespaces before and after (" ave ") but would that work if it's the first word in the string?
Or should I use a pattern like (this might be wrong too)
^[0-9a-zA-Z_#' ](Ave)\w //the word is **not** preceded by any character other than a whitespace and is followed by a whitespace
For Expressions such as those, I should use something along this pattern, right?
string replacement = "Av.";
Regex rgx = new Regex( ^[0-9a-zA-Z_#' ](Ave)\w);
string result = rgx.Replace(input, replacement);
Thanks
Regular expressions have a nifty tool for this which is the \b character class shortcut, it matches on word boundaries, so Ave\b would only match Ave followed by either a space or a dot or something else that is not a word character.
Read all about the word boundary class here: http://www.regular-expressions.info/wordboundaries.html
BTW, that site is THE place to go to to learn about regular expressions.
Also, if you were to do it in the way you try, it could be something like this: [^\w]Ave[^\s]
That literally is: Not a word character (a-z, A-Z, 0-9 or _), then Ave, then not a space character (tab, space, linebreak etc.).
Also you could use the shorthand for [^\w] and [^\s] which are \W and \S so it would then become \WAve\S
But the \b way is better.
Add the word delimiter to your regex,
Regex.Match(content, #"\b(Ave)\b");

Strip non ascii chars but allow currency symbols

I am using below regex to strip all non-ascii characters from a string.
String pattern = #"[^\u0000-\u007F]";
Regex rx = new Regex(pattern, RegexOptions.Compiled);
rx.Replace(data," ");
However, i want to allow use of curreny (pound symbol) and trademark symbols.
I have modified above regex as shown below & it works for me. Can anyone just confirm if the regex is valid ?
String pattern = #"[^\u0000-\u007F \p{Sc}]";
Basically, I want to allow all currency symbols too.
Yes, your regex is correct.
What you are doing with your code is replacing the characters matched by your regular expressions by an empty character.
Now, what characters does your regular expression match?
Anything except:
The range you specified: 0000-007F
Currency symbol characters: \p{Sc}. See http://regular-expressions.info/unicode.html#prop
If you just want to keep allowing some other characters, yes, you can add them too (exactly like you did with \p{Sc}.
Edit:
Be careful when doing it in the future. The regex would really be [^\u0000-\u007F\p{Sc}] (no space), although in this case it doesn't matter since the space character was already in the ASCII range.

Is there a lowercase expression for .Net regular expressions?

Perl has the \u operator to lowercase a match when using string replacement and regular expressions. Does .Net have anything similar? For example, uppercase all words that start with a <
s/<\(\w*\)/<\U\1/
The way to do these kind of things in .NET is using the MatchEvaluator parameter:
string pattern = #"<(\w*)";
string replaced = Regex.Replace(line, pattern,
x => "<" + x.Groups[1].ToString().ToUpper());
This reads: Whenever you find the regular expression, replace it with the first group uppercased.
You've got some errors in your Perl code. In both Perl and .NET regexes, \( and \) match the literal characters, ( and ); to use parentheses as grouping operators, leave the backslashes off. Also, \u does not lowercase a match, it titlecases (usually the same as uppercasing) the next character. What you're thinking of is \L, which lowercases all characters until the end of the string or \E, whichever comes first.
In Perl, \U, \L and such aren't really a regex feature, they're a string feature, like the more common escape sequences: \n, \t, etc.. They're listed redundantly in the regex docs because they're especially useful in regex substitutions. C# has no equivalent for them, either in string literals or the regex classes, but as #steinar pointed out, it does have MatchEvaluator and (since .NET 3.0) lambda expressions:
string s = "ABC<XYZ!";
Console.WriteLine(Regex.Replace(s, #"<(\w+)", m => m.Value.ToLower()));
output:
ABC<xyz!
edit: The parentheses aren't really necessary in my example, but I left them in to demonstrate their proper use as grouping operators. I also changed the original \w* to \w+; there's no point matching zero word characters when your only goal is to change the case of word characters.

Regex Expressions for all non alphanumeric symbols

I am trying to make a regular expression for a string that has at least 1 non alphanumeric symbol in it
The code I am trying to use is
Regex symbolPattern = new Regex("?[!##$%^&*()_-+=[{]};:<>|./?.]");
I'm trying to match only one of !##$%^&*()_-+=[{]};:<>|./?. but it doesn't seem to be working.
If you want to match non-alphanumeric symbols then just use \W|_.
Regex pattern = new Regex(#"\W|_");
This will match anything except 0-9 and a-z. Information on the \W character class and others available here (c# Regex Cheet Sheet).
https://www.mikesdotnetting.com/article/46/c-regular-expressions-cheat-sheet
You could also avoid regular expressions if you want:
return s.Any(c => !char.IsLetterOrDigit(c))
Can you check for the opposite condition?
Match match = Regex.Match(#"^([a-zA-Z0-9]+)$");
if (!match.Success) {
// it's alphanumeric
} else {
// it has one of those characters in it.
}
I didn't get your entire question, but this regex will match those strings that contains at least one non alphanumeric character. That includes whitespace (couldn't see that in your list though)
[^\w]+
Your regex just needs little tweaking. The hyphen is used to form ranges like A-Z, so if you want to match a literal hyphen, you either have to escape it with a backslash or move it to the end of the list. You also need to escape the square brackets because they're the delimiters for character class. Then get rid of that question mark at the beginning and you're in business.
Regex symbolPattern = new Regex(#"[!##$%^&*()_+=\[{\]};:<>|./?,-]");
If you only want to match ASCII punctuation characters, this is probably the simplest way. \W matches whitespace and control characters in addition to punctuation, and it matches them from the entire Unicode range, not just ASCII.
You seem to be missing a few characters, though: the backslash, apostrophe and quotation mark. Adding those gives you:
#"[!##$%^&*()_+=\[{\]};:<>|./?,\\'""-]"
Finally, it's a good idea to always use C#'s verbatim string literals (#"...") for regexes; it saves you a lot of hassle with backslashes. Quotation marks are escaped by doubling them.

.NET RegEx for letters and spaces

I am trying to create a regular expression in C# that allows only alphanumeric characters and spaces. Currently, I am trying the following:
string pattern = #"^\w+$";
Regex regex = new Regex(pattern);
if (regex.IsMatch(value) == false)
{
// Display error
}
What am I doing wrong?
If you just need English, try this regex:
"^[A-Za-z ]+$"
The brackets specify a set of characters
A-Z: All capital letters
a-z: All lowercase letters
' ': Spaces
If you need unicode / internationalization, you can try this regex:
#"$[\\p{L}\\s]+$"
See https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-classes-in-regular-expressions#word-character-w
This regex will match all unicode letters and spaces, which may be more than you need, so if you just need English / basic Roman letters, the first regex will be simpler and faster to execute.
Note that for both regex I have included the ^ and $ operator which mean match at start and end. If you need to pull this out of a string and it doesn't need to be the entire string, you can remove those two operators.
try this for all letter with space :
#"[\p{L} ]+$"
The character class \w does not match spaces. Try replacing it with [\w ] (there's a space after the \w to match word characters and spaces. You could also replace the space with \s if you want to match any whitespace.
If, other then 0-9, a-z and A-Z, you also need to cover any accented letters like ï, é, æ, Ć or Ş then you should better use the Unicode properties \p{...} for matching, i.e. (note the space):
string pattern = #"^[\p{IsLetter}\p{IsDigit} ]+$";
This regex works great for me.
Regex rgx = new Regex("[^a-zA-Z0-9_ ]+");
if (rgx.IsMatch(yourstring))
{
var err = "Special charactes are not allowed in Tags";
}

Categories

Resources