Strip non ascii chars but allow currency symbols - c#

I am using below regex to strip all non-ascii characters from a string.
String pattern = #"[^\u0000-\u007F]";
Regex rx = new Regex(pattern, RegexOptions.Compiled);
rx.Replace(data," ");
However, i want to allow use of curreny (pound symbol) and trademark symbols.
I have modified above regex as shown below & it works for me. Can anyone just confirm if the regex is valid ?
String pattern = #"[^\u0000-\u007F \p{Sc}]";
Basically, I want to allow all currency symbols too.

Yes, your regex is correct.
What you are doing with your code is replacing the characters matched by your regular expressions by an empty character.
Now, what characters does your regular expression match?
Anything except:
The range you specified: 0000-007F
Currency symbol characters: \p{Sc}. See http://regular-expressions.info/unicode.html#prop
If you just want to keep allowing some other characters, yes, you can add them too (exactly like you did with \p{Sc}.
Edit:
Be careful when doing it in the future. The regex would really be [^\u0000-\u007F\p{Sc}] (no space), although in this case it doesn't matter since the space character was already in the ASCII range.

Related

Writing a proper regex to allow number and only combinations of letters and numbers mixed up

I have a string example which looks like this:
51925120851209567
The length of the string and numbers may vary, however I want to only enable the string to contain just either numbers, or for it to be a combination of letters and numbers. For example a valid one would be something like this:
B0031Y4M8S // contains combination of letters and numbers without white space
Invalid regex would be:
Does not apply // this one contains white spaces and has only letters
To summarize things up, the regex should allow only these combinations:
51925120851209567 // contains only numbers and is valid
B0031Y4M8S // contains combination of numbers and letters and is valid as well
Everything else is invalid...
The current solution that I have covers only for the string to be a set of integers and nothing else... However I'm not really sure how to filter out combination of numbers and letters without white spaces and special charachters to be valid as well for the regex?
Regex regex = new Regex("^[0-9]+$");
if (regex.IsMatch(parameter))
{
// allow if statement to pass if the regex matches
}
Can someone help me out ?
You may use
^(?![A-Za-z]+$)[0-9A-Za-z]+$
It matches 1+ alphanumeric chars but will fail a match if all string consists of just letters.
Details
^ - start of a string
(?![A-Za-z]+$) - a negative lookahead that fails the match if there are 1+ ASCII letters followed with the end of string immediately to the right of the current location
[0-9A-Za-z]+ - 1+ ASCII letters
$ - end of string.
See the regex demo.
#The fourth bird's answer will almost get you there. I'm no regex expert, but an easy way to get you what you want would be to use:
Regex regex = new Regex("^[a-zA-Z0-9]+$");
This will get you the first level of exclusion. If it passes that, then check with:
Regex regex = new Regex("^[a-zA-Z]+$");
If it matches that, then you know it's only alphabetical characters and you can skip it. I'm sure there's a better way to code golf this one out, but this should work for now if you're in a crunch.

Regex Expression Only Numbers and Characters

I created the following regex expression for my C# file. Bascily I want the user's input to only be regular characters (A-Z lower or upper) and numbers. (spaces or symbols ).
[a-zA-Z0-9]
For some reason it only fails when its a symbol on its own. if theres characters mixed with it then the expression passes.
I can show you my code of how I implment it but I think its my expression.
Thanks!
The problem is that it can match anywhere. You need anchors:
^[a-zA-Z0-9]+\z
^ matches the start of a string, and \z matches the end of a string.
(Note: in .NET regex, $ matches the end of a string with an optional newline.)
This is because it will match any character in the string you need the following.
Forces it to match the entire string not just part of it
^[0-9a-zA-Z]*$
That regex will match every single alphanumeric character in the string as separate matches.
If you want to make sure the whole string the user entered only has alphanumeric characters you need to do something like:
^[a-zA-Z0-9]+$
Are you making sure to check the whole string? That is are you using an expression like
^[a-zA-Z0-9]*$
where ^ means the start of the string and $ means the end of the string?

C# regex for assembly style hex numbers

I'm new to regex and I want to highlight hexadecimal numbers in Assembly style. Like this:
$00
$FF
$1234
($00)
($00,x)
and even hexadecimal numbers that begin with #.
So far I wrote "$[A-Fa-f0-9]+" to see if it highlights numbers beginning with $ but it doesn't. Why? And can someone help me with what I'm doing? Thanks.
Put a back slash before $ and your regex will work like so
\$[A-Fa-f0-9]+
$ is a valid regex character that matches with end of string. So if your pattern contains dollar then you need to escape it. See regex reference for details
This should cover all those cases, including the cases in which you get a # instead of a $
public Regex MyRegex = new Regex(
"^(\\()?[\\$#][0-9a-fA-F]+(,x)?(?(1)\\))[\\s]*$",
RegexOptions.Singleline
| RegexOptions.Compiled
);
The unescaped sequence for the single line: ^(\()?[\$#][0-9a-fA-F]+(,x)?(?(1)\))[\s]*$
That should validate on a per-line match.
By the way, I made this regex pretty quickly using Expresso

Regex Expressions for all non alphanumeric symbols

I am trying to make a regular expression for a string that has at least 1 non alphanumeric symbol in it
The code I am trying to use is
Regex symbolPattern = new Regex("?[!##$%^&*()_-+=[{]};:<>|./?.]");
I'm trying to match only one of !##$%^&*()_-+=[{]};:<>|./?. but it doesn't seem to be working.
If you want to match non-alphanumeric symbols then just use \W|_.
Regex pattern = new Regex(#"\W|_");
This will match anything except 0-9 and a-z. Information on the \W character class and others available here (c# Regex Cheet Sheet).
https://www.mikesdotnetting.com/article/46/c-regular-expressions-cheat-sheet
You could also avoid regular expressions if you want:
return s.Any(c => !char.IsLetterOrDigit(c))
Can you check for the opposite condition?
Match match = Regex.Match(#"^([a-zA-Z0-9]+)$");
if (!match.Success) {
// it's alphanumeric
} else {
// it has one of those characters in it.
}
I didn't get your entire question, but this regex will match those strings that contains at least one non alphanumeric character. That includes whitespace (couldn't see that in your list though)
[^\w]+
Your regex just needs little tweaking. The hyphen is used to form ranges like A-Z, so if you want to match a literal hyphen, you either have to escape it with a backslash or move it to the end of the list. You also need to escape the square brackets because they're the delimiters for character class. Then get rid of that question mark at the beginning and you're in business.
Regex symbolPattern = new Regex(#"[!##$%^&*()_+=\[{\]};:<>|./?,-]");
If you only want to match ASCII punctuation characters, this is probably the simplest way. \W matches whitespace and control characters in addition to punctuation, and it matches them from the entire Unicode range, not just ASCII.
You seem to be missing a few characters, though: the backslash, apostrophe and quotation mark. Adding those gives you:
#"[!##$%^&*()_+=\[{\]};:<>|./?,\\'""-]"
Finally, it's a good idea to always use C#'s verbatim string literals (#"...") for regexes; it saves you a lot of hassle with backslashes. Quotation marks are escaped by doubling them.

C# Regular Expression to match letters, numbers and underscore

I am trying to create a regular expression pattern in C#. The pattern can only allow for:
letters
numbers
underscores
So far I am having little luck (i'm not good at RegEx). Here is what I have tried thus far:
// Create the regular expression
string pattern = #"\w+_";
Regex regex = new Regex(pattern);
// Compare a string against the regular expression
return regex.IsMatch(stringToTest);
EDIT :
#"^[a-zA-Z0-9\_]+$"
or
#"^\w+$"
#"^\w+$"
\w matches any "word character", defined as digits, letters, and underscores. It's Unicode-aware so it'll match letters with umlauts and such (better than trying to roll your own character class like [A-Za-z0-9_] which would only match English letters).
The ^ at the beginning means "match the beginning of the string here", and the $ at the end means "match the end of the string here". Without those, e.g. if you just had #"\w+", then "##Foo##" would match, because it contains one or more word characters. With the ^ and $, then "##Foo##" would not match (which sounds like what you're looking for), because you don't have beginning-of-string followed by one-or-more-word-characters followed by end-of-string.
Try experimenting with something like http://www.weitz.de/regex-coach/ which lets you develop regex interactively.
It's designed for Perl, but helped me understand how a regex works in practice.
Regex
packedasciiRegex = new Regex(#"^[!#$%&'()*+,-./:;?#[\]^_]*$");

Categories

Resources