Regex. Check string is in these characters always true - c#

I've searched quite a bit but can't work out why my regex is always returning true.
I need to validate that a whole string contains only numbers, letters, spaces - and _
I have the ^ and $ to match from the start to the end and the + so it's at least one character.
But it always returns true when I test it with #[]<>/., and so on.
Regex rg = new Regex(#"^[a-zA-Z0-9 -_]+$");
return rg.IsMatch(strToCheck);

You need to escape the hyphen since it is at that position inside of the character class.
Regex rg = new Regex(#"^[a-zA-Z0-9 \-_]+$");
Note: Inside of a character class the hyphen has special meaning. You can place it as the first or last character of the class. In some regex implementations, you can also place directly after a range. If you place the hyphen anywhere else you need to precede it with a backslash in order to add it to your character class.

It's because of - symbol present at the middle of the character class. - in the middle acts as a range operator. ie, it allows all characters which falls within the range from space to _ . To avoid - acts as a range operator, you need to put the - symbol at the first or at the last inside the character class or escape it.
#"^[a-zA-Z0-9 _-]+$"
OR
#"^[-a-zA-Z0-9 _]+$

Related

How to match string by using regular expression which will not allow same special character at same time?

I m trying to matching a string which will not allow same special character at same time
my regular expression is:
[RegularExpression(#"^+[a-zA-Z0-9]+[a-zA-Z0-9.&' '-]+[a-zA-Z0-9]$")]
this solve my all requirement except the below two issues
this is my string : bracks
acceptable :
bra-cks, b-r-a-c-ks, b.r.a.c.ks, bra cks (by the way above regular expression solved this)
not acceptable:
issue 1: b.. or bra..cks, b..racks, bra...cks (two or more any special character together),
issue 2: bra cks (two ore more white space together)
You can use a negative lookahead to invalidate strings containing two consecutive special characters:
^(?!.*[.&' -]{2})[a-zA-Z0-9.&' -]+$
Demo: https://regex101.com/r/7j14bu/1
The goal
From what i can tell by your description and pattern, you are trying to match text, which start and end with alphanumeric (due to ^+[a-zA-Z0-9] and [a-zA-Z0-9]$ inyour original pattern), and inside, you just don't want to have any two consecuive (adjacent) special characters, which, again, guessing from the regex, are . & ' -
What was wrong
^+ - i think here you wanted to assure that match starts at the beginning of the line/string, so you don't need + here
[a-zA-Z0-9.&' '-] - in this character class you doubled ' which is totally unnecessary
Solution
Please try pattern
^[a-zA-Z0-9](?:(?![.& '-]{2,})[a-zA-Z0-9.& '-])*[a-zA-Z0-9]$
Pattern explanation
^ - anchor, match the beginning of the string
[a-zA-Z0-9] - character class, match one of the characters inside []
(?:...) - non capturing group
(?!...) - negative lookahead
[.& '-]{2,} - match 2 or more of characters inside character class
[a-zA-Z0-9.& '-] - character class, match one of the characters inside []
* - match zero or more text matching preceeding pattern
$ - anchor, match the end of the string
Regex demo
Some remarks on your current regex:
It looks like you placed the + quantifiers before the pattern you wanted to quantify, instead of after. For instance, ^+ doesn't make much sense, since ^ is just the start of the input, and most regex engines would not even allow that.
The pattern [a-zA-Z0-9.&' '-]+ doesn't distinguish between alphanumerical and other characters, while you want the rules for them to be different. Especially for the other characters you don't want them to repeat, so that + is not desired for those.
In a character class it doesn't make sense to repeat the same character, like you have a repeat of a quote ('). Maybe you wanted to somehow delimit the space, but realise that those quotes are interpreted literally. So probably you should just remove them. Or if you intended to allow for a quote, only list it once.
Here is a correction (add the quote if you still need it):
^[a-zA-Z0-9]+(?:[.& -][a-zA-Z0-9]+)*$
Follow-up
Based on a comment, I suspect you would allow a non-alphanumerical character to be surrounded by single spaces, even if that gives a sequence of more than one non-alphanumerical character. In that case use this:
^[a-zA-Z0-9]+(?:(?:[ ]|[ ]?[.&-][ ]?)[a-zA-Z0-9]+)*$
So here the space gets a different role: it can optionally occur before and after a delimiter (one of ".&-"), or it can occur on its own. The brackets around the spaces are not needed, but I used them to stress that the space is intended and not a typo.

Why is giving me true the regular expression [^%()*+-\/=?#[\\]ªº´`¿'.]* with the comma (,)?

I have a problem with that regular expression [^%()*+-\/=?#[\\]ªº´¿'.]*` .
I want to avoid the characters inside. the regular expression it is working but when I set something like DAVID, SC I can save the form because it has a comma but this character it is not inside the regular expression.
Could you help me please?
You are not accounting for the special meaning of - inside a character class [.....].
You must either place the dash at the very end, or else escape it with a backslash:
[^%()*+\/=?#\[\]ªº´¿'.-]*
In your original regex, +-\/ disallows any characters between + and / in the ASCII table; these are the comma, dot and dash. Your example input contains a comma so the regex did not match all of the input at once.
I have also fixed the escaping for the [] characters from [\\] to \[\], which I presume was a mistake.
Because you're using * in [^%()*+\/=?#[\\]ªº´¿'.-]* with line start/end anchors. * means match 0 or more of preceding group/pattern in character class and your regex can even match an empty string.
Use this regex:
^[^%()*+\/=?#[\\-]ªº´¿'.]+$
PS: Hyphen - should be either or first OR at last position in character class to avoid escaping.
Rubular Demo

Regular Expression C# IsMatch()

I try to use regular expression to check if a string contains only: 0-9, A-Z, a-z, \, / or -.
I used Regex validator = new Regex(#"[0-9a-zA-Z\-/]*"); and no matter what string I introduce is valid.
The check look like this: if(!validator.IsMatch(myString))
What's wrong?
If I understand what you want. I believe your pattern should be
new Regex(#"^[0-9a-zA-Z\\\-/]*$");
The ^ and $ symbols are anchors that match the beginning and end of the string, respectively. Without those, the pattern would match any strings that contain any character in that class. With them, it matches strings that only contain characters in that class.
You also specified you wanted to include backslash characters, but the original pattern had \- in the character class. This is simply an escape sequence for the hyphen within the character class. To also include backslash in the character class you need to specify that separately (escaped as well). So the resulting character class has \\ (backslash) followed by \- (hyphen).
Now, this will still match empty strings because * means "zero-or-more". if you want to only match non-empty strings use:
new Regex(#"^[0-9a-zA-Z\\\-/]+$");
The + means "one-or-more".
Use + instead of *
new Regex(#"[0-9a-zA-Z\-/]+");
If I write a Regex of the form
"[some character class]*"
it will match every string. Every string contains 0 to many of a character class.
Perhaps you wanted to use
new Regex(#"[0-9a-zA-Z\-/]+")
to specify 1 to many of your character class.

Are these the proper regex expressions

I am trying to make a few regex strings to use in my syntax highlighter, this if the first time I have ever used them and I am having a deal of difficulty...
The first four are, I will have a specified character followed by any number of numbers, match it.
The best I have is "G[0-9]|G[0-9][0-9]|G[0-9][0-9][0-9]" to match either G#, G##, or G###
but I want to do G with any number of numbers after it.
The next three are, I will have a character (X,Y,Z, or P) and I want to match it if there is no letter or symbol behind it
"[X|Y|Z|P][0-9]"
These next few are harder, match "#11.11=11.11" where 1 is a number and there can be any number of numbers between the pound sign, the periods, and the equal sign. And the periods do not have to be there can also be "#11=11" or " #1.1=11" or "#11=1.1"
I have no idea... "#[0-9][ |.] ..."
Anything after a " ' " and between a newline
'[A-Za-z0-9]\n" but I know this only gives me one character...
And the easy one I think is anything between two () or []
"(*) | [*]"?
Quick and dirty, but tested using regexpal
1) G[0-9]{1-3} - the '{1-3}' specifies the last symbol to occur one to three times.
2) ((.*|)) - you put a '\' before the '(' and ')' as escape characters
3) [0-9]1*(.|)1*=1*(.|)1 - this matches your three examples
4) \'.*\n - I think this should work... '\n' represents a new line char right?
5) ((|[).*()|]) - this one has those escape characters again
Again...quick and dirty. Regexpal.com is your friend
1> G[0-9]{1,3}
2> No, it's WRONG.
The correct one is [XYZ][0-9]
(you do not use an OR operator (|), but just write the characters side by side within square braces)
You should really look up how to use regexes. Having said that:
I will have a specified character followed by any number of numbers, match it
G\d+
I will have a character (X,Y,Z, or P) and I want to match it if there
is no letter or symbol behind it
(?<!\w)[XYZP][0-9]
These next few are harder, make "#11.11=11.11" blue
Huh?
Anything after a " ' " and between a newline
'(.+?)\n
And the easy one I think is anything between two () or []
\(.+?\)|\[.+?\]
And the easy one I think is anything between two () or []
"(*) | [*]"?
#"\([^(]*\)" and #"\[[^\[]*\]"
It means: an open bracket - then any number of characters which are not an open bracket - and a close bracket.
Slashes are required to indicate to the regex engine that brackets should be treated literally.
# - verbatim string - is to inform C#, in turn, that those slashes should be taken literally and not as C# escape characters.
Anything after a " ' " and between a newline
Similarly: #"'[^']*\n"
G\d+
[XYZP](?=\d)
#(\d+(\.\d+)?)=(\d+(\.\d+)?)
'.*?\n
\(.*?\)|\[.*?\]
Regex explanation here.
The first one:
G[0-9]+
In regular expressions + means at least 1 or more (repetitions of the previous "characters").
You could also use * for none or any number of repetitions.
The second might be something like this:
^(X|Y|Z|P)$
This actually matches only if it's at the beginning of a line and has no characters behind. If you want it to be anywhere and only exclude certain characters behind it, modify the following:
[XYZP][^0-9a-z]
This is X or Y or Z or P followed by NOT 0-9 and NOT a-z
Notice that I use the OR character | in the first example in brackets, but not in the square brackets.
For the third one:
#[0-9]+\.[0-9]+=[0-9]+\.[0-9]+
Might not be 100 percent correct, I always confuse when to escape which characters. You might need to escape # and =.
For the last one:
(\(.*\)|\[.*\])
For the first one you can use this Regex :
^G\d+
For G with any number of digits after it
\b([Gg]\d+)\b
This matches a wordboundary (\b) followed by a lower or upper G [Gg], followed by 1 or more (+) digits (\d), followed by a wordboundary (\b)
The next three are, I will have a character (X,Y,Z, or P) and I want
to match it if there is no letter or symbol behind it
This is a little tougher
\b[XYZP]([\W]|_)
This matches an XYZ or P followed by a non-word character \W, (word characters are typically a-z, 0-9 and the underscore), so after saying we don't want a word character, we add in that the _ is allowed.
I use perl for regex, but it should hopefully be the same as what you're looking for.
For the first one, G[0-9]+ should work. The square brackets means that the regex looks for only one of the characters within the brackets (the characters being 0 through 9) and the + right after it means that it looks for one or more matches.
The second is a bit more tricky, but I would use \s[XYPZ]. The square brackets function the same as before, only matching one of X, Y, P or Z. Also the \s matches any whitespace character (tab, space, newline, etc.).
For the third one, I would try #[0-9]+\.?[0-9]+=[0-9]+\.?[0-9]+. If we go from left to right, we encounter \.? and it's new. \. matches a literal period (you have to escape it with the backslash, as just a period by itself means that it can match one of any character). The question mark means that the period can either be there or not (matches zero or one instance of a period).
The fourth one: '.*\n. The combination of the period by itself and the asterisk means that it'll match zero or more characters, the characters being anything at all. I'm not too sure if you need to escape the single quotes though.
And for the fifth one, (\(.*\)|\[.*\]) should do the trick. You need to escape the []() inside the brackets because they mean something by themselves. Also, the | means or, so the regex can either matches whatever is on the left side of the bar, or on the right.
You can specify repetitions in different ways. A star "*" after a term means, repeat the term zero, one or several times. A plus sign "+" means, repeat the term one or several times. You can also specify a number range with {n,m}. In your case the expression would be
G\d{1-3}
where \d is a digit.
With this expression you can match a position that does not preceed a suffix
find(?!suffix)
I am not sure what you mean by symbol
[XYZP](?![a-zA-Z specify your symbols here])
For the pound number
\#\d+(\.\d+)?=\d+(\.\d+)?
\# the pound sign
\d+ at least one digit
(\.\d+)? optionally (?) a period succeeded by at least one digit
finally an equal sign succeeded by another number
Everything between "'" and \n. Use this pattern here, which finds a position between a prefix and a suffix.
(?<=prefix)find(?=suffix)
(?<=').*(?=\n)
.* means any character as many times as possible. Alternatively you could use
(?<=').*?(?=\n)
.* means any character as few times as possible, if too many \n are taken. Also be carefult with the RegexOption.Multiline. Depending on its setting you will have to test for the end of line with $ instead of \n.
For the parentheses () or [] you can use the same pattern again
(?<=prefix)find(?=suffix)
(?<=\().*?(?=\))|(?<=\[).*?(?=])
where | is the alternative.

A regular expression for matching a simple word in C#?

i need a regular expression to match only the word's that match the following conditions. I am using it in my C# program
Can be any case
Should not have any numbers
may contain - and ' characters, but are optional
Should start with a letter
I have tried using the expression ^([a-zA-Z][\'\-]?)+$ but it doesn't work.
Here are list of few words that are acceptable
London (Case insensitive)
Jackson's
non-profit
Here are a list of few words that are not acceptable
12london (contains a number and is not started by a alphabet)
-to (does not start with a alphabet)
to: (contains : character, any special character other that - and ' is not allowed)
^[a-zA-Z][-'a-zA-Z]*$
This matches any word that starts with an alphabetical character, followed by any number of alphabetical characters, - or '.
Note that you don't need to escape the - and ' when it's inside the character [] class, as long as the dash is either the first or last character in the sequence.
Note also that I've removed the round brackets from your example - if you don't want to capture the input, you'll get better performance by leaving them out.
Try this one:
^[A-Za-z]+[A-Za-z'-]*$
First of all, try your regexes against tools such as http://www.regextester.com/
You are testing strings that both start with AND end with your pattern (^ means start of line, $ is the end), thus leaving out all of the words contained between two spaces.
You should use \b or \B.
Instead of looking for [a-zA-Z] you can use character classes such as '\D' (not digit).
Let me know if the above is working in your scenario.
\b\D[^\c][a-zA-Z]+[^\c]
It says: word boundaries with no digits, no control characters, one or more alphabetical lower or uppercase character, with no following control characters.

Categories

Resources