I am trying to make a few regex strings to use in my syntax highlighter, this if the first time I have ever used them and I am having a deal of difficulty...
The first four are, I will have a specified character followed by any number of numbers, match it.
The best I have is "G[0-9]|G[0-9][0-9]|G[0-9][0-9][0-9]" to match either G#, G##, or G###
but I want to do G with any number of numbers after it.
The next three are, I will have a character (X,Y,Z, or P) and I want to match it if there is no letter or symbol behind it
"[X|Y|Z|P][0-9]"
These next few are harder, match "#11.11=11.11" where 1 is a number and there can be any number of numbers between the pound sign, the periods, and the equal sign. And the periods do not have to be there can also be "#11=11" or " #1.1=11" or "#11=1.1"
I have no idea... "#[0-9][ |.] ..."
Anything after a " ' " and between a newline
'[A-Za-z0-9]\n" but I know this only gives me one character...
And the easy one I think is anything between two () or []
"(*) | [*]"?
Quick and dirty, but tested using regexpal
1) G[0-9]{1-3} - the '{1-3}' specifies the last symbol to occur one to three times.
2) ((.*|)) - you put a '\' before the '(' and ')' as escape characters
3) [0-9]1*(.|)1*=1*(.|)1 - this matches your three examples
4) \'.*\n - I think this should work... '\n' represents a new line char right?
5) ((|[).*()|]) - this one has those escape characters again
Again...quick and dirty. Regexpal.com is your friend
1> G[0-9]{1,3}
2> No, it's WRONG.
The correct one is [XYZ][0-9]
(you do not use an OR operator (|), but just write the characters side by side within square braces)
You should really look up how to use regexes. Having said that:
I will have a specified character followed by any number of numbers, match it
G\d+
I will have a character (X,Y,Z, or P) and I want to match it if there
is no letter or symbol behind it
(?<!\w)[XYZP][0-9]
These next few are harder, make "#11.11=11.11" blue
Huh?
Anything after a " ' " and between a newline
'(.+?)\n
And the easy one I think is anything between two () or []
\(.+?\)|\[.+?\]
And the easy one I think is anything between two () or []
"(*) | [*]"?
#"\([^(]*\)" and #"\[[^\[]*\]"
It means: an open bracket - then any number of characters which are not an open bracket - and a close bracket.
Slashes are required to indicate to the regex engine that brackets should be treated literally.
# - verbatim string - is to inform C#, in turn, that those slashes should be taken literally and not as C# escape characters.
Anything after a " ' " and between a newline
Similarly: #"'[^']*\n"
G\d+
[XYZP](?=\d)
#(\d+(\.\d+)?)=(\d+(\.\d+)?)
'.*?\n
\(.*?\)|\[.*?\]
Regex explanation here.
The first one:
G[0-9]+
In regular expressions + means at least 1 or more (repetitions of the previous "characters").
You could also use * for none or any number of repetitions.
The second might be something like this:
^(X|Y|Z|P)$
This actually matches only if it's at the beginning of a line and has no characters behind. If you want it to be anywhere and only exclude certain characters behind it, modify the following:
[XYZP][^0-9a-z]
This is X or Y or Z or P followed by NOT 0-9 and NOT a-z
Notice that I use the OR character | in the first example in brackets, but not in the square brackets.
For the third one:
#[0-9]+\.[0-9]+=[0-9]+\.[0-9]+
Might not be 100 percent correct, I always confuse when to escape which characters. You might need to escape # and =.
For the last one:
(\(.*\)|\[.*\])
For the first one you can use this Regex :
^G\d+
For G with any number of digits after it
\b([Gg]\d+)\b
This matches a wordboundary (\b) followed by a lower or upper G [Gg], followed by 1 or more (+) digits (\d), followed by a wordboundary (\b)
The next three are, I will have a character (X,Y,Z, or P) and I want
to match it if there is no letter or symbol behind it
This is a little tougher
\b[XYZP]([\W]|_)
This matches an XYZ or P followed by a non-word character \W, (word characters are typically a-z, 0-9 and the underscore), so after saying we don't want a word character, we add in that the _ is allowed.
I use perl for regex, but it should hopefully be the same as what you're looking for.
For the first one, G[0-9]+ should work. The square brackets means that the regex looks for only one of the characters within the brackets (the characters being 0 through 9) and the + right after it means that it looks for one or more matches.
The second is a bit more tricky, but I would use \s[XYPZ]. The square brackets function the same as before, only matching one of X, Y, P or Z. Also the \s matches any whitespace character (tab, space, newline, etc.).
For the third one, I would try #[0-9]+\.?[0-9]+=[0-9]+\.?[0-9]+. If we go from left to right, we encounter \.? and it's new. \. matches a literal period (you have to escape it with the backslash, as just a period by itself means that it can match one of any character). The question mark means that the period can either be there or not (matches zero or one instance of a period).
The fourth one: '.*\n. The combination of the period by itself and the asterisk means that it'll match zero or more characters, the characters being anything at all. I'm not too sure if you need to escape the single quotes though.
And for the fifth one, (\(.*\)|\[.*\]) should do the trick. You need to escape the []() inside the brackets because they mean something by themselves. Also, the | means or, so the regex can either matches whatever is on the left side of the bar, or on the right.
You can specify repetitions in different ways. A star "*" after a term means, repeat the term zero, one or several times. A plus sign "+" means, repeat the term one or several times. You can also specify a number range with {n,m}. In your case the expression would be
G\d{1-3}
where \d is a digit.
With this expression you can match a position that does not preceed a suffix
find(?!suffix)
I am not sure what you mean by symbol
[XYZP](?![a-zA-Z specify your symbols here])
For the pound number
\#\d+(\.\d+)?=\d+(\.\d+)?
\# the pound sign
\d+ at least one digit
(\.\d+)? optionally (?) a period succeeded by at least one digit
finally an equal sign succeeded by another number
Everything between "'" and \n. Use this pattern here, which finds a position between a prefix and a suffix.
(?<=prefix)find(?=suffix)
(?<=').*(?=\n)
.* means any character as many times as possible. Alternatively you could use
(?<=').*?(?=\n)
.* means any character as few times as possible, if too many \n are taken. Also be carefult with the RegexOption.Multiline. Depending on its setting you will have to test for the end of line with $ instead of \n.
For the parentheses () or [] you can use the same pattern again
(?<=prefix)find(?=suffix)
(?<=\().*?(?=\))|(?<=\[).*?(?=])
where | is the alternative.
Related
I m trying to matching a string which will not allow same special character at same time
my regular expression is:
[RegularExpression(#"^+[a-zA-Z0-9]+[a-zA-Z0-9.&' '-]+[a-zA-Z0-9]$")]
this solve my all requirement except the below two issues
this is my string : bracks
acceptable :
bra-cks, b-r-a-c-ks, b.r.a.c.ks, bra cks (by the way above regular expression solved this)
not acceptable:
issue 1: b.. or bra..cks, b..racks, bra...cks (two or more any special character together),
issue 2: bra cks (two ore more white space together)
You can use a negative lookahead to invalidate strings containing two consecutive special characters:
^(?!.*[.&' -]{2})[a-zA-Z0-9.&' -]+$
Demo: https://regex101.com/r/7j14bu/1
The goal
From what i can tell by your description and pattern, you are trying to match text, which start and end with alphanumeric (due to ^+[a-zA-Z0-9] and [a-zA-Z0-9]$ inyour original pattern), and inside, you just don't want to have any two consecuive (adjacent) special characters, which, again, guessing from the regex, are . & ' -
What was wrong
^+ - i think here you wanted to assure that match starts at the beginning of the line/string, so you don't need + here
[a-zA-Z0-9.&' '-] - in this character class you doubled ' which is totally unnecessary
Solution
Please try pattern
^[a-zA-Z0-9](?:(?![.& '-]{2,})[a-zA-Z0-9.& '-])*[a-zA-Z0-9]$
Pattern explanation
^ - anchor, match the beginning of the string
[a-zA-Z0-9] - character class, match one of the characters inside []
(?:...) - non capturing group
(?!...) - negative lookahead
[.& '-]{2,} - match 2 or more of characters inside character class
[a-zA-Z0-9.& '-] - character class, match one of the characters inside []
* - match zero or more text matching preceeding pattern
$ - anchor, match the end of the string
Regex demo
Some remarks on your current regex:
It looks like you placed the + quantifiers before the pattern you wanted to quantify, instead of after. For instance, ^+ doesn't make much sense, since ^ is just the start of the input, and most regex engines would not even allow that.
The pattern [a-zA-Z0-9.&' '-]+ doesn't distinguish between alphanumerical and other characters, while you want the rules for them to be different. Especially for the other characters you don't want them to repeat, so that + is not desired for those.
In a character class it doesn't make sense to repeat the same character, like you have a repeat of a quote ('). Maybe you wanted to somehow delimit the space, but realise that those quotes are interpreted literally. So probably you should just remove them. Or if you intended to allow for a quote, only list it once.
Here is a correction (add the quote if you still need it):
^[a-zA-Z0-9]+(?:[.& -][a-zA-Z0-9]+)*$
Follow-up
Based on a comment, I suspect you would allow a non-alphanumerical character to be surrounded by single spaces, even if that gives a sequence of more than one non-alphanumerical character. In that case use this:
^[a-zA-Z0-9]+(?:(?:[ ]|[ ]?[.&-][ ]?)[a-zA-Z0-9]+)*$
So here the space gets a different role: it can optionally occur before and after a delimiter (one of ".&-"), or it can occur on its own. The brackets around the spaces are not needed, but I used them to stress that the space is intended and not a typo.
I am quite new to regex thing and need regex for first name which satisfies following conditions:
First Name must contain letters only. It may contain spaces, hyphens, or apostrophes.
It must begin with letters.
All other characters and numbers are not valid.
Special characters ‘ and – cannot be together (e.g. John’-s is not allowed)
An alphabet should be present before and after the special characters ‘ and – (e.g. John ‘s is not allowed)
Two consecutive spaces are not allowed (e.g. Annia St is not allowed)
Can anyone help? I tried this ^([a-z]+['-]?[ ]?|[a-z]+['-]?)*?[a-z]$ but it's not working as expected.
Regexes are notoriously difficult to write and maintain.
One technique that I've used over the years is to annotate my regexes by using named capture groups. It's not perfect, but can greatly help with the readability and maintainability of your regex.
Here is a regex that meets your requirements.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
It is split down into the following parts:
1) (?<firstchar>(?=[A-Za-z])) This ensures the first character is an alpha character, upper or lowercase.
2) (?<alphachars>[A-Za-z]) We allow more alpha chars.
3) (?<specialchars>[A-Za-z]['-](?=[A-Za-z])) We allow special characters, but only with an alpha character before and after.
4) (?<spaces> (?=[A-Za-z])) We allow spaces, but only one space, which must be followed by alpha characters.
You should use a testing tool when writing regexes, I'd recommend https://regex101.com/
You can see from the screenshot below how this regex performs.
Take the regex I've given you, run it in https://regex101.com/ with samples you'd like to match against, and tweak it to fit your requirements. Hopefully I've given you enough information to be self sufficient in customising it to your needs.
You can use this link to run the regex https://regex101.com/r/O2wFfi/1/
Edit
I've updated to address the issue in your comment, rather than just give you the code, I will explain the problem and how I fixed it.
For your example "Sam D'Joe", if we run the original regex, the following happens.
^(?<firstchar>[A-Za-z])((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-][A-Za-z])|(?<spaces> [A-Za-z]))*$
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
Matches consume the characters that they match
This is where we run into a problem. Our "specialchars" part of the regex matches an alpha char, our special char and then another alpha char ((?<specialchars>[A-Za-z]['-](?=[A-Za-z]))).
The thing you need to know about regexes, is each time you match a character, that character is then consumed. We've already matched the alpha char before the special character, so our regex will never match.
Each step actually looks like this:
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
and then we're left with the following
We cannot match this, because one of our rules is "An alphabet should be present before and after the special characters ‘ and –".
Lookahead
Regex has a concept called "lookahead". A lookahead allows you to match a character without consuming it!
The syntax for a lookahead is ?= followed by what you want to match. E.g. ?=[A-Z] would look ahead for a single character that is an uppercase letter.
We can fix our regex, by using lookaheads.
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) We now change our "spaces" regex, to lookahead to the alpha char, so we don't consume it. We change (?<spaces> [A-Za-z]) to (?<spaces> ?=[A-Za-z]). This matches the space and looks ahead to the subsequent alpha char, but doesn't consume it.
5) (?<specialchars>[A-Za-z]['-][A-Za-z]) matches the alpha char, the special char, and the subsequent alpha char.
6) We use a wildcard to repeat matching our previous 3 rules multiple times, and we match until the end of the line.
I also added lookaheads to the "firstchar", "specialchars" and "spaces" capture groups, I've bolded the changes below.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
This short regex should do it ^([a-zA-Z]+?)([-\s'][a-zA-Z]+)*?$ ,
([a-zA-Z]+?) - Means the String should start with alphabets.
([-\s'][a-zA-Z]+)*? - Means the string must have hyphen,space or apostrophe followed by alphabets.
^ and $ - denote start and end of string
Here's the link to regex demo.
Try this one
^[^- '](?=(?![A-Z]?[A-Z]))(?=(?![a-z]+[A-Z]))(?=(?!.*[A-Z][A-Z]))(?=(?!.*[- '][- '.]))(?=(?!.*[.][-'.]))[A-Za-z- '.]{2,}$
Demo
I'm trying to modify a fairly basic regex pattern in C# that tests for phone numbers.
The patterns is -
[0-9]+(\.[0-9][0-9]?)?
I have two questions -
1) The existing expression does work (although it is fairly restrictive) but I can't quite understand how it works. Regexps for similar issues seem to look more like this one -
/^[0-9()]+$/
2) How could I extend this pattern to allow brackets, periods and a single space to separate numbers. I tried a few variations to include -
[0-9().+\s?](\.[0-9][0-9]?)?
Although i can't seem to create a valid pattern.
Any help would be much appreciated.
Thanks,
[0-9]+(\.[0-9][0-9]?)?
First of all, I recommend checking out either regexr.com or regex101.com, so you yourself get an understanding of how regex works. Both websites will give you a step-by-step explanation of what each symbol in the regex does.
Now, one of the main things you have to understand is that regex has special characters. This includes, among others, the following: []().-+*?\^$. So, if you want your regex to match a literal ., for example, you would have to escape it, since it's a special character. To do so, either use \. or [.]. Backslashes serve to escape other characters, while [] means "match any one of the characters in this set". Some special characters don't have a special meaning inside these brackets and don't require escaping.
Therefore, the regex above will match any combination of digits of length 1 or more, followed by an optional suffix (foobar)?, which has to be a dot, followed by one or two digits. In fact, this regex seems more like it's supposed to match decimal numbers with up to two digits behind the dot - not phone numbers.
/^[0-9()]+$/
What this does is pretty simple - match any combination of digits or round brackets that has the length 1 or greater.
[0-9().+\s?](\.[0-9][0-9]?)?
What you're matching here is:
one of: a digit, round bracket, dot, plus sign, whitespace or question mark; but exactly once only!
optionally followed by a dot and one or two digits
A suitable regex for your purpose could be:
(\+\d{2})?((\(0\)\d{2,3})|\d{2,3})?\d+
Enter this in one of the websites mentioned above to understand how it works. I modified it a little to also allow, for example +49 123 4567890.
Also, for simplicity, I didn't include spaces - so when using this regex, you have to remove all the spaces in your input first. In C#, that should be possible with yourString.Replace(" ", ""); (simply replacing all spaces with nothing = deleting spaces)
The + after the character set is a quantifier (meaning the preceeding character, character set or group is repeated) at least one, and unlimited number of times and it's greedy (matched the most possible).
Then [0-9().+\s]+ will match any character in set one or more times.
I am creating regex for multiline string pattern but it's not work. this is my input pattern.
FXP/R,U
1.NWAMNKPA/UGONMA D 2.NWAMNKPA/AMAJINDI O
3.NWAMNKPA/AMAJINDI N A 4.NWAMNKPA/ADAUGOAMAJI C
5.NWAMNKPA/CHINAZAEKPERE N
Regular expression:
(FXP\S{3,20})|(\r\s{3}.\S+(.+))
but it's not take this line:
3.NWAMNKPA/AMAJINDI N A 4.NWAMNKPA/ADAUGOAMAJI C
it's take only this two only :
1.NWAMNKPA/UGONMA D 2.NWAMNKPA/AMAJINDI O
5.NWAMNKPA/CHINAZAEKPERE N
Desired o/p :-
NWAMNKPA/UGONMA D
NWAMNKPA/AMAJINDI O
NWAMNKPA/AMAJINDI N A
NWAMNKPA/ADAUGOAMAJI C
NWAMNKPA/CHINAZAEKPERE N
You can look into RegexOptions.MultiLine (and other options). (http://msdn.microsoft.com/en-us/library/yd1hzczs.aspx)
I would advise you to use String.Split() instead and validate a line at a time. Regular expressions are hard readable and there is no need to match a pattern over more lines. It makes you code easier to understand.
I don't think your regular expression is doing what you think it's doing. The first part is ok, but the second part, \r\s{3}.\S+(.+), is looking for a carriage return, followed by exactly three whitespace characters, followed by any one character (whitespace or not), followed by any number of non-whitespace characters, followed by any number of characters which you capture.
There are a number of issues with this. First of all, not all text has carriage returns (\r) - checking for a newline (\n) instead is much safer. Even if your text does have \r, there's almost certainly going to be a \n afterwards (Windows ends lines with \r\n). The \n might be absorbed into the \s{3}, depending on your data, though.
Secondly, + is a greedy operator. That means that the first + in \S+(.+) will match everything it can - in other words, all non-whitespace characters until it reaches a whitespace. Only after finding a whitespace will the (.+) start capturing, and the first character it has will be whitespace. Alternatively, if there is no whitespace left in the string, the \S+ will "give back" one character so that the .+ has something to match, in which case it will simply be the last character of the string.
All things considered, I think you're going to be much better off with something simpler, like this:
RegEx.Split(myData, #"(?=\d)").Where(s => !string.IsNullOrEmpty(s))
That will split your data every time the next character is a number.
I am trying to capture a value out of a string. The string's format should be
01+XXXX
and I want to capture XXXX using a regular expression. This is what I came up with -
01+\\s*(?<1>[.0-9]*)
But that won't work. What DOES work is -
01+\\s*(?<1>[+.0-9]*)
The only difference is adding the + into the character class. My main question is - why does the second expression work and the first expression doesn't? In the first one, I look for 01+ and the rest of it should go to [.0-9]. It seems to me that the second one wants to read + twice - is that not what its doing? I am pretty new to regular expressions so I feel like I might be missing something small.
On this site http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial it says that + is used for "Repeat one or more times". So is it trying to read 01+ more than once?
It's reading the 1 one or more times. That is, the regex 01+ matches 01 or 011 or 0111 etc.
But it doesn't match the +. If you want to match a literal +, write 01\+ or 01[+] for the regex.
The + is a special character, meaning "one or more times." In this case, it means 01, 011, 0111, etc. instead of 01+. If you want to use it literally, you need to escape it, like this: \+
Note: It looks like you are using it with strings, so you would need to double-escape: \\+
It works inside a character class ([+]) because character classes take most characters literally, with exceptions including \ and ].
'+' is a special character in regex, it means "1 or more times". So what you have written means:
The character '0'
The character '1' one or more times
Whitespace 0 or more times
etc.
If you want to match a literal plus you need to escape it:
01\+\\s*(?<1>[.0-9]*)
The + is a quantifier, as explained in the tutorial you linked. So, your regex means "match a zero, then one or more ones, then zero or more whitespaces, then ...".
The plus needs to be escaped:
01\\+\\s*(?<1>[.0-9]*)
Your second regex worked, because the + there was part of a character class and does not need to be escaped there.
01\+(?<cap>[\d.]*)
explain:
01 '01'
\+ '+'
[\d.]* any character of: digits (0-9), '.'
(0 or more times, matching the most amount possible)