I am trying to create a regex to validate a string. The string could be of the following formats (to give an idea of what I am trying to do here):
145/1/3 or
748/57676/6765/454/345 or
45/234 45/235 45/236
So basically the string can contain numbers, spaces and forward slashes and the string can end with a number only. I am new at regex and have gone through many of the questions on the website. But please you have to admit that this is really confusing and difficult to master. And if someone could refer an author or any weblink that can teach regex, that would be really helpful. Thanks in advance mates!
I came up with this
^[0-9]( |[0-9]|\/)*[0-9]$
And used this to test it.
You can see it matches anything that begins (^) with a number, has zero or more (*) of either a space, a number or (|) a forward-slash (/) and ends ($) with a number.
Now that I am aware that the space and / cannot go together and multiple spaces and/or slashes are also not allowed, this RegEx is a better fit for you.
^[0-9]+([ \/][0-9]+)*$
This should work: ^[/\d\s]*\d$.
It is looking for the beginning of the string ^ , then 0 or more digits, spaces [/\d\s]* followed by a digit \d then the end of the string $.
You should use following regular expression:
(\d+(/\d+)*\s*)+
This mean: some digits (\d+) followed by optional repeating pattern of some digits and \ ((/\d+)*) followed by an optional number of whitespaces (\s*), all repeated at least once.
Try this:
^\d(\d|\s|\/)*\d$
\d = digit character (you can also use [0-9]).
\s = space character
The brackets followed by a star means to repeat a \d, \s, or / an infinite amount of times.
The final \d$ means the ending must match a digit.
Related
I am quite new to regex thing and need regex for first name which satisfies following conditions:
First Name must contain letters only. It may contain spaces, hyphens, or apostrophes.
It must begin with letters.
All other characters and numbers are not valid.
Special characters ‘ and – cannot be together (e.g. John’-s is not allowed)
An alphabet should be present before and after the special characters ‘ and – (e.g. John ‘s is not allowed)
Two consecutive spaces are not allowed (e.g. Annia St is not allowed)
Can anyone help? I tried this ^([a-z]+['-]?[ ]?|[a-z]+['-]?)*?[a-z]$ but it's not working as expected.
Regexes are notoriously difficult to write and maintain.
One technique that I've used over the years is to annotate my regexes by using named capture groups. It's not perfect, but can greatly help with the readability and maintainability of your regex.
Here is a regex that meets your requirements.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
It is split down into the following parts:
1) (?<firstchar>(?=[A-Za-z])) This ensures the first character is an alpha character, upper or lowercase.
2) (?<alphachars>[A-Za-z]) We allow more alpha chars.
3) (?<specialchars>[A-Za-z]['-](?=[A-Za-z])) We allow special characters, but only with an alpha character before and after.
4) (?<spaces> (?=[A-Za-z])) We allow spaces, but only one space, which must be followed by alpha characters.
You should use a testing tool when writing regexes, I'd recommend https://regex101.com/
You can see from the screenshot below how this regex performs.
Take the regex I've given you, run it in https://regex101.com/ with samples you'd like to match against, and tweak it to fit your requirements. Hopefully I've given you enough information to be self sufficient in customising it to your needs.
You can use this link to run the regex https://regex101.com/r/O2wFfi/1/
Edit
I've updated to address the issue in your comment, rather than just give you the code, I will explain the problem and how I fixed it.
For your example "Sam D'Joe", if we run the original regex, the following happens.
^(?<firstchar>[A-Za-z])((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-][A-Za-z])|(?<spaces> [A-Za-z]))*$
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
Matches consume the characters that they match
This is where we run into a problem. Our "specialchars" part of the regex matches an alpha char, our special char and then another alpha char ((?<specialchars>[A-Za-z]['-](?=[A-Za-z]))).
The thing you need to know about regexes, is each time you match a character, that character is then consumed. We've already matched the alpha char before the special character, so our regex will never match.
Each step actually looks like this:
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
and then we're left with the following
We cannot match this, because one of our rules is "An alphabet should be present before and after the special characters ‘ and –".
Lookahead
Regex has a concept called "lookahead". A lookahead allows you to match a character without consuming it!
The syntax for a lookahead is ?= followed by what you want to match. E.g. ?=[A-Z] would look ahead for a single character that is an uppercase letter.
We can fix our regex, by using lookaheads.
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) We now change our "spaces" regex, to lookahead to the alpha char, so we don't consume it. We change (?<spaces> [A-Za-z]) to (?<spaces> ?=[A-Za-z]). This matches the space and looks ahead to the subsequent alpha char, but doesn't consume it.
5) (?<specialchars>[A-Za-z]['-][A-Za-z]) matches the alpha char, the special char, and the subsequent alpha char.
6) We use a wildcard to repeat matching our previous 3 rules multiple times, and we match until the end of the line.
I also added lookaheads to the "firstchar", "specialchars" and "spaces" capture groups, I've bolded the changes below.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
This short regex should do it ^([a-zA-Z]+?)([-\s'][a-zA-Z]+)*?$ ,
([a-zA-Z]+?) - Means the String should start with alphabets.
([-\s'][a-zA-Z]+)*? - Means the string must have hyphen,space or apostrophe followed by alphabets.
^ and $ - denote start and end of string
Here's the link to regex demo.
Try this one
^[^- '](?=(?![A-Z]?[A-Z]))(?=(?![a-z]+[A-Z]))(?=(?!.*[A-Z][A-Z]))(?=(?!.*[- '][- '.]))(?=(?!.*[.][-'.]))[A-Za-z- '.]{2,}$
Demo
I'm trying to modify a fairly basic regex pattern in C# that tests for phone numbers.
The patterns is -
[0-9]+(\.[0-9][0-9]?)?
I have two questions -
1) The existing expression does work (although it is fairly restrictive) but I can't quite understand how it works. Regexps for similar issues seem to look more like this one -
/^[0-9()]+$/
2) How could I extend this pattern to allow brackets, periods and a single space to separate numbers. I tried a few variations to include -
[0-9().+\s?](\.[0-9][0-9]?)?
Although i can't seem to create a valid pattern.
Any help would be much appreciated.
Thanks,
[0-9]+(\.[0-9][0-9]?)?
First of all, I recommend checking out either regexr.com or regex101.com, so you yourself get an understanding of how regex works. Both websites will give you a step-by-step explanation of what each symbol in the regex does.
Now, one of the main things you have to understand is that regex has special characters. This includes, among others, the following: []().-+*?\^$. So, if you want your regex to match a literal ., for example, you would have to escape it, since it's a special character. To do so, either use \. or [.]. Backslashes serve to escape other characters, while [] means "match any one of the characters in this set". Some special characters don't have a special meaning inside these brackets and don't require escaping.
Therefore, the regex above will match any combination of digits of length 1 or more, followed by an optional suffix (foobar)?, which has to be a dot, followed by one or two digits. In fact, this regex seems more like it's supposed to match decimal numbers with up to two digits behind the dot - not phone numbers.
/^[0-9()]+$/
What this does is pretty simple - match any combination of digits or round brackets that has the length 1 or greater.
[0-9().+\s?](\.[0-9][0-9]?)?
What you're matching here is:
one of: a digit, round bracket, dot, plus sign, whitespace or question mark; but exactly once only!
optionally followed by a dot and one or two digits
A suitable regex for your purpose could be:
(\+\d{2})?((\(0\)\d{2,3})|\d{2,3})?\d+
Enter this in one of the websites mentioned above to understand how it works. I modified it a little to also allow, for example +49 123 4567890.
Also, for simplicity, I didn't include spaces - so when using this regex, you have to remove all the spaces in your input first. In C#, that should be possible with yourString.Replace(" ", ""); (simply replacing all spaces with nothing = deleting spaces)
The + after the character set is a quantifier (meaning the preceeding character, character set or group is repeated) at least one, and unlimited number of times and it's greedy (matched the most possible).
Then [0-9().+\s]+ will match any character in set one or more times.
PFB the regex. I want to make sure that the regex should not contain any special character just after # and just before. In-between it can allow any combination.
The regex I have now:
#"^[^\W_](?:[\w.-]*[^\W_])?#(([a-zA-Z0-9]+)(\.))([a-zA-Z]{2,3}|[0-9]{1,3})(\]?)$"))"
For example, the regex should not match
abc#.sj.com
abc#-.sj-.com
SSDFF-SAF#-_.SAVAVSAV-_.IP
Since you consider _ special, I'd recommend using [^\W_] at the beginning and then rearrange the starting part a bit. To prevent a special char before a #, just make sure there is a letter or digit there. I also recommend to remove redundant capturing groups/convert them into non-capturing:
#"^[^\W_](?:[\w.-]*[^\W_])?#(?:\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.|(?:[\w-]+\.)+)(?:[a-zA-Z]{2,3}|[0-9]{1,3})\]?$"
Here is a demo of how this regex matches now.
The [^\W_](?:[\w.-]*[^\W_])? matches:
[^\W_] - a digit or a letter only
(?:[\w.-]*[^\W_])? - a 1 or 0 occurrences of:
[\w.-]* - 0+ letters, digits, _, . and -
[^\W_] - a digit or a letter only
Change the initial [\w-\.]+ for [A-Za-z0-9\-\.]+.
Note that this excludes many acceptable email addresses.
Update
As pointed out, [A-Za-z0-9] is not an exact translation of \w. However, you appear to have a specific definition as to what you consider special characters and so it is probably easier for you to define within the square brackets what you class as allowable.
[SOME_WORDS:200:1000]
Trying to match just the last 1000 part. Both numbers are variable and can contain an unknown number of characters (although they are expected to contain digits, I cannot rule out that they may also contain other characters). The SOME_WORDS part is known and does not change.
So I begin by doing a positive lookbehind for [SOME_WORDS: followed by a positive lookahead for the trailing ]
That gives us the pattern (?<=\[SOME_WORDS:).*(?=])
And captures the part 200:1000
Now because I don't know how many characters are after SOME_WORDS:, but I know that it ends with another : I use .*: to indicate any character any amount of time followed by :
That gives us the pattern (?<=\[SOME_WORDS:.*:).*(?=])
However at this point the pattern no longer matches anything and this is where I become confused. What am I doing wrong here?
If I assume that the first number will always be 3 characters long I can replace .* with ... to get the pattern (?<=\[SOME_WORDS:...:).*(?=]) and this correctly captures just the 1000 part. However I don't understand why replacing ... with .* makes the pattern not capture anything.
EDIT:
It seems like the online tool I was using to test the regex pattern wasn't working correctly. The pattern (?<=\[SOME_WORDS:.*:).*(?=]) matches the 1000 with no issues when actually done in .net
You usually cannot use a + or a * in a lookbehind, only in a lookahead.
If c# does allow these than you could use a .*? instead of a .* as the .* will eat the second :
Try this:
(?<=\[SOME_WORDS:)(?=\d+:(\d+)])
The match wil be in the first capture group
Quote from http://www.regular-expressions.info/lookaround.html
The bad news is that most regex flavors do not allow you to use just any regex inside a lookbehind, because they cannot apply a regular expression backwards. The regular expression engine needs to be able to figure out how many characters to step back before checking the lookbehind. When evaluating the lookbehind, the regex engine determines the length of the regex inside the lookbehind, steps back that many characters in the subject string, and then applies the regex inside the lookbehind from left to right just as it would with a normal regex.
As Robert Smit mentions this is due to the * being a greedy operator. Greedy operators consume as many characters as they possibly can when they are matched first. They only give up characters if the match fails. If you make the greedy operator lazy(*?), then matching consumes as little number of characters as possible for the match to succeed, so the : is not consumed by *. You can also use [^:]* which is match any character other than :.
I am trying to make a few regex strings to use in my syntax highlighter, this if the first time I have ever used them and I am having a deal of difficulty...
The first four are, I will have a specified character followed by any number of numbers, match it.
The best I have is "G[0-9]|G[0-9][0-9]|G[0-9][0-9][0-9]" to match either G#, G##, or G###
but I want to do G with any number of numbers after it.
The next three are, I will have a character (X,Y,Z, or P) and I want to match it if there is no letter or symbol behind it
"[X|Y|Z|P][0-9]"
These next few are harder, match "#11.11=11.11" where 1 is a number and there can be any number of numbers between the pound sign, the periods, and the equal sign. And the periods do not have to be there can also be "#11=11" or " #1.1=11" or "#11=1.1"
I have no idea... "#[0-9][ |.] ..."
Anything after a " ' " and between a newline
'[A-Za-z0-9]\n" but I know this only gives me one character...
And the easy one I think is anything between two () or []
"(*) | [*]"?
Quick and dirty, but tested using regexpal
1) G[0-9]{1-3} - the '{1-3}' specifies the last symbol to occur one to three times.
2) ((.*|)) - you put a '\' before the '(' and ')' as escape characters
3) [0-9]1*(.|)1*=1*(.|)1 - this matches your three examples
4) \'.*\n - I think this should work... '\n' represents a new line char right?
5) ((|[).*()|]) - this one has those escape characters again
Again...quick and dirty. Regexpal.com is your friend
1> G[0-9]{1,3}
2> No, it's WRONG.
The correct one is [XYZ][0-9]
(you do not use an OR operator (|), but just write the characters side by side within square braces)
You should really look up how to use regexes. Having said that:
I will have a specified character followed by any number of numbers, match it
G\d+
I will have a character (X,Y,Z, or P) and I want to match it if there
is no letter or symbol behind it
(?<!\w)[XYZP][0-9]
These next few are harder, make "#11.11=11.11" blue
Huh?
Anything after a " ' " and between a newline
'(.+?)\n
And the easy one I think is anything between two () or []
\(.+?\)|\[.+?\]
And the easy one I think is anything between two () or []
"(*) | [*]"?
#"\([^(]*\)" and #"\[[^\[]*\]"
It means: an open bracket - then any number of characters which are not an open bracket - and a close bracket.
Slashes are required to indicate to the regex engine that brackets should be treated literally.
# - verbatim string - is to inform C#, in turn, that those slashes should be taken literally and not as C# escape characters.
Anything after a " ' " and between a newline
Similarly: #"'[^']*\n"
G\d+
[XYZP](?=\d)
#(\d+(\.\d+)?)=(\d+(\.\d+)?)
'.*?\n
\(.*?\)|\[.*?\]
Regex explanation here.
The first one:
G[0-9]+
In regular expressions + means at least 1 or more (repetitions of the previous "characters").
You could also use * for none or any number of repetitions.
The second might be something like this:
^(X|Y|Z|P)$
This actually matches only if it's at the beginning of a line and has no characters behind. If you want it to be anywhere and only exclude certain characters behind it, modify the following:
[XYZP][^0-9a-z]
This is X or Y or Z or P followed by NOT 0-9 and NOT a-z
Notice that I use the OR character | in the first example in brackets, but not in the square brackets.
For the third one:
#[0-9]+\.[0-9]+=[0-9]+\.[0-9]+
Might not be 100 percent correct, I always confuse when to escape which characters. You might need to escape # and =.
For the last one:
(\(.*\)|\[.*\])
For the first one you can use this Regex :
^G\d+
For G with any number of digits after it
\b([Gg]\d+)\b
This matches a wordboundary (\b) followed by a lower or upper G [Gg], followed by 1 or more (+) digits (\d), followed by a wordboundary (\b)
The next three are, I will have a character (X,Y,Z, or P) and I want
to match it if there is no letter or symbol behind it
This is a little tougher
\b[XYZP]([\W]|_)
This matches an XYZ or P followed by a non-word character \W, (word characters are typically a-z, 0-9 and the underscore), so after saying we don't want a word character, we add in that the _ is allowed.
I use perl for regex, but it should hopefully be the same as what you're looking for.
For the first one, G[0-9]+ should work. The square brackets means that the regex looks for only one of the characters within the brackets (the characters being 0 through 9) and the + right after it means that it looks for one or more matches.
The second is a bit more tricky, but I would use \s[XYPZ]. The square brackets function the same as before, only matching one of X, Y, P or Z. Also the \s matches any whitespace character (tab, space, newline, etc.).
For the third one, I would try #[0-9]+\.?[0-9]+=[0-9]+\.?[0-9]+. If we go from left to right, we encounter \.? and it's new. \. matches a literal period (you have to escape it with the backslash, as just a period by itself means that it can match one of any character). The question mark means that the period can either be there or not (matches zero or one instance of a period).
The fourth one: '.*\n. The combination of the period by itself and the asterisk means that it'll match zero or more characters, the characters being anything at all. I'm not too sure if you need to escape the single quotes though.
And for the fifth one, (\(.*\)|\[.*\]) should do the trick. You need to escape the []() inside the brackets because they mean something by themselves. Also, the | means or, so the regex can either matches whatever is on the left side of the bar, or on the right.
You can specify repetitions in different ways. A star "*" after a term means, repeat the term zero, one or several times. A plus sign "+" means, repeat the term one or several times. You can also specify a number range with {n,m}. In your case the expression would be
G\d{1-3}
where \d is a digit.
With this expression you can match a position that does not preceed a suffix
find(?!suffix)
I am not sure what you mean by symbol
[XYZP](?![a-zA-Z specify your symbols here])
For the pound number
\#\d+(\.\d+)?=\d+(\.\d+)?
\# the pound sign
\d+ at least one digit
(\.\d+)? optionally (?) a period succeeded by at least one digit
finally an equal sign succeeded by another number
Everything between "'" and \n. Use this pattern here, which finds a position between a prefix and a suffix.
(?<=prefix)find(?=suffix)
(?<=').*(?=\n)
.* means any character as many times as possible. Alternatively you could use
(?<=').*?(?=\n)
.* means any character as few times as possible, if too many \n are taken. Also be carefult with the RegexOption.Multiline. Depending on its setting you will have to test for the end of line with $ instead of \n.
For the parentheses () or [] you can use the same pattern again
(?<=prefix)find(?=suffix)
(?<=\().*?(?=\))|(?<=\[).*?(?=])
where | is the alternative.