I'm asking what looks like a similar question to all the lat/lon regex questions, but my question puts a different spin on the format that I haven't been able to find. I want to only accept a format as such:
LAT: XX-XX.XX N|S
LON: XXX-XX.XX E|W
This is for a C# window text entry where latitude and longitude are entered in separate textboxes.
I want the format to only accept 1 dash(-) and 1 decimal in those locations (i.e. negative values are invalid) and enforce the range correctly so that all place values need to be entered such as:
LAT 0-90 North or South
00-00.00 N valid
5-00.00 N invalid
05-00.00 N valid
90-00.00 N valid
89-59.99 S valid
90-60.00 S invalid
91.00.00 N invalid
LON 0-180 East or West
0-0.0 E invalid
15-00.00 E invalid
015.00.00 E valid
180-00.00 E valid
180-01.00 E invalid
179-59.99 W valid
179-60.00 W invalid
181-00.00 W invalid
I know how to do it digit by digit such as for Latitude:
[0-9][0-9]-[0-5][0-9].[0-9][0-9] [N|S]
That is the extent of my knowledge of RegEx authoring.
As always, any help on this would be much appreciated.
My suggestion is
String patternLatitude = #"^(90\-00\.00)|([0-8]\d\-[0-5]\d\.\d\d) (N|S)$";
String patternLongitude = #"^(180\-00\.00)|((1[0-7]\d)|(0\d\d)\-[0-5]\d\.\d\d) (W|E)$";
providing that the given example
015.00.00 E valid
should be actually invalid. More testing examples (all invalid)
090-00.01 N
180-00.01 E
190-00.00 E
200-00.00 E
Explanation:
Latitude:
90-00.00 is a special case (the only possible value with 90 degree), for other degree values we can put down [0..8]\d; minutes are [0..5]\d and decimals are just two digits: \d\d.
Longitude: 180-00.00 is a special case (the only possibility with 180 degree); the second case 1** lattitudes: since we don't have 180 or 190 lattitudes we can put them as 1[0-7]\d; finally if a lattitude starts with 0 it can have any two digits more: 0\d\d. Minutes and their decimals are the same as they are in the Lattitude case.
The following will give you a regex to match the latitudes, along with capturing groups to enable access to the degs/mins/secs and N/S values:
(([0-8]\d)[-.]([0-5]\d)\.(\d\d)|(90[-.]00\.00)) ([N|S])
And the same for E/W:
((0\d\d|1[0-7]\d)[-.]([0-5]\d)\.(\d\d)|(180[-.]00\.00)) ([E|W])
This may be hard to look at ... but it validates.
# #"(?:(?:(?:(?:0\d|[1-8]\d)(?=-\d\d\.\d\d[ ][NS])|(?:0\d\d|1[0-7]\d)(?=-\d\d\.\d\d[ ][EW]))-(?:(?:(?:0\d|[1-5]\d)\.\d\d)))|(?:(?:90(?=-\d\d\.\d\d[ ][NS])|180(?=-\d\d\.\d\d[ ][EW]))-00\.00))[ ][NSEW]"
(?: #
(?: # =============
(?: #
(?: # LAT: 00 to 89 North or South
0 \d #
| [1-8] \d #
) #
(?= - \d\d \. \d\d [ ] [NS] )
| # or,
(?: # LON: 000 to 179 East or West
0 \d\d #
| 1 [0-7] \d #
) #
(?= - \d\d \. \d\d [ ] [EW] )
) #
#
- # -
#
(?: #
(?: #
(?: # 00 to 59
0 \d #
| [1-5] \d #
) #
\. # .
\d\d # 00 to 99
) #
) #
) #
| # or,
(?: # =============
(?: #
90 # LAT: 90 North or South
(?= - \d\d \. \d\d [ ] [NS] )
| # or,
180 # LON: 180 East or West
(?= - \d\d \. \d\d [ ] [EW] )
) #
- 00 \. 00 # - 00.00
) #
) #
[ ] # =============
[NSEW] # N,S,E,W
To limit degrees to 0-90/0-180, and seconds to 0-0/59-99, I would go for these regex:
#Latitude:
(([0-8]\d)-(0\d|[1-5]\d)\.\d\d|90-00.00)\s[NS]
#Longitude:
((0\d\d|1[0-7]\d)-(0\d|[1-5]\d)\.\d\d|180-00.00)\s[EW]
Related
I have the following strings that are valid...
" 1"
" 12"
" 123"
"1234"
" 123"
" 12A"
""
The following string are NOT valid...
" 1234"
" 1234"
"0 12"
"0012"
Currently I use the following regex match to check if the string is valid...
"(|[0-9A-Z\-]{4}| {1}[0-9A-Z\-]{3}| {2}[0-9A-Z\-]{2}| {3}[0-9A-Z\-]{1})"
Note: To be clear, the above regex will NOT meet my requirements, that's why I'm asking this question.
I was hoping there was a simpler match I could use, something like the following...
"(| {0,3}[0-9A-Z\-]{1,4})"
The only problem I have is that the above will also match this like " 1234" which is not acceptable. Is there a way for me to limit the capture group I have to only 4 characters?
If the match can not start with a zero, you could add a negative lookahead as Wiktor previously commented:
"(?="|.{4}")(?! *0)[0-9A-Z -]*"
Explanation
" Match literally
(?="|.{4}") If what is directly on the right is either " or 4 chars followed by "
(?! *0) If what is direcly on the right is not 0+ spaces followed by a zero
[0-9A-Z -]* Match 0+ times what is listed in the character class
" Match literally
Regex demo
If the spaces can only occur at the beginning you could use:
"(?="|.{4}")(?! *0) *[0-9A-Z-]+"
Regex demo
This would pass all your test cases:
"(|[1-9\s][0-9A-Z\s]{2}[0-9A-Z])"
Though I suspect there are cases you might not have mentioned.
Explanation: match either 0 or 4 characters between double quotes. First character may be a space or digit but not a zero. Next two characters are any digit or capital letter or space. Fourth character is a digit or capital but not a space.
To make it a bit more efficient:
"(?:[A-Z\d-]{4}|[ ](?:[A-Z\d-]{3}|[ ](?:[A-Z\d-]|[ ])[A-Z\d-]))"
https://regex101.com/r/1fr9tb/1
"
(?:
[A-Z\d-]{4}
| [ ]
(?:
[A-Z\d-]{3}
| [ ]
(?: [A-Z\d-] | [ ] )
[A-Z\d-]
)
)
"
Benchmarks
Regex1: "(?:[A-Z\d-]{4}|[ ](?:[A-Z\d-]{3}|[ ](?:[A-Z\d-]|[ ])[A-Z\d-]))"
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 7
Elapsed Time: 0.66 s, 663.84 ms, 663843 µs
Matches per sec: 527,233
Regex2: "(|[0-9A-Z\-]{4}|[ ]{1}[0-9A-Z\-]{3}|[ ]{2}[0-9A-Z\-]{2}|[ ]{3}[0-9A-Z\-]{1})"
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 7
Elapsed Time: 0.94 s, 938.44 ms, 938438 µs
Matches per sec: 372,960
Regex3: "(?="|.{4}")(?![ ]*0)[0-9A-Z -]*"
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 6
Elapsed Time: 0.73 s, 728.48 ms, 728484 µs
Matches per sec: 411,814
Regex4: "(|[1-9\s][0-9A-Z\s]{2}[0-9A-Z])"
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 6
Elapsed Time: 0.85 s, 851.48 ms, 851481 µs
Matches per sec: 352,327
According to Regex documentation, using RegexOptions.ExplicitCapture makes the Regex only match named groups like (?<groupName>...); but in action it does something a little bit different.
Consider these lines of code:
static void Main(string[] args) {
Regex r = new Regex(
#"(?<code>^(?<l1>[\d]{2})/(?<l2>[\d]{3})/(?<l3>[\d]{2})$|^(?<l1>[\d]{2})/(?<l2>[\d]{3})$|(?<l1>^[\d]{2}$))"
, RegexOptions.ExplicitCapture
);
var x = r.Match("32/123/03");
r.GetGroupNames().ToList().ForEach(gn => {
Console.WriteLine("GroupName:{0,5} --> Value: {1}", gn, x.Groups[gn].Success ? x.Groups[gn].Value : "");
});
}
When you run this snippet you'll see the result contains a group named 0 while I don't have a group named 0 in my regex!
GroupName: 0 --> Value: 32/123/03
GroupName: code --> Value: 32/123/03
GroupName: l1 --> Value: 32
GroupName: l2 --> Value: 123
GroupName: l3 --> Value: 03
Press any key to continue . . .
Could somebody please explain this behavior to me?
You always have group 0: that's the entire match. Numbered groups are relative to 1 based on the ordinal position of the opening parenthesis that defines the group. Your regular expression (formatted for clarity):
(?<code>
^
(?<l1> [\d]{2} )
/
(?<l2> [\d]{3} )
/
(?<l3> [\d]{2} )
$
|
^
(?<l1>[\d]{2})
/
(?<l2>[\d]{3})
$
|
(?<l1> ^[\d]{2} $ )
)
Your expression will backtrack, so you might consider simplifying your regular expression. This is probably clearer and more efficient:
static Regex rxCode = new Regex(#"
^ # match start-of-line, followed by
(?<code> # a mandatory group ('code'), consisting of
(?<g1> \d\d ) # - 2 decimal digits ('g1'), followed by
( # - an optional group, consisting of
/ # - a literal '/', followed by
(?<g2> \d\d\d ) # - 3 decimal digits ('g2'), followed by
( # - an optional group, consisting of
/ # - a literal '/', followed by
(?<g3> \d\d ) # - 2 decimal digits ('g3')
)? # - END: optional group
)? # - END: optional group
) # - END: named group ('code'), followed by
$ # - end-of-line
" , RegexOptions.IgnorePatternWhitespace|RegexOptions.ExplicitCapture );
Once you have that, something like this:
string[] texts = { "12" , "12/345" , "12/345/67" , } ;
foreach ( string text in texts )
{
Match m = rxCode.Match( text ) ;
Console.WriteLine("{0}: match was {1}" , text , m.Success ? "successful" : "NOT successful" ) ;
if ( m.Success )
{
Console.WriteLine( " code: {0}" , m.Groups["code"].Value ) ;
Console.WriteLine( " g1: {0}" , m.Groups["g1"].Value ) ;
Console.WriteLine( " g2: {0}" , m.Groups["g2"].Value ) ;
Console.WriteLine( " g3: {0}" , m.Groups["g3"].Value ) ;
}
}
produces the expected
12: match was successful
code: 12
g1: 12
g2:
g3:
12/345: match was successful
code: 12/345
g1: 12
g2: 345
g3:
12/345/67: match was successful
code: 12/345/67
g1: 12
g2: 345
g3: 67
named group
^(?<l1>[\d]{2})/(?<l2>[\d]{3})/(?<l3>[\d]{2})$|^(?<l1>[\d]{2})/(?<l2>[\d]{3})$|(?<l1>^[\d]{2}$)
try this (i remove first group from your regex) - see demo
What will be the RegularExression for this?
NN-ARID-NNN?
//N = Number
I have tried this ^[0-9/-0-9]+$
You're not matching the ARID at all and a character class will match in any order... You might want to use something more like this:
^[0-9]{2}-ARID-[0-9]{3}$
[Assuming that ? is not in the actual string...]
If you want the first two digits to be within the range of 00 to 13, then you can use the OR operator with | and a group:
^(?:0[0-9]|1[0-3])-ARID-[0-9]{3}$
^^^ ^ ^
| OR |
| |
+---- Group ---+
Breakdown:
^ Matches beginning of string
(?: Beginning of group
0[0-9] Matches 00 to 09 only
| OR
1[0-3] Matches 10 to 13 only
) End of group
-ARID- Matches -ARID- literally
[0-9]{3} Matches 3 digits
$ Matches end of line
When there is an option of matching 00-09 or 10-13, the pattern just cannot match a blank. There's no way it will match if the numbers are not there.
I have this RegEx for C# ASP.NET MVC3 Model validation:
[RegularExpression(#"[0-9]*\,?[0-9]?[0-9]")]
This works for almost all cases, except if the number is bigger than 100.
Any number greater than 100 should show error.
I already tried use [Range], but it doesn't work with commas.
Valid: 0 / 0,0 / 0,00 - 100 / 100,0 / 100,00.
Invalid (Number > 100).
Not sure if zero's are only optional digits at the end but
# (?:100(?:,0{1,2})?|[0-9]{1,2}(?:,[0-9]{1,2})?)
(?:
100
(?: , 0{1,2} )?
|
[0-9]{1,2}
(?: , [0-9]{1,2} )?
)
Zero's only option at end
# (?:100|[0-9]{1,2})(?:,0{1,2})?
(?:
100
| [0-9]{1,2}
)
(?: , 0{1,2} )?
And, the permutations for no leading zero's except for zero itself
# (?:100(?:,0{1,2})?|(?:0|[1-9][0-9]?)(?:,[0-9]{1,2})?)
(?:
100
(?: , 0{1,2} )?
|
(?:
0
|
[1-9] [0-9]?
)
(?: , [0-9]{1,2} )?
)
# (?:100|0|[1-9][0-9])(?:,0{1,2})?
(?:
100
|
0
|
[1-9] [0-9]
)
(?: , 0{1,2} )?
Here's a RegEx that matches your criteria:
^(?:(?:[0-9]|[1-9]{1,2})(?:,[0-9]{1,2})?|(?:100)(?:,0{1,2})?)$
(Given your use case, I have assumed that your character sequence appears by itself and is not embedded within other content. Please let me know if that is not the case.)
And here's a Perl program that demonstrates that RegEx on a sample data set. (Also see live demo.)
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
chomp;
# A1 => An integer between 1 and 99, without leading zeros.
# (Although zero can appear by itself.)
#
# A2 => A optional fractional component that may contain no more
# than two digits.
#
# -OR-
#
# B1 => The integer 100.
#
# B2 => A optional fractional component following that may
# consist of one or two zeros only.
#
if (/^(?:(?:[0-9]|[1-9]{1,2})(?:,[0-9]{1,2})?|(?:100)(?:,0{1,2})?)$/) {
# ^^^^^^^^A1^^^^^^ ^^^^^A2^^^^ ^B1 ^^^B2^^
print "* [$_]\n";
} else {
print " [$_]\n";
}
}
__DATA__
0
01
11
99
100
101
0,0
0,00
01,00
0,000
99,00
99,99
100,0
100,00
100,000
100,01
100,99
101,00
Expected Output
* [0]
[01]
* [11]
* [99]
* [100]
[101]
* [0,0]
* [0,00]
[01,00]
[0,000]
* [99,00]
* [99,99]
* [100,0]
* [100,00]
[100,000]
[100,01]
[100,99]
[101,00]
I have to write a function that will get a string and it will have 2 forms:
XX..X,YY..Y where XX..X are max 4 characters and YY..Y are max 26 characters(X and Y are digits or A or B)
XX..X where XX..X are max 8 characters (X is digit or A or B)
e.g. 12A,784B52 or 4453AB
How can i user Regex grouping to match this behavior?
Thanks.
p.s. sorry if this is to localized
You can use named captures for this:
Regex regexObj = new Regex(
#"\b # Match a word boundary
(?: # Either match
(?<X>[AB\d]{1,4}) # 1-4 characters --> group X
, # comma
(?<Y>[AB\d]{1,26}) # 1-26 characters --> group Y
| # or
(?<X>[AB\d]{1,8}) # 1-8 characters --> group X
) # End of alternation
\b # Match a word boundary",
RegexOptions.IgnorePatternWhitespace);
X = regexObj.Match(subjectString).Groups["X"].Value;
Y = regexObj.Match(subjectString).Groups["Y"].Value;
I don't know what happens if there is no group Y, perhaps you might need to wrap the last line in an if statement.