Using C#
I need to create 8 regex patterns to check a bunch of arbitrary-length strings (one per line in a text file). I'm trying to write the patterns as strings, then convert them to Pattern later.
The ones I believe I have:
1 . All numbers (eg 128329983928): "^[0-9]{string.length}$"
2 . All lowercase (eg aejksanikp): "^[a-z]{string.length}$"
3 . All uppercase (eg AIJDJWIHJMNQ): "^[A-Z]{string.length}$"
The ones I need help with:
4 . All lowercase, all one letter (eg aaaaaaaaaaaaaaa)
5 . All uppercase, all one letter (eg AAAAAAAAAAAAAAA)
6 . Any case, one letter (eg AAaaaAaaAAAaAAaa)
7 . Any number plus any single letter, any case: (eg 1420A843aa830a3237A)
8 . Any single number (eg 22222222222222)
For numbers 4, 5, and 8, I could just do a bunch of | (or), but I was hoping there was a better way than "a|b|c|d|e...{string.length}". I really have no idea what to do for 6 and 7.
4). This looks from the start (^) to end ($) of a string (or line, if m modifier). It captures one alphabetical character (([a-z])), and then looks for that captured character 0+ times (\1*). Demo.
^([a-z])\1*$
5). Do the same thing, but initially capture the group [A-Z]. Demo
6). You can either use [a-Z], [A-z], [a-zA-Z], or just use the i modifier to make it case insensitive. Demo (with modifier).
7). This one is a little more tricky. We still use the anchors ^ and $ like before. But now we look for 0+ digits and capture an alphabetical character (this means the letter can come first or after some number of digits). After this, we look for either a digit or the captured letter 0+ times until the end. It sounds like a letter is required; however, if you want it to be optional, you can put a ? after the captured letter (([a-z])?). Remember to make this one case-insensitive with the i modifier or replace the capture group with [a-zA-Z]. Demo.
^\d*([a-z])(\d|\1)*$
8). Replace [a-z] in example 1 with either [0-9] or \d. Demo.
References:
Shorthand Character Classes (i.e. \d)
Grouping and Capturing (we used this to capture the character and reference it with \1)
NOTE: Since this seemed like homework, please ask if you have questions on how things worked beyond my explanations so you learn a thing or two about expressions :)
4 . All lowercase, all one letter (eg aaaaaaaaaaaaaaa)
5 . All uppercase, all one letter (eg AAAAAAAAAAAAAAA)
Since these two will be:
"^[a]{string.length}$"
"^[A]{string.length}$"
you should able able to determine the rest (Hint: [] represents a set)
Edit: Changed the patterns to cater for the fact the input is "one per line in a text file" and the regexp needs to match the whole line.
ok
ok
ok
^([a-z])\1{string.length-1}$
^([A-Z])\1{string.length-1}$
^(?i)([a-z])\1{string.length-1}$
^([a-zA-Z0-9]){string.length-1}$ unless it must start with a number then you'd go
for: ^[0-9][a-zA-Z0-9]{string.length-1}$
^([0-9])\1{string.length-1}
Note that:
^ means beginning of string and $ end of string. This prevents
partial matches.
(?i) means the start of the case insensitive search.
Related
I have a regex that I thought was working correctly until now. I need to match on an optional character. It may be there or it may not.
Here are two strings. The top string is matched while the lower is not. The absence of a single letter in the lower string is what is making it fail.
I'd like to get the single letter after the starting 5 digits if it's there and if not, continue getting the rest of the string. This letter can be A-Z.
If I remove ([A-Z]{1}) +.*? + from the regex, it will match everything I need except the letter but it's kind of important.
20000 K Q511195DREWBT E00078748521
30000 K601220PLOPOH Z00054878524
Here is the regex I'm using.
/^([0-9]{5})+.*? ([A-Z]{1}) +.*? +([A-Z]{1})([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) +([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})/
Use
[A-Z]?
to make the letter optional. {1} is redundant. (Of course you could also write [A-Z]{0,1} which would mean the same, but that's what the ? is there for.)
You could improve your regex to
^([0-9]{5})+\s+([A-Z]?)\s+([A-Z])([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})
And, since in most regex dialects, \d is the same as [0-9]:
^(\d{5})+\s+([A-Z]?)\s+([A-Z])(\d{3})(\d{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])\d{3}(\d{4})(\d{2})(\d{2})
But: do you really need 11 separate capturing groups? And if so, why don't you capture the fourth-to-last group of digits?
You can make the single letter optional by adding a ? after it as:
([A-Z]{1}?)
The quantifier {1} is redundant so you can drop it.
You have to mark the single letter as optional too:
([A-Z]{1})? +.*? +
or make the whole part optional
(([A-Z]{1}) +.*? +)?
You also could use simpler regex designed for your case like (.*)\/(([^\?\n\r])*) where $2 match what you want.
here is the regex for password which will require a minimum of 8 characters including a number and lower and upper case letter and optional sepecial charactor
/((?=.\d)(?=.[a-z])(?=.*[A-Z])(?![~##$%^&*_-+=`|{}:;!.?"()[]]).{8,25})/
/((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?![~##\$%\^&\*_\-\+=`|{}:;!\.\?\"()\[\]]).{8,25})/
PFB the regex. I want to make sure that the regex should not contain any special character just after # and just before. In-between it can allow any combination.
The regex I have now:
#"^[^\W_](?:[\w.-]*[^\W_])?#(([a-zA-Z0-9]+)(\.))([a-zA-Z]{2,3}|[0-9]{1,3})(\]?)$"))"
For example, the regex should not match
abc#.sj.com
abc#-.sj-.com
SSDFF-SAF#-_.SAVAVSAV-_.IP
Since you consider _ special, I'd recommend using [^\W_] at the beginning and then rearrange the starting part a bit. To prevent a special char before a #, just make sure there is a letter or digit there. I also recommend to remove redundant capturing groups/convert them into non-capturing:
#"^[^\W_](?:[\w.-]*[^\W_])?#(?:\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.|(?:[\w-]+\.)+)(?:[a-zA-Z]{2,3}|[0-9]{1,3})\]?$"
Here is a demo of how this regex matches now.
The [^\W_](?:[\w.-]*[^\W_])? matches:
[^\W_] - a digit or a letter only
(?:[\w.-]*[^\W_])? - a 1 or 0 occurrences of:
[\w.-]* - 0+ letters, digits, _, . and -
[^\W_] - a digit or a letter only
Change the initial [\w-\.]+ for [A-Za-z0-9\-\.]+.
Note that this excludes many acceptable email addresses.
Update
As pointed out, [A-Za-z0-9] is not an exact translation of \w. However, you appear to have a specific definition as to what you consider special characters and so it is probably easier for you to define within the square brackets what you class as allowable.
I have to validate a "project code" string in C# - the string length can be anywhere between 5-10 characters. The only rules outside this are as follows:
First character can only be a letter or number
Middle characters if they exist can be letter, number or a period (.)
Last character can only be a letter or number
*Avoid more than one period in a row in the middle
I can validate the 5-10 characters restriction like this:
^(?=.{5,10}$)
And part 1 and part 3 like this:
[a-zA-Z0-9]{1}
The middle rule is looking like this:
[a-zA-Z0-9.]{0,8}
And if I put it all together I have this:
^(?=.{5,10}$)[a-zA-Z0-9]{1}[a-zA-Z0-9.]{0,8}[a-zA-Z0-9]{1}$
It works fine, but with all that nearly identical code, it seems it could be condensed somehow. Any ideas? Thanks!
You can make it a bit shorter by matching the middle part 3 to 8 times, and a single time with the outer parts (You don't need {1}). This eliminates the need for the 5,10 part of your code because 1+3+1=5 and 1+8+1=10.
^[a-zA-Z0-9][a-zA-Z0-9.]{3,8}[a-zA-Z0-9]$
You can use
(?i)^(?!.*[.]{2})[a-z0-9][a-z0-9.]{3,8}[a-z0-9]$
See demo
^[a-z0-9] - First character can only be a letter or number
[a-z0-9.]{3,8} - Middle characters if they exist can be letter, number or a period (.)
[a-z0-9]$ - Last character can only be a letter or number
^(?!.*[.]{2}) - *Avoid more than one period in a row in the middle
The (?i) inline modifier can be replaced with RegexOptions.IgnoreCase flag when used with the new Regex() initializer.
I'm going to piggy-back off Cyral's answer so +1 to him!
I'm using a case insensitive flag to get from [a-zA-Z0-9] to [a-z0-9]
^(?i)[a-z0-9][a-z0-9.]{3,8}[a-z0-9]$
Thanks folks!
I have added the following regular expression for validating a mobile phone number:
(^07[1,2,3,4,5,7,8,9][0-9]{7,8}$)
I want to allow the user to enter a # character too and I'm not sure where to fit it in. They may need to enter # character after they have dialed a number, or at the beginning of a number to dial a direct number or an extension.
First, your current regex matches 'numbers' of the format 07,12345678 as well. So you need to change [1,2,3,4,5,7,8,9] to [1-9] (when you have a - between two characters in a character class, it usually means that there's a range)
If you want to accept an optional # character, you can use the ? quantifier which means 0 or 1 times.
^#?07[1-9][0-9]{7,8}#?$
regex101 demo
Except that, as you can see in the demo, it will also match numbers with two hashes; one at the front and one at the end. One option to circumvent this is to use some conditionals (which C# can support).
^(#)?07[1-9][0-9]{7,8}(?(1)|#?)$
regex101 demo
(?(1)|#?) basically means that if the first hash was matched, then nothing more should be matched. Otherwise, if no hash was initially matched, then it can match a hash, if there is one at the end of the number.
In C#, it will be a bit like this:
Regex.Match(myString, #"^(#)?07[1-9][0-9]{7,8}(?(1)|#?)$");
Or you could use a negative lookahead to make sure there's never more than one hash in the number:
^(?!.*#.*#.*$)#?07[1-9][0-9]{7,8}#?$
Ok so I finally figured out that I need the following:
So the regular expression has to perform:
alphanumeric
at least 1 letter
must have between 4 and 8 characters (either a letter/digit).
i.e. an alphanumeric string, that must have at least 1 letter, and be between 4-8 in length. (so it can't be just all numbers or just all letters).
can all of this be done in a single regex?
I'm assuming you mean alphaunumeric, at least one letter, and 4 to 8 characters long.
Try this:
(?=.*[a-zA-Z])[a-zA-Z0-9]{4,8}
(?= - we're using a lookahead, so we can check for something without affecting the rest of the match
.*[a-zA-Z] - match for anything followed by a letter, i.e. check we have at least one letter
[a-zA-Z0-9]{4,8} - This will match a letter or a number 4 to 8 times.
However, you say the intention is for "it can't be just all numbers or just all letters", but requirements 1, 2 and 3 don't accomplish this since it's can be all letters and meet all three requirements. It's possible you want this, with an extra lookahead to confirm there's at least one digit:
(?=.*[a-zA-Z])(?=.*[0-9])[a-zA-Z0-9]{4,8}
The use of a-zA-Z isn't very international friendly, so you may be better off using an escape code for "letter" if available in your flavour of Regular Expressions.
Also, I hope this isn't matching for acceptable passwords, as 4 characters probably isn't long enough.
number 2 and 3 seem to contradict. The following will match alphanumeric between 4 and 8:
/[0-9a-zA-Z]{4,8}/
?Regex.IsMatch("sdf", "(?=.+[a-zA-Z])[a-zA-Z0-9]{4,8}")
false
?Regex.IsMatch("sdfd", "(?=.+[a-zA-Z])[a-zA-Z0-9]{4,8}")
true
?Regex.IsMatch("1234", "(?=.+[a-zA-Z])[a-zA-Z0-9]{4,8}")
false
Warning on **.* and .+:
// At least one letter does not match with .*
?Regex.IsMatch("1111", "(?=.*[a-zA-Z])[a-zA-Z0-9]{4,8}")
false
?Regex.IsMatch("1aaa", "(?=.+[a-zA-Z])[a-zA-Z0-9]{4,8}")
true
[a-zA-Z0-9]{4,8}
The first part specifies alphanumeric, and the 2nd part specifies from 4 to 8 characters.
Try: [a-zA-Z0-9]{4,8}