Regex pattern for validation - trying to condense repetitive patterns in regex - c#

I have to validate a "project code" string in C# - the string length can be anywhere between 5-10 characters. The only rules outside this are as follows:
First character can only be a letter or number
Middle characters if they exist can be letter, number or a period (.)
Last character can only be a letter or number
*Avoid more than one period in a row in the middle
I can validate the 5-10 characters restriction like this:
^(?=.{5,10}$)
And part 1 and part 3 like this:
[a-zA-Z0-9]{1}
The middle rule is looking like this:
[a-zA-Z0-9.]{0,8}
And if I put it all together I have this:
^(?=.{5,10}$)[a-zA-Z0-9]{1}[a-zA-Z0-9.]{0,8}[a-zA-Z0-9]{1}$
It works fine, but with all that nearly identical code, it seems it could be condensed somehow. Any ideas? Thanks!

You can make it a bit shorter by matching the middle part 3 to 8 times, and a single time with the outer parts (You don't need {1}). This eliminates the need for the 5,10 part of your code because 1+3+1=5 and 1+8+1=10.
^[a-zA-Z0-9][a-zA-Z0-9.]{3,8}[a-zA-Z0-9]$

You can use
(?i)^(?!.*[.]{2})[a-z0-9][a-z0-9.]{3,8}[a-z0-9]$
See demo
^[a-z0-9] - First character can only be a letter or number
[a-z0-9.]{3,8} - Middle characters if they exist can be letter, number or a period (.)
[a-z0-9]$ - Last character can only be a letter or number
^(?!.*[.]{2}) - *Avoid more than one period in a row in the middle
The (?i) inline modifier can be replaced with RegexOptions.IgnoreCase flag when used with the new Regex() initializer.

I'm going to piggy-back off Cyral's answer so +1 to him!
I'm using a case insensitive flag to get from [a-zA-Z0-9] to [a-z0-9]
^(?i)[a-z0-9][a-z0-9.]{3,8}[a-z0-9]$
Thanks folks!

Related

Regex groups expression not capturing content

I'm trying to create a large regex expression where the plan is to capture 6 groups.
Is gonna be used to parse some Android log that have the following format:
2020-03-10T14:09:13.3250000 VERB CallingClass 17503 20870 Whatever content: this log line had (etc)
The expression I've created so far is the following:
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(\w{+})\t(\d{5})\t(\d{5})\t(.*$)
The lines in this case are Tab separated, although the application that I'm developing will be dynamic to the point where this is not always the case, so regex I feel is still the best option even if heavier then performing a split.
Breaking down the groups in more detail from my though process:
Matches the date (I'm considering changing this to a x number of characters instead)
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})
Match a block of 4 characters
([A-Za-z]{4})
Match any number of characters until the next tab
(\w{+})
Match a block of 5 numbers 2 times
\t(\d{5})
At last, match everything else until the end of the line.
\t(.*$)
If I use a reduced expression to the following it works:
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(.*$)
This doesn't include 3 of the groups, the word and the 2 numbers blocks.
Any idea why is this?
Thank you.
The problem is \w{+} is going to match a word character followed by one or more { characters and then a final } character. If you want one or more word characters then just use plus without the curly braces (which are meant for specifying a specific number or number range, but will match literal curly braces if they do not adhere to that format).
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(\w+)\t(\d{5})\t(\d{5})\t(.*$)
I highly recommend using https://regex101.com/ for the explanation to see if your expression matches up with what you want spelled out in words. However for testing for use in C# you should use something else like http://regexstorm.net/tester

Search for question mark with Regex in C# [duplicate]

I have a regex that I thought was working correctly until now. I need to match on an optional character. It may be there or it may not.
Here are two strings. The top string is matched while the lower is not. The absence of a single letter in the lower string is what is making it fail.
I'd like to get the single letter after the starting 5 digits if it's there and if not, continue getting the rest of the string. This letter can be A-Z.
If I remove ([A-Z]{1}) +.*? + from the regex, it will match everything I need except the letter but it's kind of important.
20000 K Q511195DREWBT E00078748521
30000 K601220PLOPOH Z00054878524
Here is the regex I'm using.
/^([0-9]{5})+.*? ([A-Z]{1}) +.*? +([A-Z]{1})([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) +([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})/
Use
[A-Z]?
to make the letter optional. {1} is redundant. (Of course you could also write [A-Z]{0,1} which would mean the same, but that's what the ? is there for.)
You could improve your regex to
^([0-9]{5})+\s+([A-Z]?)\s+([A-Z])([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})
And, since in most regex dialects, \d is the same as [0-9]:
^(\d{5})+\s+([A-Z]?)\s+([A-Z])(\d{3})(\d{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])\d{3}(\d{4})(\d{2})(\d{2})
But: do you really need 11 separate capturing groups? And if so, why don't you capture the fourth-to-last group of digits?
You can make the single letter optional by adding a ? after it as:
([A-Z]{1}?)
The quantifier {1} is redundant so you can drop it.
You have to mark the single letter as optional too:
([A-Z]{1})? +.*? +
or make the whole part optional
(([A-Z]{1}) +.*? +)?
You also could use simpler regex designed for your case like (.*)\/(([^\?\n\r])*) where $2 match what you want.
here is the regex for password which will require a minimum of 8 characters including a number and lower and upper case letter and optional sepecial charactor
/((?=.\d)(?=.[a-z])(?=.*[A-Z])(?![~##$%^&*_-+=`|{}:;!.?"()[]]).{8,25})/
/((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?![~##\$%\^&\*_\-\+=`|{}:;!\.\?\"()\[\]]).{8,25})/

Insert - after every second Character

so I'm developimg a Game with Unity 3D using C#. As first step the user has to enter his personal Code, which consists of 5 pairs, where each pairs has 2 characters/numbers (Im validating the characters & numbers separately). Now what I'm trying to achieve is that after every second character there should appear a minus, like you have after every 4th number, when you enter your credit-card number.
Example: 27-05-AB-CD-EF
So now I tried to use a Regular Expression and its working for the first two letters, but somehow the Regex does see the minus as a character too, and then it adds a minus infinitely often. I tried different versions, where i thought that i just allow letters and numbers, but somehow that doesn't work.
Regex.Replace(codeText, "([A-Za-z0-9][^-]){2}", "$0-");
Any guess what might be doing wrong?
Try with this expression "([A-Za-z0-9]){2}(?!-)" where (?!-) is a Zero-width negative lookahead assertion which in case you don't know is an expression that is matched but isn't part of the match value. So this expression matches two characters that aren't followed by -.
https://www.regular-expressions.info/lookaround.html read this page for more information

repeat a group of characters

I have the following input to be matched by a regex:
1.1.1.1
1.01.1.1
01.01.091.01
1.10.100.0010
So I have allways four groups consisting of digits. While the first three ones should match, the last one should not.
So I wrote this regex:
^(\d*[1-9]+\.){4}$
In general this regex should return all those strings where any of the digits in any of the groups is not followed by a zero. Or more easily: I want to not match all numbers with trailing zeros.
However this doesn´t match anything. regex1010.com tells this:
A repeated capturing group will only capture the last iteration. Put a
capturing group around the repeated group to capture all iterations or
use a non-capturing group instead if you're not interested in the data
But when I add a further capturing group I get the same message:
^((\d*[1-9]+\.)){4}$
The same applies to a non-capturing group:
^(?:\d*[1-9]+\.){4}$
Of course I could just write the same group four times, but that´s fairly clumsy and hard to read.
As mentioned by others the dot is the point, so we have three identical groups and one without the dot.
So this regex does it for me:
(?:\d*[1-9]\.){3}(?:\d*[1-9])
You never specify the dot in your patterns. What you ask for is, in fact, not a repetition of four, it is a specific single pattern of four numbers separated with dots.
^(\d*[1-9]+\.\d*[1-9]+\.\d*[1-9]+\.\d*[1-9]+)$
The only thing in there you could consider a repetition is the "number + dot" part, but then you repeat that three times and add another number. Then the regex would become this:
^((\d*[1-9]+\.){3}\d*[1-9]+)$
However, your third line contains a space at the end, so you may want to add extra checks to trim those off.
The problem with your regex is by not including the . your regex fails to find four matches of digits because they always have dots in between.'
Try this instead:
(?:(\d*[1-9])\.?){4}

regex patterns for seemingly simple task

Using C#
I need to create 8 regex patterns to check a bunch of arbitrary-length strings (one per line in a text file). I'm trying to write the patterns as strings, then convert them to Pattern later.
The ones I believe I have:
1 . All numbers (eg 128329983928): "^[0-9]{string.length}$"
2 . All lowercase (eg aejksanikp): "^[a-z]{string.length}$"
3 . All uppercase (eg AIJDJWIHJMNQ): "^[A-Z]{string.length}$"
The ones I need help with:
4 . All lowercase, all one letter (eg aaaaaaaaaaaaaaa)
5 . All uppercase, all one letter (eg AAAAAAAAAAAAAAA)
6 . Any case, one letter (eg AAaaaAaaAAAaAAaa)
7 . Any number plus any single letter, any case: (eg 1420A843aa830a3237A)
8 . Any single number (eg 22222222222222)
For numbers 4, 5, and 8, I could just do a bunch of | (or), but I was hoping there was a better way than "a|b|c|d|e...{string.length}". I really have no idea what to do for 6 and 7.
4). This looks from the start (^) to end ($) of a string (or line, if m modifier). It captures one alphabetical character (([a-z])), and then looks for that captured character 0+ times (\1*). Demo.
^([a-z])\1*$
5). Do the same thing, but initially capture the group [A-Z]. Demo
6). You can either use [a-Z], [A-z], [a-zA-Z], or just use the i modifier to make it case insensitive. Demo (with modifier).
7). This one is a little more tricky. We still use the anchors ^ and $ like before. But now we look for 0+ digits and capture an alphabetical character (this means the letter can come first or after some number of digits). After this, we look for either a digit or the captured letter 0+ times until the end. It sounds like a letter is required; however, if you want it to be optional, you can put a ? after the captured letter (([a-z])?). Remember to make this one case-insensitive with the i modifier or replace the capture group with [a-zA-Z]. Demo.
^\d*([a-z])(\d|\1)*$
8). Replace [a-z] in example 1 with either [0-9] or \d. Demo.
References:
Shorthand Character Classes (i.e. \d)
Grouping and Capturing (we used this to capture the character and reference it with \1)
NOTE: Since this seemed like homework, please ask if you have questions on how things worked beyond my explanations so you learn a thing or two about expressions :)
4 . All lowercase, all one letter (eg aaaaaaaaaaaaaaa)
5 . All uppercase, all one letter (eg AAAAAAAAAAAAAAA)
Since these two will be:
"^[a]{string.length}$"
"^[A]{string.length}$"
you should able able to determine the rest (Hint: [] represents a set)
Edit: Changed the patterns to cater for the fact the input is "one per line in a text file" and the regexp needs to match the whole line.
ok
ok
ok
^([a-z])\1{string.length-1}$
^([A-Z])\1{string.length-1}$
^(?i)([a-z])\1{string.length-1}$
^([a-zA-Z0-9]){string.length-1}$ unless it must start with a number then you'd go
for: ^[0-9][a-zA-Z0-9]{string.length-1}$
^([0-9])\1{string.length-1}
Note that:
^ means beginning of string and $ end of string. This prevents
partial matches.
(?i) means the start of the case insensitive search.

Categories

Resources