C# RegEx to match specific strings - c#

I need to match (using regex) strings that can be like this:
required: custodian_{number 1 - 9}_{fieldType either txt or ssn}
optional: _{fieldLength 1-999}
So for example:
custodian_1_ssn_1 is valid
custodian_1_ssn_1_255 is valid
custodian or custodian_ or custodian_1 or custodian_1_ or custodian_1_ssn or custodian_1_ssn_ or custodian_1_ssn_1_ are not valid
Currently I am working with this:
(?:custodian|signer)_[1-9]?[0-9]_(?:txt|ssn)_[1-9][0-9]?(_[1-9]?[0-9]?[0-9]?)?
as my regex and my api is working to pick up:
custodian_1_txt_1
custodian_1_ssn_1
custodian_1_txt_1_255 <---- not matching the last "5"
any thoughts?

You may use pattern:
^custodian(?:_[a-z0-9]+)+$
^ Assert position beginning of line.
custodian Match literal substring custodian.
(?:_[a-z0-9]+)+ Non capturing group. Multiple sequence of _ followed by alphanumerics.
$ Assert position end of line.
You can check the correct matches here.
Obviously you can modify the pattern to add substring signer in non capturing group as:
^(?:custodian|signer)(?:_[a-z0-9]+)+$.

I suggest using \d for numbers not yours and this is my code try it:-
(?:custodian|signer)_[1-9]?[0-9]_(?:txt|ssn)_[1-9][0-9]?(_[1-9]?\d*)?
I just added a \d value to the end of your pattern to match all end digits before another match.

You could use an anchor to assert the start ^ and the end $ of the string and for the last part make at least the first 1-9 not optional or else it would match and underscore at the end:
^(?:custodian|signer)_[1-9]?[0-9]_(?:txt|ssn)_[1-9][0-9]?(_[1-9][0-9]?[0-9]?)?$

If you're only interested in the last digits, this super generic regex will do:
(?:.+)_(\d+)
If you do need to match the whole string, this worked:
^(?:custodian|signer)_\d+_(?:txt|ssn)(?:_\d+)?_(\d+)$

Related

Why is Regex.Replace giving me weird result for last group

A simple example:
Regex.Replace("12345678910999999999", #"(\d{3})(\d{3})(\d{3})(\d{2})", "$1-$2-$3 $4")
This outputs to:
123-456-789 10999999999
But why? I have specifically set the group index i need. And that group index contains the exact value (checked in debugger).
Here is a fiddle:
https://dotnetfiddle.net/dkAPx3
Match the rest of the string with .* to truncate it:
Regex.Replace("12345678910999999999", #"^(\d{3})(\d{3})(\d{3})(\d{2}).*", "$1-$2-$3 $4")
I'd also add ^ at the start to match the beginning of the string.
See the .NET regex demo.
Your regex has matched and replaced only "first" part of string, add .* to the end of the pattern:
Regex.Replace("12345678910999999999", #"(\d{3})(\d{3})(\d{3})(\d{2}).*", "$1-$2-$3 $4"); // results in "123-456-789 10"

Regex - I need to include non-alphabetic character

i have this expression and i need to make sure to include at least one non-alphabetic character
^(?!.*(.)\1)\S{8,12}$
testhis invalid
testhis7 valid
testhis# valid
You could use a positive lookahead asserting at least 1 char other than a-zA-Z
^(?!.*(.)\1)(?=.*[^\sa-zA-Z])\S{8,12}$
Explanation
^ Start of string
(?!.*(.)\1) Assert not 2 consecutive chars
(?=.*[^\sa-zA-Z]) Assert 1 char other than a whitespace char and a-zA-Z
\S{8,12} Match 8-12 non whitespace chars
$ End of string
Regex demo
Another option is to use \P{L} to assert any char other than any kind of letter from any language
^(?!.*(.)\1)(?=.*\P{L})\S{8,12}$
Regex demo
You can just check for the special character (as matched by [\p{P}\p{S}]) in positive lookahead (?=.*[\p{P}\p{S}]), which gives you the regex:
^(?!.*(.)\1)(?=.*[\p{P}\p{S}])\S{8,12}$
See online demo
You can also replace [\p{P}\p{S}] by [!"\#$%&'()*+,\-./:;<=>?#\[\\\]^_‘{|}~], or any other character set that list all the characters that you want to count as being "special characters".
It's better to do it with separate if-statements. This way you'll have exact information what is missing in the value. With regexps you'll only get a true/false result if the value matched the pattern or not - you'll have no information WHAT is missing in the value.
For example:
if(!value.Any(c => !char.IsLetter(c)){
throw new Exception("value must contain at least one non-letter")
}

RegEx for ABC_XYZ_PPQRST-AA

I need to create a regex to test below kind of data,
xxx_yyy_zzz-aaa
I am able to verify the first two _ underscore, but unable to append the - hyphens.
#"[a-zA-Z0-9]_[a-zA-Z0-9]_[a-zA-Z0-9]s/[^-][a-zA-Z0-9]"
I am using c#. the number of characters above are just for an example
The xxx_yyy_zzz-aaa string implies that the format is {alphanum}_{alphanum}_{alphanum}-{alphanum}. The pattern for the {alphanum} part has already been written by you.
Next, you want to quantify each alphanumeric part since just [A-Za-z0-9] matches a single alphanum char. Use + to match 1 or more occurrences, or {3} to match only 3, or {3,} to match 3 or more.
That is not all, since you expect the whole string to match the pattern. Hence, you need anchors, ^ to match the start of string and $ (or \z) to match the end of string.
Thus, I'd recommend
#"^[a-zA-Z0-9]+_[a-zA-Z0-9]+_[a-zA-Z0-9]+-[a-zA-Z0-9]+\z"
See the regex demo.

RegEx : Find match based on 1st two chars

I am new to RegEx and thus have a question on RegEx. I am writing my code in C# and need to come up with a regex to find matching strings.
The possible combination of strings i get are,
XYZF44DT508755
ABZF44DT508755
PQZF44DT508755
So what i need to check is whether the string starts with XY or AB or PQ.
I came up with this one and it doesn't work.
^((XY|AB|PQ).){2}
Note: I don't want to use regular string StartsWith()
UPDATE:
Now if i want to try a new matching condition like this -
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
How to write the RegEx for that?
You can modify you expression to the following and use the IsMatch() method.
Regex.IsMatch(input, "^(?:XY|AB|PQ)")
The outer capturing group in conjuction with . (any single character) is trying to match a third character and then repeat the sequence twice because of the range quantifier {2} ...
According to your updated edit, you can simply place "ZF" after the grouping construct.
Regex.IsMatch(input, "^(?:XY|AB|PQ)ZF")
You want to test for just ^(XY|AB|PQ). Your RegEx means: Search for either XY, AB or PQ, then a random character, and repeat the whole sequence twice, for example "XYKPQL" would match your RegEx.
This is a screenshot of the matches on regex101:
^ forces the start of line,
(...) creates a matching group and
XY|AB|PQ matches either XY, AB or PQ.
If you want the next two characters to be ZF, just append ZF to the RegEx so it becomes ^(XY|AB|PQ)ZF.
Check out regex101, a great way to test your RegExes.
You were on the right track. ^(XY|AB|PQ) should match your string correctly.
The problem with ^((XY|AB|PQ).){2} is following the entire group with {2}. This means exactly 2 occurrences. That would be 2 occurrences of your first 2 characters, plus . (any single character), meaning this would match strings like XY_AB_. The _ could be anything.
It may have been your intention with the . to match a larger string. In this case you might try something along the lines of ^((XY|AB|PQ)\w*). The \w* will match 0 or more occurrences of "word characters", so this should match all of XYZF44DT508755 up to a space, line break, punctuation, etc., and not just the XY at the beginning.
There are some good tools out there for understanding regexes, one of my favorites is debuggex.
UPDATE
To answer your updated question:
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
The regex would be (assuming you want to match the entire "word").
^((XY|AB|PQ)ZF\w*)
Debuggex Demo

Regex for special case

I need to create a regex expression for the following scenario.
It can have only numbers and only one dot or comma.
First part can have one to three digits.
The second part can be a dot or a comma.
The third part can have one to two digits.
The valid scenarios are
123,12
123.12
123,1
123
12,12
12.12
1,12
1.12
1,1
1.1
1
I came up so far with this expression
\d{1,3}(?:[.,]\d{1,2})?
but it doesn't work well. For example the input is 11:11 is marked as valid.
You need to put anchors around your expression:
^\d{1,3}(?:[.,]\d{1,2})?$
^ will match the start of the string
$ will match the end of the string
If those anchors are missing, it will partially match on your string, since the last part is optional, means on "11:11" it can match on the digits before the colon and a second match will be on the digits after the colon.
Try to use ^ and $:
^\d{1,3}(?:[.,]\d{1,2})?$
^ The match must start at the beginning of the string or line.
$ The match must occur at the end of the string or before \n at the end of the line or string.

Categories

Resources