Regex matching recurring patterns

Regex matching recurring patterns - c#

I want to validate input in a C# TextBox by using regular expressions. The expected input is in this format:
CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-C
So I've got six elements of five separated characters and one separated character at the end.
Now my regex matches any character between five and 255 chars: .{5,255}
How do I need to modify it in order to match the format mentioned above?

Update: -
If you want to match any character, then you can use: -
^(?:[a-zA-Z0-9]{5}-){6}[a-zA-Z0-9]$
Explanation: -
(?: // Non-capturing group
[a-zA-Z0-9]{5} // Match any character or digit of length 5
- // Followed by a `-`
){6} // Match the pattern 6 times (ABCD4-) -> 6 times
[a-zA-Z0-9] // At the end match any character or digit.
Note: - The below regex will only match pattern like you posted: -
CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-C
You can try this regex: -
^(?:([a-zA-Z0-9])\1{4}-){6}\1$
Explanation: -
(?: // Non-capturing group
( // First capture group
[a-zA-Z0-9] // Match any character or digit, and capture in group 1
)
\1{4} // Match the same character as in group 1 - 4 times
- // Followed by a `-`
){6} // Match the pattern 6 times (CCCCC-) -> 6 times
\1 // At the end match a single character.

Untested, but I think this will work:
([A-Za-z0-9]{5}-){6}[A-Za-z0-9]

For your example, in general replace C to the character class you want:
^(C{5}-){6}C$
^([a-z]{5}-){6}[a-z]$ # Just letter, use case insensitive modifier
^([a-z0-9]{5}-){6}[a-z0-9]$ # Letters and digits..

Try this:
^(C{5}-){6}C$
The ^ and $ denote the begiining and end of the string repectively and make sure that no additional characters are entered.

Related

Regex C# - optional group in the middle

I have such source text with an optional group in the middle:
GH22-O0-TFS-SFSD 00-1-006.19135
GH22-O0-TFS-SFSD 00-1-006.1.19135
Desired value in the first case will be '19135' and in the second '1.19135'.
Regex has to match the whole string and select all characters after first "." - which is my Group 1. I tried to make subgroups and mark Group 3 as optional but it is not working.
Regex:
.*\.0*(([0-9])(\.0*([0-9]+)))
How it should be changed to capture desired values?

This should work for you:
.*?\.(.*)
This will match the whole string and include everything after the first period in capture group 1 regardless of character type.

You can use
^(.*?)\.0*(\d+)(?:\.0*(\d+))?$
See the regex demo. Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than an LF char as few as possoble (as *? is a lazy quantifier)
\. - a dot
0* - zero or more zeros
(\d+) - Group 2: any one or more digits
(?:\.0*(\d+))? - an optional occurrence of ., zero or more zeros, then Group 3 capturing one or more digits
$ - end of string.

I hope I understand your goals and this should work:
.*?\.([\d.]+)
.*?\. - loosely capture everything leading up to the first period
([\d.]+) - capture the remaining digits and periods into capture group #1
https://regex101.com/r/0t9Ijy/1

Regex start new match at specific pattern

Hello im kinda new to regex and have a small, maybe simple question.
I have the given text:
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)
matches only till sleeping but reates 3 matches correctly.
But i need the Additional test text also in the second group.
i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*) but now i have only one huge match because the second group takes everything until the end.
How can i match everything until a new line with a date starts and create a new match from there on?

If you are sure there is only one additional line to be matched you can use
(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)
See the regex demo. Details:
(?m) - a multiline modifier
^ - start of a line
(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}) - Group 1: a datetime string
\s* - zero or more whitespaces
(.*(?:\n.*)?) - Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible.
If there can be any amount of lines, you may consider
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)
See this regex demo. Here,
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2}) - matches the same as above, just \s is replaced with [\p{Zs}\t] that only matches horizontal whitespace
[\p{Zs}\t]* - 0+ horizontal whitespace chars
(?s) - now, . will match any chars including a newline
(.*?) - Group 2: any zero or more chars, as few as possible
(?=\n\d{2}\.\d{2}\.\d{4}|\z) - up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string.

You are using \s repeatedly using the * quantifier with the character class [,.:\w\s]* and \s also matches newlines and will match too much.
You can just match the rest of the line using (.*\r?\n.*) which would not match a newline, then match a newline and the next line in the same group.
^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)
Regex demo
If multiple lines can follow, match all following lines that do not start with a date like pattern.
^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)
Explanation
^ Start of the string
( Capture group1
\d{2}\.\d{2}\.\d{4} Match a date like pattern
) Close group 1
\s* Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]*)
( Capture group 2
.* Match the whole line
(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)* Optionally repeat matching the whole line if it does not start with a date like pattern
) Close group 2
Regex demo

C# regex string that is not another string

I want to match an at least 3 letter word, preceded by any character from class [-_ :] any amount of times, that is not this specific 3 letter word string2.
Ex:
if string2="VER"
in
" ODO VER7"
matched " ODO"
or
"_::ATTPQ VER7"
matched "_::ATTPQ"
but if
" VER7"
it shoudn't match " VER"
so I thought about
Regex.Match(inputString, #"[-_:]*[A-Z]{3,}[^(VER)]", RegexOptions.IgnoreCase);
where
[-_:]* checks for any character in class, appearing 0 or more times
[A-Z] the range of letters that could form the word
{3,} the minimum amount of letters to form the word
[^(VER)] the grouping construct that shouldn't appear
I believe however that [A-Z]{3,} results in any letter at least 3 times (not what i want)
and [^(VER)] not sure what it's doing

Using [^(VER)] means a negated character class where you would match any character except ( ) V E or R
For you example data, you could match 0+ spaces or tabs (or use \s to also match a newline).
Then use a negative lookahead before matching 3 or more times A-Z to assert what is on the right is not VER.
If that is the case, match 3 or more times A-Z followed by a space and VER itself.
^[ \t]*[-_:]*(?!VER)[A-Z]{3,} VER
Regex demo

^\s*[-_:]*(?!VER)[A-Z]{3,}
This regex asserts that between the start and end of the string, there's zero or more of your characters, followed by at least 3 letters. It uses a negative lookahead to make sure that VER (or whatever you want) is not present.
Demo

This would match the preceding class characters [-_ :] of 3 or more letters/numbers
that do not start with VER (as in the samples given) :
[-_ :]+(?!VER)[^\W_]{3,}
https://regex101.com/r/wLw23I/1

Regular expression for updating version number

I have a version numbers as given below.
020. 000. 1234. 43567 (please note the whitespace after the dot(.))
020,000,1234,43567
20.0.1234.43567
20,0,1234,43567
I want a regular expression for updating the numbers after last two dots(.) to for example 1298 and 45678 (any number)
020. 000. 1298. 43568 (please note the whitespace after the dot(.))
020,000,1298,45678
20.0.1298.45678
20,0,1298,45678
Thanks,

resultString = Regex.Replace(subjectString,
#"(\d+) # any number
([.,]\s*) # dot or comma, optional whitespace
(\d+) # etc.
([.,]\s*)
\d+
([.,]\s*)
\d+",
"$1$2$3${4}1298${5}43568", RegexOptions.IgnorePatternWhitespace);
Note the ${4} instead of $4 because otherwise the following 1 would be interpreted as belonging to the group number ($41).
Also note the difference between (\d+) and (\d)+. While both match 1234, the first one will capture 1234 into the group created by the parentheses. The second one will capture only 4 because the previous captures will be overwritten by the next.

To replace version with 1298 and 43568
var regex = new Regex(#"(?<=^(?:\d+[.,]\s*){2})\d+(?<seperator>[.,]\s*)\d+$");
regex.Replace(source, "1298${seperator}43568");
This is because
(?<=) doesn't includethe group in the match but requires it to exist before the match
^ match start of string followed by at least one digit
(?:\d+[.,]\s*) non capturing group, match at least one digit followed by a . or , followed by 0 or more spaces
{2} previous match should occur twice
\d+ first part of the capture, 1 or more digits
(?<seperator>[.,]\s*) get the seperator of a . or , followed by optional spaces into a named capture group called seperator
\d+ capture one or more digits
$ match end of string
in the replacement string you are just providing the replacement version and using ${seperator} to insert the original seperator.
If you are not bothered about preserving the seperator you can just do
var regex = new Regex(#"(?<=^(?:\d+[.,]\s*){2})\d+[.,]\s*\d+$");
regex.Replace(source, "1298.43568");

How to make this regex allow spaces c#

I have a phone number field with the following regex:
[RegularExpression(#"^[0-9]{10,10}$")]
This checks input is exactly 10 numeric characters, how should I change this regex to allow spaces to make all the following examples validate
1234567890
12 34567890
123 456 7890
cheers!

This works:
^(?:\s*\d\s*){10,10}$
Explanation:
^ - start line
(?: - start noncapturing group
\s* - any spaces
\d - a digit
\s* - any spaces
) - end noncapturing group
{10,10} - repeat exactly 10 times
$ - end line
This way of constructing this regex is also fairly extensible in case you will have to ignore any other characters.

Use this:
^([\s]*\d){10}\s*$
I cheated :) I just modified this regex here:
Regular expression to count number of commas in a string
I tested. It works fine for me.

Use this simple regex
var matches = Regex.Matches(inputString, #"([\s\d]{10})");
EDIT
var matches = Regex.Matches(inputString, #"^((?:\s*\d){10})$");
explain:
^ the beginning of the string
(?: ){10} group, but do not capture (10 times):
\s* whitespace (0 or more times, matching the most amount possible)
\d digits (0-9)
$ before an optional \n, and the end of the string

Depending on your problem, you might consider using a Match Evaluator delegate, as described in http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.matchevaluator.aspx
That would make short work of the issue of counting digits and/or spaces

Something like this i think ^\d{2}\s?\d\s?\d{3}\s?\d{4}$
There are variants : 10 digits or 2 digits space 8 digits or 3 digits space 3 digits space 4 digits.
But if you want only this 3 variants use something like this
^(?:\d{10})|(?:\d{2}\s\d{8})|(?:\d{3}\s\d{3}\s\d{4})$

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex matching recurring patterns - c#

Untested, but I think this will work: ([A-Za-z0-9]{5}-){6}[A-Za-z0-9]

For your example, in general replace C to the character class you want: ^(C{5}-){6}C$ ^([a-z]{5}-){6}[a-z]$ # Just letter, use case insensitive modifier ^([a-z0-9]{5}-){6}[a-z0-9]$ # Letters and digits..

Try this: ^(C{5}-){6}C$ The ^ and $ denote the begiining and end of the string repectively and make sure that no additional characters are entered.

Related

Regex C# - optional group in the middle

Regex start new match at specific pattern

C# regex string that is not another string

Regular expression for updating version number

How to make this regex allow spaces c#

Categories

Resources