repeat a group of characters - c#

I have the following input to be matched by a regex:
1.1.1.1
1.01.1.1
01.01.091.01
1.10.100.0010
So I have allways four groups consisting of digits. While the first three ones should match, the last one should not.
So I wrote this regex:
^(\d*[1-9]+\.){4}$
In general this regex should return all those strings where any of the digits in any of the groups is not followed by a zero. Or more easily: I want to not match all numbers with trailing zeros.
However this doesn´t match anything. regex1010.com tells this:
A repeated capturing group will only capture the last iteration. Put a
capturing group around the repeated group to capture all iterations or
use a non-capturing group instead if you're not interested in the data
But when I add a further capturing group I get the same message:
^((\d*[1-9]+\.)){4}$
The same applies to a non-capturing group:
^(?:\d*[1-9]+\.){4}$
Of course I could just write the same group four times, but that´s fairly clumsy and hard to read.

As mentioned by others the dot is the point, so we have three identical groups and one without the dot.
So this regex does it for me:
(?:\d*[1-9]\.){3}(?:\d*[1-9])

You never specify the dot in your patterns. What you ask for is, in fact, not a repetition of four, it is a specific single pattern of four numbers separated with dots.
^(\d*[1-9]+\.\d*[1-9]+\.\d*[1-9]+\.\d*[1-9]+)$
The only thing in there you could consider a repetition is the "number + dot" part, but then you repeat that three times and add another number. Then the regex would become this:
^((\d*[1-9]+\.){3}\d*[1-9]+)$
However, your third line contains a space at the end, so you may want to add extra checks to trim those off.

The problem with your regex is by not including the . your regex fails to find four matches of digits because they always have dots in between.'
Try this instead:
(?:(\d*[1-9])\.?){4}

Related

C# Regex Anomaly

I'm a bit perplexed here.
I have a Regex which is to limit decimal places to two points.
My second and third captures work as expected. But including the 1st capture ($1) corrupts the string and includes all the decimal places (I get the original string).
var t = "553.17765";
var from = #"(\d+)(\.*)(\d{0,2})";
var to = "$1$2$3";
var rd = Regex.Replace(t, from,to);
var r = Regex.Match(t, from);
Why can't I get the 553 in the $1 variable?
LinqPad
What is happening is that you are matching the number multiple times, once before the . and once after. You could work around that by looking for the longest match, but it seems you could improve your Regex instead
(\d+\.?\d{0,2})
Steps are as follows
The capture group covers the whole number at once.
Look for digits, greedy match.
Look for a decimal point, either one or none.
Look for zero to two digits
Furthermore, if you want to replace using Regex.Replace you need something to match the rest of the string.
text = Regex.Replace(text, #".+?(\d+\.?\d{0,2}).+", "$1");
dotnetfiddle
Your example does not work because it triggers twice per definition. The statement (\d+)(\.*)(\d{0,2}) will split the string 553.17765 as follows:
Match 1: 533.17
$1 = 553
$2 = .
$3 = 17
Replace 533.17 with 553.17
Match 2: 765
$1 = 765
Replace 765 with 765
The first match includes - as expected - only two of the decimal places. With this action, the match is complete and the regex starts looking for the next match, because Replace replaces all matches, not the first one only. As you can see, this regex does nothing by design.
The way replace works btw. is to find a match and replace the whole match with the replace pattern. So no need to include the surrounding text. The problem is, that your regex matches too well. It only matches the first two decimal places. Therefore the match only includes the first two decimal places.
That means that whatever you will replace that with, will only replace 553.17 and nothing more. For finding decimal numbers this is good. For replacing not so much, here you want to find the whole number with all decimal places and then replace it.
So a working replace regex would look like this: (\d+\.\d{1,2})\d*. First there is only one capture group, as we don't intend to change the order of numbers around. Second, the point is required as we are only interested to replace numbers that actually have decimal places. Same reason we need at least one, up to two, decimal places. Every decimal place after that is optional, but will be captured greedily to give the whole number to the match so it will be replaced completely.
Match 1: 533.17765
$1 = 533.17
Replace 533.17765 with 533.17
This regex does not handle thousands-separators btw, if that is required.

Is there a regular expression for matching a string that has no more than 2 repeating characters? [duplicate]

I want to match strings that do not contain more than 3 of the same character repeated in a row. So:
abaaaa [no match]
abawdasd [match]
abbbbasda [no match]
bbabbabba [match]
Yes, it would be much easier and neater to do a regex match for containing the consecutive characters, and then negate that in the code afterwards. However, in this case that is not possible.
I would like to open out the question to x consecutive characters so that it can be extended to the general case to make the question and answer more useful.
Negative lookahead is supported in this case.
Use a negative lookahead with back references:
^(?:(.)(?!\1\1))*$
See live demo using your examples.
(.) captures each character in group 1 and the negative look ahead asserts that the next 2 chars are not repeats of the captured character.
To match strings not containing a character repeated more than 3 times consecutively:
^((.)\2?(?!\2\2))+$
How it works:
^ Start of string
(
(.) Match any character (not a new line) and store it for back reference.
\2? Optionally match one more exact copies of that character.
(?! Make sure the upcoming character(s) is/are not the same character.
\2\2 Repeat '\2' for as many times as you need
)
)+ Do ad nauseam
$ End of string
So, the number of /2 in your whole expression will be the number of times you allow a character to be repeated consecutively, any more and you won't get a match.
E.g.
^((.)\2?(?!\2\2\2))+$ will match all strings that don't repeat a character more than 4 times in a row.
^((.)\2?(?!\2\2\2\2))+$ will match all strings that don't repeat a character more than 5 times in a row.
Please be aware this solution uses negative lookahead, but not all not all regex flavors support it.
I'm answering this question :
Is there a regular expression for matching a string that has no more than 2 repeating characters?
which was marked as an exact duplicate of this question.
Its much quicker to negate the match instead
if (!Regex.Match("hello world", #"(.)\1{2}").Success) Console.WriteLine("No dups");

Advanced Regex - Capture Whole Group of Complex Statement inside Replace

I'm working on a project, and I need to parse related data... the tools I work with is fully command based, and return all kind of stuff, so the regex come handy instead of guess that this line is that, and the other is this, ... so I need to parse this like:
1 QB 1283 /YR VC MC MO22AUG IFNTHR 2240 2335 100 0 S
which depending on the condition may appear on many shapes, but, this will work hopefully:
.*((/)?(?<Class>(\w{2}\s+)+)(\w{2}\d{2}\w{3})?\s+\w{6}).*
There is just an issue, I need to capture only this part:
YR VC MC and there's no guarantee that there's always three of them... I tried parentheses grouping, as well as naming as you can see, I don't know how to capture a group in C#, though I think it use the Regex->Replace and then replace the whole data with the selected group (in hear 'Class' group), but it only match the last part,.. of inner parentheses, not the whole of it. for example in the above line it will returns "MC" not three of them, i also tried to replace (\w{2}\s+)+) with (\w{2}\s+|\w{2}\s+\w{2}\s+|\w{2}\s+\w{2}\s+\w{2}\s+) but it didn't worked either.
Any one can help me with this matter?
Thank you.
Capture Groups
Let's back up a bit. First, we need to understand what capture groups are. Everything put within parenthesis will be a capturing group. So, for instance, the regex (\d)(\d) with the string 89 will capture 8 in the first group and 9 in the second group. Let's say you make the second digit optional, so (\d)(\d?). Now, if you try to match just 8, the first group will be 8, and the second group will just be an empty string. In this way, we can match all groups, even if some are 'missing'.
Non-Capture Groups
Your regular expression seems to have a ton of unnecessary capture groups. If you don't need it, don't use parenthesis. For example, for (/)?, you can simply remove the parenthesis. What if you want to match the string "123" ten times? You'd probably do something like (123){10}. But hey, that's another unneeded capture group! You can create a non-capture group by using (?:) instead of (). This way, you won't be capturing whatever is within the parenthesis, but you'll be effectively using the parentheses to your convenience.
Your Regex
Removing all unneccessary capture groups from your regex, we end up with:
.*/?(\w{2}\s+)+(?:\w{2}\d{2}\w{3})?\s+\w{6}.*.
Which includes the space within the capture group, so let's bring that out:
.*/?(\w{2})\s+(?:\w{2}\d{2}\w{3})?\s+\w{6}.*.
At this point, the capture group (\w{2}) only matches the MC in your sample string, so let's do what you did and split it off into three different capture groups. Note that we can't do something like (\w{2}){1,3} (which will match \w{2} one to three times), because this still only has one single set of parenthesis, so it only has one single capture group. As such, we will need to expand our (\w{2})\s+ to (\w{2})\s+(\w{2})\s+(\w{2})\s+. This regex will correctly capture your three strings.
Regex in C#
In C#, we have this handy Regex class in System.Text.RegularExpressions. This is how you would use it:
string regex = #".*/?(\w{2})\s+(\w{2})\s+(\w{2})\s+(?:\w{2}\d{2}\w{3})?\s+\w{6}.*";
string sample = "1 QB 1283 /YR VC MC MO22AUG IFNTHR 2240 2335 100 0 S";
Match matches = Regex.Match (sample, regex);
string[] stringGroups = matches.Groups
.Cast<Group> ()
.Select (el => el.Value)
.ToArray ();
Here, stringGroups will be a string array with all the capture groups. stringGroups[0] will be the entire match (so in this case, 1 QB 1283 /YR VC MC MO22AUG IFNTHR 2240 2335 100 0 S), stringGroups[1] will be the first capture group (YR in this case), stringGroups[2] the second, and stringGroups[3] the third.
PS: I highly recommend Debuggex for testing this type of stuff.
Make it un-greedy:
.*?((/)?(?<Class>(\w{2}\s+)+)(\w{2}\d{2}\w{3})?\s+\w{6}).*
^
Or remove both greedy dots from both ends. You don't need them:
/?(?<Class>(?:\w{2}\s+)+)(?:\w{2}\d{2}\w{3})?\s+\w{6}

regular expression not working: repeated strings of digits

I was trying to create a regular expression to find repeated strings of digits.
eg:
1 -not matching
11 -matching
122 -matching
1234 -not matching
what i used is \d+. Tutorial are telling
the "+" is similar to "*", except it requires at least one repetition.
But when i tried it is matching with any number. Any idea why?
Update
The tutorial i tried : http://www.codeproject.com/Articles/9099/The-Minute-Regex-Tutorial
The repetition constructs in Regular Expressions, +, *, {x}, do not repeat "what you found the first time around", they repeat "the pattern that finds things".
So this:
\d+
Will not find one digit, then match a sequence of that digit, instead it will first find one digit, then try to find another digit, then another, etc.
If you want it to repeat "what it found" you have to explicitly say so:
(\d)\1+
The \1 here says "I will match whatever is in the first group again", this regular expression should match sequences of the same digit, instead of sequences of digits.
^\d*(\d)\1+\d*$
You can use this.See demo.\d+ would match any intergers 1 or more time.You need to use \1 to find repeated digits.
https://regex101.com/r/hI0qP0/4
It works properly. \d+ is not a repetition of a specific digit, it is a repetition of one or more \d. \d+ will match 1 (one or more digit), 12 (one or more digit), 122 (one or more digit)... you see the idea. If you want to see two or more repetitions, you'd need to say \d\d+ or \d{2,} - but this, too, says that you want two or more digits, not two or more of a same digit. To say that, you need backreferences: (\d)\1+ is two or more of a same digit: a digit we remember, then one or more of that remembered thing.

RegEx : Find match based on 1st two chars

I am new to RegEx and thus have a question on RegEx. I am writing my code in C# and need to come up with a regex to find matching strings.
The possible combination of strings i get are,
XYZF44DT508755
ABZF44DT508755
PQZF44DT508755
So what i need to check is whether the string starts with XY or AB or PQ.
I came up with this one and it doesn't work.
^((XY|AB|PQ).){2}
Note: I don't want to use regular string StartsWith()
UPDATE:
Now if i want to try a new matching condition like this -
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
How to write the RegEx for that?
You can modify you expression to the following and use the IsMatch() method.
Regex.IsMatch(input, "^(?:XY|AB|PQ)")
The outer capturing group in conjuction with . (any single character) is trying to match a third character and then repeat the sequence twice because of the range quantifier {2} ...
According to your updated edit, you can simply place "ZF" after the grouping construct.
Regex.IsMatch(input, "^(?:XY|AB|PQ)ZF")
You want to test for just ^(XY|AB|PQ). Your RegEx means: Search for either XY, AB or PQ, then a random character, and repeat the whole sequence twice, for example "XYKPQL" would match your RegEx.
This is a screenshot of the matches on regex101:
^ forces the start of line,
(...) creates a matching group and
XY|AB|PQ matches either XY, AB or PQ.
If you want the next two characters to be ZF, just append ZF to the RegEx so it becomes ^(XY|AB|PQ)ZF.
Check out regex101, a great way to test your RegExes.
You were on the right track. ^(XY|AB|PQ) should match your string correctly.
The problem with ^((XY|AB|PQ).){2} is following the entire group with {2}. This means exactly 2 occurrences. That would be 2 occurrences of your first 2 characters, plus . (any single character), meaning this would match strings like XY_AB_. The _ could be anything.
It may have been your intention with the . to match a larger string. In this case you might try something along the lines of ^((XY|AB|PQ)\w*). The \w* will match 0 or more occurrences of "word characters", so this should match all of XYZF44DT508755 up to a space, line break, punctuation, etc., and not just the XY at the beginning.
There are some good tools out there for understanding regexes, one of my favorites is debuggex.
UPDATE
To answer your updated question:
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
The regex would be (assuming you want to match the entire "word").
^((XY|AB|PQ)ZF\w*)
Debuggex Demo

Categories

Resources