Match optional pattern preceeded by random content - c#

I need to capture two groups in the following sentence, one is I, the other is optional
I want to match random optional field.
I tried the following approach, but it's not yielding expected result:
(I).*?(optional)?
Removing the round patenthesis around optional can match correctly, but since I need the second match, I can't do so.
(I).*?optional?
So how can I match both groups correctly? thanks!

The trick with your regex is that you need to group (and discard) anything leading up to optional that doesn't match optional.
Use negative look-around (with a ?: prepended so that the group isn't used for capture):
(I)(?:(?!optional).)*(optional)?.*

You can try the following regex:
(I).*(optional)
the pair of brackets will return you the capturing groups.

Related

C# regex nth character not in list or string end

I'm trying to check if the 4th letter in a string is not s or S using the following regular expression.
Regex rx = new Regex(#"A[2-6][025][^sS].*");
In Addition I want corresponding three letter strings to match (e.g. "A30").
Unfortunately the Match check returns false.
Does someone know what I'm doing wrong and how I can alter my regex?
rx.Match(test).Success
This should do what you want:
^A[2-6][025](?:[^sS].*|)$
Note the non-capturing group part:
(?:[^sS].*|)
This matches a character that is not s or S, followed by any number of characters or an empty string.
Regex101
First you can check if there is an s or S at fourth character place with the following regex:
^...[sS]
At a second stage you want to check, if there is a combination of A and a number which can be solved with your approach:
A[2-6][025]

repeat a group of characters

I have the following input to be matched by a regex:
1.1.1.1
1.01.1.1
01.01.091.01
1.10.100.0010
So I have allways four groups consisting of digits. While the first three ones should match, the last one should not.
So I wrote this regex:
^(\d*[1-9]+\.){4}$
In general this regex should return all those strings where any of the digits in any of the groups is not followed by a zero. Or more easily: I want to not match all numbers with trailing zeros.
However this doesn´t match anything. regex1010.com tells this:
A repeated capturing group will only capture the last iteration. Put a
capturing group around the repeated group to capture all iterations or
use a non-capturing group instead if you're not interested in the data
But when I add a further capturing group I get the same message:
^((\d*[1-9]+\.)){4}$
The same applies to a non-capturing group:
^(?:\d*[1-9]+\.){4}$
Of course I could just write the same group four times, but that´s fairly clumsy and hard to read.
As mentioned by others the dot is the point, so we have three identical groups and one without the dot.
So this regex does it for me:
(?:\d*[1-9]\.){3}(?:\d*[1-9])
You never specify the dot in your patterns. What you ask for is, in fact, not a repetition of four, it is a specific single pattern of four numbers separated with dots.
^(\d*[1-9]+\.\d*[1-9]+\.\d*[1-9]+\.\d*[1-9]+)$
The only thing in there you could consider a repetition is the "number + dot" part, but then you repeat that three times and add another number. Then the regex would become this:
^((\d*[1-9]+\.){3}\d*[1-9]+)$
However, your third line contains a space at the end, so you may want to add extra checks to trim those off.
The problem with your regex is by not including the . your regex fails to find four matches of digits because they always have dots in between.'
Try this instead:
(?:(\d*[1-9])\.?){4}

Regular expression for specific combination of alphabets and numbers

I am trying to create regular expression for following type of strings:
combination of the prefix (XI/ YV/ XD/ YQ/ XZ), numerical digits only, and either no ‘Z’ or a ‘Z’ suffix.
For example, XD35Z should pass but XD01HW should not pass.
So far I tried following:
#"XD\d+Z?" - XD35Z passes but unfortunately it also works for XD01HW
#"XD\d+$Z" - XD01HW fails which is what I want but XD35Z also fails
I have also tried #"XD\d{1,}Z"? but it did not work
I need a single regex which will give me appropriate results for both types of strings.
Try this regex:
^(XI|YV|XD|YQ|XZ){1}\d+Z{0,1}$
I'm using quantifying braces to explicitly limit the allowed numbers of each character/group. And the ^ and $ anchors make sure that the regex matches only the whole line (string).
Broken into logical pieces this regex checks
^(XI|YV|XD|YQ|XZ){1} Starts with exactly one of the allowed prefixes
\d+ Is follow by one or more digits
Z{0,1}$ Ends with between 0 and 1 Z
You're misusing the $ which represents the end of the string in the Regex
It should be : #"^XD\d+Z?$" (notice that it appears at the end of the Regex, after the Z?)
The regex following the behaviour you want is:
^(XI|YV|XD|YQ|XZ)\d+Z?$
Explanation:
combination of the prefix (XI/ YV/ XD/ YQ/ XZ)
^(XI|YV|XD|YQ|XZ)
numerical digits only
\d+
‘Z’ or a ‘Z’ suffix
Z?$

Why does Regex.Match include noncapturing groups in the result?

In matching a regular expression, I want to exclude noncapturing groups from the result. I incorrectly assumed that they'd be excluded by default since, well, they're called noncapturing groups.
For some reason, though, Regex.Match behaves as though I hadn't even specified a noncapturing group. Try running this in the Immediate window:
System.Text.RegularExpressions.Regex.Match("b3a",#"(?:\d)\w").Value
I expected the result to be
"a"
but it's actually
"3a"
This question suggested I look at the Groups, but there is only one Group in the result and it too is "3a". It contains one Capture, also "3a".
What's going on here? Is Regex bugged, or is there an option I need to set?
Matching is not the same thing as capturing. (?:\d) simply means match a subpattern containing \d, but don't bother putting it in a capture group. Your entire pattern (?:\d)\w looks for a (?:\d) followed by a \w; it's functionally equivalent to \d\w.
If you're trying to match a \w only when it is preceded by a \d, use a lookbehind assertion instead:
System.Text.RegularExpressions.Regex.Match("b3a", #"(?<=\d)\w").Value
Non-capturing group means it does not make a group. Matching string are included in the resulting string.
If you want exclude that part, use something like lookbehind assertion.
#"(?<=\d)\w"
You are misunderstanding the purpose of noncapturing groups.
In general, groups (defined by a pair of parentheses ()) mean two things:
The contained regular expression is grouped, so any quantifiers after the brackets apply to the whole expression rather than just the previous single character.
The substring matching the group is stored as a subcapture in the Groups property.
Sometimes, you do not want the second result for certain groups, which is why noncapturing groups were introduced: They allow you to group a sub-expression without having any matches of it stored in an item in the Groups property.
You have observed that your Groups property contains one item, though - which is true, as by default, the first group is always the capture of the complete expression. cf. in the docs:
If the regular expression engine can find a match, the first element of the GroupCollection object returned by the Groups property contains a string that matches the entire regular expression pattern.
You can still use groups to achieve what you want, by placing the string you want to capture into a group:
\d(\w)
(I have left out the noncapturing group again as it does not change anything in your above expression.)
With this modified expression, the Groups property in your match should have 2 items:
The complete match (of \d\w)
Only the part of the above string you seem to be interested in, matched by \w

Regexp group matching one more character then group should?

I'm guessing that I'm confused about how groups work in regexp.
My regexp replaces more characters then it should.
Here is my string:
...test - Copy\asd.test2\asd.keke
And here is my pattern:
.?(asd\.)
It matches "\asd." but I want it to match just "asd."
What am I getting wrong here?
What are you trying to achieve using .? if you don't want to match it?
To check for characters outside of your match, you can use lookaround assertion. E.g. checking for a backslash before the match, you'd use
(?<=\\)asd\.

Categories

Resources