RegEx : Find match based on 1st two chars - c#

I am new to RegEx and thus have a question on RegEx. I am writing my code in C# and need to come up with a regex to find matching strings.
The possible combination of strings i get are,
XYZF44DT508755
ABZF44DT508755
PQZF44DT508755
So what i need to check is whether the string starts with XY or AB or PQ.
I came up with this one and it doesn't work.
^((XY|AB|PQ).){2}
Note: I don't want to use regular string StartsWith()
UPDATE:
Now if i want to try a new matching condition like this -
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
How to write the RegEx for that?

You can modify you expression to the following and use the IsMatch() method.
Regex.IsMatch(input, "^(?:XY|AB|PQ)")
The outer capturing group in conjuction with . (any single character) is trying to match a third character and then repeat the sequence twice because of the range quantifier {2} ...
According to your updated edit, you can simply place "ZF" after the grouping construct.
Regex.IsMatch(input, "^(?:XY|AB|PQ)ZF")

You want to test for just ^(XY|AB|PQ). Your RegEx means: Search for either XY, AB or PQ, then a random character, and repeat the whole sequence twice, for example "XYKPQL" would match your RegEx.
This is a screenshot of the matches on regex101:
^ forces the start of line,
(...) creates a matching group and
XY|AB|PQ matches either XY, AB or PQ.
If you want the next two characters to be ZF, just append ZF to the RegEx so it becomes ^(XY|AB|PQ)ZF.
Check out regex101, a great way to test your RegExes.

You were on the right track. ^(XY|AB|PQ) should match your string correctly.
The problem with ^((XY|AB|PQ).){2} is following the entire group with {2}. This means exactly 2 occurrences. That would be 2 occurrences of your first 2 characters, plus . (any single character), meaning this would match strings like XY_AB_. The _ could be anything.
It may have been your intention with the . to match a larger string. In this case you might try something along the lines of ^((XY|AB|PQ)\w*). The \w* will match 0 or more occurrences of "word characters", so this should match all of XYZF44DT508755 up to a space, line break, punctuation, etc., and not just the XY at the beginning.
There are some good tools out there for understanding regexes, one of my favorites is debuggex.
UPDATE
To answer your updated question:
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
The regex would be (assuming you want to match the entire "word").
^((XY|AB|PQ)ZF\w*)
Debuggex Demo

Related

C# regex nth character not in list or string end

I'm trying to check if the 4th letter in a string is not s or S using the following regular expression.
Regex rx = new Regex(#"A[2-6][025][^sS].*");
In Addition I want corresponding three letter strings to match (e.g. "A30").
Unfortunately the Match check returns false.
Does someone know what I'm doing wrong and how I can alter my regex?
rx.Match(test).Success
This should do what you want:
^A[2-6][025](?:[^sS].*|)$
Note the non-capturing group part:
(?:[^sS].*|)
This matches a character that is not s or S, followed by any number of characters or an empty string.
Regex101
First you can check if there is an s or S at fourth character place with the following regex:
^...[sS]
At a second stage you want to check, if there is a combination of A and a number which can be solved with your approach:
A[2-6][025]

Is there a regular expression for matching a string that has no more than 2 repeating characters? [duplicate]

I want to match strings that do not contain more than 3 of the same character repeated in a row. So:
abaaaa [no match]
abawdasd [match]
abbbbasda [no match]
bbabbabba [match]
Yes, it would be much easier and neater to do a regex match for containing the consecutive characters, and then negate that in the code afterwards. However, in this case that is not possible.
I would like to open out the question to x consecutive characters so that it can be extended to the general case to make the question and answer more useful.
Negative lookahead is supported in this case.
Use a negative lookahead with back references:
^(?:(.)(?!\1\1))*$
See live demo using your examples.
(.) captures each character in group 1 and the negative look ahead asserts that the next 2 chars are not repeats of the captured character.
To match strings not containing a character repeated more than 3 times consecutively:
^((.)\2?(?!\2\2))+$
How it works:
^ Start of string
(
(.) Match any character (not a new line) and store it for back reference.
\2? Optionally match one more exact copies of that character.
(?! Make sure the upcoming character(s) is/are not the same character.
\2\2 Repeat '\2' for as many times as you need
)
)+ Do ad nauseam
$ End of string
So, the number of /2 in your whole expression will be the number of times you allow a character to be repeated consecutively, any more and you won't get a match.
E.g.
^((.)\2?(?!\2\2\2))+$ will match all strings that don't repeat a character more than 4 times in a row.
^((.)\2?(?!\2\2\2\2))+$ will match all strings that don't repeat a character more than 5 times in a row.
Please be aware this solution uses negative lookahead, but not all not all regex flavors support it.
I'm answering this question :
Is there a regular expression for matching a string that has no more than 2 repeating characters?
which was marked as an exact duplicate of this question.
Its much quicker to negate the match instead
if (!Regex.Match("hello world", #"(.)\1{2}").Success) Console.WriteLine("No dups");

Failing to recognize both keywords

Let me know if I'm asking the question in a wrong way. Not sure if I'm approaching it from the right angle.
My regex looks like this.
^.+(ef?)|(mn?).+$
I'm trying to match line 2 and 4 in the text below.
abcd
efgh
ijkl
mnop
qrst
As it seem, only the last one catches the editors eye. What am I missing?
I've tried to follow some examples for detecting e.g. "ALPHA" and "BETA" words but, apparently, I'm too ignorant of how it works.
regex engine would split the below regex into two parts.
^.+(ef?)|(mn?).+$
Part 1| Part 2
At-first, part1 will be executed.
^.+(ef?)
.+ ensures that there must be atleast a single character present before e, but there isn't. So it fails to match the second one. And fails for all the others because there isn't a character e present in the remaining strings.
| OR
Now the regex engine moves to the second part,
(mn?).+$
Matches the string which contains the letter m. m is present only in the fourth string. So it matches the m plus the following one or more characters because of .+.
The correct approach to match the 2 and 4th strings is:
^.*(ef?).*$|^.*(mn?).*$
OR
^.*(?:(ef?)|(mn?)).*$
DEMO
Use ^.*(?:(ef?)|(mn?)).+$, if there must be a character follows e and an optional f or m and an optional n
If you want to match the strings starts with e or m, then use the below regex.
^(ef?|mn?).+$
Note:
.* matches any character zero or more times.
.+ matches any character one or more times.

regular expression greedy on left side only (.net)

I am trying to capture matches between two strings.
For example, I am looking for all text that appears between Q and XYZ, using the "soonest" match (not continuing to expand outwards). This string:
circus Q hello there Q SOMETEXT XYZ today is the day XYZ okay XYZ
Should return:
Q SOMETEXT XYZ
But instead, it returns:
Q hello there Q SOMETEXT XYZ
Here is the expression I'm using:
Q.*?XYZ
It's going too far back to the left. It's working fine on the ride side when I use the question mark after the asterisk. How can I do the same for the left side, and stop once I hit that first left Q, making it work the same as the right side works? I've tried question marks and other symbols from http://msdn.microsoft.com/en-us/library/az24scfc.aspx, but there's something I'm just not figuring out.
I'm a regex novice, so any help on this would be appreciated!
Well, the non Greedy match is working - it gets the shortest string that satisfies the regex. The thing that you have to remember is that regex is a left to right process. So it matches the first Q, then gets the shortest number of characters followed by an XYZ. If you want it not to go past any Qs, you have to use a negated character class:
Q[^Q]*?XYZ
[^Q] matches any one character that is not a Q. Mind that this will only work for a single character. If your opening delimeter is multiple characters, you have to do it a different way. Why? Well, take the delimiter 'PQR' and the string is
foo PQR bar XYZ
If you try to use the regex from before, but you extended the character class to :
PQR[^PQR]*?XYZ
then you'll get
'PQR bar XYZ'
As you expected. But if your string is
foo PQR Party Time! XYZ
You'll get no matches. It's because [] delineates a "character class" - which matches exactly one character. Using these classes, you can match a range of characters, simply by listing them.
th[ae]n
will match both 'than' and 'then', but not 'thin'. Placing a carat ('^') at the beginning negates the class - meaning "match anything but these characters" - so by turning our one-character delimiter into [^PQR], rather than saying "not 'PQR'", you're saying "not 'P', 'Q', or 'R'". You can still use this if you want, but only if you're 100% sure that the characters from your delimiter will only be in your delimiter. If that's the case, it's faster to use greedy matching and only negate the first character of your delimiter. The regex for that would be:
PQR[^P]*XYZ
But, if you can't make that guarantee, then match with:
PQR(?:.(?!PQR))*?XYZ
Regex doesn't directly support negative string matching (because it's impossible to define, when you think about it), so you have to use a negative lookahead.
(?!PQR)
is just such a lookahead. It means "Assert that the next few characters are not this internal regex", without matching any characters, so
.(?!PQR)
matches any character not followed by PQR. Wrap that in a group so that you can lazily repeat it,
(.(?!PQR))*?
and you have a match for "string that doesn't contain my delimiter". The only thing I did was add a ?: to make it a non-capturing group.
(?:.(?!PQR))*?
Depending on the language you use to parse your regex, it may try to pass back every matched group individually (useful for find and replace). This keeps it from doing that.
Happy regexing!
The concept of greediness only works on the right side.
To make the expression only match from the last Q before XYZ, make it not match Q between them:
Q[^Q]*?XYZ

A regular expression for matching a simple word in C#?

i need a regular expression to match only the word's that match the following conditions. I am using it in my C# program
Can be any case
Should not have any numbers
may contain - and ' characters, but are optional
Should start with a letter
I have tried using the expression ^([a-zA-Z][\'\-]?)+$ but it doesn't work.
Here are list of few words that are acceptable
London (Case insensitive)
Jackson's
non-profit
Here are a list of few words that are not acceptable
12london (contains a number and is not started by a alphabet)
-to (does not start with a alphabet)
to: (contains : character, any special character other that - and ' is not allowed)
^[a-zA-Z][-'a-zA-Z]*$
This matches any word that starts with an alphabetical character, followed by any number of alphabetical characters, - or '.
Note that you don't need to escape the - and ' when it's inside the character [] class, as long as the dash is either the first or last character in the sequence.
Note also that I've removed the round brackets from your example - if you don't want to capture the input, you'll get better performance by leaving them out.
Try this one:
^[A-Za-z]+[A-Za-z'-]*$
First of all, try your regexes against tools such as http://www.regextester.com/
You are testing strings that both start with AND end with your pattern (^ means start of line, $ is the end), thus leaving out all of the words contained between two spaces.
You should use \b or \B.
Instead of looking for [a-zA-Z] you can use character classes such as '\D' (not digit).
Let me know if the above is working in your scenario.
\b\D[^\c][a-zA-Z]+[^\c]
It says: word boundaries with no digits, no control characters, one or more alphabetical lower or uppercase character, with no following control characters.

Categories

Resources