C# regex repeating group of the digit in pattern - c#

i use regex pattern
pattern = "ID\\d+.*?ID\\d+";
input="ID1...sometxt1...ID1...sometxt2...ID3...sometxt3...ID50"
input=Regex.Replace(input, pattern, "");
Console.WriteLine(input);
Output will = "...sometxt2..."
but i need Output
...sometxt2...ID3...sometxt3...ID50,
i need that regex find groups with equal digit after ID. ID3 != ID50, this group must remain, ID1==ID1 - this group must be replaced
Thank!

If you need to replace the whole substrings from ID having the same digits after them, you need to use a capturing group with a backreference:
var pattern = #"\bID(\d+).*?\bID\1\b";
See the regex demo
Explanation:
\bID - a whole word "ID"
(\d+) - one or more digits captured into Group 1
.*? - any characters but a newline, as few as possible up to the closest
\bID - whole word "ID" followed with....
\1 - backreference to the matched digits in Group 1
\b - followed with a word boundary (so that we do not match 10 if we have 1 in Group 1).
Note that you will need RegexOptions.Singleline modifier if you have newline characters in your input strings.
Also, do not forget to assign the replacement result to a variable:
var res = Regex.Replace(input, pattern, string.Empty);

Related

Get string after the last comma or the last number using Regex in C#

How can I get the string after the last comma or last number using regex for this examples:
"Flat 1, Asker Horse Sports", -- get string after "," result: "Asker
Horse Sports"
"9 Walkers Barn" -- get string after "9" result:
Walkers Barn
I need that regex to support both cases or to different regex rules, each / case.
I tried /,[^,]*$/ and (.*),[^,]*$ to get the strings after the last comma but no luck.
You can use
[^,\d\s][^,\d]*$
See the regex demo (and a .NET regex demo).
Details
[^,\d\s] - any char but a comma, digit and whitespace
[^,\d]* - any char but a comma and digit
$ - end of string.
In C#, you may also tell the regex engine to search for the match from the end of the string with the RegexOptions.RightToLeft option (to make regex matching more efficient. although it might not be necessary in this case if the input strings are short):
var output = Regex.Match(text, #"[^,\d\s][^,\d]*$", RegexOptions.RightToLeft)?.Value;
You were on the right track the capture group in (.*),[^,]*$, but the group should be the part that you are looking for.
If there has to be a comma or digit present, you could match until the last occurrence of either of them, and capture what follows in the capturing group.
^.*[\d,]\s*(.+)$
^ Start of string
.* Match any char except a newline 0+ times
[\d,] Match either , or a digit
\s* Match 0+ whitespace chars
(.+) Capture group 1, match any char except a newline 1+ times
$ End of string
.NET regex demo | C# demo

Regex: finding words that end with the same letter the next word begins with

I tried to get regex to work but couldn't (probably because i'm fairly new to regex).
Here's what i want to do:
Consider this text: One word, duel. Limes said bye.
Wanted matches: One word, duel. Limes said bye.
As mentioned previously in the title, i want to get consecutive words matched, one ending with (for example) with "t" and the other one starting with "t" as well, case insensitive.
The closest i got to the answer is with this expression [^a-z][a-z]*([a-z])[^a-z]+\1[a-z]*([a-z])[^a-z]+\2[a-z]*[^a-z]
You may use
(?i)\b(?<w>\p{L}+)(?:\P{L}+(?<w>(\p{L})(?<=\1\P{L}+\1)\p{L}*))+\b
See the regex demo. The results are in Group "w" capture collection.
Details
\b - a word boundary
(?<w>\p{L}+) - Group "w" (word): 1 or more BMP Unicode letters
(?:\P{L}+(?<w>(\p{L})(?<=\1\P{L}+\1)\p{L}*))+ - 1 or more repetitions of
\P{L}+ - 1 or more chars other than BMP Unicode letters
(?<w>(\p{L})(?<=\1\P{L}+\1)\p{L}*) - Group "w":
(\p{L}) - a letter captured into Group 1
(?<=\1\P{L}+\1) - immediately to the left of the current position, there must be the same letter as captured in Group 1, 1+ chars other than letters, and the letter in Group 1
\p{L}* - 0 or more letters
\b - a word boundary.
C# code demo:
var text = "One word, duel. Limes said bye.";
var pattern = #"\b(?<w>\p{L}+)(?:\P{L}+(?<w>(\p{L})(?<=\1\P{L}+\1)\p{L}*))+\b";
var result = Regex.Match(text, pattern, RegexOptions.IgnoreCase)?.Groups["w"].Captures
.Cast<Capture>()
.Select(x => x.Value);
Console.WriteLine(string.Join(", ", result)); // => word, duel, Limes, said
A C# demo version without using LINQ:
string text = "One word, duel. Limes said bye.";
string pattern = #"\b(?<w>\p{L}+)(?:\P{L}+(?<w>(\p{L})(?<=\1\P{L}+\1)\p{L}*))+\b";
Match result = Regex.Match(text, pattern, RegexOptions.IgnoreCase);
List<string> output = new List<string>();
if (result.Success)
{
foreach (Capture c in result.Groups["w"].Captures)
output.Add(c.Value);
}
Console.WriteLine(string.Join(", ", output));
If a word consists of at least 2 characters a-z, you might use 2 capturing groups with an alternation in a positive lookahead to check if the next word starts with the last char or if the previous word ended and the current word starts with the last char.
With case insensitive match enabled:
\b([a-z])[a-z]*([a-z])\b(?:(?=[,.]? \2)|(?<=\1 \1[a-z]+))
\b Word boundary
([a-z]) Capture group 1 Match a-z
[a-z]* Match 0+ times a-z in between
([a-z]) Capture group 2 Match a-z
\b Word boundary
(?: Non capturing group
(?= Positive lookahead, assert what is on the right is
[,.]? \2 an optional . or , space and what is captured in group 2
) Close lookahead
| Or
(?<= Positive lookbehind, assert what is on the left is
\1 \1[a-z]+ Match what is captured in group 1 and space and 1+ times a char a-z
) Close lookbehind
) Close non capturing group
Regex demo
Note that matching [a-zA-Z] is a small range for a word. You might use \w or \p{L} instead.

Regex to take first set after Space and want to remove $ with same regex

My input string:-
" $440,765.12 12-108(e)\n3 "
Output String i want as:-
"440,765.12"
I have tried with below regex and it's working but I am not able to remove $ with the same regex so anyone knows how to do the same task with below regex?
Regex rx = new Regex(#"^(.*? .*?) ");
var match = rx.Match(" $440,765.12 12-108(e)\n3 ");
var text = match.Groups[1].Value;
output after using above regex:-
$440,765.12
I know I can do the same task using string.replace function but I want to do the same with regex only.
You may use
var result = Regex.Match(s, #"\$(\d[\d,.]*)")?.Groups[1].Value;
See the regex demo:
Details
\$ - matches a $ char
(\d[\d,.]*) - captures into Group 1 ($1) a digit and then any 0 or more digits, , or . chars.
If you want a more "precise" pattern (just in case the match may appear within some irrelevant dots or commas), you may use
\$(\d{1,3}(?:,\d{3})*(?:\.\d+)?)
See this regex demo. Here, \d{1,3}(?:,\d{3})*(?:\.\d+)? matches 1, 2 or 3 digits followed with 0 or more repetitions of , and 3 digits, followed with an optional sequence of a . char and 1 or more digits.
Also, if there can be any currency symol other than $ replace \$ with \p{Sc} Unicode category that matches any currency symbol:
\p{Sc}(\d{1,3}(?:,\d{3})*(?:\.\d+)?)
See yet another regex demo.

Regular expression pattern for dates with different symbols and variable digit formats

I am trying to parse out some dates in a text field that could be in the following formats (note the text field has a bunch of other junk surrounding the dates):
//with dashes
10-10-16
1-5-16
10-1-16
1-10-16
//with periods
10.10.16
1.5.16
10.1.16
1.10.16
//with forward slashes
10/10/16
1/5/16
10/1/16
1/10/16
What I need is one pattern for all digit format scenarios. Here is what I tried:
//x.xx.xx
Regex reg1 = new Regex (#"\(?\d{1}\)?[-/.]? *\d{2}[-/.]? *[-/.]?\d{2}")
//xx.xx.xx
Regex reg2 = new Regex (#"\(?\d{2}\)?[-/.]? *\d{2}[-/.]? *[-/.]?\d{2}")
//x.x.xx
Regex reg3 = new Regex (#"\(?\d{1}\)?[-/.]? *\d{1}[-/.]? *[-/.]?\d{2}")
//xx.x.xx
Regex reg4 = new Regex (#"\(?\d{2}\)?[-/.]? *\d{1}[-/.]? *[-/.]?\d{2}")
I'm new to regular expressions, so I am looking for a single expression that will handle all these scenarios (ie., digit formats with single number and double digit numbers for -/. in between).
Is there one expression that could handle this?
Thanks,
I can suggest
Regex rx = new Regex(#"\(?(?<!\d)\d{1,2}\)?[-/.]?\d{1,2}[-/.]?\d{2}(?!\d)");
If your date separators are used consistently, use the backreference with a capturing group:
Regex rx = new Regex(#"\(?(?<!\d)\d{1,2}\)?([-/.])\d{1,2}\1\d{2}(?!\d)");
See the regex demo 1 and demo 2.
Details:
\(? - an optional (
(?<!\d) - there must be no digit before the current location
\d{1,2} - 1 or 2 digits
\)? - an optional )
[-/.]? - an optional -, /, or .
\d{1,2}[-/.]? - ibid.
\d{2} - 2 digits
(?!\d) - there must be no digit after the current location.
The version with a capturing group/backreference contains ([-/.]) - a capturing group with ID=1 that matches the first separator, and \1 is the backreference that matches the same text captured into Group 1 (making the second separator to be identical to the first one).
You can also try this: \d{1,2}([-./])\d{1,2}\1\d{2}
Regex regex = new Regex(#"\d{1,2}([-./])\d{1,2}\1\d{2}");
\d{1,2} between one and two digits
([-./]) any of ., - and /
\1 repeat this character another time to (prevent matching 1.1/01 or 1-1.01)
\d{2} matches two digits
try this
\d{1,2}(-|\.|\/)\d{1,2}(-|\.|\/)\d{1,2}

Regex matching recurring patterns

I want to validate input in a C# TextBox by using regular expressions. The expected input is in this format:
CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-C
So I've got six elements of five separated characters and one separated character at the end.
Now my regex matches any character between five and 255 chars: .{5,255}
How do I need to modify it in order to match the format mentioned above?
Update: -
If you want to match any character, then you can use: -
^(?:[a-zA-Z0-9]{5}-){6}[a-zA-Z0-9]$
Explanation: -
(?: // Non-capturing group
[a-zA-Z0-9]{5} // Match any character or digit of length 5
- // Followed by a `-`
){6} // Match the pattern 6 times (ABCD4-) -> 6 times
[a-zA-Z0-9] // At the end match any character or digit.
Note: - The below regex will only match pattern like you posted: -
CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-C
You can try this regex: -
^(?:([a-zA-Z0-9])\1{4}-){6}\1$
Explanation: -
(?: // Non-capturing group
( // First capture group
[a-zA-Z0-9] // Match any character or digit, and capture in group 1
)
\1{4} // Match the same character as in group 1 - 4 times
- // Followed by a `-`
){6} // Match the pattern 6 times (CCCCC-) -> 6 times
\1 // At the end match a single character.
Untested, but I think this will work:
([A-Za-z0-9]{5}-){6}[A-Za-z0-9]
For your example, in general replace C to the character class you want:
^(C{5}-){6}C$
^([a-z]{5}-){6}[a-z]$ # Just letter, use case insensitive modifier
^([a-z0-9]{5}-){6}[a-z0-9]$ # Letters and digits..
Try this:
^(C{5}-){6}C$
The ^ and $ denote the begiining and end of the string repectively and make sure that no additional characters are entered.

Categories

Resources