Regex Match multiple occurences with numbers in string C# - c#

I've been searching for my problem answer, but couldn't find so I write here.
I want to take a string example: = "37513220102304920105590"
and find all matches for numbers of length 11 which starts 3 or 4.
I have been trying to do so:
string input = "37513220102304920105590"
var regex = new Regex("^[3-4][0-9]{10}$");
var matches = regex.Matches(trxPurpose);
// I expect it to have 3 occurances "37513220102", "32201023049" and "30492010559"
// But my matches are empty.
foreach (Match match in matches)
{
var number = match.Value;
// do stuff
}
My question is: Is my regex bad or I do something wrong with mathing?

Use capturing inside a positive lookahead, and you need to remove anchors, too. Note the - between 3 and 4 is redundant.
(?=([34][0-9]{10}))
See the regex demo.
In C#, since the values are captured, you need to collect .Groups[1].Value contents, see C# code:
var s = "37513220102304920105590";
var result = Regex.Matches(s, #"(?=([34][0-9]{10}))")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();

Related

Regex to get string between number and underscore C# [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
i'm tryng make a regex to get the string between some number and underscore, for example:
I have CP_01Ags_v5, so I need a regex to match just Ags. another example could be CP_13Hgo_v5 and match Hgo.
Some idea?
Based off the examples and matches you are describing. You want something along the lines of.
[0-9]+(.*)[_]
to break it down.
The regex looking for any number that shows up one or more times then matches everything after the number(s) up until the [_] underscore.
The downfall is this assumes the examples you provided are similar. If your example is
CP_13Hgo_v5asdf_
then it will match
Hgo_v5asdf
if you have other possible findings then you want the non-greedy version of this regex.
[0-9]+(.*?)[_]
this will cause two groups to be found in this example
CP_13Hgo_v5asdf_
will find the following groups:
Hgo
and
asdf
You can use look-arounds to match just the string between the digits and the underscore e.g.
(?<=\d)[A-Za-z]+(?=_)
Demo on regex101
In C# (note the need to escape the \ in the regex):
String s = #"CP_01Ags_v5 CP_13Hgo_v5";
Match m = Regex.Match(s, "(?<=\\d)[A-Za-z]+(?=_)");
while (m.Success) {
Console.WriteLine(m.Value);
m = m.NextMatch();
}
Output
Ags
Hgo
If your string is always at least two characters and there are no other strings of at least two characters, then you can apply the following:
var text = "CP_01Ags_v5";
var x = Regex.Match(text, #"(?<!^)[A-Za-z]{2,}");
Use Regex Group:
(?<leftPart>_\d{2})(?<YourTarget>[a-zA-Z])(?<rightPart>_[a-zA-Z0-9]{2})
C#:
Regex re = new Regex(#"(?<leftPart>_\d{2})(?<YourTarget>[a-zA-Z])(?<rightPart>_[a-zA-Z0-9]{2})");
/*
* Loop
* To get value of group you want
*/
foreach (Match item in re.Matches("CP_01Ags_v5 CP_13Hgo_v5,"))
{
Console.WriteLine(" Match: " + item.ToString());
Console.WriteLine(" Your Target you want: " + item.Groups["YourTarget"]);
}

Regex: Give priority to optional pattern

Let's say I have a string like this:
555 3553 666 555
And a regex like this
var pat = new Regex("3?553?");
When the string above is matched pat.Match(mystring) the result returned will be "55".
I need the result returned to be "3553" if possible, and if not, then only then I want the result to be "55". As in: The 3? is optional and doesn't have to be there, but if it is it will always be matched first.
So this 555 3553 666 555 will return 3553
And this 222 5555 777 will return 55
Is this possible to achieve without using two separate regex definitions?
Thank you.
Regex engines always go through the string from left to right (assuming a left-to-right script). In your case, the first two characters match the regex, therefore it returns.
So, instead of stopping after the first match, you need to do all the matches and choose the longest one. However, there is a caveat: Regex matches can't overlap (every character can be matched only once). Therefore, in a string like
55553553
your regex would return 55, 553, and 553.
The solution is to use a lookahead assertion, combined with a capturing group:
var pat = new Regex("(?=(3?553?))", "g");
and get all its matches
var match = pat.exec(subject);
while (match != null) {
// matched text: match[1], add that to an array
}
match = pat.exec(subject);
}
Then choose the longest match.
I think you want to use a priority over matches, if yes! I think below code can help you:
var matches = Regex.Matches(txt, #"(?<G1>3553)|(?<G2>55)").OfType<Match>();
var res = matches
.GroupBy(x => x.Success)
.Select(x =>
new {
Success = x.Key,
G = !string.IsNullOrEmpty(x.Max(w => w.Groups["G1"].Value))
? x.Max(w => w.Groups["G1"].Value)
: x.Max(w => w.Groups["G2"].Value)
})
.SingleOrDefault();
C# Demo
Your regex matches 55 simply because that was the first match it can find. There is nothing to do with priorities.
I think what you want here is to get the longest match. You should use Matches to get all the matches and get the longest one by checking Length.
var matches = Regex.Matches("555 3553 666 555", "3?553?");
var longestMatch = matches.Cast<Match>().OrderByDescending(x => x.Value.Length).First().Value

In Perl you use brackets to extract your matches what is the equivalent of that in c#

For instance in Perl I can do
$x=~/$(\d+)\s/ which is basically saying from variable x find any number preceded by $ sign and followed by any white space character. Now $1 is equal to the number.
In C# I tried
Regex regex = new Regex(#"$(\d+)\s");
if (regex.IsMatch(text))
{
// need to access matched number here?
}
First off, your regex there $(\d+)\s actually means: find a number after the end of the string. It can never match. You have to escape the $ since it's a metacharacter.
Anyway, the equivalent C# for this is:
var match = Regex.Match(text, #"\$(\d+)\s");
if (match.Success)
{
var number = match.Groups[1].Value;
// ...
}
And, for better maintainability, groups can be named:
var match = Regex.Match(text, #"\$(?<number>\d+)\s");
if (match.Success)
{
var number = match.Groups["number"].Value;
// ...
}
And in this particular case you don't even have to use groups in the first place:
var match = Regex.Match(text, #"(?<=\$)\d+(?=\s)");
if (match.Success)
{
var number = match.Value;
// ...
}
To get a matched result, use Match instead of IsMatch.
var regex = new Regex("^[^#]*#(?<domain>.*)$");
// accessible via
regex.Match("foo#domain.com").Groups["domain"]
// or use an index
regex.Match("foo#domain.com").Matches[0]
Use the Match method instead of IsMatch and you need to escape $ to match it literally because it is a character of special meaning meaning "end of string".
Match m = Regex.Match(s, #"\$(\d+)\s");
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}

Regex to find and replace a year in a string

I have a string in my c#:
The.Big.Bang.Theory.(2013).S07E05.Release.mp4
I need to find an occurance of (2013), and replace the whole thing, including the brackets, with _ (Three underscores). So the output would be:
The.Big.Bang.Theory._.S07E05.Release.mp4
Is there a regex that can do this? Or is there a better method?
I then do some processing on the new string - but later, need to report that '(2013)' was removed .. so I need to store the value that is replaced.
Tried with your string. It works
string pattern = #"\(\d{4}\)";
string search = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var m = Regex.Replace(search, pattern, "___");
Console.WriteLine(m);
This will find any 4 digits number enclosed in open/close brakets.
If the year number can change, I think that Regex is the best approach .
Instead this code will tell you if there a match for your pattern
var k = Regex.Matches(search, pattern);
if(k.Count > 0)
Console.WriteLine(k[0].Value);
Many of these answers forgot the original question in that you wanted to know what you are replacing.
string pattern = #"\((19|20)\d{2}\)";
string search = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
string replaced = Regex.Match(search, pattern).Captures[0].ToString();
string output = Regex.Replace(search, pattern, "___");
Console.WriteLine("found: {0} output: {1}",replaced,output);
gives you the output
found: (2013) output: The.Big.Bang.Theory.___.S07E05.Release.mp4
Here is an explanation of my pattern too.
\( -- match the (
(19|20) -- match the numbers 19 or 20. I assume this is a date for TV shows or movies from 1900 to now.
\d{2} -- match 2 more digits
\) -- match )
Here is a working snippet from a console application, note the regex \(\d{4}\):
var r = new System.Text.RegularExpressions.Regex(#"\(\d{4}\)");
var s = r.Replace("The.Big.Bang.Theory.(2013).S07E05.Release.mp4", "___");
Console.WriteLine(s);
and the output from the console application:
The.Big.Bang.Theory.___.S07E05.Release.mp4
and you can reference this Rubular for proof.
Below is a modified solution taking into consideration your additional requirement:
var m = r.Match("The.Big.Bang.Theory.(2013).S07E05.Release.mp4");
if (m.Success)
{
var s = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4".Replace(m.Value, "___");
var valueReplaced = m.Value;
}
Try this:
string s = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var info = Regex.Split(
Regex.Matches(s, #"\(.*?\)")
.Cast<Match>().First().ToString(), #"[\s,]+");
s = s.Replace(info[0], "___");
Result
The.Big.Bang.Theory.___.S07E05.Release.mp4
try this :
string str="The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var matches = Regex.Matches(str, #"\([0-9]{4}\)");
List<string> removed=new List<string>();
if (matches.Count > 0)
{
for (int i = 0; i < matches.Count; i++)
{
List.add(matches.value);
}
}
str=Regex.replace(str,#"\([0-9]{4}\)","___");
System.out.println("Removed Strings are:")
foreach(string s in removed )
{
System.out.println(s);
}
output:
Removed Strings are:
(2013)
You don't need a regex for a simple replace (you can use one, but's it's not needed)
var name = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var replacedName = name.Replace("(2013)", "___");

Extracting Numbers from String RegEx

I am really struggling with Regular Expressions and can't seem to extract the number from this string
"id":143331539043251,
I've tried with this ... but I'm getting compilation errors
var regex = new Regex(#""id:"\d+,");
Note that the full string contains other numbers I don't want. I want numbers between id: and the ending ,
Try this code:
var match = Regex.Match(input, #"\""id\"":(?<num>\d+)");
var yourNumber = match.Groups["num"].Value;
Then use extracted number yourNumber as a string or parse it to number type.
If all you need is the digits, just match on that:
[0-9]+
Note that I am not using \d as that would match on any digit (such as Arabic numerals) in the .NET regex engine.
Update, following comments on the question and on this answer - the following regex will match the pattern and place the matched numbers in a capturing group:
#"""id"":([0-9]+),"
Used as:
Regex.Match(#"""id"":143331539043251,", #"""id"":([0-9]+),").Groups[1].Value
Which returns 143331539043251.
If you are open to using LINQ try the following (c#):
string stringVariable = "123cccccbb---556876---==";
var f = (from a in stringVariable.ToCharArray() where Char.IsDigit(a) == true select a);
var number = String.Join("", f);

Categories

Resources