Regex: Find a string within different string variations - c#

I need to find a Regex that gets hold of the
81.03
part (varies, but always has the structure XX.XX) in following string variations:
Projects/75100/75120/75124/AR1/75124_AR1_HM2_81.03-testing-b405.tgz
Projects/75100/75130/75138/LM1/75138_LM1_HM2_81.03.tgz
I´ve come up with:
var regex = new Regex("(.*_)(.*?)-");
but this only matches up to the first example string whereas
var regex = new Regex("(.*_)(.*?)(.*\.)");
only matches the second string.
The path to the file constantly changes as does the "-testing..." postfix.
Any ideas to point me out in the right direction?

You can use
var result = Regex.Match(text, #".*_(\d+\.\d+)")?.Groups[1].Value;
Or, if the string can have more dot+number parts:
var result = Regex.Match(text, #".*_(\d+(?:\.\d+)+)")?.Groups[1].Value;
See the regex demo.
In general, the regex will extract dot-separated digit chunks after the last _.
Details
.* - any 0 or more chars other than a newline, as many as possible
_ - a _ char
(\d+(?:\.\d+)+) - Group 1: one or more digits followed with one or more occurrences of a dot followed with one or more digits
\d+\.\d+ - one or more digits, . and one or more digits.

To match the value 81.03 another option is to match the digits with optional decimal part after the last forward slash after the first underscore.
_(\d+(?:\.\d+)?)[^/\r\n]*$
Regex demo
Explanation
_ Match literally
(\d+(?:\.\d+)?) Capture group 1, match 1+ digits with an optional decimal part
[^/\r\n]* Match 0+ chars except / or a newline
$ End of string

Related

Regex match multiple digits after '-'

This seems like it should be easy, but I'm not so good with regex, and this doesn't seem to be easy to find on google.
I need a regex that starts with the string 'SP-multiple digits' and ends with the string '- multiple digits'
For example i have to match '-12' in "Sp-1234-12".
My attempt was: [^*-]*$ -> This case matches everything after the minus but i need the minus included.
For that digit and hyphen format, you could use a capture group for the part of the string that you want:
^Sp(?:-\d+)*(-\d+)$
Explanation
^ Start of string
Sp Match literally
(?:-\d+)* Optionally repeat - and 1+ digits
(-\d+) Capture group 1, match - and 1+ digits
$ End of string
Regex demo
Note that in C# you can use [0-9] instead of \d to match only digits 0-9

C# Regex match the last digit after the last underscore

Using regex I'm trying to get only the last digit (can be only 2 or 3) after the last underscore.
What I have right now is getting the digit and characters.
I need to cut off the characters and only get the digit [2-3].
Here is my example -- I need to get only 2 after the last underscore. Currently getting both digit and characters
ABC_0000_DEFG_1I_23_45_HIJKL2.pdf
The output I want -- 2 (after HIJKL).
^.*_\K[^.]+
If I get rid of ^ with \d, d{2-3}, ... it still gets HIJKL.
The regular expression
_[^_]*([2-3])[^_]*$
should do you. It matches:
_ — an underscore, followed by
[^_]* — zero or more characters other than underscore, followed by
([23]) — the decimal digits 2 or 3, followed by
[^_]* — zero or more characters other than underscore, followed by
$ — end-of-text
You will need to get match group #1:
var rx = new Regex(#"_[^_]*([2-3])[^_]*$");
var m = rx.Match("ABC_0000_DEFG_1I_23_45_HIJKL2.pdf");
var s = m.Success ? m.Groups(1) : null;
At which point, s should be "2".
You can use
_[^_]*(\d)[^_]*$
Which matches the last underscore, followed by a digit surrounded by anything but underscores.
You can use [23] instead of \d if you want to ignore anything other than 2 or 3.
To get a match only in .NET you might also use lookarounds:
(?<=_[^_]*)[23](?=[^_]*$)
The pattern matches:
(?<=_[^_]*) Positive lookbehind, assert _ followed by optional chars other than _
[23] Match either 2 or 3
(?=[^_]*$) Positive lookahead to assert no _ till the end of the string
See a .NET regex demo or a C# demo.
Example code
Regex regex = new Regex(#"(?<=_[^_]*)[23](?=[^_]*$)");
Match match = regex.Match("ABC_0000_DEFG_1I_23_45_HIJKL2.pdf");
if (match.Success)
{
Console.WriteLine(match.Value);
}
Output
2

Get string after the last comma or the last number using Regex in C#

How can I get the string after the last comma or last number using regex for this examples:
"Flat 1, Asker Horse Sports", -- get string after "," result: "Asker
Horse Sports"
"9 Walkers Barn" -- get string after "9" result:
Walkers Barn
I need that regex to support both cases or to different regex rules, each / case.
I tried /,[^,]*$/ and (.*),[^,]*$ to get the strings after the last comma but no luck.
You can use
[^,\d\s][^,\d]*$
See the regex demo (and a .NET regex demo).
Details
[^,\d\s] - any char but a comma, digit and whitespace
[^,\d]* - any char but a comma and digit
$ - end of string.
In C#, you may also tell the regex engine to search for the match from the end of the string with the RegexOptions.RightToLeft option (to make regex matching more efficient. although it might not be necessary in this case if the input strings are short):
var output = Regex.Match(text, #"[^,\d\s][^,\d]*$", RegexOptions.RightToLeft)?.Value;
You were on the right track the capture group in (.*),[^,]*$, but the group should be the part that you are looking for.
If there has to be a comma or digit present, you could match until the last occurrence of either of them, and capture what follows in the capturing group.
^.*[\d,]\s*(.+)$
^ Start of string
.* Match any char except a newline 0+ times
[\d,] Match either , or a digit
\s* Match 0+ whitespace chars
(.+) Capture group 1, match any char except a newline 1+ times
$ End of string
.NET regex demo | C# demo

Regex to take first set after Space and want to remove $ with same regex

My input string:-
" $440,765.12 12-108(e)\n3 "
Output String i want as:-
"440,765.12"
I have tried with below regex and it's working but I am not able to remove $ with the same regex so anyone knows how to do the same task with below regex?
Regex rx = new Regex(#"^(.*? .*?) ");
var match = rx.Match(" $440,765.12 12-108(e)\n3 ");
var text = match.Groups[1].Value;
output after using above regex:-
$440,765.12
I know I can do the same task using string.replace function but I want to do the same with regex only.
You may use
var result = Regex.Match(s, #"\$(\d[\d,.]*)")?.Groups[1].Value;
See the regex demo:
Details
\$ - matches a $ char
(\d[\d,.]*) - captures into Group 1 ($1) a digit and then any 0 or more digits, , or . chars.
If you want a more "precise" pattern (just in case the match may appear within some irrelevant dots or commas), you may use
\$(\d{1,3}(?:,\d{3})*(?:\.\d+)?)
See this regex demo. Here, \d{1,3}(?:,\d{3})*(?:\.\d+)? matches 1, 2 or 3 digits followed with 0 or more repetitions of , and 3 digits, followed with an optional sequence of a . char and 1 or more digits.
Also, if there can be any currency symol other than $ replace \$ with \p{Sc} Unicode category that matches any currency symbol:
\p{Sc}(\d{1,3}(?:,\d{3})*(?:\.\d+)?)
See yet another regex demo.

Regular expression for updating version number

I have a version numbers as given below.
020. 000. 1234. 43567 (please note the whitespace after the dot(.))
020,000,1234,43567
20.0.1234.43567
20,0,1234,43567
I want a regular expression for updating the numbers after last two dots(.) to for example 1298 and 45678 (any number)
020. 000. 1298. 43568 (please note the whitespace after the dot(.))
020,000,1298,45678
20.0.1298.45678
20,0,1298,45678
Thanks,
resultString = Regex.Replace(subjectString,
#"(\d+) # any number
([.,]\s*) # dot or comma, optional whitespace
(\d+) # etc.
([.,]\s*)
\d+
([.,]\s*)
\d+",
"$1$2$3${4}1298${5}43568", RegexOptions.IgnorePatternWhitespace);
Note the ${4} instead of $4 because otherwise the following 1 would be interpreted as belonging to the group number ($41).
Also note the difference between (\d+) and (\d)+. While both match 1234, the first one will capture 1234 into the group created by the parentheses. The second one will capture only 4 because the previous captures will be overwritten by the next.
To replace version with 1298 and 43568
var regex = new Regex(#"(?<=^(?:\d+[.,]\s*){2})\d+(?<seperator>[.,]\s*)\d+$");
regex.Replace(source, "1298${seperator}43568");
This is because
(?<=) doesn't includethe group in the match but requires it to exist before the match
^ match start of string followed by at least one digit
(?:\d+[.,]\s*) non capturing group, match at least one digit followed by a . or , followed by 0 or more spaces
{2} previous match should occur twice
\d+ first part of the capture, 1 or more digits
(?<seperator>[.,]\s*) get the seperator of a . or , followed by optional spaces into a named capture group called seperator
\d+ capture one or more digits
$ match end of string
in the replacement string you are just providing the replacement version and using ${seperator} to insert the original seperator.
If you are not bothered about preserving the seperator you can just do
var regex = new Regex(#"(?<=^(?:\d+[.,]\s*){2})\d+[.,]\s*\d+$");
regex.Replace(source, "1298.43568");

Categories

Resources