This little challenge just screams regular expressions to me, but so far I am stumped.
I have an arbitrary string that contains two numbers embedded in it. I need to extract those two numbers, which will be n and m digits long (n,m are unknown in advance). The format of the string is always
FixedWord[n digits]anotherfixedword[m digits]alotmorestuffontheend
The first number is of the format 1.2.3.4 (the number of digits varying) eg 5.3.20 or 5.3.10.1 or 5.4.
and the second is a simpler 'm' digits (eg 25 or 2)
eg "AppName5.2.6dbVer44Oracle.Group"
It shouts 'pattern matching' and hence "extraction using regexes". Can anyone guide me further?
TIA
The following pattern:
(\d+(?>\.\d+)*)\w+?(\d+)
Will match this:
AppName5.2.6dbVer44Oracle.Group
\__________/ <-- match
\___/ \/ <-- captures
Demo
And will capture the two values you're interested in in capture groups.
Use it like this:
var match = Regex.Match(input, #"(\d+(?>\.\d+)*)\w+?(\d+)");
if (match.Success)
{
var first = match.Groups[1].Value;
var second = match.Groups[2].Value;
// ...
}
Pattern explanation:
( # Start of group 1
\d+ # a series of digits
(?> # start of atomic group
\.\d+ # dot followed by digits
)* # .. 0 to n times
)
\w+? # some word characters (as few as possible)
(\d+) # a series of digits captured in group 2
Try this:
\w*?([\d|\.]+)\w*?([\d{1,4}]+).*
You could start from the following:
^[a-zA-Z]+((?:\d+\.)+\d)[a-zA-Z]+(\d+).*$
I assumed that the fixed words are just made of letters and that you want to match the entire string. If you prefer, you could substitute the parts not in parentheses with the actual fixed words or change the character sets as desired. I recommend using a tool like https://regex101.com to fine-tune the expression.
Keep it basic by specifing a match ( ) by looking for a digit \d, then zero or more * digits or periods in a set [\d.] (the set is \d -or- a literal period):
var data = "AppName5.2.6dbVer44Oracle.Group";
var pattern = #"(\d[\d.]*)";
// Outputs:
// 5.2.6
// 44
Console.WriteLine (Regex.Matches(data, pattern)
.OfType<Match>()
.Select (mt => mt.Groups[1].Value));
Each match will be a number within the sentence. So if the total set of numbers change, the pattern will not fail and dutifully report 1 to N numbers.
Simply look for the numbers, since you only care for the numbers and don't want to check the syntax of the whole input string.
Matches matches = Regex.Matches(input, #"\d+(\.\d+)*");
if (matches.Count >= 2) {
string number1 = matches[0].Value;
string number2 = matches[1].Value;
} else {
// Less than two numbers found
}
The expression \d+(\.\d+)* means:
\d+ one or more digits.
( )* repeat zero, one or more times.
\.\d+ one decimal point (escaped with \) followed by one or more digits.
and
\d one digit.
( ) grouping.
+ repeat the expression to the left one or more times.
* repeat the expression to the left zero, one or more times.
\ escapes characters that have a special meaning in regex.
. any character (without escaping).
\. period character (".").
Related
I want to match an 8 digit number. Currently, I have the following regex but It is failing in some cases.
(\d+)\1{6}
It matches only when a number is different at the end such as 44444445 or 54444444. However, I am looking to match cases where at least 7 digits are the same regardless of their position.
It is failing in cases like
44454444
44544444
44444544
What modification is needed here?
It's probably a bad idea to use this in a performance-sensitive location, but you can use a capture reference to achieve this.
The Regex you need is as follows:
(\d)(?:.*?\1){6}
Breaking it down:
(\d) Capture group of any single digit
.*? means match any character, zero or more times, lazily
\1 means match the first capture group
We enclose that in a non-capturing group {?:
And add a quantifier {6} to match six times
You can sort the digits before matching
string input = "44444445 54444444 44454444 44544444 44444544";
string[] numbers = input.Split(' ');
foreach (var number in numbers)
{
number = String.Concat(str.OrderBy(c => c));
if (Regex.IsMatch(number, #"(\d+)\1{6}"))
// do something
}
Still not a good idea to use regex for this though
The pattern that you tried (\d+)\1{6} matches 6 of the same digits in a row. If you want to stretch the match over multiple same digits, you have to match optional digits in between.
Note that in .NET \d matches more digits than 0-9 only.
If you want to match only digits 0-9 using C# without matching other characters in between the digits:
([0-9])(?:[0-9]*?\1){6}
The pattern matches:
([0-9]) Capture group 1
(?: Non capture group
[0-9]*?\1 Match optional digits 0-9 and a backreference to group 1
){6} Close non capture group and repeat 6 times
See a .NET Regex demo
If you want to match only 8 digits, you can use a positive lookahead (?= to assert 8 digits and word boundaries \b
\b(?=\d{8}\b)[0-9]*([0-9])(?:[0-9]*?\1){6}\d*\b
See another .NET Regex demo
I am trying to parse out some dates in a text field that could be in the following formats (note the text field has a bunch of other junk surrounding the dates):
//with dashes
10-10-16
1-5-16
10-1-16
1-10-16
//with periods
10.10.16
1.5.16
10.1.16
1.10.16
//with forward slashes
10/10/16
1/5/16
10/1/16
1/10/16
What I need is one pattern for all digit format scenarios. Here is what I tried:
//x.xx.xx
Regex reg1 = new Regex (#"\(?\d{1}\)?[-/.]? *\d{2}[-/.]? *[-/.]?\d{2}")
//xx.xx.xx
Regex reg2 = new Regex (#"\(?\d{2}\)?[-/.]? *\d{2}[-/.]? *[-/.]?\d{2}")
//x.x.xx
Regex reg3 = new Regex (#"\(?\d{1}\)?[-/.]? *\d{1}[-/.]? *[-/.]?\d{2}")
//xx.x.xx
Regex reg4 = new Regex (#"\(?\d{2}\)?[-/.]? *\d{1}[-/.]? *[-/.]?\d{2}")
I'm new to regular expressions, so I am looking for a single expression that will handle all these scenarios (ie., digit formats with single number and double digit numbers for -/. in between).
Is there one expression that could handle this?
Thanks,
I can suggest
Regex rx = new Regex(#"\(?(?<!\d)\d{1,2}\)?[-/.]?\d{1,2}[-/.]?\d{2}(?!\d)");
If your date separators are used consistently, use the backreference with a capturing group:
Regex rx = new Regex(#"\(?(?<!\d)\d{1,2}\)?([-/.])\d{1,2}\1\d{2}(?!\d)");
See the regex demo 1 and demo 2.
Details:
\(? - an optional (
(?<!\d) - there must be no digit before the current location
\d{1,2} - 1 or 2 digits
\)? - an optional )
[-/.]? - an optional -, /, or .
\d{1,2}[-/.]? - ibid.
\d{2} - 2 digits
(?!\d) - there must be no digit after the current location.
The version with a capturing group/backreference contains ([-/.]) - a capturing group with ID=1 that matches the first separator, and \1 is the backreference that matches the same text captured into Group 1 (making the second separator to be identical to the first one).
You can also try this: \d{1,2}([-./])\d{1,2}\1\d{2}
Regex regex = new Regex(#"\d{1,2}([-./])\d{1,2}\1\d{2}");
\d{1,2} between one and two digits
([-./]) any of ., - and /
\1 repeat this character another time to (prevent matching 1.1/01 or 1-1.01)
\d{2} matches two digits
try this
\d{1,2}(-|\.|\/)\d{1,2}(-|\.|\/)\d{1,2}
I have a version numbers as given below.
020. 000. 1234. 43567 (please note the whitespace after the dot(.))
020,000,1234,43567
20.0.1234.43567
20,0,1234,43567
I want a regular expression for updating the numbers after last two dots(.) to for example 1298 and 45678 (any number)
020. 000. 1298. 43568 (please note the whitespace after the dot(.))
020,000,1298,45678
20.0.1298.45678
20,0,1298,45678
Thanks,
resultString = Regex.Replace(subjectString,
#"(\d+) # any number
([.,]\s*) # dot or comma, optional whitespace
(\d+) # etc.
([.,]\s*)
\d+
([.,]\s*)
\d+",
"$1$2$3${4}1298${5}43568", RegexOptions.IgnorePatternWhitespace);
Note the ${4} instead of $4 because otherwise the following 1 would be interpreted as belonging to the group number ($41).
Also note the difference between (\d+) and (\d)+. While both match 1234, the first one will capture 1234 into the group created by the parentheses. The second one will capture only 4 because the previous captures will be overwritten by the next.
To replace version with 1298 and 43568
var regex = new Regex(#"(?<=^(?:\d+[.,]\s*){2})\d+(?<seperator>[.,]\s*)\d+$");
regex.Replace(source, "1298${seperator}43568");
This is because
(?<=) doesn't includethe group in the match but requires it to exist before the match
^ match start of string followed by at least one digit
(?:\d+[.,]\s*) non capturing group, match at least one digit followed by a . or , followed by 0 or more spaces
{2} previous match should occur twice
\d+ first part of the capture, 1 or more digits
(?<seperator>[.,]\s*) get the seperator of a . or , followed by optional spaces into a named capture group called seperator
\d+ capture one or more digits
$ match end of string
in the replacement string you are just providing the replacement version and using ${seperator} to insert the original seperator.
If you are not bothered about preserving the seperator you can just do
var regex = new Regex(#"(?<=^(?:\d+[.,]\s*){2})\d+[.,]\s*\d+$");
regex.Replace(source, "1298.43568");
I have a text block that is formatted like this:
1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 etc
An unknow number of numbers and dots (sections of a legal document)
How can I capture the full section (1.2.3.4.5) into a group?
I use C# but any regex is fine, aI can translate it.
UPDATED
Use this Regex:
Regex.Matches(inputString, #"\d[\.\d]*(?<!\.)");
explain:
\d digits (0-9)
[.\d]* any character of: '.', digits (0-9)
(0 or more times, matching the most amount possible))
(?<! subexpression) Zero-width negative lookbehind assertion.
string s = "1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 or 2222.3333.111.5 etc";
var matches = Regex.Matches(s, #"\d+(\.\d+)*").Cast<Match>()
.Select(m => m.Value)
.ToArray();
well, if you know you can't go beyond 5, then you can do
#"1+((.2+)((.3+)((.4+)(.5+)?)?)?)?"
and you can expand on that pattern for every symbol, up to a finite number of symbols
the + means any number of occurrences of the symbol, but at least 1. IF 0 is valid, you can use * instead
put ?: after an opening parenthesies if you don't want the pattern to be captured
like example: (?:abc)
I ommitted them to make the regex more readable.
the ? after the parenthesies, means 1 or 0 of the preceding symbol.
Now if you don't know how far you string can go, for instance
"1.2.3.4......252525.262626.272727.......n.n.n" than my intuition tells me that you can't do that with regex.
I'm going to put all decimal numbers in a text inside a span tag (<span>) but the numbers are not using period as decimal separator, they use slash (/)
The sample text is something like this:
There are 12/5 percent of students who...
And I want to convert it to
There are <span>12/5</span> percent of students who...
Actually I need the regular expression which matches.
Try this:
resultString = Regex.Replace(subjectString, #"(?<!/)\d+(?:/\d+)?(?!/)", "<span>$0</span>");
It will work with integers and decimals. Numbers like 1/ or /1 are not allowed, neither is something like 1/2/3.
Explanation:
(?<!/) # Assert that the previous character isn't a /
\d+ # Match one or more digits
(?: # Try to match...
/\d+ # a /, followed by one or more digits
)? # ...optionally.
(?!/) # Assert that the following character isn't a /
The following regex will work for you, taking in count that the number will contain a /:
[\d]+/[\d]+
The following code will do the trick:
string text = "12/5";
string pattern = #"\b[\d]+/[\d]+\b";
MatchEvaluator m = match => "<span>" + match.Groups[0] + "</match>";
Regex.Replace(text, pattern, m);
RegEx:
\D+((\d+)\/(\d+))\D+
Capture groups:
\1 12/5
\2 12
\3 5