Regular Expression: Section names with unknown length?

Regular Expression: Section names with unknown length? - c#

I have a text block that is formatted like this:
1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 etc
An unknow number of numbers and dots (sections of a legal document)
How can I capture the full section (1.2.3.4.5) into a group?
I use C# but any regex is fine, aI can translate it.

UPDATED
Use this Regex:
Regex.Matches(inputString, #"\d[\.\d]*(?<!\.)");
explain:
\d digits (0-9)
[.\d]* any character of: '.', digits (0-9)
(0 or more times, matching the most amount possible))
(?<! subexpression) Zero-width negative lookbehind assertion.

string s = "1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 or 2222.3333.111.5 etc";
var matches = Regex.Matches(s, #"\d+(\.\d+)*").Cast<Match>()
.Select(m => m.Value)
.ToArray();

well, if you know you can't go beyond 5, then you can do
#"1+((.2+)((.3+)((.4+)(.5+)?)?)?)?"
and you can expand on that pattern for every symbol, up to a finite number of symbols
the + means any number of occurrences of the symbol, but at least 1. IF 0 is valid, you can use * instead
put ?: after an opening parenthesies if you don't want the pattern to be captured
like example: (?:abc)
I ommitted them to make the regex more readable.
the ? after the parenthesies, means 1 or 0 of the preceding symbol.
Now if you don't know how far you string can go, for instance
"1.2.3.4......252525.262626.272727.......n.n.n" than my intuition tells me that you can't do that with regex.

Related

Regex to match 7 same digits in a number regardless of position

I want to match an 8 digit number. Currently, I have the following regex but It is failing in some cases.
(\d+)\1{6}
It matches only when a number is different at the end such as 44444445 or 54444444. However, I am looking to match cases where at least 7 digits are the same regardless of their position.
It is failing in cases like
44454444
44544444
44444544
What modification is needed here?

It's probably a bad idea to use this in a performance-sensitive location, but you can use a capture reference to achieve this.
The Regex you need is as follows:
(\d)(?:.*?\1){6}
Breaking it down:
(\d) Capture group of any single digit
.*? means match any character, zero or more times, lazily
\1 means match the first capture group
We enclose that in a non-capturing group {?:
And add a quantifier {6} to match six times

You can sort the digits before matching
string input = "44444445 54444444 44454444 44544444 44444544";
string[] numbers = input.Split(' ');
foreach (var number in numbers)
{
number = String.Concat(str.OrderBy(c => c));
if (Regex.IsMatch(number, #"(\d+)\1{6}"))
// do something
}
Still not a good idea to use regex for this though

The pattern that you tried (\d+)\1{6} matches 6 of the same digits in a row. If you want to stretch the match over multiple same digits, you have to match optional digits in between.
Note that in .NET \d matches more digits than 0-9 only.
If you want to match only digits 0-9 using C# without matching other characters in between the digits:
([0-9])(?:[0-9]*?\1){6}
The pattern matches:
([0-9]) Capture group 1
(?: Non capture group
[0-9]*?\1 Match optional digits 0-9 and a backreference to group 1
){6} Close non capture group and repeat 6 times
See a .NET Regex demo
If you want to match only 8 digits, you can use a positive lookahead (?= to assert 8 digits and word boundaries \b
\b(?=\d{8}\b)[0-9]*([0-9])(?:[0-9]*?\1){6}\d*\b
See another .NET Regex demo

Regex Match all characters until reach character, but also include last match

I'm trying to find all Color Hex codes using Regex.
I have this string value for example - #FF0000FF#0038FFFF#51FF00FF#F400FFFF and I use this:
#.+?(?=#)
pattern to match all characters until it reaches #, but it stops at the last character, which should be the last match.
I'm kind of new to this Regex stuff. How could I also get the last match?

Your regex does not match the last value because your regex (with the positive lookahead (?=#)) requires a # to appear after an already consumed value, and there is no # at the end of the string.
You may use
#[^#]+
See the regex demo
The [^#] negated character class matches any char but # (+ means 1 or more occurrences) and does not require a # to appear immediately to the right of the currently matched value.
In C#, you may collect all matches using
var result = Regex.Matches(s, #"#[^#]+")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
A more precise pattern you may use is #[A-Fa-f0-9]{8}, it matches a # and then any 8 hex chars, digits or letters from a to f and A to F.

Don't rely upon any characters after the #, match hex characters and it
will work every time.
(?i)#[a-f0-9]+

How to extract numbers from a string using regular expressions?

This little challenge just screams regular expressions to me, but so far I am stumped.
I have an arbitrary string that contains two numbers embedded in it. I need to extract those two numbers, which will be n and m digits long (n,m are unknown in advance). The format of the string is always
FixedWord[n digits]anotherfixedword[m digits]alotmorestuffontheend
The first number is of the format 1.2.3.4 (the number of digits varying) eg 5.3.20 or 5.3.10.1 or 5.4.
and the second is a simpler 'm' digits (eg 25 or 2)
eg "AppName5.2.6dbVer44Oracle.Group"
It shouts 'pattern matching' and hence "extraction using regexes". Can anyone guide me further?
TIA

The following pattern:
(\d+(?>\.\d+)*)\w+?(\d+)
Will match this:
AppName5.2.6dbVer44Oracle.Group
\__________/ <-- match
\___/ \/ <-- captures
Demo
And will capture the two values you're interested in in capture groups.
Use it like this:
var match = Regex.Match(input, #"(\d+(?>\.\d+)*)\w+?(\d+)");
if (match.Success)
{
var first = match.Groups[1].Value;
var second = match.Groups[2].Value;
// ...
}
Pattern explanation:
( # Start of group 1
\d+ # a series of digits
(?> # start of atomic group
\.\d+ # dot followed by digits
)* # .. 0 to n times
)
\w+? # some word characters (as few as possible)
(\d+) # a series of digits captured in group 2

Try this:
\w*?([\d|\.]+)\w*?([\d{1,4}]+).*

You could start from the following:
^[a-zA-Z]+((?:\d+\.)+\d)[a-zA-Z]+(\d+).*$
I assumed that the fixed words are just made of letters and that you want to match the entire string. If you prefer, you could substitute the parts not in parentheses with the actual fixed words or change the character sets as desired. I recommend using a tool like https://regex101.com to fine-tune the expression.

Keep it basic by specifing a match ( ) by looking for a digit \d, then zero or more * digits or periods in a set [\d.] (the set is \d -or- a literal period):
var data = "AppName5.2.6dbVer44Oracle.Group";
var pattern = #"(\d[\d.]*)";
// Outputs:
// 5.2.6
// 44
Console.WriteLine (Regex.Matches(data, pattern)
.OfType<Match>()
.Select (mt => mt.Groups[1].Value));
Each match will be a number within the sentence. So if the total set of numbers change, the pattern will not fail and dutifully report 1 to N numbers.

Simply look for the numbers, since you only care for the numbers and don't want to check the syntax of the whole input string.
Matches matches = Regex.Matches(input, #"\d+(\.\d+)*");
if (matches.Count >= 2) {
string number1 = matches[0].Value;
string number2 = matches[1].Value;
} else {
// Less than two numbers found
}
The expression \d+(\.\d+)* means:
\d+ one or more digits.
( )* repeat zero, one or more times.
\.\d+ one decimal point (escaped with \) followed by one or more digits.
and
\d one digit.
( ) grouping.
+ repeat the expression to the left one or more times.
* repeat the expression to the left zero, one or more times.
\ escapes characters that have a special meaning in regex.
. any character (without escaping).
\. period character (".").

How to make this regex allow spaces c#

I have a phone number field with the following regex:
[RegularExpression(#"^[0-9]{10,10}$")]
This checks input is exactly 10 numeric characters, how should I change this regex to allow spaces to make all the following examples validate
1234567890
12 34567890
123 456 7890
cheers!

This works:
^(?:\s*\d\s*){10,10}$
Explanation:
^ - start line
(?: - start noncapturing group
\s* - any spaces
\d - a digit
\s* - any spaces
) - end noncapturing group
{10,10} - repeat exactly 10 times
$ - end line
This way of constructing this regex is also fairly extensible in case you will have to ignore any other characters.

Use this:
^([\s]*\d){10}\s*$
I cheated :) I just modified this regex here:
Regular expression to count number of commas in a string
I tested. It works fine for me.

Use this simple regex
var matches = Regex.Matches(inputString, #"([\s\d]{10})");
EDIT
var matches = Regex.Matches(inputString, #"^((?:\s*\d){10})$");
explain:
^ the beginning of the string
(?: ){10} group, but do not capture (10 times):
\s* whitespace (0 or more times, matching the most amount possible)
\d digits (0-9)
$ before an optional \n, and the end of the string

Depending on your problem, you might consider using a Match Evaluator delegate, as described in http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.matchevaluator.aspx
That would make short work of the issue of counting digits and/or spaces

Something like this i think ^\d{2}\s?\d\s?\d{3}\s?\d{4}$
There are variants : 10 digits or 2 digits space 8 digits or 3 digits space 3 digits space 4 digits.
But if you want only this 3 variants use something like this
^(?:\d{10})|(?:\d{2}\s\d{8})|(?:\d{3}\s\d{3}\s\d{4})$

Minimum Length Regular Expression

I'm trying to write a regular expression that will validate that user input is greater than X number of non-whitespace characters. I'm basically trying to filter out begining and ending whitespace while still ensuring that the input is greater than X characters; the characters can be anything, just not whitespace (space, tab, return, newline).
This the regex I've been using, but it doesn't work:
\s.{10}.*\s
I'm using C# 4.0 (Asp.net Regular Expression Validator) btw if that matters.

It may be easier to not use regex at all:
input.Where(c => !char.IsWhiteSpace(c)).Count() > 10
If whitespace shouldn't count in the middle, then this will work:
(\s*(\S)\s*){10,}
If you don't care about whitespace in between non-whitespace characters, the other answers have that scenario covered.

This regular expression looks for eight or more characters between the first and last non-whitespace characters, ignoring leading and trailing whitespace:
\s*\S.{8,}\S\s*

If your trying to check (like in my case a phone number that contains 8 digits) , you need to refer to a number below the one you need.
(\s*(\S)\s*){7,}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular Expression: Section names with unknown length? - c#

I have a text block that is formatted like this: 1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 etc An unknow number of numbers and dots (sections of a legal document) How can I capture the full section (1.2.3.4.5) into a group? I use C# but any regex is fine, aI can translate it.

UPDATED Use this Regex: Regex.Matches(inputString, #"\d[\.\d](?<!\.)"); explain: \d digits (0-9) [.\d] any character of: '.', digits (0-9) (0 or more times, matching the most amount possible)) (?<! subexpression) Zero-width negative lookbehind assertion.

string s = "1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 or 2222.3333.111.5 etc"; var matches = Regex.Matches(s, #"\d+(\.\d+)*").Cast<Match>() .Select(m => m.Value) .ToArray();

Related

Regex to match 7 same digits in a number regardless of position

Regex Match all characters until reach character, but also include last match

How to extract numbers from a string using regular expressions?

How to make this regex allow spaces c#

Minimum Length Regular Expression

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular Expression: Section names with unknown length? - c#

I have a text block that is formatted like this: 1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 etc An unknow number of numbers and dots (sections of a legal document) How can I capture the full section (1.2.3.4.5) into a group? I use C# but any regex is fine, aI can translate it.

UPDATED Use this Regex: Regex.Matches(inputString, #"\d[\.\d]*(?<!\.)"); explain: \d digits (0-9) [.\d]* any character of: '.', digits (0-9) (0 or more times, matching the most amount possible)) (?<! subexpression) Zero-width negative lookbehind assertion.

string s = "1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 or 2222.3333.111.5 etc"; var matches = Regex.Matches(s, #"\d+(\.\d+)*").Cast<Match>() .Select(m => m.Value) .ToArray();

Related

Regex to match 7 same digits in a number regardless of position

Regex Match all characters until reach character, but also include last match

How to extract numbers from a string using regular expressions?

How to make this regex allow spaces c#

Minimum Length Regular Expression

Categories

Resources

UPDATED Use this Regex: Regex.Matches(inputString, #"\d[\.\d](?<!\.)"); explain: \d digits (0-9) [.\d] any character of: '.', digits (0-9) (0 or more times, matching the most amount possible)) (?<! subexpression) Zero-width negative lookbehind assertion.