Why does regex return one digit?

Why does regex return one digit? - c#

I want to get last digits from strings.
For example: "Text11" - 11; "Te1xt32" - 32 and etc.
I write this regex:
var regex = new Regex(#"^(.+)(?<Number>(\d+))(\z)");
And use it:
regex.Match(input).Groups["Number"].Value;
That returns 1 for "Text11" and 2 for "Te1xt32" instead 11 and 32.
So question, Why \d+ get only last digit?

Because .+ at the first is greedy by default, so .+ matches greedily upto the last and then it backtracks to the previous character and uses the pattern \d+ inorder to produce a match. You need to add a non-greedy quantifier ? next to the + to make the regex engine to do a non-greedy match or shortest possible match.
var regex = new Regex(#"^(.+?)(?<Number>(\d+))(\z)");
DEMO

As an alternative, you can use the same regex with in RightToLeft mode:
var input = "Te1xt32";
// I removed some unnecessary capturing groups in your regex
var regex = new Regex(#"^(.+)(?<Number>\d+)\z", RegexOptions.RightToLeft);
// You need to specify the starting index as the end of the string
Match m = regex.Match(input, input.Length);
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups["Number"].Value);
Demo on ideone
Since what you want to find is at the end of the string and the part in front doesn't have any pattern, going from right to left avoids some backtracking in this case, though the difference, if any, is going to be insignificant in this case.
RightToLeft mode, as the name suggests, performs the match from right to left, so the numbers at the end of the string will be greedily consumed by \d+ before the rest is consumed by .+.

You can simply do:
var regex = new Regex(#"(?<Number>\d+)\z");

Related

regular expression: The beginning and end of the string with a letter with a specified length

I use this pattern but I do not get the answer. Regex reg = new Regex(#"^([A-Za-z][A-Za-z0-9\.]*(?:[A-Za-z])){6,30}#mydomain.com$");
I want my string to start with a letter and end with a letter, with a combination of letters, numbers, and dots, provided it is between 6 and 30 characters long.
For example: a.124b#mydomain.com or abc.1e#mydomain.com and ...

string pattern = #"^[a-z][a-z0-9.]{4,28}[a-z]#mydomain\.com$";
string input = #"a.124b#mydomain.com";
RegexOptions options = RegexOptions.IgnoreCase;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
now the explanation:
^[a-z] start of the string, and one letter
[a-z0-9.]{4,28} letters, digits and dot character (you don't need to escape it when in square brackets), repeated between 4 and 28 times
[a-z] another single letter
(those in combination amont to 6 to 30 characters)
#mydomain\.com$ rest of your mail address and end of string.
notice also the RegexOptions.IgnoreCase - when you know you don't care about case, it makes letter groups a bit more readable
the error you made in your regex was adding the quantifier for your complete capture group - meaning a repetition of 6-30 times of the whole group.
i also recommend https://regex101.com/ for all your regex needs

Here is one option:
^(?![0-9.]|.*[0-9.]#)[a-zA-Z0-9.]{6,30}#mydomain\.com$
See the online demo
^ - Start line anchor.
(?![0-9.]|.*[0-9.]#) - Negative lookahead to prevent start with dot/digit or end dot/digit before the "#".
[a-zA-Z0-9.]{6,30} - 6-30 Characters specified in class.
#mydomain\.com - Literally match "#mydomain.com". Notice the backslash before the dot to make it literal (outside a character class).
$ - End line anchor.
I was going to mention a case-insensitive alternative, but it looks like #FranzGleichmann got you covered =)

Regex: Find a string within different string variations

I need to find a Regex that gets hold of the
81.03
part (varies, but always has the structure XX.XX) in following string variations:
Projects/75100/75120/75124/AR1/75124_AR1_HM2_81.03-testing-b405.tgz
Projects/75100/75130/75138/LM1/75138_LM1_HM2_81.03.tgz
I´ve come up with:
var regex = new Regex("(.*_)(.*?)-");
but this only matches up to the first example string whereas
var regex = new Regex("(.*_)(.*?)(.*\.)");
only matches the second string.
The path to the file constantly changes as does the "-testing..." postfix.
Any ideas to point me out in the right direction?

You can use
var result = Regex.Match(text, #".*_(\d+\.\d+)")?.Groups[1].Value;
Or, if the string can have more dot+number parts:
var result = Regex.Match(text, #".*_(\d+(?:\.\d+)+)")?.Groups[1].Value;
See the regex demo.
In general, the regex will extract dot-separated digit chunks after the last _.
Details
.* - any 0 or more chars other than a newline, as many as possible
_ - a _ char
(\d+(?:\.\d+)+) - Group 1: one or more digits followed with one or more occurrences of a dot followed with one or more digits
\d+\.\d+ - one or more digits, . and one or more digits.

To match the value 81.03 another option is to match the digits with optional decimal part after the last forward slash after the first underscore.
_(\d+(?:\.\d+)?)[^/\r\n]*$
Regex demo
Explanation
_ Match literally
(\d+(?:\.\d+)?) Capture group 1, match 1+ digits with an optional decimal part
[^/\r\n]* Match 0+ chars except / or a newline
$ End of string

Regex - how do i match the first part of an indexed path

for the line
Tester[0]/Test[4]/testId
Tester[0]/Test[4]/testId
Test[1]/Test[4]/testId
Test[2]/Test[4]/testId
I want to match the first part of the path including the first [, n and ] and first /
so for line above I would get
Tester[0]
Tester[0]
Test[1]
Test[2]
I have tried using
var rx = new Regex(#"^\[.*\]\/");
var res = rx.Replace("Tester[0]/Test[4]/testId", "", 1 /*only one occurrence */);
i get
res == "testId";
rather than
res == "Test[4]/testId"
which is what im hoping for
so its matching the first open square bracket and the last closing bracket.
I need it to match only the first closing bracket
Update:
Sorry, i am trying to match the first forward slash also.
Tester[0]/
Tester[0]/
Test[1]/
Test[2]/
Solution:
to remove the first match using "?":
var rx = new Regex(#"^.*?\[.*?\]\/");
var res = rx.Replace("Tester[0]/Test[4]/testId", "", 1 /*only one occurrence */);

I'm assuming this was your original regex pattern: ^.*\[.*\]/ (the pattern in your question does not match the lines).
This pattern uses greedy quantifiers (*), so, even though we only requested one match, the pattern itself matches more than we'd like. As you noticed, it matched until the second occurrence of the square brackets.
We can make this pattern non-greedy by adding question marks to the quantifiers: ^.*?\[.*?\]/.
Although this works for your use-case, a better pattern may be: ^[^/]+/. This removes any character up to the first forward-slash. The [^ ... ] is a negative character class (the brackets are unrelated to the brackets in the strings we're matching against). In this case, it matches any character that isn't a forward-slash.
For this simple text manipulation, though, we could just use a String.Substring() instead of regular expressions:
line.Substring(line.IndexOf('/') + 1);
This is faster and easier to understand than a regular expression pattern.

You can use lookahead and lookbehind approach to find matching and replace accordingly :
With lookaround approach, your regex would be like this :
(?=/).*(?<=])

Is this the sort of thing you are looking for?
updated
var str="Tester[0]/Test[4]/testId\nTester[0]/Test[4]/testId\nTest[1]/Test[4]/testId\nTest[2]/Test[4]/testId"
console.log(str)
// Tester[0]/Test[4]/testId
// Tester[0]/Test[4]/testId
// Test[1]/Test[4]/testId
// Test[2]/Test[4]/testId
var str2=str.replace(/\/.+/mg,"")
console.log(str2)
// Tester[0]
// Tester[0]
// Test[1]
// Test[2]
this works by starting the match at the first '/' and then ending when the line ends and replaces this match with " ". the m flags multi-line and the g flags to do a global match.

Split credit card number into 4 chunks using Regex lookahead?

I want to chunk a credit card number (in my case I always have 16 digits) into 4 chunks of 4 digits.
I've succeeded doing it via positive look ahead :
var s="4581458245834584";
var t=Regex.Split(s,"(?=(?:....)*$)");
Console.WriteLine(t);
But I don't understand why the result is with two padded empty cells:
I already know that I can use "Remove Empty Entries" flag , But I'm not after that.
However - If I change the regex to (?=(?:....)+$) , then I get this result :
Question
Why does the regex emit empty cells ? and how can I fix my regex so it produce 4 chunks at first place ( without having to 'trim' those empty entries )

But I don't understand why the result is with two padded empty cells:
Let's try breaking down your regex.
Regex: (?=(?:....)*$)
Explanation: Lookahead (?=) for anything 4 times(?:....) for zero or more times. Just looking ahead and matching nothing will match zero width.
Since you are using * quantifier which says zero or more it matches first zero width at beginning or string and also at end of string.
Visualize it from this snapshot of Regex101 Demo
[
So How can I select only those 3 splitters in the middle ?
I don't know C# very well but this 3 step method might work for you.
Search with (\d{4}) and replace with -\1. Result will be -4581-4582-4583-4584. Demo
Now replace first - by searching with ^-. Result will be 4581-4582-4583-4584. Demo
At last search for - and split on it. Demo. Used \n to substitute for demo purpose.
Alternative Solution Inspired from Royi's answer.
Regex: (?=(?!^)(?:\d{4})+$)
Explanation:
(?= // Look ahead for
(?!^) // Not the start of string
(?:\d{4})+$ // Multiple group of 4 digits till end of string
)
Since nothing is matched and only lookaround assertions are used, it will pinpoint Zero width after a group of 4 digits.
Regex101 Demo

It seems like I've found an answer.
Looking at those splitters - I needed to get rid of the edges :
So I thought - how can I tell the regex engine "not at the start of the line " ?
Which is exactly what (?!^) does
So here is the new regex :
var s="4581458245834584";
var t=Regex.Split(s,"(?!^)(?=(?:....)+$)");
Console.WriteLine(t);
Result :

Umm, I don't know WHY you need Regex for this. You just overcomplicate things. Better way is to just split it manually:
var values = new List<int>();
for(int i =0;i < 4;i++)
{
var value = int.Parse(s.Substring(i*4, 4));
values.Add(value);
}
Regex solution:
var s = "4581458245834584";
var separated = Regex.Match(s, "(.{4}){4}").Groups[1].Captures.Cast<Capture>().Select(x => x.Value).ToArray();

It has been mentioned already that the * quantifier also matches at the end of string where there are zero group-matches ahead. To avoid matching at start and end you can use \B non word boundary which only matches between two word characters not giving matches for start and end.
\B(?=(?:.{4})+$)
See demo at regex101
Because the lookahead won't be triggered at start or end of the string you could even use *

Regex.Match() won't match a substring

This is something simple but I cannot figure this out. I want to find a substring with this regex. It will mach "M4N 3M5", but doesn't match the below :
const string text = "asdf M4N 3M5 adsf";
Regex regex = new Regex(#"^[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}$", RegexOptions.None);
Match match = regex.Match(text);
string value = match.Value;

Try removing ^ and $:
Regex regex = new Regex(#"[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}", RegexOptions.None);
^ : The match must start at the beginning of the string or line.
$ : The match must occur at the end of the string or before \n at the
end of the line or string.
If you want to match only in word boundaries you can use \b as suggested by Mike Strobel:
Regex regex = new Regex(#"\b[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}\b", RegexOptions.None);

I know this question has been answered but i have noticed two thing in your pattern which i want to highlight:
No need to mention the single instance of any token.
For example: (Notice the missing {1})
\d{1} = \d
[A-Z]{1} = [A-Z]
Also I won't recommend you to enter a <space>in your pattern use '\s' instead because if mistakenly a backspace is pressed you might not
be able to figure out the mistake and running code will stop
working.
Personally, for this case i would recommend you to use \b since it is best fit here.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why does regex return one digit? - c#

You can simply do: var regex = new Regex(#"(?<Number>\d+)\z");

Related

regular expression: The beginning and end of the string with a letter with a specified length

Regex: Find a string within different string variations

Regex - how do i match the first part of an indexed path

Split credit card number into 4 chunks using Regex lookahead?

Regex.Match() won't match a substring

Categories

Resources