Split credit card number into 4 chunks using Regex lookahead?

Split credit card number into 4 chunks using Regex lookahead? - c#

I want to chunk a credit card number (in my case I always have 16 digits) into 4 chunks of 4 digits.
I've succeeded doing it via positive look ahead :
var s="4581458245834584";
var t=Regex.Split(s,"(?=(?:....)*$)");
Console.WriteLine(t);
But I don't understand why the result is with two padded empty cells:
I already know that I can use "Remove Empty Entries" flag , But I'm not after that.
However - If I change the regex to (?=(?:....)+$) , then I get this result :
Question
Why does the regex emit empty cells ? and how can I fix my regex so it produce 4 chunks at first place ( without having to 'trim' those empty entries )

But I don't understand why the result is with two padded empty cells:
Let's try breaking down your regex.
Regex: (?=(?:....)*$)
Explanation: Lookahead (?=) for anything 4 times(?:....) for zero or more times. Just looking ahead and matching nothing will match zero width.
Since you are using * quantifier which says zero or more it matches first zero width at beginning or string and also at end of string.
Visualize it from this snapshot of Regex101 Demo
[
So How can I select only those 3 splitters in the middle ?
I don't know C# very well but this 3 step method might work for you.
Search with (\d{4}) and replace with -\1. Result will be -4581-4582-4583-4584. Demo
Now replace first - by searching with ^-. Result will be 4581-4582-4583-4584. Demo
At last search for - and split on it. Demo. Used \n to substitute for demo purpose.
Alternative Solution Inspired from Royi's answer.
Regex: (?=(?!^)(?:\d{4})+$)
Explanation:
(?= // Look ahead for
(?!^) // Not the start of string
(?:\d{4})+$ // Multiple group of 4 digits till end of string
)
Since nothing is matched and only lookaround assertions are used, it will pinpoint Zero width after a group of 4 digits.
Regex101 Demo

It seems like I've found an answer.
Looking at those splitters - I needed to get rid of the edges :
So I thought - how can I tell the regex engine "not at the start of the line " ?
Which is exactly what (?!^) does
So here is the new regex :
var s="4581458245834584";
var t=Regex.Split(s,"(?!^)(?=(?:....)+$)");
Console.WriteLine(t);
Result :

Umm, I don't know WHY you need Regex for this. You just overcomplicate things. Better way is to just split it manually:
var values = new List<int>();
for(int i =0;i < 4;i++)
{
var value = int.Parse(s.Substring(i*4, 4));
values.Add(value);
}
Regex solution:
var s = "4581458245834584";
var separated = Regex.Match(s, "(.{4}){4}").Groups[1].Captures.Cast<Capture>().Select(x => x.Value).ToArray();

It has been mentioned already that the * quantifier also matches at the end of string where there are zero group-matches ahead. To avoid matching at start and end you can use \B non word boundary which only matches between two word characters not giving matches for start and end.
\B(?=(?:.{4})+$)
See demo at regex101
Because the lookahead won't be triggered at start or end of the string you could even use *

Related

C# Regex Anomaly

I'm a bit perplexed here.
I have a Regex which is to limit decimal places to two points.
My second and third captures work as expected. But including the 1st capture ($1) corrupts the string and includes all the decimal places (I get the original string).
var t = "553.17765";
var from = #"(\d+)(\.*)(\d{0,2})";
var to = "$1$2$3";
var rd = Regex.Replace(t, from,to);
var r = Regex.Match(t, from);
Why can't I get the 553 in the $1 variable?
LinqPad

What is happening is that you are matching the number multiple times, once before the . and once after. You could work around that by looking for the longest match, but it seems you could improve your Regex instead
(\d+\.?\d{0,2})
Steps are as follows
The capture group covers the whole number at once.
Look for digits, greedy match.
Look for a decimal point, either one or none.
Look for zero to two digits
Furthermore, if you want to replace using Regex.Replace you need something to match the rest of the string.
text = Regex.Replace(text, #".+?(\d+\.?\d{0,2}).+", "$1");
dotnetfiddle

Your example does not work because it triggers twice per definition. The statement (\d+)(\.*)(\d{0,2}) will split the string 553.17765 as follows:
Match 1: 533.17
$1 = 553
$2 = .
$3 = 17
Replace 533.17 with 553.17
Match 2: 765
$1 = 765
Replace 765 with 765
The first match includes - as expected - only two of the decimal places. With this action, the match is complete and the regex starts looking for the next match, because Replace replaces all matches, not the first one only. As you can see, this regex does nothing by design.
The way replace works btw. is to find a match and replace the whole match with the replace pattern. So no need to include the surrounding text. The problem is, that your regex matches too well. It only matches the first two decimal places. Therefore the match only includes the first two decimal places.
That means that whatever you will replace that with, will only replace 553.17 and nothing more. For finding decimal numbers this is good. For replacing not so much, here you want to find the whole number with all decimal places and then replace it.
So a working replace regex would look like this: (\d+\.\d{1,2})\d*. First there is only one capture group, as we don't intend to change the order of numbers around. Second, the point is required as we are only interested to replace numbers that actually have decimal places. Same reason we need at least one, up to two, decimal places. Every decimal place after that is optional, but will be captured greedily to give the whole number to the match so it will be replaced completely.
Match 1: 533.17765
$1 = 533.17
Replace 533.17765 with 533.17
This regex does not handle thousands-separators btw, if that is required.

RegEx : Find match based on 1st two chars

I am new to RegEx and thus have a question on RegEx. I am writing my code in C# and need to come up with a regex to find matching strings.
The possible combination of strings i get are,
XYZF44DT508755
ABZF44DT508755
PQZF44DT508755
So what i need to check is whether the string starts with XY or AB or PQ.
I came up with this one and it doesn't work.
^((XY|AB|PQ).){2}
Note: I don't want to use regular string StartsWith()
UPDATE:
Now if i want to try a new matching condition like this -
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
How to write the RegEx for that?

You can modify you expression to the following and use the IsMatch() method.
Regex.IsMatch(input, "^(?:XY|AB|PQ)")
The outer capturing group in conjuction with . (any single character) is trying to match a third character and then repeat the sequence twice because of the range quantifier {2} ...
According to your updated edit, you can simply place "ZF" after the grouping construct.
Regex.IsMatch(input, "^(?:XY|AB|PQ)ZF")

You want to test for just ^(XY|AB|PQ). Your RegEx means: Search for either XY, AB or PQ, then a random character, and repeat the whole sequence twice, for example "XYKPQL" would match your RegEx.
This is a screenshot of the matches on regex101:
^ forces the start of line,
(...) creates a matching group and
XY|AB|PQ matches either XY, AB or PQ.
If you want the next two characters to be ZF, just append ZF to the RegEx so it becomes ^(XY|AB|PQ)ZF.
Check out regex101, a great way to test your RegExes.

You were on the right track. ^(XY|AB|PQ) should match your string correctly.
The problem with ^((XY|AB|PQ).){2} is following the entire group with {2}. This means exactly 2 occurrences. That would be 2 occurrences of your first 2 characters, plus . (any single character), meaning this would match strings like XY_AB_. The _ could be anything.
It may have been your intention with the . to match a larger string. In this case you might try something along the lines of ^((XY|AB|PQ)\w*). The \w* will match 0 or more occurrences of "word characters", so this should match all of XYZF44DT508755 up to a space, line break, punctuation, etc., and not just the XY at the beginning.
There are some good tools out there for understanding regexes, one of my favorites is debuggex.
UPDATE
To answer your updated question:
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
The regex would be (assuming you want to match the entire "word").
^((XY|AB|PQ)ZF\w*)
Debuggex Demo

Why does regex return one digit?

I want to get last digits from strings.
For example: "Text11" - 11; "Te1xt32" - 32 and etc.
I write this regex:
var regex = new Regex(#"^(.+)(?<Number>(\d+))(\z)");
And use it:
regex.Match(input).Groups["Number"].Value;
That returns 1 for "Text11" and 2 for "Te1xt32" instead 11 and 32.
So question, Why \d+ get only last digit?

Because .+ at the first is greedy by default, so .+ matches greedily upto the last and then it backtracks to the previous character and uses the pattern \d+ inorder to produce a match. You need to add a non-greedy quantifier ? next to the + to make the regex engine to do a non-greedy match or shortest possible match.
var regex = new Regex(#"^(.+?)(?<Number>(\d+))(\z)");
DEMO

As an alternative, you can use the same regex with in RightToLeft mode:
var input = "Te1xt32";
// I removed some unnecessary capturing groups in your regex
var regex = new Regex(#"^(.+)(?<Number>\d+)\z", RegexOptions.RightToLeft);
// You need to specify the starting index as the end of the string
Match m = regex.Match(input, input.Length);
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups["Number"].Value);
Demo on ideone
Since what you want to find is at the end of the string and the part in front doesn't have any pattern, going from right to left avoids some backtracking in this case, though the difference, if any, is going to be insignificant in this case.
RightToLeft mode, as the name suggests, performs the match from right to left, so the numbers at the end of the string will be greedily consumed by \d+ before the rest is consumed by .+.

You can simply do:
var regex = new Regex(#"(?<Number>\d+)\z");

Regex - find every occurrence of integer surrounded by space and coma

I have the following string:
"121 fd412 4151 3213, 421, 423 41241 fdsfsd"
And I need to get 3213 and 421 - because they both have space in front of them, and a coma behind.
The result will be set inside the string array...How can I do that?
"\\d+" catches every integer.
"\s\\d+(,)" throws some memory errors.
EDIT.
space to the left (<-) of the number, coma to the right (->)
EDIT 2.
string mainString = "Tests run: 5816, 8346, 28364 iansufbiausbfbabsbo3 4";
MatchCollection c = Regex.Matches(a, #"\d+(?=\,)");
var myList = new List<String>();
foreach(Match match in c)
{
myList.Add(match.Value);
}
Console.Write(myList[1]);
Console.ReadKey();

Your regex syntax is incorrect for wanting to match both digits, if you want them as separate results, you could do:
#"\s(\d+),\s(\d+)\s"
Live Demo
Edit
#"\s(\d+),"
Live Demo

\s\\d+(,):
\s is not properly escaped, should be \\s, same as for \\d
\\d matches single digit, you need \\d+ - one or more consecutive digits
(,) captures comma, do you really need this? seems like you need to capture a number, so \\s(\\d+),
you said "because they both have space behind them, and a coma in front", so probably ,\\s(\\d+)

How about this expression :
" \d+," // expression without the quotes
it should find what you need.
How to work with regular expression can you check on the MSDN
Hope it helps

Another solution
\s(\d+), // or maybe you'll need a double slash \\
Output:
3213
421
Demo

I think you mean you're looking for something like ,<space><digit> not ,<digit><space>
If so, try this:
, (\d+) //you might need to add another backslash as the others have noted
Well, based on your new edit
\s(\d+),
Test it here

It's all you need, only the numbers
\d+(?=\,)

Regex to isolate a specific substring

I have this string I have retrieved from a File.ReadAllText:
6 11 rows processed
As you can see there is always an integer specifying the line number in this document. What I am interested in is the integer that comes after it and the words "rows processed". So in this case I am only interested in the substring "11 rows processed".
So, knowing that each line will start with an integer and then some white space, I need to be able to isolate the integer that follows it and the words "rows processed" and return that to a string by itself.
I have been told this is easy to do with Regex, but so far I haven't the faintest clue how to build it.

You don't need regular expressions for this. Just split on the whitespace:
var fields = s.Split(new char[0], StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine(String.Join(" ", fields.Skip(1));
Here, I am using the fact that if you pass an empty array as the char [] parameter to String.Split, it splits on all whitespace.

This should work for what you need:
\d+(.*)
This searches for 1 or more digits (\d+) and then it puts everything afterwards in a group:
. = any character
* = repeater (zero or more of the preceding value (which is any character in the above
() = grouping
However, Jason is correct in that you only need to use a split function

If you need to use a Regex it would be like this:
string result = null;
Match match = Regex.Match(row, #"^\s*\d+\s*(.*)");
if (match.Success)
result = match.Groups[1].Value;
The regex matches from start of row: first spaces if any, then digits and then more spaces. Last it extracts rest of line and return it as result.

This is done easily with Regex.Replace() using the following regular expression...
^\d+\s+
So it'd be something like this:
return Regex.Replace(text, #"^\d+\s+", "");
Basically you're just trimming the first number \d and the whitespace \s that follows.

Example in PHP(C# regex should be compatible):
$line = "6 11 rows processed";
$resp = preg_match("/[0-9]+\s+(.*)/",$line,$out);
echo $out[1];
I hope I catched your point.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.