C# RegEx. Replace credit card in text

C# RegEx. Replace credit card in text - c#

I have some text with credit card number inside, like:
"Your credit card number is 4321432143219999, this is really your credit card!"
I need to find by RegEx this number and replace it with ************9999, so result text should be:
"Your credit card number is ************9999, this is really your credit card!"
How can I do it in C#?
Thanks!

var str = "Your credit card number is 4321432143219999, this is really your credit card!";
var res = Regex.Replace(str, "[0-9](?=[0-9]{4})", "*");
This will search for digits that are followed by at least 4 digits and replace it with * (so it would be fooled by 123456 and it would change it in **3456)
If your credit card numbers are 16 digits long:
var res2 = Regex.Replace(str, #"\b[0-9]{12}(?=[0-9]{4}\b)", new string('*', 12));
This will replace a block of 12 digits followed by 4 digits (so a total of 16 digits) with 12x *. The digits must be separated from other text with space or other non-word characters. So A1234567890123456 isn't good, as isn't 1234567890123456A. 1234567890123456, is ok because the , is a non-word character.

It's best to avoid regular expressions whenever possible because they are slower and harder to troubleshoot than ordinary substring operations.
public static string HideNumber(string number)
{
string hiddenString = number.Substring(number.Length - 4).PadLeft(number.Length, '*');
return hiddenString;
}

Related

How to Regex Split an expression to accept decimal number?

I am trying to make a calculator. I am using Regex.Split() to get the number input from the expression. It works well with non-decimal digits but now I am finding a way to get the decimal number input as well.
string mExp = "1.50 + 2.50";
string[] strNum = (Regex.Split(mExp, #"[\D+]"));
num1 = double.Parse(strNum[0]);
num2 = double.Parse(strNum[1]);

You can change your regex to split on some number of spaces followed by an arithmetic operator, followed by spaces:
string[] strNum = (Regex.Split(mExp, #"\s*[+/*-]\s*"));
Console.WriteLine(string.Join("\n", strNum));
Output:
1.50
2.50
Demo on rextester
To deal with negative numbers, you have to make the regex a bit more sophisticated and add a lookbehind for a digit and a lookahead for either a digit or a -:
string mExp = "-1.50 + 2.50 -3.0 + -1";
string[] strNum = (Regex.Split(mExp, #"(?<=\d)\s*[+*/-]\s*(?=-|\d)"));
Console.WriteLine(string.Join("\n", strNum));
Output:
-1.50
2.50
3.0
-1
Demo on rextester

You can use the following regex for splitting for both non-decimal and decimal numbers:
[^\d.]+
Regex Demo
string[] strNum = (Regex.Split(mExp, #"[^\d.]+"));
Essentially saying to match anything except a digit or a dot character recursively and split by that match.

Extracting dollar prices and numbers with comma as thousand separator from PDF converted to text format

I am trying to redact some pdfs with dollar amounts using c#. Below is what I have tried
#"/ (\d)(?= (?:\d{ 3})+(?:\.|$))| (\.\d\d ?)\d *$/ g"
#"(?<=each)(((\d*[,|.]\d{2,3}))*)"
#"(?<=each)(((\d*[,|.]\d{2,3}))*)"
#"\d+\.\d{2}"
Here are some test cases that it needs to match
76,249.25
131,588.00
7.09
21.27
420.42
54.77
32.848
3,056.12
0.009
0.01
32.85
2,948.59
$99,249.25
$9.0000
$1,800.0000
$1,000,000
Here are some test cases that it should not target
666-257-6443
F1A 5G9
Bolt, Locating, M8 x 1.25 x 30 L
Precision Washer, 304 SS, 0.63 OD x 0.31
Flat Washer 300 Series SS; Pack of 50
U-SSFAN 0.63-L6.00-F0.75-B0.64-T0.38-SC5.62
U-CLBUM 0.63-D0.88-L0.875
U-WSSS 0.38-D0.88-T0.125
U-BGHK 6002ZZ - H1.50
U-SSCS 0.38-B0.38
6412K42
Std Dowel, 3/8" x 1-1/2" Lg, Steel
2019.07.05
2092-002.0180
SHCMG 0.25-L1.00
280160717
Please note the c# portion is interfacing with iText 7 pdfSweep.
Guid g = new Guid();
CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
string guid = g.ToString();
string input = #"C:\Users\JM\Documents\pdftest\61882 _280011434 (1).pdf";
string output = #"C:\Users\JM\Documents\pdftest\61882 _2800011434 (1) x2" + guid+".pdf";
string regex = #"(?m)^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$";
strategy.Add(new RegexBasedCleanupStrategy(regex));
PdfDocument pdf = new PdfDocument(new PdfReader(input), new PdfWriter(output));
PdfAutoSweep autoSweep = new PdfAutoSweep(strategy);
autoSweep.CleanUp(pdf);
pdf.Close();
Please share your wisdom

You may use
\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?
Or, if the prices occur on whole lines:
^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$
See the regex demo
Bonus: To obtain only price values, you need to remove the ? after \$ to make it obligatory:
\$([0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?)
(I added a capturing group in case you need to access the number value separately from the $ char).
If you need to support any currency char, not just $, replace \$ with \p{Sc}.
Details
^ - start of string
\$? - an optional dollar symbol
[0-9]{1,3} - one to three digits
(?:,[0-9]{3})* - any 0 or more repetitions of a comma and then three digits
(?:\.[0-9]+)? - an optional sequence of a dot and then any 1 or more digits
$ - end of string.
C# check for a match:
if (Regex.IsMatch(str, #"^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$"))
{
// there is a match
}
pdfSweep notice:
Apply the fix from this answer. The point is that the line breaks are lost when parsing the text. The regex you need then is
#"(?m)^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?\r?$"
where (?m) makes ^ and $ match start/end of lines and \r? is required as $ only matches before LF, not before CRLF in .NET regex.

RegEx to find numbers sequence in string separated by space with predefined maximum length

Sorry for the confusing title, I'll try to explain this with example. Currently we have this expression to find number sequence in a string
\b((\d[ ]{0,1}){13,19})\b
Now I'd like to modify it so it fulfills these rule
- The length should be between 13 to 19 characters, excluding the whitespaces
- Each number cluster must have minimum 3 digits
The expression should mark these as matched:
1234567890123
1234 5678 9012 345
Not match:
123456789012 3
123 12 123 1 23134
Current expression that I have will mark all of them as match.
Example

This is possible using look-around.
The regex can be changed to the following:
\b(?<!\d )(?=(?:\d ?){13,19}(?! ?\d))(?:\d{3,} ?)+\b(?! ?\d)
This works by looking ahead to make sure the number is between 13 and 19 digits long. It then matches groups of 3 or more digits. It then uses negative look ahead after its found all groups of 3 to make sure there aren't any numbers left. If there are, we've found a group smaller than 3. This works on the examples you've provided.
\b Makes sure that its the start of a "word".
(?<!\d ) Make sure there are no numbers behind.
(?=(?:\d ?){13,19}(?! ?\d)) Looks ahead to make sure the number is between 13 and 19 digits long
(?:\d ?){13,19} From original. ?: added to make non-capturing
(?! ?\d) Negative look ahead: if there is still digits left after getting 19 digits, too big therefore discard current match
(?:\d{3,} ?)+ Match any number of clusters bigger than 3 (min 13, max 19 handled by first look ahead)
\b(?! ?\d) Looks for the end of a cluster. If there are still numbers left after the end of the cluster, there must be a cluster that is too small.
Test here

I suggest the following solution also based on lookarounds:
\b\d(?!\d?\b)(?: ?\d(?!(?<= \d)\d?\b)){12,18}\b
See the regex demo
The main point is that we only match the next digit if it is not a part of a 1- or 2-digit group.
Pattern explanation
\b - starting word boundary
\d(?!\d?\b) - a digit that is not followed with 1 or 0 digits and then a trailing word boundary (that is, if it is 12 or 1 like group, it is failed)
(?: ?\d(?!(?<= \d)\d?\b)){12,18} - 12 to 18 occurrences of:
? - 1 or 0 spaces
\d(?!(?<= \d)\d?\b) - any single digit that is not followed with 1 or 0 digits followed with a word boundary (thanks to the (?!\d?\b)), and if that 1 or 0 digits are preceded with space + 1 digit ((?<= \d) lookbehind does that)
\b - a trailing word boundary.
NOTE that in case you want to match these strings in a non-numeric context (that means, if you do not want to allow any digits on the left and on the right) you might also consider adding (?<!\d *) at the front and (?! *\d) at the end of the pattern.
Note that to match any whitespace, you may replace a literal space with \s in the pattern.

If you can use Linq, this will be way easier to maintain:
var myList = new List<string>
{
"1234567890123",
"1234 5678 9012 345",
"123456789012 3",
"123 12 123 1 23134"
};
foreach(var input in myList)
{
var splitted = Regex.Split(input, #"\s+"); // Split on whitespace
var length = splitted.Sum(x => x.Length); // Compute the total length
var smallestGroupSize = splitted.Min(x => x.Length); // Compute the length of the smallest chunck
Console.WriteLine($"Total lenght: {length}, smallest group size: {smallestGroupSize}");
if (length < 13 || length > 19 || smallestGroupSize < 3)
{
Console.WriteLine($"Input '{input}' is incorrect{Environment.NewLine}");
continue;
}
Console.WriteLine($"Input '{input}' is correct!{Environment.NewLine}");
}
which produces:
Total lenght: 13, smallest group size: 13
Input '1234567890123' is correct!
Total lenght: 15, smallest group size: 3
Input '1234 5678 9012 345' is correct!
Total lenght: 13, smallest group size: 1
Input '123456789012 3' is incorrect
Total lenght: 14, smallest group size: 1
Input '123 12 123 1 23134' is incorrect

Regular expression match all numbers after the last dash?

Trying to find the last instance of numbers after last dash in a string so
test-123-2-456 would return 456
123-test would return ""
123-test-456 would return 456
123-test-456sdfsdf would return 456
123-test-asd456 would return 456
The expression, #"[^-]*$", does not match the numbers though, and I have tried using [\d] but to no avail.

Sure, the simplest solution would be something like this:
(\d+)[^-]*$
This will match one or more digits, captured in group 1, followed by zero or more of any character other than a hyphen, followed by the end of the string. In other words, it will match any sequence of digits as long as there are no hyphens between that sequence and the end of the string. You then just have to extract group 1 from the match. For example:
var inputs = new[] {
"test-123-2-456",
"123-test",
"123-test-456",
"123-test-456sdfsdf",
"123-test-asd456"
};
foreach(var str in inputs)
{
var m = Regex.Match(str, #"(\d+)[^-]*$");
Console.WriteLine("{0} --> {1}", str, m.Groups[1].Value);
}
Produces:
test-123-2-456 --> 456
123-test -->
123-test-456 --> 456
123-test-456sdfsdf --> 456
123-test-asd456 --> 456
Alternatively, if you could use a negative lookahead like this:
\d+(?!.*-)
This will match one or more digit characters so long as they are not followed by a hyphen. Only the digits will be included in the match.
Note that these two options behave differently if there are two or more sets of numbers after the last -, e.g. foo-123bar456. In this case it's not entirely clear what you want to happen, but the first pattern will simply match everything starting from the first sequence of digits to the end (123bar456) with group 1 only containing the first sequence of digits (123). If you'd like to change this so that it only captures the last sequence of digits, place a \d inside the character class (i.e. (\d+)[^\d-]*$). The second second pattern would produce a separate match for each sequence digits (in this example, 123 and 456) but the Regex.Match method will only give you the first match.

I suggest to apply two regex-functions. Take the result of the first one as the input for the second one.
The first regex is:
-[0-9]+[^-]+$ // Take the last peace of your string lead by a minus (-)
// followed by digits ([0-9]+)
// and some ugly rest that doesn't contain another minus ([^-]+$)
The second regex is:
-[0-9]+ // Seperate the relevant digits from the ugly rest
// You know that there can only be one minus + digits part in it
Tested here: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

The latest group from this RegEx can get the last number for you:
[^-A-z][0-9]+[^A-z]
If you are looking at groups, you can write this code by matching groups to get the latest number:
var inputs = new[] {
"test-123-2-456",
"123-test",
"123-test-456",
"123-test-456sdfsdf",
"123-test-asd456"
};
var m = Regex.Match(str, #"([0-9]*)");
if(m.Groups.Length>1) //This will avoid the values starting with numbers only.
Console.WriteLine("{0} --> {1}", str, m.Groups[m.Groups.Length-1].Value);

How to validate using Regex

My Requirement is that
My first two digits in entered number is of the range 00-32..
How can i check this through regex in C#?
I could not Figure it out !!`

Do you really need a regex?
int val;
if (Int32.TryParse("00ABFSSDF".Substring(0, 2), out val))
{
if (val >= 0 && val <= 32)
{
// valid
}
}

Since this is almost certainly a learning exercise, here are some hints:
Your rexex will be an "OR" | of two parts, both validating the first two characters
The first expression part will match if the first character is a digit is 0..2, and the second character is a digit 0..9
The second expression part will match if the first character is digit 3, and the second character is a digit 0..2
To match a range of digits, use [A-B] range, where A is the lower and B is the upper bound for the digits to match (both bounds are inclusive).

Try something like
Regex reg = new Regex(#"^([0-2]?[0-9]|3[0-2])$");
Console.WriteLine(reg.IsMatch("00"));
Console.WriteLine(reg.IsMatch("22"));
Console.WriteLine(reg.IsMatch("33"));
Console.WriteLine(reg.IsMatch("42"));
The [0-2]?[0-9] matches all numbers from zero to 29 and the 3[0-2] matches 30-32.
This will validate number from 0 to 32, and also allows for numbers with leading zero, eg, 08.

You should divide the region as in:
^[012]\d|3[012]

if(Regex.IsMatch("123456789","^([0-2][0-9]|3[0-2])"))
// match

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# RegEx. Replace credit card in text - c#

Related

How to Regex Split an expression to accept decimal number?

Extracting dollar prices and numbers with comma as thousand separator from PDF converted to text format

RegEx to find numbers sequence in string separated by space with predefined maximum length

Regular expression match all numbers after the last dash?

How to validate using Regex

Categories

Resources