Regex (.NET) to validate any real number in several formats - c#

I am trying to validate any real number in different formats using one regex rule in .NET. The formats I mean are the following:
Dots (thousands) and comma (decimal)
123 ; 1.234.567 ; 12.345.678 ; 123.456.789 ; 1.234.567,89 ; 1.234,56789 ; 1,2 ; 0,123
Commas (thousands) and dot (decimal)
1,234,567 ; 12,345,678 ; 123,456,789 ; 1,234,567.89 ; 1,234.56789 ; 1.2 ; 0.123
White space (thousands) and dot or comma (decimal)
1 234 567 ; 12 345 678 ; 123 456 789 ; 1 234 567,89 ; 1 234 567.89 ; 1 234,56789 ; 1 234.56789
I know a bit more that the basics about regex, so I have done this. No success so far.
(^|\s)(-|\+|±|\+/-)?(?:(([1-9]{1,3})([,]\d{3})*|[0]?)([\.]\d+)?)|(?:(([1-9]{1,3})([\.]\d{3})*|[0]?)([,]\d+)?)|(?:(([1-9]{1,3})([\s]\d{3})*|[0]?)([\.|,]\d+)?)(\s|$)
Can any one help me or link me to the solution if it is out there?

Well, this may not be the optimal regex:
^\d*$|^(?:\d{1,3}(?:\.\d{3})*(?:,\d{1,5})?)$|^(?:\d{1,3}(?:,\d{3})*(?:\.\d{1,5})?)$|^(?:\d{1,3}(?: \d{3})*(?:[,.]\d{1,5})?)$
But it does the job. I'll look how to make a better one in a near future. Here's a Live Demo
If your input is not that dirty (ie: once you have a space as a thousand separator you don't get dot then comma, not like 1 032,354.12) you can use this simple version:
^\d{1,3}(?:[., ]\d{3})*(?:[.,]\d{1,5})?$
Which means:
\d{1,3} <= start with 1 to 3 digit;
(?:[., ]\d{3})* <= thousand separator with 3 digit after repeated 0 to n times;
(?:[.,]\d{1,5})? <= decimal separator with 1 to 5 digit after it, 0 or 1 time.

Related

Extracting dollar prices and numbers with comma as thousand separator from PDF converted to text format

I am trying to redact some pdfs with dollar amounts using c#. Below is what I have tried
#"/ (\d)(?= (?:\d{ 3})+(?:\.|$))| (\.\d\d ?)\d *$/ g"
#"(?<=each)(((\d*[,|.]\d{2,3}))*)"
#"(?<=each)(((\d*[,|.]\d{2,3}))*)"
#"\d+\.\d{2}"
Here are some test cases that it needs to match
76,249.25
131,588.00
7.09
21.27
420.42
54.77
32.848
3,056.12
0.009
0.01
32.85
2,948.59
$99,249.25
$9.0000
$1,800.0000
$1,000,000
Here are some test cases that it should not target
666-257-6443
F1A 5G9
Bolt, Locating, M8 x 1.25 x 30 L
Precision Washer, 304 SS, 0.63 OD x 0.31
Flat Washer 300 Series SS; Pack of 50
U-SSFAN 0.63-L6.00-F0.75-B0.64-T0.38-SC5.62
U-CLBUM 0.63-D0.88-L0.875
U-WSSS 0.38-D0.88-T0.125
U-BGHK 6002ZZ - H1.50
U-SSCS 0.38-B0.38
6412K42
Std Dowel, 3/8" x 1-1/2" Lg, Steel
2019.07.05
2092-002.0180
SHCMG 0.25-L1.00
280160717
Please note the c# portion is interfacing with iText 7 pdfSweep.
Guid g = new Guid();
CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
string guid = g.ToString();
string input = #"C:\Users\JM\Documents\pdftest\61882 _280011434 (1).pdf";
string output = #"C:\Users\JM\Documents\pdftest\61882 _2800011434 (1) x2" + guid+".pdf";
string regex = #"(?m)^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$";
strategy.Add(new RegexBasedCleanupStrategy(regex));
PdfDocument pdf = new PdfDocument(new PdfReader(input), new PdfWriter(output));
PdfAutoSweep autoSweep = new PdfAutoSweep(strategy);
autoSweep.CleanUp(pdf);
pdf.Close();
Please share your wisdom
You may use
\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?
Or, if the prices occur on whole lines:
^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$
See the regex demo
Bonus: To obtain only price values, you need to remove the ? after \$ to make it obligatory:
\$([0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?)
(I added a capturing group in case you need to access the number value separately from the $ char).
If you need to support any currency char, not just $, replace \$ with \p{Sc}.
Details
^ - start of string
\$? - an optional dollar symbol
[0-9]{1,3} - one to three digits
(?:,[0-9]{3})* - any 0 or more repetitions of a comma and then three digits
(?:\.[0-9]+)? - an optional sequence of a dot and then any 1 or more digits
$ - end of string.
C# check for a match:
if (Regex.IsMatch(str, #"^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?$"))
{
// there is a match
}
pdfSweep notice:
Apply the fix from this answer. The point is that the line breaks are lost when parsing the text. The regex you need then is
#"(?m)^\$?[0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)?\r?$"
where (?m) makes ^ and $ match start/end of lines and \r? is required as $ only matches before LF, not before CRLF in .NET regex.

RegEx to find numbers sequence in string separated by space with predefined maximum length

Sorry for the confusing title, I'll try to explain this with example. Currently we have this expression to find number sequence in a string
\b((\d[ ]{0,1}){13,19})\b
Now I'd like to modify it so it fulfills these rule
- The length should be between 13 to 19 characters, excluding the whitespaces
- Each number cluster must have minimum 3 digits
The expression should mark these as matched:
1234567890123
1234 5678 9012 345
Not match:
123456789012 3
123 12 123 1 23134
Current expression that I have will mark all of them as match.
Example
This is possible using look-around.
The regex can be changed to the following:
\b(?<!\d )(?=(?:\d ?){13,19}(?! ?\d))(?:\d{3,} ?)+\b(?! ?\d)
This works by looking ahead to make sure the number is between 13 and 19 digits long. It then matches groups of 3 or more digits. It then uses negative look ahead after its found all groups of 3 to make sure there aren't any numbers left. If there are, we've found a group smaller than 3. This works on the examples you've provided.
\b Makes sure that its the start of a "word".
(?<!\d ) Make sure there are no numbers behind.
(?=(?:\d ?){13,19}(?! ?\d)) Looks ahead to make sure the number is between 13 and 19 digits long
(?:\d ?){13,19} From original. ?: added to make non-capturing
(?! ?\d) Negative look ahead: if there is still digits left after getting 19 digits, too big therefore discard current match
(?:\d{3,} ?)+ Match any number of clusters bigger than 3 (min 13, max 19 handled by first look ahead)
\b(?! ?\d) Looks for the end of a cluster. If there are still numbers left after the end of the cluster, there must be a cluster that is too small.
Test here
I suggest the following solution also based on lookarounds:
\b\d(?!\d?\b)(?: ?\d(?!(?<= \d)\d?\b)){12,18}\b
See the regex demo
The main point is that we only match the next digit if it is not a part of a 1- or 2-digit group.
Pattern explanation
\b - starting word boundary
\d(?!\d?\b) - a digit that is not followed with 1 or 0 digits and then a trailing word boundary (that is, if it is 12 or 1 like group, it is failed)
(?: ?\d(?!(?<= \d)\d?\b)){12,18} - 12 to 18 occurrences of:
? - 1 or 0 spaces
\d(?!(?<= \d)\d?\b) - any single digit that is not followed with 1 or 0 digits followed with a word boundary (thanks to the (?!\d?\b)), and if that 1 or 0 digits are preceded with space + 1 digit ((?<= \d) lookbehind does that)
\b - a trailing word boundary.
NOTE that in case you want to match these strings in a non-numeric context (that means, if you do not want to allow any digits on the left and on the right) you might also consider adding (?<!\d *) at the front and (?! *\d) at the end of the pattern.
Note that to match any whitespace, you may replace a literal space with \s in the pattern.
If you can use Linq, this will be way easier to maintain:
var myList = new List<string>
{
"1234567890123",
"1234 5678 9012 345",
"123456789012 3",
"123 12 123 1 23134"
};
foreach(var input in myList)
{
var splitted = Regex.Split(input, #"\s+"); // Split on whitespace
var length = splitted.Sum(x => x.Length); // Compute the total length
var smallestGroupSize = splitted.Min(x => x.Length); // Compute the length of the smallest chunck
Console.WriteLine($"Total lenght: {length}, smallest group size: {smallestGroupSize}");
if (length < 13 || length > 19 || smallestGroupSize < 3)
{
Console.WriteLine($"Input '{input}' is incorrect{Environment.NewLine}");
continue;
}
Console.WriteLine($"Input '{input}' is correct!{Environment.NewLine}");
}
which produces:
Total lenght: 13, smallest group size: 13
Input '1234567890123' is correct!
Total lenght: 15, smallest group size: 3
Input '1234 5678 9012 345' is correct!
Total lenght: 13, smallest group size: 1
Input '123456789012 3' is incorrect
Total lenght: 14, smallest group size: 1
Input '123 12 123 1 23134' is incorrect

Regex for validation of many types of number

I'm new to Regex and I would like to know how do I detect the number by Regex in C#, that always display in a format : #,###
Ex : 2 000,000 into 2,000
Ex : 15 000.000 into 15,000
Ex : 6.700 into 6,700
Ex : .3.3.3 into 0,300
These are some examples that I'm doing for validation
As the comments suggest, the question is not very clear.
To get your examples working, you can use e.g.
(?:(?<int>\d+)[ .,]?|[.,])
(?<frac>\d+)?
(?:[ .,]\d+)*
to match the "integer part" and the "fractional part" divided by ., , or (wired, but that is what I read out of your examples - since 15 000.000 => 15,000 and 6.700 => 6,700 I assume a comma seperator everywhere).
I'm pretty sure I did not get it right! At least not entirely. The examples you provide look like numbers with different thousands separator, but it seems to have no system.
However, this is what you match with the regular expression above:
int | | frac | anything else
----+-+------+--------------
2 | | 000 | ,000
15 | | 000 | .000
6 |.| 70 |
|.| 3 | .3.3
In addition, it matches numbers without fractional part.
In Detail
(?:(?<int>\d+)[ .,]?|[.,])
Match decimals (one ore more) and store them in a group named int. Match an optional , . or , thereafter.
OR
Match . or ,.
(?<frac>\d+)?
Optionally match the fraction part (one or more decimals).
(?:[ .,]\d+)*
Match , . or , and one or more decimals (repeat this zero or more times).
This last one is to prevent the last parts of e.g. .3.3.3 to match in subsequent calls.
Next
Then you can use a MatchEvaluator-Function (here in form of a delegate) to replace the values.
var rx = new Regex(#"
(?:(?<int>\d+)[ .,]?|[.,])
(?<frac>\d+)?
(?:[ .,]\d+)*
",
RegexOptions.IgnorePatternWhitespace
);
var deDE = new System.Globalization.CultureInfo("de-DE");
text = rx.Replace(text, delegate(Match match) {
int integral;
int fraction;
int fraclen = match.Groups["frac"].Length;
int.TryParse(match.Groups["int"].Value, out integral);
int.TryParse(match.Groups["frac"].Value, out fraction);
var val = integral + fraction / Math.Pow(10, fraclen);
return String.Format(deDE, "{0:0.000}", val);
});
The function is called for every match. Inside, I read out the groups, convert them into integers and then create the matched value with integral + fraction / Math.Pow(10, fraclen) (integral part + fraction part divided by 10^len where len is the string-length of the fraction part, thus "70" becomes 0.7 by calculating 70/10^2 == 70/100 == 0.7).
At the end, I return String.Format with CultureInfo de-DE. This is done because in Germany you use , as decimal seperator. There are others too - and there are many other ways to output such a number..
This is just an example.

Percentage Regex with comma

I have this RegEx for C# ASP.NET MVC3 Model validation:
[RegularExpression(#"[0-9]*\,?[0-9]?[0-9]")]
This works for almost all cases, except if the number is bigger than 100.
Any number greater than 100 should show error.
I already tried use [Range], but it doesn't work with commas.
Valid: 0 / 0,0 / 0,00 - 100 / 100,0 / 100,00.
Invalid (Number > 100).
Not sure if zero's are only optional digits at the end but
# (?:100(?:,0{1,2})?|[0-9]{1,2}(?:,[0-9]{1,2})?)
(?:
100
(?: , 0{1,2} )?
|
[0-9]{1,2}
(?: , [0-9]{1,2} )?
)
Zero's only option at end
# (?:100|[0-9]{1,2})(?:,0{1,2})?
(?:
100
| [0-9]{1,2}
)
(?: , 0{1,2} )?
And, the permutations for no leading zero's except for zero itself
# (?:100(?:,0{1,2})?|(?:0|[1-9][0-9]?)(?:,[0-9]{1,2})?)
(?:
100
(?: , 0{1,2} )?
|
(?:
0
|
[1-9] [0-9]?
)
(?: , [0-9]{1,2} )?
)
# (?:100|0|[1-9][0-9])(?:,0{1,2})?
(?:
100
|
0
|
[1-9] [0-9]
)
(?: , 0{1,2} )?
Here's a RegEx that matches your criteria:
^(?:(?:[0-9]|[1-9]{1,2})(?:,[0-9]{1,2})?|(?:100)(?:,0{1,2})?)$
(Given your use case, I have assumed that your character sequence appears by itself and is not embedded within other content. Please let me know if that is not the case.)
And here's a Perl program that demonstrates that RegEx on a sample data set. (Also see live demo.)
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
chomp;
# A1 => An integer between 1 and 99, without leading zeros.
# (Although zero can appear by itself.)
#
# A2 => A optional fractional component that may contain no more
# than two digits.
#
# -OR-
#
# B1 => The integer 100.
#
# B2 => A optional fractional component following that may
# consist of one or two zeros only.
#
if (/^(?:(?:[0-9]|[1-9]{1,2})(?:,[0-9]{1,2})?|(?:100)(?:,0{1,2})?)$/) {
# ^^^^^^^^A1^^^^^^ ^^^^^A2^^^^ ^B1 ^^^B2^^
print "* [$_]\n";
} else {
print " [$_]\n";
}
}
__DATA__
0
01
11
99
100
101
0,0
0,00
01,00
0,000
99,00
99,99
100,0
100,00
100,000
100,01
100,99
101,00
Expected Output
* [0]
[01]
* [11]
* [99]
* [100]
[101]
* [0,0]
* [0,00]
[01,00]
[0,000]
* [99,00]
* [99,99]
* [100,0]
* [100,00]
[100,000]
[100,01]
[100,99]
[101,00]

regular expression for 2 string arguments having numeric values with range constraint

I need to validate console input arguments. User can pass only 2 arguments separated by Space.
First argument should be between 1 to 100
Second argument should be between 1 to 750.
I need a regular expression to validate the input. Please help.
Description
this regex will match 1-100 space 1-750
^\b([1-9][0-9]?|100)\b\s+\b([1-9][0-9]?|[1-6][0-9]{2}|7[0-4][0-9]|750)\b$
Expanded
^ match the start of the string
\b match the word boundary
( open capture group 1
[1-9] match any single digit not including zero followed by
[0-9]? match any single digit or no digit
| or
100 match the number one hundred
) close the capture group 1
\b\s+\b require a word break, space, and word break.
( start capture group 2
[1-9] match any single digit not including zero followed by
[0-9]? match any single digit or no digit
| or
[1-6] match any digits 1 thru 6 followed by
[0-9]{2} match two of any digits
| or
7 match a seven followed by
[0-4] match digits 0 thru 4 followed by
[0-9] match any single digit
| or
750 match the number seven hundred and fifty
) close the capture group
\b$ require a word break and end of string.
It sounds like you want a pattern like this:
^(1|[1-9]\d|100)\s+(1|[1-9]\d|[1-6]\d\d|7[0-5]\d)$
However, you are probably better off verifying the inputs via normal integer comparison:
int int1, int2;
if (int.TryParse(param1, out int1) && int.TryParse(param2, out int2))
{
if (int1 >= 1 && int1 <= 100 && int2 >= 1 && int2 <= 750)
{
...
}
}
As others have said, regex isn't the best option, but if you really want to use it, this seems to work...
^(?:100|[1-9]\d?) (?:[1-7](?:[0-4]\d|50)|[1-9]\d?)$
I rather recommend not using regex but something like this:
int a=0,b=0;
if(args.Length != 2){
// not 2 arguments
}else{
if(!int.TryParse(args[0], out a) || !int.TryParse(args[1], out b)){
// not numbers
}else{
if(a < 1 || a > 100 || b < 1 || b > 750){
// out of ranges
}else{
// everything fine
}
}
}
and you'll have your numbers right there.

Categories

Resources