Regex query to validate a currency string C#? - c#

I am in no way a master of Regex which is why I am here I currently have this:
\s?[^0-9]?\s?[0-9,]+([\\.]{1})?[0-9]+\s?
Link to regex101
To explain my validation attempt I am trying to validate that a string matches the correct formatting structure.
I only want to match strings such as:
£1.00
£10.00
£100.00
£1000.00
£10,000.00
£100,000.00
£1,234,546.00
Validation rules:
String must include a '£' at the start.
String should always have 2 digits following a decimal place.
Following the '£' only digits between 0-9 should be accepted
If the string length is greater than 6 (after £1000.00) then commas need to be entered at the appropriate points (I.E. £10,000.00 - £100,000.00 - £1,000,000.00 - £10,000,000.00 etc)
For example, strings that shouldn't be accepted:
£1
£10.000
£1,00.00
£1,000.00
10,000.00
£100,000,00
£1.234.546
Really hoping that one of you amazing people can help me out, if you need anymore info then please let me know!

You can try the pattern below:
^£(?:[0-9]{1,4}|[0-9]{2,3},[0-9]{3}|[0-9]{1,3}(?:,[0-9]{3}){2,})\.[0-9]{2}$
When pences are mandatory - \.[0-9]{2} we have 3 options for pounds:
[0-9]{1,4} for sums in range £0 .. £9999
[0-9]{2,3},[0-9]{3} for sums in range £10,000 .. £999,999
[0-9]{1,3}(?:,[0-9]{3}){2,} for huge sums £1,000,000 ..
Demo:
using System.Linq;
using System.Text.RegularExpressions;
...
Regex regex = new Regex(
#"^£(?:[0-9]{1,4}|[0-9]{2,3},[0-9]{3}|[0-9]{1,3}(?:,[0-9]{3}){2,})\.[0-9]{2}$";
string[] tests = new string[] {
"£1.00",
"£10.00",
"£100.00",
"£1000.00",
"£10,000.00",
"£100,000.00",
"£1,234,546.00",
"£1",
"£10.000",
"£1,00.00",
"£1,000.00",
"10,000.00",
"£100,000,00",
"£1.234.546",
};
string report = string.Join(Environment.NewLine, tests
.Select(test => $"{test,15} : {(regex.IsMatch(test) ? "Match" : "Fail")}"));
Console.Write(report);
Outcome:
£1.00 : Match
£10.00 : Match
£100.00 : Match
£1000.00 : Match
£10,000.00 : Match
£100,000.00 : Match
£1,234,546.00 : Match
£1 : Fail
£10.000 : Fail
£1,00.00 : Fail
£1,000.00 : Fail
10,000.00 : Fail
£100,000,00 : Fail
£1.234.546 : Fail

What about this?
new Regex(#"£\d{1,3}(\,\d{3})*\.\d{2}\b");
Edit:
new Regex(#"£((\d{1,4})|(\d{2,3}(\,\d{3})+)|(\d(\,\d{3}){2,}))\.\d{2}\b");
https://regex101.com/r/zSRw2B/1

Related

Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!
For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?
I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.
For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'

Regex for ClassName.PropertyName

I don't know Regex,
But I need to have regex expression for evaluation of ClassName.PropertyName?
Need to validate some values from appSettings for being compliant with ClassName.PropertyName convention
"ClassName.PropertyName" - this is the only format that is valid, the rest below is invalid:
"Personnel.FirstName1" <- the only string that should match
"2Personnel.FirstName1"
"Personnel.33FirstName"
"Personnel..FirstName"
"Personnel.;FirstName"
"Personnel.FirstName."
"Personnel.FirstName "
" Personnel.FirstName"
" Personnel. FirstName"
" 23Personnel.3FirstName"
I have tried this (from the link posted as duplicate):
^\w+(.\w+)*$
but it doesn't work: I have false positives, e.g. 2Personnel.FirstName1 as well as Personnel.33FirstName passes the check when both should have been rejected.
Can someone help me with that?
Let's start from single identifier:
Its first character must be letter or underscope
It can contain letters, underscopes and digits
So the regular expression for an identifier is
[A-Za-z_][A-Za-z0-9_]*
Next, we should chain identifier with . (do not forget to escape .) an indentifier followed by zero or more . + identifier:
^[A-Za-z_][A-Za-z0-9_]*(?:\.[A-Za-z_][A-Za-z0-9_]*)*$
In case it must be exactly two identifiers (and not, say abc.def.hi - three ones)
^[A-Za-z_][A-Za-z0-9_]*\.[A-Za-z_][A-Za-z0-9_]*$
Tests:
string[] tests = new string[] {
"Personnel.FirstName1", // the only string that should be matched
"2Personnel.FirstName1",
"Personnel.33FirstName",
"Personnel..FirstName",
"Personnel.;FirstName",
"Personnel.FirstName.",
"Personnel.FirstName ",
" Personnel.FirstName",
" Personnel. FirstName",
" 23Personnel.3FirstName",
} ;
string pattern = #"^[A-Za-z_][A-Za-z0-9_]*(\.[A-Za-z_][A-Za-z0-9_]*)*$";
var results = tests
.Select(test =>
$"{"\"" + test + "\"",-25} : {(Regex.IsMatch(test, pattern) ? "matched" : "failed")}"");
Console.WriteLine(String.Join(Environment.NewLine, results));
Outcome:
"Personnel.FirstName1" : matched
"2Personnel.FirstName1" : failed
"Personnel.33FirstName" : failed
"Personnel..FirstName" : failed
"Personnel.;FirstName" : failed
"Personnel.FirstName." : failed
"Personnel.FirstName " : failed
" Personnel.FirstName" : failed
" Personnel. FirstName" : failed
" 23Personnel.3FirstName" : failed
Edit: In case culture specific names (like äöü.FirstName) should be accepted (see Rand Random's comments) then [A-Za-z] range should be changed into \p{L} - any letter. Exotic possibility - culture specific digits (e.g. Persian ones - ۰۱۲۳۴۵۶۷۸۹) can be solved by changing 0-9 into \d
// culture specific letters, but not digits
string pattern = #"^[\p{L}_][\p{L}0-9_]*(?:\.[\p{L}_][\p{L}0-9_]*)*$";
If each identifier should not exceed sertain length (say, 16) we should redesign initial identifier pattern: mandatory letter or underscope followed by [0..16-1] == {0,15} letters, digits or underscopes
[A-Za-z_][A-Za-z0-9_]{0,15}
And we have
string pattern = #"^[A-Za-z_][A-Za-z0-9_]{0,15}(?:\.[A-Za-z_][A-Za-z0-9_]{0,15})*$";
^[A-Za-z]*\.[A-Za-z]*[0-9]$
or
^[A-Za-z]*\.[A-Za-z]*[0-9]+$
if you need more than one numerical character in the number suffix

Need Regex expression to allow only either numbers or letters separated by comma and it should not allow alpha numeric

Need Regex expression to allow only either numbers or letters separated by comma and it should not allow alpha numeric combinations (like "abc123").
Some examples:
Valid:
123,abc
abc,123
123,123
abc,abc
Invalid:
abc,abc123
abc133,abc
abc123,abc123
Since valid and invalid are changed, I've rewritten my answer from scratch.
The suggested pattern is
^(([0-9]+)|([a-zA-Z]+))(,(([0-9]+)|([a-zA-Z]+)))*$
Demo:
string[] tests = new string[] {
"123,abc",
"abc,123",
"123,123",
"abc,abc",
"abc,abc123",
"abc133,abc",
"abc123,abc123",
// More tests
"123abc", // invalid (digits first, then letters)
"123", // valid (one item)
"a,b,c,1,2,3", // valid (more than two items)
"1e4", // invalid (floating point number)
"1,,2", // invalid (empty part)
"-3", // invalid (minus sign)
"۱۲۳", // invalid (Persian digits)
"число" // invalid (Russian letters)
};
string pattern = #"^(([0-9]+)|([a-zA-Z]+))(,(([0-9]+)|([a-zA-Z]+)))*$";
var report = string.Join(Environment.NewLine, tests
.Select(item => $"{item,-20} : {(Regex.IsMatch(item, pattern) ? "valid" : "invalid")}"));
Console.WriteLine(report);
Outcome:
123,abc : valid
abc,123 : valid
123,123 : valid
abc,abc : valid
abc,abc123 : invalid
abc133,abc : invalid
abc123,abc123 : invalid
123abc : invalid
123 : valid
a,b,c,1,2,3 : valid
1e4 : invalid
1,,2 : invalid
-3 : invalid
۱۲۳ : invalid
число : invalid
Pattern's explanation:
^ - string beginning (anchor)
([0-9]+)|([a-zA-Z]+) - either group of digits (1+) or group of letters
(,(([0-9]+)|([a-zA-Z]+))) - fllowed by zero or more such groups
$ - string ending (anchor)
If you specify Dmitry Bychenkos regex with RegexOptions.IgnoreCase you can shrink it down to Regex.IsMatch (test, #"^[0-9a-z](,[0-9a-z])*$",RegexOptions.IgnoreCase)
Alternate way to check it w/o regex (performs worse):
using System;
using System.Linq;
public class Program1
{
public static void Main()
{
var mydata = new[] {"1,3,4,5,1,3,a,s,r,3", "2, 4 , a", " 2,3,as"};
// function that checks it- perfoms not as good as reges as internal stringarray
// is build and analyzed
Func<string,bool> isValid =
data => data.Split(new[]{','}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim())
.All(aChar => aChar.Length == 1 && char.IsLetterOrDigit(aChar[0]));
foreach (var d in mydata)
{
Console.WriteLine(string.Format("{0} => is {1}",d, isValid(d) ? "Valid" : "Invalid"));
}
}
}
Output:
1,3,4,5,1,3,a,s,r,3 => is Valid
2, 4 , a => is Valid
2,3,as => is Invalid
To match words separated by commas, where the words consist either of digits or of letters:
^(\d+|[a-zA-Z]+)(,(\d+|[a-zA-Z]+))*$
Explanation
\d+ matches a string of at least one digit.
[a-zA-Z] matches a string of at least one upper or lower case letter.
(\d+|[a-zA-Z]+) matches either a string of digits or a string of letters.
C#
Regex regex = new Regex(#"^(\d+|[a-zA-Z]+)(,(\d+|[a-zA-Z]+))*$");

Regex to do not match certain sequence

I have a text file as below:
1.1 - Hello
1.2 - world!
2.1 - Some
data
here and it contains some 32 digits so i cannot use \D+
2.2 - Etc..
so i want a regex to get 4 matches in this case for each point. My regex doesn't work as I wish. Please, advice:
private readonly Regex _reactionRegex = new Regex(#"(\d+)\.(\d+)\s*-\s*(.+)", RegexOptions.Compiled | RegexOptions.Singleline);
even this regex isn't very helpful:
(\d+)\.(\d+)\s*-\s*(.+)(?<!\d+\.\d+)
Alex, this regex will do it:
(?sm)^\d+\.\d+\s*-\s*((?:.(?!^\d+\.\d+))*)
This is assuming that you want to capture the point, without the numbers, for instance: just Hello
If you want to also capture the digits, for instance 1.1 - Hello, you can use the same regex and display the entire match, not just Group 1. The online demo below will show you both.
How does it work?
The idea is to capture the text you want to Group 1 using (parentheses).
We match in multi-line mode m to allow the anchor ^ to work on each line.
We match in dotall mode s to allow the dot to eat up strings on multiple lines
We use a negative lookahead (?! to stop eating characters when what follows is the beginning of the line with your digit marker
Here is full working code and an online demo.
using System;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
class Program {
static void Main() {
string yourstring = #"1.1 - Hello
1.2 - world!
2.1 - Some
data
here and it contains some 32 digits so i cannot use \D+
2.2 - Etc..";
var resultList = new StringCollection();
try {
var yourRegex = new Regex(#"(?sm)^\d+\.\d+\s*-\s*((?:.(?!^\d+\.\d+))*)");
Match matchResult = yourRegex.Match(yourstring);
while (matchResult.Success) {
resultList.Add(matchResult.Groups[1].Value);
Console.WriteLine("Whole Match: " + matchResult.Value);
Console.WriteLine("Group 1: " + matchResult.Groups[1].Value + "\n");
matchResult = matchResult.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Console.WriteLine("\nPress Any Key to Exit.");
Console.ReadKey();
} // END Main
} // END Program
This may do for what you're looking for, though there is some ambiguity of the expected result.
(\d+)\.(\d+)\s*-\s*(.+?)(\n)(?>\d|$)
The ambiguity is for example what would you expect to match if data looked like:
1.1 - Hello
1.2 - world!
2.1 - Some
data here and it contains some
32 digits so i cannot use \D+
2.2 - Etc..
Not clear if 32 here starts a new record or not.

Regex match 2 out of 4 groups

I want a single Regex expression to match 2 groups of lowercase, uppercase, numbers or special characters. Length needs to also be grater than 7.
I currently have this expression
^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z]).{8,}$
It, however, forces the string to have lowercase and uppercase and digit or special character.
I currently have this implemented using 4 different regex expressions that I interrogate with some C# code.
I plan to reuse the same expression in JavaScript.
This is sample console app that shows the difference between 2 approaches.
class Program
{
private static readonly Regex[] Regexs = new[] {
new Regex("[a-z]", RegexOptions.Compiled), //Lowercase Letter
new Regex("[A-Z]", RegexOptions.Compiled), // Uppercase Letter
new Regex(#"\d", RegexOptions.Compiled), // Numeric
new Regex(#"[^a-zA-Z\d\s:]", RegexOptions.Compiled) // Non AlphaNumeric
};
static void Main(string[] args)
{
Regex expression = new Regex(#"^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z]).{8,}$", RegexOptions.ECMAScript & RegexOptions.Compiled);
string[] testCases = new[] { "P#ssword", "Password", "P2ssword", "xpo123", "xpo123!", "xpo123!123##", "Myxpo123!123##", "Something_Really_Complex123!#43#2*333" };
Console.WriteLine("{0}\t{1}\t", "Single", "C# Hack");
Console.WriteLine("");
foreach (var testCase in testCases)
{
Console.WriteLine("{0}\t{2}\t : {1}", expression.IsMatch(testCase), testCase,
(testCase.Length >= 8 && Regexs.Count(x => x.IsMatch(testCase)) >= 2));
}
Console.ReadKey();
}
}
Result Proper Test String
------- ------- ------------
True True : P#ssword
False True : Password
True True : P2ssword
False False : xpo123
False False : xpo123!
False True : xpo123!123##
True True : Myxpo123!123##
True True : Something_Really_Complex123!#43#2*333
For javascript you can use this pattern that looks for boundaries between different character classes:
^(?=.*(?:.\b.|(?i)(?:[a-z]\d|\d[a-z])|[a-z][A-Z]|[A-Z][a-z]))[^:\s]{8,}$
if a boundary is found, you are sure to have two different classes.
pattern details:
\b # is a zero width assertion, it's a boundary between a member of
# the \w class and an other character that is not from this class.
.\b. # represents the two characters with the word boundary.
boundary between a letter and a number:
(?i) # make the subpattern case insensitive
(?:
[a-z]\d # a letter and a digit
| # OR
\d[a-z] # a digit and a letter
)
boundary between an uppercase and a lowercase letter:
[a-z][A-Z] | [A-Z][a-z]
since all alternations contains at least two characters from two different character classes, you are sure to obtain the result you hope.
You could use possessive quantifiers (emulated using atomic groups), something like this:
((?>[a-z]+)|(?>[A-Z]+)|(?>[^a-zA-Z]+)){2,}
Since using possessive matching will prevent backtracking, you won't run into the two groups being two consecutive groups of lowercase letters, for instance. So the full regex would be something like:
^(?=.*((?>[a-z]+)|(?>[A-Z]+)|(?>[^a-zA-Z]+)){2,}).{8,}$
Though, were it me, I'd cut the lookahead, just use the expression ((?>[a-z]+)|(?>[A-Z]+)|(?>[^a-zA-Z]+)){2,}, and check the length separately.

Categories

Resources