I don't know Regex,
But I need to have regex expression for evaluation of ClassName.PropertyName?
Need to validate some values from appSettings for being compliant with ClassName.PropertyName convention
"ClassName.PropertyName" - this is the only format that is valid, the rest below is invalid:
"Personnel.FirstName1" <- the only string that should match
"2Personnel.FirstName1"
"Personnel.33FirstName"
"Personnel..FirstName"
"Personnel.;FirstName"
"Personnel.FirstName."
"Personnel.FirstName "
" Personnel.FirstName"
" Personnel. FirstName"
" 23Personnel.3FirstName"
I have tried this (from the link posted as duplicate):
^\w+(.\w+)*$
but it doesn't work: I have false positives, e.g. 2Personnel.FirstName1 as well as Personnel.33FirstName passes the check when both should have been rejected.
Can someone help me with that?
Let's start from single identifier:
Its first character must be letter or underscope
It can contain letters, underscopes and digits
So the regular expression for an identifier is
[A-Za-z_][A-Za-z0-9_]*
Next, we should chain identifier with . (do not forget to escape .) an indentifier followed by zero or more . + identifier:
^[A-Za-z_][A-Za-z0-9_]*(?:\.[A-Za-z_][A-Za-z0-9_]*)*$
In case it must be exactly two identifiers (and not, say abc.def.hi - three ones)
^[A-Za-z_][A-Za-z0-9_]*\.[A-Za-z_][A-Za-z0-9_]*$
Tests:
string[] tests = new string[] {
"Personnel.FirstName1", // the only string that should be matched
"2Personnel.FirstName1",
"Personnel.33FirstName",
"Personnel..FirstName",
"Personnel.;FirstName",
"Personnel.FirstName.",
"Personnel.FirstName ",
" Personnel.FirstName",
" Personnel. FirstName",
" 23Personnel.3FirstName",
} ;
string pattern = #"^[A-Za-z_][A-Za-z0-9_]*(\.[A-Za-z_][A-Za-z0-9_]*)*$";
var results = tests
.Select(test =>
$"{"\"" + test + "\"",-25} : {(Regex.IsMatch(test, pattern) ? "matched" : "failed")}"");
Console.WriteLine(String.Join(Environment.NewLine, results));
Outcome:
"Personnel.FirstName1" : matched
"2Personnel.FirstName1" : failed
"Personnel.33FirstName" : failed
"Personnel..FirstName" : failed
"Personnel.;FirstName" : failed
"Personnel.FirstName." : failed
"Personnel.FirstName " : failed
" Personnel.FirstName" : failed
" Personnel. FirstName" : failed
" 23Personnel.3FirstName" : failed
Edit: In case culture specific names (like äöü.FirstName) should be accepted (see Rand Random's comments) then [A-Za-z] range should be changed into \p{L} - any letter. Exotic possibility - culture specific digits (e.g. Persian ones - ۰۱۲۳۴۵۶۷۸۹) can be solved by changing 0-9 into \d
// culture specific letters, but not digits
string pattern = #"^[\p{L}_][\p{L}0-9_]*(?:\.[\p{L}_][\p{L}0-9_]*)*$";
If each identifier should not exceed sertain length (say, 16) we should redesign initial identifier pattern: mandatory letter or underscope followed by [0..16-1] == {0,15} letters, digits or underscopes
[A-Za-z_][A-Za-z0-9_]{0,15}
And we have
string pattern = #"^[A-Za-z_][A-Za-z0-9_]{0,15}(?:\.[A-Za-z_][A-Za-z0-9_]{0,15})*$";
^[A-Za-z]*\.[A-Za-z]*[0-9]$
or
^[A-Za-z]*\.[A-Za-z]*[0-9]+$
if you need more than one numerical character in the number suffix
Related
I try to filter some strings I streamed for some useful information in C#.
I got two possible string structures:
string examplestring1 = "from - to (mm) no. 1\r\n\r\nna 570 - 590\r\n60 18.12.20\r\nna 5390 - 5410\r\n60 18.12.20\r\nna 11380 - 11390 60 18.12.20\r\nPage 1/1";
string examplestring2 = "e ne 570 - 590 ne 5390 - 5410 ne 11380 - 11390 e";
I'd like to get an array or a List of strings in the format of "xxx - xxx". Like:
string[] example = new string[]{"570 - 590","5390 - 5410","11380 - 11390"};
I tried to use Regex:
List<string> numbers = new List<string>();
numbers.AddRange(Regex.Split(examplestring2, #"\D+"));
At least I get a list only containg the numbers. But that doesn't work out for examplestring1 since there is date within.
Also I tried to play around with Regex pattern. But things like following does not work.
Regex.Split(examplestring1, #"\D+" + " - " + #"\D+");
I'd be grateful for a solution or at least some hint how to solve that matter.
You can use
var results = Regex.Matches(text, #"\d+\s*-\s*\d+").Cast<Match>().Select(x => x.Value);
See the regex demo. If there must be a single regular space on both ends of the -, you can use \d+ - \d+ regex.
If you want to match any -, you can use [\p{Pd}\xAD] instead of -.
Note that \d in .NET matches any Unicode digits, to only match ASCII digits, use RegexOptions.ECMAScript option: Regex.Matches(text, #"\d+\s*-\s*\d+", RegexOptions.ECMAScript).
I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!
For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?
I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.
For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'
I am in no way a master of Regex which is why I am here I currently have this:
\s?[^0-9]?\s?[0-9,]+([\\.]{1})?[0-9]+\s?
Link to regex101
To explain my validation attempt I am trying to validate that a string matches the correct formatting structure.
I only want to match strings such as:
£1.00
£10.00
£100.00
£1000.00
£10,000.00
£100,000.00
£1,234,546.00
Validation rules:
String must include a '£' at the start.
String should always have 2 digits following a decimal place.
Following the '£' only digits between 0-9 should be accepted
If the string length is greater than 6 (after £1000.00) then commas need to be entered at the appropriate points (I.E. £10,000.00 - £100,000.00 - £1,000,000.00 - £10,000,000.00 etc)
For example, strings that shouldn't be accepted:
£1
£10.000
£1,00.00
£1,000.00
10,000.00
£100,000,00
£1.234.546
Really hoping that one of you amazing people can help me out, if you need anymore info then please let me know!
You can try the pattern below:
^£(?:[0-9]{1,4}|[0-9]{2,3},[0-9]{3}|[0-9]{1,3}(?:,[0-9]{3}){2,})\.[0-9]{2}$
When pences are mandatory - \.[0-9]{2} we have 3 options for pounds:
[0-9]{1,4} for sums in range £0 .. £9999
[0-9]{2,3},[0-9]{3} for sums in range £10,000 .. £999,999
[0-9]{1,3}(?:,[0-9]{3}){2,} for huge sums £1,000,000 ..
Demo:
using System.Linq;
using System.Text.RegularExpressions;
...
Regex regex = new Regex(
#"^£(?:[0-9]{1,4}|[0-9]{2,3},[0-9]{3}|[0-9]{1,3}(?:,[0-9]{3}){2,})\.[0-9]{2}$";
string[] tests = new string[] {
"£1.00",
"£10.00",
"£100.00",
"£1000.00",
"£10,000.00",
"£100,000.00",
"£1,234,546.00",
"£1",
"£10.000",
"£1,00.00",
"£1,000.00",
"10,000.00",
"£100,000,00",
"£1.234.546",
};
string report = string.Join(Environment.NewLine, tests
.Select(test => $"{test,15} : {(regex.IsMatch(test) ? "Match" : "Fail")}"));
Console.Write(report);
Outcome:
£1.00 : Match
£10.00 : Match
£100.00 : Match
£1000.00 : Match
£10,000.00 : Match
£100,000.00 : Match
£1,234,546.00 : Match
£1 : Fail
£10.000 : Fail
£1,00.00 : Fail
£1,000.00 : Fail
10,000.00 : Fail
£100,000,00 : Fail
£1.234.546 : Fail
What about this?
new Regex(#"£\d{1,3}(\,\d{3})*\.\d{2}\b");
Edit:
new Regex(#"£((\d{1,4})|(\d{2,3}(\,\d{3})+)|(\d(\,\d{3}){2,}))\.\d{2}\b");
https://regex101.com/r/zSRw2B/1
Need Regex expression to allow only either numbers or letters separated by comma and it should not allow alpha numeric combinations (like "abc123").
Some examples:
Valid:
123,abc
abc,123
123,123
abc,abc
Invalid:
abc,abc123
abc133,abc
abc123,abc123
Since valid and invalid are changed, I've rewritten my answer from scratch.
The suggested pattern is
^(([0-9]+)|([a-zA-Z]+))(,(([0-9]+)|([a-zA-Z]+)))*$
Demo:
string[] tests = new string[] {
"123,abc",
"abc,123",
"123,123",
"abc,abc",
"abc,abc123",
"abc133,abc",
"abc123,abc123",
// More tests
"123abc", // invalid (digits first, then letters)
"123", // valid (one item)
"a,b,c,1,2,3", // valid (more than two items)
"1e4", // invalid (floating point number)
"1,,2", // invalid (empty part)
"-3", // invalid (minus sign)
"۱۲۳", // invalid (Persian digits)
"число" // invalid (Russian letters)
};
string pattern = #"^(([0-9]+)|([a-zA-Z]+))(,(([0-9]+)|([a-zA-Z]+)))*$";
var report = string.Join(Environment.NewLine, tests
.Select(item => $"{item,-20} : {(Regex.IsMatch(item, pattern) ? "valid" : "invalid")}"));
Console.WriteLine(report);
Outcome:
123,abc : valid
abc,123 : valid
123,123 : valid
abc,abc : valid
abc,abc123 : invalid
abc133,abc : invalid
abc123,abc123 : invalid
123abc : invalid
123 : valid
a,b,c,1,2,3 : valid
1e4 : invalid
1,,2 : invalid
-3 : invalid
۱۲۳ : invalid
число : invalid
Pattern's explanation:
^ - string beginning (anchor)
([0-9]+)|([a-zA-Z]+) - either group of digits (1+) or group of letters
(,(([0-9]+)|([a-zA-Z]+))) - fllowed by zero or more such groups
$ - string ending (anchor)
If you specify Dmitry Bychenkos regex with RegexOptions.IgnoreCase you can shrink it down to Regex.IsMatch (test, #"^[0-9a-z](,[0-9a-z])*$",RegexOptions.IgnoreCase)
Alternate way to check it w/o regex (performs worse):
using System;
using System.Linq;
public class Program1
{
public static void Main()
{
var mydata = new[] {"1,3,4,5,1,3,a,s,r,3", "2, 4 , a", " 2,3,as"};
// function that checks it- perfoms not as good as reges as internal stringarray
// is build and analyzed
Func<string,bool> isValid =
data => data.Split(new[]{','}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim())
.All(aChar => aChar.Length == 1 && char.IsLetterOrDigit(aChar[0]));
foreach (var d in mydata)
{
Console.WriteLine(string.Format("{0} => is {1}",d, isValid(d) ? "Valid" : "Invalid"));
}
}
}
Output:
1,3,4,5,1,3,a,s,r,3 => is Valid
2, 4 , a => is Valid
2,3,as => is Invalid
To match words separated by commas, where the words consist either of digits or of letters:
^(\d+|[a-zA-Z]+)(,(\d+|[a-zA-Z]+))*$
Explanation
\d+ matches a string of at least one digit.
[a-zA-Z] matches a string of at least one upper or lower case letter.
(\d+|[a-zA-Z]+) matches either a string of digits or a string of letters.
C#
Regex regex = new Regex(#"^(\d+|[a-zA-Z]+)(,(\d+|[a-zA-Z]+))*$");
Trying to come up with a 'simple' regex to mask bits of text that look like they might contain account numbers.
In plain English:
any word containing a digit (or a train of such words) should be matched
leave the last 4 digits intact
replace all previous part of the matched string with four X's (xxxx)
So far
I'm using the following:
[\-0-9 ]+(?<m1>[\-0-9]{4})
replacing with
xxxx${m1}
But this misses on the last few samples below
sample data:
123456789
a123b456
a1234b5678
a1234 b5678
111 22 3333
this is a a1234 b5678 test string
Actual results
xxxx6789
a123b456
a1234b5678
a1234 b5678
xxxx3333
this is a a1234 b5678 test string
Expected results
xxxx6789
xxxxb456
xxxx5678
xxxx5678
xxxx3333
this is a xxxx5678 test string
Is such an arrangement possible with a regex replace?
I think I"m going to need some greediness and lookahead functionality, but I have zero experience in those areas.
This works for your example:
var result = Regex.Replace(
input,
#"(?<!\b\w*\d\w*)(?<m1>\s?\b\w*\d\w*)+",
m => "xxxx" + m.Value.Substring(Math.Max(0, m.Value.Length - 4)));
If you have a value like 111 2233 33, it will print xxxx3 33. If you want this to be free from spaces, you could turn the lambda into a multi-line statement that removes whitespace from the value.
To explain the regex pattern a bit, it's got a negative lookbehind, so it makes sure that the word behind it does not have a digit in it (with optional word characters around the digit). Then it's got the m1 portion, which looks for words with digits in them. The last four characters of this are grabbed via some C# code after the regex pattern resolves the rest.
I don't think that regex is the best way to solve this problem and that's why I am posting this answer. For so complex situations, building the corresponding regex is too difficult and, what is worse, its clarity and adaptability is much lower than a longer-code approach.
The code below these lines delivers the exact functionality you are after, it is clear enough and can be easily extended.
string input = "this is a a1234 b5678 test string";
string output = "";
string[] temp = input.Trim().Split(' ');
bool previousNum = false;
string tempOutput = "";
foreach (string word in temp)
{
if (word.ToCharArray().Where(x => char.IsDigit(x)).Count() > 0)
{
previousNum = true;
tempOutput = tempOutput + word;
}
else
{
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
output = output + " " + word;
}
}
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
Have you tried this:
.*(?<m1>[\d]{4})(?<m2>.*)
with replacement
xxxx${m1}${m2}
This produces
xxxx6789
xxxx5678
xxxx5678
xxxx3333
xxxx5678 test string
You are not going to get 'a123b456' to match ... until 'b' becomes a number. ;-)
Here is my really quick attempt:
(\s|^)([a-z]*\d+[a-z,0-9]+\s)+
This will select all of those test cases. Now as for C# code, you'll need to check each match to see if there is a space at the beginning or end of the match sequence (e.g., the last example will have the space before and after selected)
here is the C# code to do the replace:
var redacted = Regex.Replace(record, #"(\s|^)([a-z]*\d+[a-z,0-9]+\s)+",
match => "xxxx" /*new String("x",match.Value.Length - 4)*/ +
match.Value.Substring(Math.Max(0, match.Value.Length - 4)));