Validate String With Regex - c#

I'm trying to validate a string with this regex
var regexAgencia = new Regex("^(?!0000)([0-9]{4})");
var result = regexAgencia.IsMatch(agencia);
Valid Options:
N-X
NN-X
NNN-X
NNNN-X
N
NN
NNN
NNNN
Invalid Options:
0-X
00-X
000-X
0000-X
0
00
000
0000
Where N is any number 0-9 and X can be X or 0-9
When I validade this "014777417" the regex return true
I need help to write a regex to validade this string with this rules.

This should do it for you:
^(?=\d*[1-9])\d{1,4}(?:-[X\d])?$
It starts with a positive look ahead to ensure a digit other than zero is present ((?=\d*[1-9])). Thereafter it matches 1-4 digits, optionally followed by a hyphen and a digit or X.
See it here at regex101.

You certainly can do this through just Regex, however, I always have this lingering fear of creating code that either:
1) only I understand or remember
2) even I don't understand when looking back
In that spirit, it seems if you do a simple split, your string might be easier to evaluate:
string[] parts = agencia.Split('-');
if ((parts.Length == 1 && Regex.IsMatch(agencia, #"^\d{1,4}$")) ||
(parts.Length == 2 && Regex.IsMatch(parts[0], #"^\d{1,4}$")) &&
Regex.IsMatch(parts[1], #"^[0-9X]$"))
{
}
-- EDIT --
I can't tell if you want 0 or not, so if you don't, change \d from to [1-9].

It would be easier to have two tests: one to check if it could be valid, followed by one to exclude the special case of all leading zeros being invalid:
static void Main(string[] args)
{
string[] agencias = { "", "1234-5", "0-9", "014777417", "0000", "1-23", "01-0", "1-234 abcd", "0123-4" };
var regexAgenciaValid = new Regex("^(([0-9]{1,4})(-[0-9])?)$");
var regexAgenciaInvalid = new Regex("^((0{1,4})(-[0-9])?)$");
foreach (string agencia in agencias)
{
var result = regexAgenciaValid.IsMatch(agencia) && !regexAgenciaInvalid.IsMatch(agencia);
Console.WriteLine(agencia + " " + result);
}
Console.ReadLine();
}
Output:
False
1234-5 True
0-9 False
014777417 False
0000 False
1-23 False
01-0 True
1-234 abcd False
0123-4 True
This has the bonus of being easier to modify in the future.

Related

Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!
For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?
I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.
For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'

String validations c#

I have a 9 character string I am trying to provide multiple checks on. I want to first check if the first 1 - 7 characters are numbers and then say for example the first 3 characters are numbers how would I check the 5th character for a letter range of G through T.
I am using c# and have tried this so far...
string checkString = "123H56789";
Regex charactorSet = new Regex("[G-T]");
Match matchSetOne = charactorSetOne.Match(checkString, 3);
if (Char.IsNumber(checkString[0]) && Char.IsNumber(checkString[1]) && Char.IsNumber(checkString[2]))
{
if (matchSetOne.Success)
{
Console.WriteLine("3th char is a letter");
}
}
But am not sure if this is the best way to handle the validations.
UPDATE:
The digits can be 0 - 9, but can concatenate from one number to seven. Like this "12345T789" or "1R3456789" etc.
It'a easy with LINQ:
check if the first 1 - 7 characters are numbers :
var condition1 = input.Take(7).All(c => Char.IsDigit(c));
check the 5th character for a letter range of G through T
var condition2 = input.ElementAt(4) >= 'G' && input.ElementAt(4) <= 'T';
As it is, both conditions can't be true at the same time (if the first 7 chars are digits, then the 5th char can't be a letter).

Regex condition in C#

I have state numbers and state letters of vehicles according to States in DB. State numbers can be old and new type.
Example of new types of state number.
273KL01
002UK02
098KZ03
120US04
...
Example of old types of state number.
R575KMM
A887KDN
M784LKA
X647DUA
...
Bold characters indicates specified State.
User will input his car's state number and choose State. I need to validate If state number can be registered in chosen State. If it not possible(wrong user input) I will show him message like "You entered wrong state number or State" .
I have done this with If-Else statement. But I want to know another way with regex.
As I think, here will be two steps of condition.
Check if number is old type(starts with letter), if true get from DB state letter and check with regex statements.
If case 1 is false, I get from DB state digits and check with regex statements.
I have regex statement for the first condition:
^(?i)f - Where state letter is f.
What will be regex statement for my second conditon?
Or can be it done(two steps both) with one regex statements?
As you further explained that you actually do want to match any letter at the beginning, and any two digits at the end of the string, using a regular expression is indeed the shortest way to solve this.
Regex re = new Regex("^[a-z].*[0-9]{2}$", RegexOptions.IgnoreCase);
Console.WriteLine(re.IsMatch("Apple02")); // true
Console.WriteLine(re.IsMatch("Arrow")); // false
Console.WriteLine(re.IsMatch("45Alty12")); // false
Console.WriteLine(re.IsMatch("Basci98")); // true
Otherwise, if your requirement is simple, e.g. just the letter A or a at the beginning, and 12 or 02 at the end, then you can also solve this easily without regular expressions:
bool Match(string s)
{
if (string.IsNullOrWhiteSpace(s))
return false;
if (s[0] != 'a' && s[0] != 'A')
return false;
return s.EndsWith("02") || s.EndsWith("12");
}
Examples:
Console.WriteLine(Match("Apple02")); // true
Console.WriteLine(Match("Arrow")); // false
Console.WriteLine(Match("45Alty12")); // false
Console.WriteLine(Match("a12")); // true
Console.WriteLine(Match("a")); // false
Console.WriteLine(Match("12")); // false
Of course you can also expand this to fit your more complex requirement. In your case, you could use char.IsLetter and char.IsDigit to make the checks:
bool Match(string s)
{
if (string.IsNullOrWhiteSpace(s))
return false;
return s.Length > 2 && char.IsLetter(s[0]) &&
char.IsDigit(s[s.Length - 1]) && char.IsDigit(s[s.Length - 2]);
}
Note that the IsLetter method also accepts letters from non-English alphabets, so you might need to change that. You could alternatively make a comparison like this:
bool Match(string s)
{
if (string.IsNullOrWhiteSpace(s))
return false;
return s.Length > 2 &&
((s[0] >= 'a' && s[0] <= 'z') || (s[0] >= 'A' && s[0] <= 'Z'))
char.IsDigit(s[s.Length - 1]) && char.IsDigit(s[s.Length - 2]);
}
Here's what you need:
^[Aa].*[01][2]$
With a few explanations:
^ assert position at start of a line
[Aa] match a single character present in the list below
Aa a single character in the list Aa literally
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
[01] match a single character present in the list below
01 a single character in the list 01 literally
[2] match a single character present in the list below
2 the literal character 2
$ assert position at end of a line
If you need it to start with any letter :
^[A-Za-z].*[01][2]$
Given your edit:
I would use this regex:
^[A-Z].{6}|.{5}\d{2}$
Which guaranties that the input is:
Of length 7;
Start with a capital letter OR finishes with two digit

How do I check the data type for each char in a string?

I'm new to C# so expect some mistakes ahead. Any help / guidance would be greatly appreciated.
I want to limit the accepted inputs for a string to just:
a-z
A-Z
hyphen
Period
If the character is a letter, a hyphen, or period, it's to be accepted. Anything else will return an error.
The code I have so far is
string foo = "Hello!";
foreach (char c in foo)
{
/* Is there a similar way
To do this in C# as
I am basing the following
Off of my Python 3 knowledge
*/
if (c.IsLetter == true) // *Q: Can I cut out the == true part ?*
{
// Do what I want with letters
}
else if (c.IsDigit == true)
{
// Do what I want with numbers
}
else if (c.Isletter == "-") // Hyphen | If there's an 'or', include period as well
{
// Do what I want with symbols
}
}
I know that's a pretty poor set of code.
I had a thought whilst writing this:
Is it possible to create a list of the allowed characters and check the variable against that?
Something like:
foreach (char c in foo)
{
if (c != list)
{
// Unaccepted message here
}
else if (c == list)
{
// Accepted
}
}
Thanks in advance!
Easily accomplished with a Regex:
using System.Text.RegularExpressions;
var isOk = Regex.IsMatch(foo, #"^[A-Za-z0-9\-\.]+$");
Rundown:
match from the start
| set of possible matches
| |
|+-------------+
|| |any number of matches is ok
|| ||match until the end of the string
|| |||
vv vvv
^[A-Za-z0-9\-\.]+$
^ ^ ^ ^ ^
| | | | |
| | | | match dot
| | | match hyphen
| | match 0 to 9
| match a-z (lowercase)
match A-Z (uppercase)
You can do this in a single line with regular expressions:
Regex.IsMatch(myInput, #"^[a-zA-Z0-9\.\-]*$")
^ -> match start of input
[a-zA-Z0-9\.\-] -> match any of a-z , A-Z , 0-9, . or -
* -> 0 or more times (you may prefer + which is 1 or more times)
$ -> match the end of input
You can use Regex.IsMatch function and specify your regular expression.
Or define manually chars what you need. Something like this:
string foo = "Hello!";
char[] availableSymbols = {'-', ',', '!'};
char[] availableLetters = {'A', 'a', 'H'}; //etc.
char[] availableNumbers = {'1', '2', '3'}; //etc
foreach (char c in foo)
{
if (availableLetters.Contains(c))
{
// Do what I want with letters
}
else if (availableNumbers.Contains(c))
{
// Do what I want with numbers
}
else if (availableSymbols.Contains(c))
{
// Do what I want with symbols
}
}
Possible solution
You can use the CharUnicodeInfo.GetUnicodeCategory(char) method. It returns the UnicodeCategory of a character. The following unicode categories might be what you're look for:
UnicodeCategory.DecimalDigitNumber
UnicodeCategory.LowercaseLetter and UnicodeCategory.UppercaseLetter
An example:
string foo = "Hello!";
foreach (char c in foo)
{
UnicodeCategory cat = CharUnicodeInfo.GetUnicodeCategory(c);
if (cat == UnicodeCategory.LowercaseLetter || cat == UnicodeCategory.UppercaseLetter)
{
// Do what I want with letters
}
else if (cat == UnicodeCategory.DecimalDigitNumber)
{
// Do what I want with numbers
}
else if (c == '-' || c == '.')
{
// Do what I want with symbols
}
}
Answers to your other questions
Can I cut out the == true part?:
Yes, you can cut the == true part, it is not required in C#
If there's an 'or', include period as well.:
To create or expressions use the 'barbar' (||) operator as i've done in the above example.
Whenever you have some kind of collection of similar things, an array, a list, a string of characters, whatever, you'll see at the definition of the collection that it implements IEnumerable
public class String : ..., IEnumerable,
here T is a char. It means that you can ask the class: "give me your first T", "give me your next T", "give me your next T" and so on until there are no more elements.
This is the basis for all Linq. Ling has about 40 functions that act upon sequences. And if you need to do something with a sequence of the same kind of items, consider using LINQ.
The functions in LINQ can be found in class Enumerable. One of the function is Contains. You can use it to find out if a sequence contains a character.
char[] allowedChars = "abcdefgh....XYZ.-".ToCharArray();
Now you have a sequence of allowed characters. Suppose you have a character x and want to know if x is allowed:
char x = ...;
bool xIsAllowed = allowedChars.Contains(x);
Now Suppose you don't have one character x, but a complete string and you want only the characters in this string that are allowed:
string str = ...
var allowedInStr = str
.Where(characterInString => allowedChars.Contains(characterInString));
If you are going to do a lot with sequences of things, consider spending some time to familiarize yourself with LINQ:
Linq explained
You can use Regex.IsMatch with "^[a-zA-Z_.]*$" to check for valid characters.
string foo = "Hello!";
if (!Regex.IsMatch(foo, "^[a-zA-Z_\.]*$"))
{
throw new ArgumentException("Exception description here")
}
Other than that you can create a list of chars and use string.Contains method to check if it is ok.
string validChars = "abcABC./";
foreach (char c in foo)
{
if (!validChars.Contains(c))
{
// Throw exception
}
}
Also, you don't need to check for == true/false in if line. Both expressions are equal below
if (boolvariable) { /* do something */ }
if (boolvariable == true) { /* do something */ }

Regex masking of words that contain a digit

Trying to come up with a 'simple' regex to mask bits of text that look like they might contain account numbers.
In plain English:
any word containing a digit (or a train of such words) should be matched
leave the last 4 digits intact
replace all previous part of the matched string with four X's (xxxx)
So far
I'm using the following:
[\-0-9 ]+(?<m1>[\-0-9]{4})
replacing with
xxxx${m1}
But this misses on the last few samples below
sample data:
123456789
a123b456
a1234b5678
a1234 b5678
111 22 3333
this is a a1234 b5678 test string
Actual results
xxxx6789
a123b456
a1234b5678
a1234 b5678
xxxx3333
this is a a1234 b5678 test string
Expected results
xxxx6789
xxxxb456
xxxx5678
xxxx5678
xxxx3333
this is a xxxx5678 test string
Is such an arrangement possible with a regex replace?
I think I"m going to need some greediness and lookahead functionality, but I have zero experience in those areas.
This works for your example:
var result = Regex.Replace(
input,
#"(?<!\b\w*\d\w*)(?<m1>\s?\b\w*\d\w*)+",
m => "xxxx" + m.Value.Substring(Math.Max(0, m.Value.Length - 4)));
If you have a value like 111 2233 33, it will print xxxx3 33. If you want this to be free from spaces, you could turn the lambda into a multi-line statement that removes whitespace from the value.
To explain the regex pattern a bit, it's got a negative lookbehind, so it makes sure that the word behind it does not have a digit in it (with optional word characters around the digit). Then it's got the m1 portion, which looks for words with digits in them. The last four characters of this are grabbed via some C# code after the regex pattern resolves the rest.
I don't think that regex is the best way to solve this problem and that's why I am posting this answer. For so complex situations, building the corresponding regex is too difficult and, what is worse, its clarity and adaptability is much lower than a longer-code approach.
The code below these lines delivers the exact functionality you are after, it is clear enough and can be easily extended.
string input = "this is a a1234 b5678 test string";
string output = "";
string[] temp = input.Trim().Split(' ');
bool previousNum = false;
string tempOutput = "";
foreach (string word in temp)
{
if (word.ToCharArray().Where(x => char.IsDigit(x)).Count() > 0)
{
previousNum = true;
tempOutput = tempOutput + word;
}
else
{
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
output = output + " " + word;
}
}
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
Have you tried this:
.*(?<m1>[\d]{4})(?<m2>.*)
with replacement
xxxx${m1}${m2}
This produces
xxxx6789
xxxx5678
xxxx5678
xxxx3333
xxxx5678 test string
You are not going to get 'a123b456' to match ... until 'b' becomes a number. ;-)
Here is my really quick attempt:
(\s|^)([a-z]*\d+[a-z,0-9]+\s)+
This will select all of those test cases. Now as for C# code, you'll need to check each match to see if there is a space at the beginning or end of the match sequence (e.g., the last example will have the space before and after selected)
here is the C# code to do the replace:
var redacted = Regex.Replace(record, #"(\s|^)([a-z]*\d+[a-z,0-9]+\s)+",
match => "xxxx" /*new String("x",match.Value.Length - 4)*/ +
match.Value.Substring(Math.Max(0, match.Value.Length - 4)));

Categories

Resources