C# Regex Validation Rule using Regex.Match() - c#

I've written a Regular expression which should validate a string using the following rules:
The first four characters must be alphanumeric.
The alpha characters are followed by 6 or 7 numeric values for a total length of 10 or 11.
So the string should look like this if its valid:
CCCCNNNNNN or CCCCNNNNNNN
C being any character and N being a number.
My expression is written: #"^[0-9A-Za-z]{3}[0-9A-Za-z-]\d{0,21}$";
My regex match code looks like this:
var cc1 = "FOOBAR"; // should fail.
var cc2 = "AAAA1111111111"; // should succeed
var regex = #"^[0-9A-Za-z]{3}[0-9A-Za-z-]\d{0,21}$";
Match match = Regex.Match( cc1, regex, RegexOptions.IgnoreCase );
if ( cc1 != string.Empty && match.Success )
{
//"The Number must start with 4 letters and contain no numbers.",
Error = SeverityType.Error
}
I'm hoping someone can take a look at my expression and offer some feedback on improvements to produce a valid match.
Also, am I use .Match() correctly? If Match.Success is true, then does that mean that the string is valid?

The regex for 4 alphanumeric characters follows by 6 to 7 decimal digits is:
var regex = #"^\w{4}\d{6,7}$";
See: Regular Expression Language - Quick Reference
The Regex.Match Method returns a Match object. The Success Property indicates whether the match is successful or not.
var match = Regex.Match(input, regex, RegexOptions.IgnoreCase);
if (!match.Success)
{
// does not match
}

The following code demonstrates the regex usage:
var cc1 = "FOOBAR"; // should fail.
var cc2 = "AAAA1111111"; // should succeed
var r = new Regex(#"^[0-9a-zA-Z]{4}\d{6,7}$");
if (!r.IsMatch(cc2))
{
Console.WriteLine("cc2 doesn't match");
}
if (!r.IsMatch(cc1))
{
Console.WriteLine("cc1 doesn't match");
}
The output will be cc1 doesn't match.

The following code is using a regular expression and checks 4 different patterns to test it, see output below:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var p1 = "aaaa999999";
CheckMatch(p1);
p1 = "aaaa9999999";
CheckMatch(p1);
p1 = "aaaa99999999";
CheckMatch(p1);
p1 = "aaa999999";
CheckMatch(p1);
}
public static void CheckMatch(string p1)
{
var reg = new Regex(#"^\w{4}\d{6,7}$");
if (!reg.IsMatch(p1))
{
Console.WriteLine($"{p1} doesn't match");
}
else
{
Console.WriteLine($"{p1} matches");
}
}
}
Output:
aaaa999999 matches
aaaa9999999 matches
aaaa99999999 doesn't match
aaa999999 doesn't match
Try it as DotNetFiddle

Your conditions give:
The first four characters must be alphanumeric: [A-Za-z\d]{4}
Followed by 6 or 7 numeric values: \d{6,7}
Put it together and anchor it:
^[A-Za-z\d]{4}\d{6,7}\z
Altho that depends a bit how you define "alphanumeric". Also if you are using ignore case flag then you can remove the A-Z range from the expression.

Try the following pattern:
#"^[A-za-z\d]{4}\d{6,7}$"

Related

Find pattern to solve regex in one step

I have a problem to find the pattern that solves the problem in onestep.
The string looks like this:
Text1
Text1$Text2$Text3
Text1$Text2$Text3$Text4$Text5$Text6 etc.
What i want to get is: Take up to 4x Text. If there are more than "4xText" take only the last sign.
Example:
Text1$Text2$Text3$Text4$Text5$Text6 -> Text1$Text2$Text3$Text4&56
My current solution is:
First pattern:
^([^\$]*)\$?([^\$]*)\$?([^\$]*)\$?([^\$]*)\$?
After this i will do a substitution with the first pattern
New string: Text5$Text6
second pattern is:
([^\$])\b
result: 56
combine both and get the result:
Text1$Text2$Text3$Text4$56
For me it is not clear why i cant easily put the second pattern after the first pattern into one pattern. Is there something like an anchor that tells the engine to start the pattern from here like it would do if is would be the only pattern ?
You might use an alternation with a positive lookbehind and then concatenate the matches.
(?<=^(?:[^$]+\$){0,3})[^$]+\$?|[^$](?=\$|$)
Explanation
(?<= Positive lookbehind, assert what is on the left is
^(?:[^$]+\$){0,3} Match 0-3 times any char except $ followed by an optional $
) Close lookbehind
[^$]+\$? Match 1+ times any char except $, then match an optional $
| Or
[^$] Match any char except $
(?=\$|$) Positive lookahead, assert what is directly to the right is either $ or the end of the string
.NET regex demo | C# demo
Example
string pattern = #"(?<=^(?:[^$]*\$){0,3})[^$]*\$?|[^$](?=\$|$)";
string[] strings = {
"Text1",
"Text1$Text2$Text3",
"Text1$Text2$Text3$Text4$Text5$Text6"
};
Regex regex = new Regex(pattern);
foreach (String s in strings) {
Console.WriteLine(string.Join("", from Match match in regex.Matches(s) select match.Value));
}
Output
Text1
Text1$Text2$Text3
Text1$Text2$Text3$Text4$56
I strongly believe regular expression isn't the way to do that. Mostly because of the readability.
You may consider using simple algorithm like this one to reach your goal:
using System;
public class Program
{
public static void Main()
{
var input = "Text1$Text2$Text3$Text4$Text5$Text6";
var parts = input.Split('$');
var result = "";
for(var i=0; i<parts.Length; i++){
result += (i <= 4 ? parts[i] + "$" : parts[i].Substring(4));
}
Console.WriteLine(result);
}
}
There are also linq alternatives :
using System;
using System.Linq;
public class Program
{
public static void Main()
{
var input = "Text1$Text2$Text3$Text4$Text5$Text6";
var parts = input.Split('$');
var first4 = parts.Take(4);
var remainings = parts.Skip(4);
var result2 = string.Join("$", first4) + "$" + string.Join("", remainings.Select( r=>r.Substring(4)));
Console.WriteLine(result2);
}
}
It has to be adjusted to the actual needs but the idea is there
Try this code:
var texts = new string[] {"Text1", "Text1$Text2$Text3", "Text1$Text2$Text3$Text4$Text5$Text6" };
var parsed = texts
.Select(s => Regex.Replace(s,
#"(Text\d{1,3}(?:\$Text\d{1,3}){0,3})((?:\$Text\d{1,3})*)",
(match) => match.Groups[1].Value +"$"+ match.Groups[2].Value.Replace("Text", "").Replace("$", "")
)).ToArray();
// parsed is now: string[3] { "Text1$", "Text1$Text2$Text3$", "Text1$Text2$Text3$Text4$56" }
Explanation:
solution uses regex pattern: (Text\d{1,3}(?:\$Text\d{1,3}){0,3})((?:\$Text\d{1,3})*)
(...) - first capturing group
(?:...) - non-capturing group
Text\d{1,3}(?:\$Text\d{1,3} - match Text literally, then match \d{1,3}, which is 1 up to three digits, \$ matches $ literally
Rest is just repetition of it. Basically, first group captures first four pieces, second group captures the rest, if any.
We also use MatchEvaluator here which is delegate type defined as:
public delegate string MatchEvaluator(Match match);
We define such method:
(match) => match.Groups[1].Value +"$"+ match.Groups[2].Value.Replace("Text", "").Replace("$", "")
We use it to evaluate match, so takee first capturing group and concatenate with second, removing unnecessary text.
It's not clear to me whether your goal can be achieved using exclusively regex. If nothing else, the fact that you want to introduce a new character '&' into the output adds to the challenge, since just plain matching would never be able to accomplish that. Possibly using the Replace() method? I'm not sure that would work though...using only a replacement pattern and not a MatchEvaluator, I don't see a way to recognize but still exclude the "$Text" portion from the fifth instance and later.
But, if you are willing to mix regex with a small amount of post-processing, you can definitely do it:
static readonly Regex regex1 = new Regex(#"(Text\d(?:\$Text\d){0,3})(?:\$Text(\d))*", RegexOptions.Compiled);
static void Main(string[] args)
{
for (int i = 1; i <= 6; i++)
{
string text = string.Join("$", Enumerable.Range(1, i).Select(j => $"Text{j}"));
WriteLine(KeepFour(text));
}
}
private static string KeepFour(string text)
{
Match match = regex1.Match(text);
if (!match.Success)
{
return "[NO MATCH]";
}
StringBuilder result = new StringBuilder();
result.Append(match.Groups[1].Value);
if (match.Groups[2].Captures.Count > 0)
{
result.Append("&");
// Have to iterate (join), because we don't want the whole match,
// just the captured text.
result.Append(JoinCaptures(match.Groups[2]));
}
return result.ToString();
}
private static string JoinCaptures(Group group)
{
return string.Join("", group.Captures.Cast<Capture>().Select(c => c.Value));
}
The above breaks your requirement into three different capture groups in a regex. Then it extracts the captured text, composing the result based on the results.

Compare if input string is in correct format

Check if input string entered by user is in format like IIPIII, where I is integer number, any one digit number can be used on place of I and P is a character.
Example if input is 32P125 it is valid string else N23P33 is invalid.
I tried using string.Length or string.IndexOf("P") but how to validate other integer values?
I'm sure someone can offer a more succinct answer but pattern matching is the way to go.
using System.Text.RegularExpressions;
string test = "32P125";
// 2 integers followed by any upper cased letter, followed by 3 integers.
Regex regex = new Regex(#"\d{2}[A-Z]\d{3}", RegexOptions.ECMAScript);
Match match = regex.Match(test);
if (match.Success)
{
//// Valid string
}
else
{
//// Invalid string
}
Considering that 'P' has to matched literally -
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string st1 = "32P125";
string st2 = "N23P33";
Regex rg = new Regex(#"\d{2}P\d{3}");
// If 'P' is not to be matched literally, reeplace above line with below one
// Regex rg = new Regex(#"\d{2}[A-Za-z]\d{3}");
Console.WriteLine(rg.IsMatch(st1));
Console.WriteLine(rg.IsMatch(st2));
}
}
OUTPUT
True
False
It can be encapsulated in one simple if:
string testString = "12P123";
if(
// check if third number is letter
Char.IsLetter(testString[2]) &&
// if above succeeds, code proceeds to second condition (short-circuiting)
// remove third character, after that it should be a valid number (only digits)
int.TryParse(testString.Remove(2, 1), out int i)
) {...}
I would encourage the usage of MaskedTextProvided over Regex.
Not only is this looking cleaner but it's also less error prone.
Sample code would look like the following:
string Num = "12P123";
MaskedTextProvider prov = new MaskedTextProvider("##P###");
prov.Set(Num);
var isValid = prov.MaskFull;
if(isValid){
string result = prov.ToDisplayString();
Console.WriteLine(result);
}
Use simple Regular expression for this kind of stuff.

match pattern for startswith

I want to do a Regex Match in c# to check whether a string starts with part of pattern.
Say if the pattern is "ABC...GHI" then valid strings can be in the format "A","AB","ABCDEF","ABCXYXGHI"
This is a sample code. What exactly regex has to be in the pattern to make it work
string pattern = "ABC...GHI"
code = "A" //valid
code = "ABC" valid
code = "ABCDE" //valid
code = "ABCXXX" //valid
code = "ABCXXXGHI" //valid
code = "ABCXXXGHIAA" //invalid
code = "B" //invalid
Regex.IsMatch(code, pattern)
You can use ? and make optional part of regexp. The final regexp string could be
A(B(C(.(.(.(G(H(I?)?)?)?)?)?)?)?)?
The final string is quite messy but you can create it automatically
The visualization of above regexp is here http://www.regexper.com/#A(B(C(.(.(.(G(H(I%3F)%3F)%3F)%3F)%3F)%3F)%3F)%3F)%3F
Are you looking for something like this?
var pat = new Regex(#"^A(B(C(.(Z)?)?)?)?");
var testStrings = new string[]
{
"ALPHA",
"ABGOOF",
"ABCblah",
"ABCbZ",
"FOOBAR"
};
foreach (var s in testStrings)
{
var m = pat.Match(s);
if (m.Success)
{
Console.WriteLine("{0} matches {1}", s, m.Value);
}
else
{
Console.WriteLine("No match found for {0}", s);
}
}
Results from that are:
ALPHA matches A
ABGOOF matches AB
ABCblah matches ABCb
ABCbZ matches ABCbZ
No match found for FOOBAR
The key is that everything after the A is optional. So if you wanted strings that start with A or AB, you'd have:
AB?
If you wanted to add ABC, you need:
A(BC?)?
Another character:
A(B(CZ?)?)?
Messy, but you could write code to generate the expression automatically if you had to.
Additional info
It's possible that you want the strings to be no longer than the pattern, and all characters must match the pattern. That is, given the pattern I showed above, "ABCxZ" would be valid, but "ABCblah" would not be valid because the "lab" part doesn't match the pattern. If that's the case, then you need to add a "$" to the end of the pattern to say that the string ends there. So:
var pat = new Regex(#"^A(B(C(.(Z)?)?)?)?$");
Or, in your example case:
"^A(B(C(.(.(.(G(H(I)?)?)?)?)?)?)?)?$"

Why my \p{L} returns underscore?

I have the following code to parse by Regex:
const string patern = #"^(\p{L}+)_";
var rgx = new Regex(patern);
var str1 = "library_log_12312_12.log";
var m = rgx.Matches(str1);
It returns only one match and it is "library_". I have read a lot of resources and it should not contain underscore, should it?
Your pattern includes the _, so the match does too. If you only want the group, you need to specify that. It'll be in group 1 (as group 0 is always the whole match):
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
var regex = new Regex(#"^(\p{L}+)_");
var input = "library_log_12312_12.log";
var matches = regex.Matches(input);
var match = matches[0];
Console.WriteLine(match.Groups[0]); // library_
Console.WriteLine(match.Groups[1]); // library
}
}
Your regex ends with _ so basically, it matches on one or more Unicode letters, followed by an underscore (which is not a Unicode letter).
The captured group will not contain the _.
Works as expected.
It should contain the underscore as it is in your regular expression.
If you only want to have library as the result, you need to access the first sub-group in the result:
var m = rgx.Matches(str1).Cast<Match>().Select(x => x.Groups[1].Value);

C# Regular Expression: {4} and more then 4 values possible

I'm in need of a regular expression that checks if the input is exactly 4 numbers.
I'm using "\d{4}" (also tried "\d\d\d\d").
But if you enter 5 numbers, it also says the input is valid.
[TestMethod]
public void RegexTest()
{
Regex expr = new Regex("\\d{4}");
String a = "4444", b = "4l44", c = "55555", d = "5 55";
Match mc = expr.Match(a);
Assert.IsTrue(mc.Success);
mc = expr.Match(b);
Assert.IsFalse(mc.Success);
***mc = expr.Match(c);
Assert.IsFalse(mc.Success)***;
mc = expr.Match(d);
Assert.IsFalse(mc.Success);
}
(it's the c that is 'true' but should be false, the others work)
Thanks in advance,
~Sir Troll
If it must be exactly 4, then you need to use $ and ^ to mark end and start of the input:
Regex expr = new Regex(#"^\d{4}$");
Note I'm also using verbatim string literals here, so save on sanity - then you don't need to C#-escape all your regex-escape-characters. Only " needs escaping in the C# (to "").
"55555" contains "5555", which is valid... So the string contains a match, even though the match isn't the complete string. See Marc's answer for the solution
a string with 5 digits also contains 4 digits, so you have to make sure that you also add start and end of string contraints.
This should work as your Regex:
Regex expr = new Regex(#"^\d{4}$");

Categories

Resources