String format Check gives unexpected result (Regex) - c#

I want to check format of a string which is ABC-nnn, where ABC represents alphabets (English) in capital letters. nnn represents triple digit number for example 123 or 001 or 012 A complete example would be FBI-026. I used regex for that and below is my code.
public bool IsSubstringMatchingFormat(int numberOfLettersBeforeHyphen, int numberOfNumbersAfterHyphen, string stringToMatch)
{
Regex regex = new Regex($#"^[A-Z]{numberOfLettersBeforeHyphen}-\d{numberOfNumbersAfterHyphen}");
return regex.IsMatch(stringToMatch);
}
I call it IsSubstringMatchingFormat(3, 3, "SMB-123") but it returns false. Please provide your insight.

Have you actually checked what the string you are passing into the regex looks like? ie evaluate $#"^[A-Z]{numberOfLettersBeforeHyphen}-\d{numberOfNumbersAfterHyphen}"and see if that is the regex you want? I can tell you that it isn't because it will end up being ^[A-Z]3-\d3 which does not do what you want.
What I think you'll want is:
$#"^[A-Z]{{{numberOfLettersBeforeHyphen}}}-\d{{{numberOfNumbersAfterHyphen}}}"
This adds the escaped curly braces back into your regex to give:
^[A-Z]{3}-\d{3}
The equivalent of this using String.Format would be:
String.Format(
#"^[A-Z]{{{0}}}-\d{{{1}}}",
numberOfLettersBeforeHyphen,
numberOfLettersAfterHyphen);

Your curly braces are being consumed by the string interpolation and are not making it into the regex. If you try to print the regex, you'll see it's something like
^[A-Z]3-\d3
Which is something else entirely.

The {} are being removed when you format. Try this:
public static bool IsSubstringMatchingFormat(int numberOfLettersBeforeHyphen, int numberOfNumbersAfterHyphen, string stringToMatch)
{
var expre = #"^[A-Z]{numberOfLettersBeforeHyphen}-\d{numberOfNumbersAfterHyphen}";//
expre = expre.Replace("numberOfLettersBeforeHyphen", numberOfLettersBeforeHyphen.ToString())
.Replace("numberOfNumbersAfterHyphen", numberOfNumbersAfterHyphen.ToString());
Regex regex = new Regex(expre);
return regex.IsMatch(stringToMatch);
}

Related

Compare String with Regex in C#

I have a list of string taken from a file. Some of this strings are in the format "Q" + number + "null" (e.g. Q98null, Q1null, Q24null, etc)
With a foreach loop I must check if a string is just like the one shown before.
I use this right now
string a = "Q9null" //just for testing
if(a.Contains("Q") && a.Contains("null"))
MessageBox.Show("ok");
but I'd like to know if there is a better way to do this with regex.
Thank you!
Your method will produce a lot of false positives - for example, it would recognize some invalid strings, such as "nullQ" or "Questionable nullability".
A regex to test for the match would be "^Q\\d+null$". The structure is very simple: it says that the target string must start in a Q, then one or more decimal digits should come, and then there should be null at the end.
Console.WriteLine(Regex.IsMatch("Q123null", "^Q\\d+null$")); // Prints True
Console.WriteLine(Regex.IsMatch("nullQ", "^Q\\d+null$")); // Prints False
Demo.
public static bool Check(string s)
{
Regex regex = new Regex(#"^Q\d+null$");
Match match = regex.Match(s);
return match.Success;
}
Apply the above method in your code:
string a = "Q9null" //just for testing
if(Check(a))
MessageBox.Show("ok");
First way: Using the Regex
Use this Regex ^Q\d+null$
Second way: Using the SubString
string s = "Q1123null";
string First,Second,Third;
First = s[0].ToString();
Second = s.Substring(1,s.Length-5);
Third = s.Substring(s.Length-4);
Console.WriteLine (First);
Console.WriteLine (Second);
Console.WriteLine (Third);
then you can check everything after this...

How to find out if a string contains digits followed by alphabet characters?

How can I find out whether a given string contains at least one number followed by alphabets using Regex in c#?
For example :
var strInput="Test123Test";
The function should return a bool value.
result = Regex.IsMatch(subjectString, #"\d\p{L}");
will return True for your sample string. Currently, this regex also considers non-ASCII digits and letters as valid matches, if you don't want that, use #"[0-9][A-Za-z])" instead.
If you want to match 123 only then:-
Match m = Regex.Match("Test123Test", #"(\p{L}+)(\d+)") ;
string result = m.Groups[2].Value
If you want to get the bool value then do this:-
Console.WriteLine(!String.IsNullOrEmtpty(result))) ;
or simple use:-
Regex.IsMatch("Test123Test", #"\p{L}+\d+") ;
Try this:
if(Regex.IsMatch(strInput, #"[0-9][A-Za-z]"))
...

Validate filename in c# through regex

I want to validate a filename with this format : LetterNumber_Enrollment_YYYYMMDD_HHMM.xml
string filename = "Try123_Enrollment_20130102_1200.xml";
Regex pattern = new Regex(#"[a-zA-z]_Enrollment_[0-9]{6}_[0-9]{4}\\.xml");
if (pattern.IsMatch(filename))
{
return isValid = true;
}
However, I can't make it to work.
Any thing that i missed here?
You are not matching digits at the beginning. Your pattern should be: ^[A-Za-z0-9]+_Enrollment_[0-9]{8}_[0-9]{4}\.xml$ to match given string.
Changes:
Your string starts with alphanumeric string before first _ symbol so you need to check both (letters and digits).
After Environment_ part you have digits with the length of 8 not 6.
No need of double \. You need to escape just dot (i.e. \.).
Demo app:
using System;
using System.Text.RegularExpressions;
class Test {
static void Main() {
string filename = "Try123_Enrollment_20130102_1200.xml";
Regex pattern = new Regex(#"^[A-Za-z0-9]+_Enrollment_[0-9]{8}_[0-9]{4}\.xml$");
if (pattern.IsMatch(filename))
{
Console.WriteLine("Matched");
}
}
}
Your Regex is nowhere near your actual string:
you only match a single letter at the start (and no digits) so Try123 doesn't match
you match 6 digits instead of 8 at the date part so 20130102 doesn't match
you have escaped your backslash near the end (\\.xml) but you've also used # on your string: with # you don't need to escape.
Try this instead:
#"[a-zA-Z]{3}\d{3}_Enrollment_[0-9]{8}_[0-9]{4}\.xml"
I've assumed you want only three letters and three numbers at the start; in fact you may want this:
#"[\w]*_Enrollment_[0-9]{8}_[0-9]{4}\.xml"
You can try the following, it matches letters and digits at the beginning and also ensures that the date is valid.
[A-Za-z0-9]+_Enrollment_(19|20)\d\d(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])_[0-9]{4}\.xml
As an aside, to test your regular expressions try the free regular expression designer from Rad Software, I find that it helps me work out complex expressions beforehand.
http://www.radsoftware.com.au/regexdesigner/

What's the regular expression to find the last digit in a string?

My brain must be frazzled but I can't get this to work and I've very little experience with regular expressions.
I have a string such as "gfdsgfd354gfdgfd55gfdgfdgfs9gfdgsf".
I need a regular expression to find the last digit in the string - in this case the "9".
EDIT
I should have been clearer but, as I say, I'm frazzled.
It's to insert a hyphen character before the final digit. I'm using C# Regex.Replace. Using the idea already suggested by Dave Sexton I tried the following without success:
private string InsertFinalDigitHyphen(string data)
{
return Regex.Replace(data, #"(\d)[^\d]*$", " $1");
}
With this I can process "ABCDE1FGH" with the intention of getting "ABCDE-1FGH" but I actually get "ABCDE-1".
I always find regular expressions hard to read, so you could do it in an alternative way with the following LINQ statement:
string str = "gfdsgfd354gfdgfd55gfdgfdgfs9gfdgsf";
var lastDigit = str.Last(char.IsDigit);
Output:
9
To insert a hyphen before this one, you can use LastIndexOf instead of Last and use that index to insert the hyphen at the correct location in the string.
You can use this one :
(\d)[^\d]*$
EDIT :
You initially mentioned only a match, no language and no replacement. For your C# replacement, you should use
private string InsertFinalDigitHyphen(string data) {
return Regex.Replace(data, #"(\d)(\D*)$", " $1$2");
}
If it is the last single digit you want the use this:
(\d)[^\d]*$
If it is the last set of digits then use this:
(\d+)[^\d]*$
EDIT:
You need to pick out the capture group (the bit in the brackets) - for c# I think it would be something like this:
Regex.Matches("gfdsgfd354gfdgfd55gfdgfdgfs9gfdgsf", '(\d)[^\d]+$')[0].Groups[1].Captures[0].Value
Or alternatively to avoid capture groups you could use a look behind regex like so:
\d(?=[^\d]+$)
You can also do this with String.LastIndexOfAny:
private string InsertFinalDigitHyphen(string data)
{
int lastDigit = data.LastIndexOfAny(new char[] {'0','1','2','3','4','5','6','7','8','9']);
return (lastDigit == -1) ? data : data.Insert(lastDigit, "-");
}

How to remove non-ASCII word from a string in C#

I want to filter some string which has some wrong letters (non-ASCII). It looks different in Notepad, Visual Studio 2010 and MySQL.
How can I check if a string has non-ASCII letters and how I can remove them?
You could use a regular expression to filter non ASCII characters:
string input = "AB £ CD";
string result = Regex.Replace(input, "[^\x0d\x0a\x20-\x7e\t]", "");
You could use Regular Expressions.
Regex.Replace(input, "[^a-zA-Z0-9]+", "")
You could also use \W+ as the pattern to remove any non-character.
This has been a God-send:
Regex.Replace(input, #"[^\u0000-\u007F]", "");
I think I got it elsewhere originally, but here is a link to the same answer here:
How can you strip non-ASCII characters from a string? (in C#)
string testString = Regex.Replace(OldString, #"[\u0000-\u0008\u000A-\u001F\u0100-\uFFFF]", "");
First, you need to determine what you mean by a "word". If non-ascii, this probably implies non-english?
Personally, I'd ask why you need to do this and what fundamental assumption has your application got that conflicts with your data? Depending on the situation, I suggest you either re-encode the text from the source encoding, although this will be a lossy conversion, or alternatively, address that fundamental assumption so that your application handles data correctly.
I think something as simple as this would probably work, wouldn't it?
public static string AsciiOnly(this string input, bool includeExtendedAscii)
{
int upperLimit = includeExtendedAscii ? 255 : 127;
char[] asciiChars = input.Where(c => (int)c <= upperLimit).ToArray();
return new string(asciiChars);
}
Example usage:
string input = "AB£ȼCD";
string asciiOnly = input.AsciiOnly(false); // returns "ABCD"
string extendedAsciiOnly = input.AsciiOnly(true); // returns "AB£CD"

Categories

Resources