Regular expressions to match these pdf file names - c#

I am looking for a regular expression to match the fileNamePattern.
Files are pdf can have these names: 8 alphanumeric chars, -, 4 alphanumeric chars, -, 4 alphanumeric chars, -, 4 alphanumeric chars, -, 12 alphanumeric chars + .pdf.
Examples:
5b7f991f-0726-4dd5-856e-7cea820f02c5.pdf
138bcee6-db7f-47a7-97bf-69c0b3989698.pdf
e988315b-ade7-48e5-9733-35bb59a3c28d.pdf
I am using
^[A-Z][0-9]{8}[-][A-Z][0-9]{4}[-][A-Z][0-9]{4}[-][A-Z][0-9]{4}[-][A-Z][0-9]{12}[.]pdf
However, I am not sure it is correct as I get no matches.

Regex r = new Regex(#"^\w{8}-\w{4}-\w{4}-\w{4}-\w{12}\.pdf$");
If you explicitly don't want underscores, you can use:
^[a-zA-Z\d]{8}-[a-zA-Z\d]{4}-[a-zA-Z\d]{4}-[a-zA-Z\d]{4}-[a-zA-Z\d]{12}\.pdf$
Regex is case-sensitive (unless you specify it to ignore case using RegexOptions.
Right now the main issue is that your regex is saying to match a letter then match n digits instead of matching n alphanumeric characters.
With setting the case insensitive flag, your regex can simplify to:
^[A-Z\d]{8}-[A-Z\d]{4}-[A-Z\d]{4}-[A-Z\d]{4}-[A-Z\d]{12}\.pdf$

You only match uppercase letters in your regex, and when you use [A-Z][0-9]{4} you do not match four alphanumeric chars, you match a letter followed with 4 digits.
So, you need to merge [A-Z][0-9] into single character classes [A-Z0-9] and then use a case insensitive flag.
Also, you need to use $ at the end of the regex to make the pattern match the entire string.
Use
^[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}[.]pdf$
See the regex demo
In C#,
var rx = new Regex(#"^[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}[.]pdf$", RegexOptions.IgnoreCase);
Note that case insensitivity can be set with an inline modifier (?i):
var pattern = #"(?i)^[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}[.]pdf$";
// check if the string matches the pattern
if (Regex.IsMatch(s, pattern)
{
// The string matches...
}

This should work
/(\w{8})-(\w{4})-(\w{4})-(\w{4})-(\w{12})(.pdf)/i
I tried it at here at RegEx Testing
\w = alphanmumeric
{n} = times the sign should appear
gi = are flags: "i" = case insensitive.

Related

How to split Alphanumeric with Symbol in C#

I want to spilt Alphanumeric with two part Alpha and numeric with special character like -
string mystring = "1- Any Thing"
I want to store like:
numberPart = 1
alphaPart = Any Thing
For this i am using Regex
Regex re = new Regex(#"([a-zA-Z]+)(\d+)");
Match result = re.Match("1- Any Thing");
string alphaPart = result.Groups[1].Value;
string numberPart = result.Groups[2].Value;
If there is no space in between string its working fine but space and symbol both alphaPart and numberPart showing null where i am doing wrong Might be Regex expression is wrong for this type of filter please suggest me on same
Try this:
(\d+)(?:[^\w]+)?([a-zA-Z\s]+)
Demo
Explanation:
(\d+) - capture one or more digit
[^\w]+ match anything except alphabets
? this tell that anything between word and number can appear or not(when not space is between them)
[a-zA-Z\s]+ match alphabets(even if between them have spaces)
Start of string is matched with ^.
Digits are matched with \d+.
Any non-alphanumeric characters are matched with [\W_] or \W.
Anything is matched with .*.
Use
(?s)^(\d+)\W*(.*)
See proof
(?s) makes . match linebreaks. So, it literally matches everything.

C# How to filtered datatable rows which containing alphanumeric with special characters using Regex

I have below data in my C# Datatable
What I want is to filter those data which has Alphanumeric with special characters like:
HOAUD039#
HOAUD00$
So I try below regex in my linq query:
var matches =
dt.AsEnumerable()
.Where(row => Regex.IsMatch(row["Empolyee_CRC"].ToString(),
"^[a-zA-Z0-9!##$&()\\-`.+,/\"]*$"))
.CopyToDataTable();
which returns me both Alphanumeric result and Alphanumeric with characters like below:
Now my question is simple and clear what is the right way to show results only having Alphanumeric with special characters.
I've also tried this regex but it is also not work
^(?:[\d,\/().]*[a-zA-Z][a-zA-Z\d,\/().]*)?$
You can try this based on your example patterns this will serve
^(?=.*\d)(?=.*[A-Za-z])(?=.*[!##$&()\\-`.+,\/\"]).*$
Explanation
^ - Anchor to start of string.
(?=.*\d) - Condition for checking at least one digit must be there in match.
(?=.*[A-Za-z]) - Condition for checking at least one character must be there in match.
(?=.*[!##$&()\\-.+,/\"])` - Condition for checking at least one special must be there in match.
.* - Match anything except newline.
$ - End of string.
Demo
In your regex you are using a single chararacter class which will only select one out of many, but your have 3 requirements.
In your second regex, everything is optional due to the * and the ?
You could use 3 positive lookaheads to assert your requirements:
^(?=.*\d)(?=.*[!##$&()`.+,\/\-])(?=.*[A-Z])[A-Z\d!##$&()`.+,\/\-]+$
In C#:
string pattern = #"^(?=.*\d)(?=.*[!##$&()`.+,\/\-])(?=.*[a-zA-Z])[a-zA-Z\d!##$&()`.+,\/-]+$";
That will match:
^ Start of string
(?=.*\d) Assert a digit
(?=.*[!##$&().+,/-])` Assert a special character
(?=.*[A-Za-z]) Assert a lowercase or uppercase character
[A-Za-z\d!##$&().+,/-]+` Match 1+ times only the allowed characters
$ End of the string
Regex demo | C# Demo

Why does this Regex match?

I have written a regular expression to match the following criteria
any digit (0-9)
hyphen
whitespace
in any order
length between 10 and 25
([0-9\-\w]{10,25})
I am using it to detect payment card numbers, so this works:
Regex.IsMatch("34343434343434", "([0-9\\-\\w]{10,25})"); // true
But this also works:
Regex.IsMatch("LogMethodComplete", "([0-9\\-\\w]{10,25})"); // true
What am I doing wrong?
This is C#
Take a look at Regular Expression Language - Quick Reference, section Character Classes.
\w matches any word character including underscore, not whitespace.
To match whitespace, you can use \s.
To match digits, you can use \d.
Instead of using \w you can use \d which means digit you could use regex like
"[\d\-\s]{10,25}" to match your criteria
You don't need to check for "words" and this is what \w does

Regex filter string number and number not working

I am trying to extract a string in this format "[\r\n \"MG480612230220150018\"\r\n]" using regex, i am trying to match number and alphabet with a min length of 5 character but it is not working, therefore i can guarantee i will extract this data (MG480612230220150018)
Regex regex = new Regex(#"^[0-9a-zA-Z]{5,}$");
Match match = regex.Match(availability.Id.ToString());
if (match.Success)
{
var myid = match.Value;
}
This will work for you:
Regex regex = new Regex(#"[a-z\d]{5,}", RegexOptions.IgnoreCase);
Regex Explanantion:
[a-z\d]{5,}
Options: Case insensitive
Match a single character present in the list below «[a-z\d]{5,}»
Between 5 and unlimited times, as many times as possible, giving back as needed (greedy) «{5,}»
A character in the range between “a” and “z” (case insensitive) «a-z»
A “digit” (any decimal number in any Unicode script) «\d»
Currently, you are matching at the beginning and end of string. As you say, the input string is longer [\r\n \"MG480612230220150018\"\r\n]. So, you need to remove the anchors:
Regex regex = new Regex(#"[0-9a-zA-Z]{5,}");
And you will get the match (MG480612230220150018).
Have a look at the demo.
As an alternative, in C#, I would use Unicode classes to match characters:
Regex regex = new Regex(#"[\p{N}\p{L}]{5,}");
\p{N} stands for a Unicode number, and \p{L} for any Unicode letter, case-insensitive.

.NET RegEx for letters and spaces

I am trying to create a regular expression in C# that allows only alphanumeric characters and spaces. Currently, I am trying the following:
string pattern = #"^\w+$";
Regex regex = new Regex(pattern);
if (regex.IsMatch(value) == false)
{
// Display error
}
What am I doing wrong?
If you just need English, try this regex:
"^[A-Za-z ]+$"
The brackets specify a set of characters
A-Z: All capital letters
a-z: All lowercase letters
' ': Spaces
If you need unicode / internationalization, you can try this regex:
#"$[\\p{L}\\s]+$"
See https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-classes-in-regular-expressions#word-character-w
This regex will match all unicode letters and spaces, which may be more than you need, so if you just need English / basic Roman letters, the first regex will be simpler and faster to execute.
Note that for both regex I have included the ^ and $ operator which mean match at start and end. If you need to pull this out of a string and it doesn't need to be the entire string, you can remove those two operators.
try this for all letter with space :
#"[\p{L} ]+$"
The character class \w does not match spaces. Try replacing it with [\w ] (there's a space after the \w to match word characters and spaces. You could also replace the space with \s if you want to match any whitespace.
If, other then 0-9, a-z and A-Z, you also need to cover any accented letters like ï, é, æ, Ć or Ş then you should better use the Unicode properties \p{...} for matching, i.e. (note the space):
string pattern = #"^[\p{IsLetter}\p{IsDigit} ]+$";
This regex works great for me.
Regex rgx = new Regex("[^a-zA-Z0-9_ ]+");
if (rgx.IsMatch(yourstring))
{
var err = "Special charactes are not allowed in Tags";
}

Categories

Resources