Validate filename in c# through regex - c#

I want to validate a filename with this format : LetterNumber_Enrollment_YYYYMMDD_HHMM.xml
string filename = "Try123_Enrollment_20130102_1200.xml";
Regex pattern = new Regex(#"[a-zA-z]_Enrollment_[0-9]{6}_[0-9]{4}\\.xml");
if (pattern.IsMatch(filename))
{
return isValid = true;
}
However, I can't make it to work.
Any thing that i missed here?

You are not matching digits at the beginning. Your pattern should be: ^[A-Za-z0-9]+_Enrollment_[0-9]{8}_[0-9]{4}\.xml$ to match given string.
Changes:
Your string starts with alphanumeric string before first _ symbol so you need to check both (letters and digits).
After Environment_ part you have digits with the length of 8 not 6.
No need of double \. You need to escape just dot (i.e. \.).
Demo app:
using System;
using System.Text.RegularExpressions;
class Test {
static void Main() {
string filename = "Try123_Enrollment_20130102_1200.xml";
Regex pattern = new Regex(#"^[A-Za-z0-9]+_Enrollment_[0-9]{8}_[0-9]{4}\.xml$");
if (pattern.IsMatch(filename))
{
Console.WriteLine("Matched");
}
}
}

Your Regex is nowhere near your actual string:
you only match a single letter at the start (and no digits) so Try123 doesn't match
you match 6 digits instead of 8 at the date part so 20130102 doesn't match
you have escaped your backslash near the end (\\.xml) but you've also used # on your string: with # you don't need to escape.
Try this instead:
#"[a-zA-Z]{3}\d{3}_Enrollment_[0-9]{8}_[0-9]{4}\.xml"
I've assumed you want only three letters and three numbers at the start; in fact you may want this:
#"[\w]*_Enrollment_[0-9]{8}_[0-9]{4}\.xml"

You can try the following, it matches letters and digits at the beginning and also ensures that the date is valid.
[A-Za-z0-9]+_Enrollment_(19|20)\d\d(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])_[0-9]{4}\.xml
As an aside, to test your regular expressions try the free regular expression designer from Rad Software, I find that it helps me work out complex expressions beforehand.
http://www.radsoftware.com.au/regexdesigner/

Related

Regex to match path

I trying to validate user entered string is correct relative path or not.
It should not start with assets/
It should not end with /
It should not end with any file extension like .html or .php or .jpg
It should not contain dot .
I am trying with below regex, but it is not working correctly.
^([a-z]:)*(\/*(\.*[a-z0-9]+\/)*(\.*[a-z0-9]+))
My test cases
Valid path
sample/hello/images
sample/hello_vid/user/data
test/123/user_live/images
Invalid path
assets/sample/hello/images
sample/hello_vid/user/data/
test/123/user_live/images/index.html
hii/sk.123/data
ok/bye/last.exe
Alternative "readable" approach ;)
public static bool IsValidPath(this string path)
{
if (string.IsNullOrEmpty(path)) return false;
if (path.StartsWith("assets/")) return false;
if (path.EndsWith("/")) return false;
if (path.Contains(".")) return false;
return true;
}
// Usage
var value = "sample/hello/images";
if (value.IsValidPath())
{
// use the value...
}
If you also want to match the underscore, you could add it to the character class. To prevent matching assets/ at the start, you could use a negative lookahead.
^(?!assets/)[a-z0-9_]+(?:/[a-z0-9_]+)+$
^ Start of string
(?!assets/) Assert what is directly to the right is not assets/
[a-z0-9_]+ Repeat 1+ times any of the listed, including the underscore
(?:/[a-z0-9_]+)+ Repeat 1+ times a / and 1+ times any of the listed
$ End of string
Regex demo
Or you could use \w instead of the character class
^(?!assets/)\w+(?:/\w+)+$
This works for your test cases.
Regex explained:
^(?!assets/) # It should not start with assets/
[^\.]+ # It should not contain dot (file extensions contain a dot)
(?<!/)$ # It should not end with /
Regex in one line:
^(?!assets/)[^\.]+(?<!/)$
You may try below regex to achieve your purpose:
^(?!assets\/)[^.]*?$(?<!\/\r?$)
Explanation of the above regex:
^, $ - Represents start and end of line respectively.
(?!assets\/) - Represents a negative look-ahead not matching the test strings which start with assets\.
[^.]*? - Matches everything lazily other than .. This case will cover the file extensions too so no need to check them again.
(?<!\/\r?$) - Represents negative look behind not matching the test string if it contains \ as the last character.
You can find the demo of the above regex in here.
Implementation in C#
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"^(?!assets\/)[^.]*?$(?<!\/\r?$)";
string input = #"sample/hello/images
sample/hello_vid/user/data
test/123/user_live/images
assets/sample/hello/images
sample/hello_vid/user/data/
test/123/user_live/images/index.html
hii/sk.123/data
ok/bye/last.exe";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine(m.Value);
}
}
}
You can find the sample run of the above code in here.
The following regex should be fine for the scenarios you have mentioned:
^((?!(assets\/))(?! )([a-zA-Z0-9_ ]+(?<! )\/(?! ))+[a-zA-Z0-9_ ]+)$
In addition to the scenarios in the questions, I have also taken care of a folder name not starting or ending with spaces.

Use RegEx to uppercase and lowercase the string

I am trying to convert a string to uppercase and lowercase based on the index.
My string is a LanguageCode like cc-CC where cc is the language code and CC is the country code. The user can enter in any format like "cC-Cc". I am using the regular expression to match whether the data is in the format cc-CC.
var regex = new Regex("^[a-z]{2}-[A-Z]{2}$", RegexOptions.IgnoreCase);
//I can use CultureInfos from .net framework and compare it's valid or not.
//But the requirement is it should allow invalid language codes also as long
//The enterd code is cc-CC format
Now when the user enters something cC-Cc I'm trying to lowercase the first two characters and then uppercase last two characters.
I can split the string using - and then concatenate them.
var languageDetails = languageCode.Split('-');
var languageCodeUpdated = $"{languageDetails[0].ToLowerInvariant()}-{languageDetails[1].ToUpperInvariant()}";
I thought can I avoid multiple strings creation and use RegEx itself to uppercase and lowercase accordingly.
While searching for the same I found some solutions to use \L and \U but I am not able to use them as the C# compiler showing error. Also, RegEx.Replace() has a parameter or delegate MatchEvaluator which I'm not able to understand.
Is there any way in C# we can use RegEx to replace uppercase with lowercase and vice versa.
.NET regex does not support case modifying operators.
You may use MatchEvaluator:
var result = Regex.Replace(s, #"(?i)^([a-z]{2})-([a-z]{2})$", m =>
$"{m.Groups[1].Value.ToLower()}-{m.Groups[2].Value.ToUpper()}");
See the C# demo.
Details
(?i) - the inline version of RegexOptions.IgnoreCase mopdiofier
^ - start of the string
([a-z]{2}) - Capturing group #1: 2 ASCII letters
- - a hyphen
([a-z]{2}) - Capturing group #2: 2 ASCII letters
$ - end of string.
TLDR: This is Regex.Replace with \U and \L support.
private static string EnhancedReplace(string input, string pattern, string replacement, RegexOptions options)
{
replacement = Regex.Replace(replacement, #"(?<mode>\\[UL])(?<group>\$((\d+)|({[^}]+})))", #"<!<mode:${mode}>%&${group}&%>");
var output = Regex.Replace(input, pattern, replacement, options);
output = Regex.Replace(output, #"<!<mode:\\L>%&(?<value>[\w\W]*?)&%>", x => x.Groups["value"].Value.ToLower());
output = Regex.Replace(output, #"<!<mode:\\U>%&(?<value>[\w\W]*?)&%>", x => x.Groups["value"].Value.ToUpper());
return output;
}
How To Use
Call the function with \U followed by the group to be uppercase
var result = EnhancedReplace(input, #"(public \w+ )(\w)", #"$1\U$2", RegexOptions.None);
Will replace this:
public string test12 { get; set; } = "test3";
With that:
public string Test12 { get; set; } = "test3";
Details
I'm currently working on an app which allows the user to define a batch of Regex Replace operations.
For example the user enters json and the batch converts it to a C#-Class.
Therefore, speed is no key requirement. But it would be very handy to be able to use \U and \L.
This method will apply Regex.Replace 3 times to the whole content and one time to the replacement string. Therefore it’s at least three times slower than Regex.Replace without \U \L support.
Step by Step
The first Regex.Replace enhances the replacement string.
It replaces: \U$1 with <!<mode:\\U>%&$1&%>
(Also works for named groups: ${groupName})
The new replacement will be applied to the content.
& 4. The inserted placeholder is now relatively unique. That allows you to search only for <!<mode:\\U>%&Actual Value&%> and use the MatchEvaluator to replace it with its uppercase version. The same will be done for \L
Regex101 Demo:
Step 1: Enhance pattern with placeholder
https://regex101.com/r/ZtqigN/1
Step 2 Use new replacement pattern
https://regex101.com/r/PWLTFD/1
Step 3&4 Resolve new placeholders
https://regex101.com/r/5DIIUo/1
Answer
var result = EnhancedReplace(input, #"(cc)(-)(cc)", #"\L$1$2\U$3", RegexOptions.IgnoreCase);

Regex to text before set of numbers

I have text like this
Inc12345_Month
Ted12345_Month
J8T12345_Month
What I need to do is extract the 12345 and also remove everything before it. This will be done in C#
.+?(?=\d_Monthly) was working in a regex tester online but when I put it in my code it only returned 5_Month.
Edit: the 12345 could be a variable length so I cannot [0-9] multiple times.
Edit2: Code this was just to try and remove everything before the 12345
string text = /* the above text pulled in from a file */;
Regex reg = new Regex(#".+?(?=\d+_Monthly)");
text = reg.Replace(string, "");
You can use this function to strip it:
private static Regex getNumberAndBeyondRegex = new Regex(^.{2}\D+(\d.*)$", RegexOptions.Compiled);
public static string GetNumberAndBeyond(string input)
{
var match = getNumberAndBeyondRegex.Match(input);
if (!match.Success) throw new ArgumentException("String isn't in the correct format.", "input");
return match.Groups[1].Value;
}
The regex at work is ^.{2}\D+(\d.*)$
It works by grabbing anything that's a number, after at least one character that isn't a number. It'll not only match _Month but also other endings.
The regex exists out of a few parts:
^ matches the beginning of the string
.{2} matches any two characters, to prevent a digit from matching if it's the first or 2nd character, you can increase this number to be equal to the minimum prefix length - 1
\D+ matches at least one character that isn't a number
( starts capturing a group
\d.* matches at least one number and any values beyond that
) closes the capturing group
$ matches the end of the string
There are a lot of different regex flavors, many of them have slight differences in terms of escaping, capturing, replacing and quite surely some others.
For testing .NET regexes online I use the free version of the tool RegexHero, it has an popup every now and then, but it makes up for that time by showing you live results, capture groups, and instant replacing. Next to having quite a lot of features.
If you want to match anywhere within the string, you can use the regex \d+_Month, it is very similiar to your original regex. In code:
new Regex("\d+_Month").Match(input).Value
Edit:
Based on the format you supplied in the comment I've created a regex and function to parse the entire file name:
private static Regex parseFileNameRegex = new Regex(#"^.*\D(\d+)_Month_([a-zA-Z]+)\.(\w+)$", RegexOptions.Compiled);
public static bool TryParseFileName(string fileName, out int id, out string month, out string fileExtension)
{
id = 0; month = null; fileExtension = null;
if (fileName == null) return false;
var match = parseFileNameRegex.Match(fileName);
if (!match.Success) return false;
if (!int.TryParse(match.Groups[1].Value, out id) || id < 1) return false; // Convert the ID into a number
month = match.Groups[2].Value;
fileExtension = match.Groups[3].Value;
return true;
}
In the parse function it requires the ID to be at least 1, 0 isn't accepted (and negative numbers won't match the regex), if you don't want this restriction, simply remove || id < 1 from the function.
Using the function would look like:
int id; string month, fileExtension;
if (!TryParseFileName("CompanyName_ClientName12345_Month_Nov.pdf", out id, out month, out fileExtension))
throw new FormatException("File name is incorrectly formatted."); // Do whatever you want when you get an invalid filename
// Use id, month and fileExtension here :)
The regex ^.*\D(\d+)_Month_([a-zA-Z]+)\.(\w+)$ works like:
^ matches the beginning of the string
.*\D matches at least one non-numeric character
(\d+) captures at least 1 number, this is the ID
_Month_ is the literal text in between
([a-zA-Z]+) matches and captures at least 1 letter, this is the month
\. matches a . character
(\w+) matches and captures any alphanumeric (letters and numbers), this is the file extension
$ matches the end of the string
Using :
Regex reg = new Regex(#"\D+(?=(\d+)_Monthly)");
is more explicit, the result is in Groups[1].
Part by part:
.+?
Match anything, maybe. This doesn't make any sense to me. It would be equivalent to ".*", which may or may not be what you meant.
(?=
start a group
\d
Match exactly 1 decimal, which explains what you are seeing, and the rest of the number is matched by .+? which is outside the group
_Monthly
match the literal text
)
end group
I think what you want is:
.*(?=\d+_Monthly)
I guess you are missing the + sign after \d
.+?(?=\d+_Monthly)
This should ask for one or more digits.
If you don't need anything before the number, this should work:
(\d+_Month)
I use Derek Slager's regex tester when I'm working with C# regex.
Better dotnet regular expression tester

Replace all characters and first 0's (zeroes)

I am trying to replace all characters inside a Regular Expression expect the number, but the number should not start with 0
How can I achieve this using Regular Expression?
I have tried multiple things like #"^([1-9]+)(0+)(\d*)"and "(?<=[1-9])0+", but those does not work
Some examples of the text could be hej:\\\\0.0.0.22, hej:22, hej:\\\\?022 and hej:\\\\?22, and the result should in all places be 22
Rather than replace, try and match against [1-9][0-9]*$ on your string. Grab the matched text.
Note that as .NET regexes match Unicode number characters if you use \d, here the regex restricts what is matched to a simple character class instead.
(note: regex assumes matches at end of line only)
According to one of your comments hej:\\\\0.011.0.022 should yield 110022. First select the relevant string part from the first non zero digit up to the last number not being zero:
([1-9].*[1-9]\d*)|[1-9]
[1-9] is the first non zero digit
.* are any number of any characters
[1-9]\d* are numbers, starting at the first non-zero digit
|[1-9] includes cases consisting of only one single non zero digit
Then remove all non digits (\D)
Match match = Regex.Match(input, #"([1-9].*[1-9]\d*)|[1-9]");
if (match.Success) {
result = Regex.Replace(match.Value, "\D", "");
} else {
result = "";
}
Use following
[1-9][0-9]*$
You don't need to do any recursion, just match that.
Here is something that you can try The87Boy you can play around with or add to the pattern as you like.
string strTargetString = #"hej:\\\\*?0222\";
string pattern = "[\\\\hej:0.?*]";
string replacement = " ";
Regex regEx = new Regex(pattern);
string newRegStr = Regex.Replace(regEx.Replace(strTargetString, replacement), #"\s+", " ");
Result from the about Example = 22

Regular Expression For Alphanumeric String With At Least One Alphabet Or Atleast One Numeric In The String

To test one alphanumeric string we usually use the regular expression "^[a-zA-Z0-9_]*$" (or most preferably "^\w+$" for C#). But this regex accepts numeric only strings or alphabet only strings, like "12345678" or "asdfgth".
I need one regex which will accept only the alphanumeric strings that have at-least one alphabet and one number. That is to say by the regex "ar56ji" will be one of the correct strings, not the previously said strings.
Thanks in advance.
This should do it:
if (Regex.IsMatch(subjectString, #"
# Match string having one letter and one digit (min).
\A # Anchor to start of string.
(?=[^0-9]*[0-9]) # at least one number and
(?=[^A-Za-z]*[A-Za-z]) # at least one letter.
\w+ # Match string of alphanums.
\Z # Anchor to end of string.
",
RegexOptions.IgnorePatternWhitespace)) {
// Successful match
} else {
// Match attempt failed
}
EDIT 2012-08-28 Improved efficiency of lookaheads by changing the lazy dot stars to specific greedy char classes.
Try this out:
"^\w*(?=\w*\d)(?=\w*[a-zA-z])\w*$"
There is a good article about it here:
http://nilangshah.wordpress.com/2007/06/26/password-validation-via-regular-expression/
This should work:
"^[a-zA-Z0-9_]*([a-zA-Z][0-9]|[0-9][a-zA-Z])[a-zA-Z0-9_]*$"
This will match:
<zero-or-more-stuff>
EITHER <letter-followed-by-digit> OR <digit-followed-by-letter>
<zero-or-more-stuff>
By ensuring you have either a digit followed by letter or a letter followed by digit, you are enforcing the requirement to have at least one digit and at least one letter. Note that I've left out the _ above, because it wasn't clear whether you would accept that as a letter, a digit, or neither.
Try this one ^([a-zA-z]+[0-9][a-zA-Z0-9]*)|([0-9]+[a-zA-z][a-zA-Z0-9]*)$
Simple is better. If you had a hard time writing it originally, you're (or some other poor sap) is going to have a hard time maintaining it or modifying it. (And I think that I see some possible holes in the approaches listed above.)
using System.Text.RegularExpressions;
boolean IsGoodPassword(string pwd){
int minPwdLen = 8;
int maxPwdLen = 12;
boolean allowableChars = false;
boolean oneLetterOneNumber = false;
boolean goodLength = false;
string allowedCharsPattern = "^[a-z0-9]*$";
//Does it pass the test for containing only allowed chars?
allowableChars = Regex.IsMatch(pwd, allowedCharsPattern , RegexOptions.IgnoreCase));
//Does it contain at least one # and one letter?
oneLetterOneNumber = Regex.IsMatch(pwd, "[0-9]")) && Regex.IsMatch(pwd, "[a-z]", RegularExpressions.IgnoreCase));
//Does it pass length requirements?
goodLength = pwd.Length >= minPwdLength && pwd.Length <= maxPwdLength;
return allowableChars && oneLetterOneNumber && goodLength;
}

Categories

Resources