Regex to match path - c#

I trying to validate user entered string is correct relative path or not.
It should not start with assets/
It should not end with /
It should not end with any file extension like .html or .php or .jpg
It should not contain dot .
I am trying with below regex, but it is not working correctly.
^([a-z]:)*(\/*(\.*[a-z0-9]+\/)*(\.*[a-z0-9]+))
My test cases
Valid path
sample/hello/images
sample/hello_vid/user/data
test/123/user_live/images
Invalid path
assets/sample/hello/images
sample/hello_vid/user/data/
test/123/user_live/images/index.html
hii/sk.123/data
ok/bye/last.exe

Alternative "readable" approach ;)
public static bool IsValidPath(this string path)
{
if (string.IsNullOrEmpty(path)) return false;
if (path.StartsWith("assets/")) return false;
if (path.EndsWith("/")) return false;
if (path.Contains(".")) return false;
return true;
}
// Usage
var value = "sample/hello/images";
if (value.IsValidPath())
{
// use the value...
}

If you also want to match the underscore, you could add it to the character class. To prevent matching assets/ at the start, you could use a negative lookahead.
^(?!assets/)[a-z0-9_]+(?:/[a-z0-9_]+)+$
^ Start of string
(?!assets/) Assert what is directly to the right is not assets/
[a-z0-9_]+ Repeat 1+ times any of the listed, including the underscore
(?:/[a-z0-9_]+)+ Repeat 1+ times a / and 1+ times any of the listed
$ End of string
Regex demo
Or you could use \w instead of the character class
^(?!assets/)\w+(?:/\w+)+$

This works for your test cases.
Regex explained:
^(?!assets/) # It should not start with assets/
[^\.]+ # It should not contain dot (file extensions contain a dot)
(?<!/)$ # It should not end with /
Regex in one line:
^(?!assets/)[^\.]+(?<!/)$

You may try below regex to achieve your purpose:
^(?!assets\/)[^.]*?$(?<!\/\r?$)
Explanation of the above regex:
^, $ - Represents start and end of line respectively.
(?!assets\/) - Represents a negative look-ahead not matching the test strings which start with assets\.
[^.]*? - Matches everything lazily other than .. This case will cover the file extensions too so no need to check them again.
(?<!\/\r?$) - Represents negative look behind not matching the test string if it contains \ as the last character.
You can find the demo of the above regex in here.
Implementation in C#
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"^(?!assets\/)[^.]*?$(?<!\/\r?$)";
string input = #"sample/hello/images
sample/hello_vid/user/data
test/123/user_live/images
assets/sample/hello/images
sample/hello_vid/user/data/
test/123/user_live/images/index.html
hii/sk.123/data
ok/bye/last.exe";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine(m.Value);
}
}
}
You can find the sample run of the above code in here.

The following regex should be fine for the scenarios you have mentioned:
^((?!(assets\/))(?! )([a-zA-Z0-9_ ]+(?<! )\/(?! ))+[a-zA-Z0-9_ ]+)$
In addition to the scenarios in the questions, I have also taken care of a folder name not starting or ending with spaces.

Related

Get only integer value from a string which contains bracket { in C#

I have a simple, very simple regex pattern like:
private static string FORMAT_REGEX = #"\{(\d)\}";
I have a string like I have {323} dollars and I want to get only 323
When I used:
Regex regex = new Regex(FORMAT_REGEX);
Match match = regex.Match(format);
if (match.Success)
{
return match.Groups[0].Value; // here comes {323} instead of 323
}
I'm sure that my pattern is wrong. What is the correct pattern ?
Only a small mistake.
You need a + sign after \d like this: \d+ to capture all digits.
And you need to get the first group: match.Groups[1].Value
Edit:
Here is a .NETFiddle
Groups[0] will always return the whole capture. You need to get the value of Groups[1].
Also, you need to capture multiple digits:
#"\{(\d+)\}";
// not
#"\{(\d)\}";
See the example at MSDN: Match.Groups Property for an example of just this, where you can capture multiple groups as well as the whole string. In that example they use \d{n} to capture exactly n digits.

Regex to text before set of numbers

I have text like this
Inc12345_Month
Ted12345_Month
J8T12345_Month
What I need to do is extract the 12345 and also remove everything before it. This will be done in C#
.+?(?=\d_Monthly) was working in a regex tester online but when I put it in my code it only returned 5_Month.
Edit: the 12345 could be a variable length so I cannot [0-9] multiple times.
Edit2: Code this was just to try and remove everything before the 12345
string text = /* the above text pulled in from a file */;
Regex reg = new Regex(#".+?(?=\d+_Monthly)");
text = reg.Replace(string, "");
You can use this function to strip it:
private static Regex getNumberAndBeyondRegex = new Regex(^.{2}\D+(\d.*)$", RegexOptions.Compiled);
public static string GetNumberAndBeyond(string input)
{
var match = getNumberAndBeyondRegex.Match(input);
if (!match.Success) throw new ArgumentException("String isn't in the correct format.", "input");
return match.Groups[1].Value;
}
The regex at work is ^.{2}\D+(\d.*)$
It works by grabbing anything that's a number, after at least one character that isn't a number. It'll not only match _Month but also other endings.
The regex exists out of a few parts:
^ matches the beginning of the string
.{2} matches any two characters, to prevent a digit from matching if it's the first or 2nd character, you can increase this number to be equal to the minimum prefix length - 1
\D+ matches at least one character that isn't a number
( starts capturing a group
\d.* matches at least one number and any values beyond that
) closes the capturing group
$ matches the end of the string
There are a lot of different regex flavors, many of them have slight differences in terms of escaping, capturing, replacing and quite surely some others.
For testing .NET regexes online I use the free version of the tool RegexHero, it has an popup every now and then, but it makes up for that time by showing you live results, capture groups, and instant replacing. Next to having quite a lot of features.
If you want to match anywhere within the string, you can use the regex \d+_Month, it is very similiar to your original regex. In code:
new Regex("\d+_Month").Match(input).Value
Edit:
Based on the format you supplied in the comment I've created a regex and function to parse the entire file name:
private static Regex parseFileNameRegex = new Regex(#"^.*\D(\d+)_Month_([a-zA-Z]+)\.(\w+)$", RegexOptions.Compiled);
public static bool TryParseFileName(string fileName, out int id, out string month, out string fileExtension)
{
id = 0; month = null; fileExtension = null;
if (fileName == null) return false;
var match = parseFileNameRegex.Match(fileName);
if (!match.Success) return false;
if (!int.TryParse(match.Groups[1].Value, out id) || id < 1) return false; // Convert the ID into a number
month = match.Groups[2].Value;
fileExtension = match.Groups[3].Value;
return true;
}
In the parse function it requires the ID to be at least 1, 0 isn't accepted (and negative numbers won't match the regex), if you don't want this restriction, simply remove || id < 1 from the function.
Using the function would look like:
int id; string month, fileExtension;
if (!TryParseFileName("CompanyName_ClientName12345_Month_Nov.pdf", out id, out month, out fileExtension))
throw new FormatException("File name is incorrectly formatted."); // Do whatever you want when you get an invalid filename
// Use id, month and fileExtension here :)
The regex ^.*\D(\d+)_Month_([a-zA-Z]+)\.(\w+)$ works like:
^ matches the beginning of the string
.*\D matches at least one non-numeric character
(\d+) captures at least 1 number, this is the ID
_Month_ is the literal text in between
([a-zA-Z]+) matches and captures at least 1 letter, this is the month
\. matches a . character
(\w+) matches and captures any alphanumeric (letters and numbers), this is the file extension
$ matches the end of the string
Using :
Regex reg = new Regex(#"\D+(?=(\d+)_Monthly)");
is more explicit, the result is in Groups[1].
Part by part:
.+?
Match anything, maybe. This doesn't make any sense to me. It would be equivalent to ".*", which may or may not be what you meant.
(?=
start a group
\d
Match exactly 1 decimal, which explains what you are seeing, and the rest of the number is matched by .+? which is outside the group
_Monthly
match the literal text
)
end group
I think what you want is:
.*(?=\d+_Monthly)
I guess you are missing the + sign after \d
.+?(?=\d+_Monthly)
This should ask for one or more digits.
If you don't need anything before the number, this should work:
(\d+_Month)
I use Derek Slager's regex tester when I'm working with C# regex.
Better dotnet regular expression tester

Validate filename in c# through regex

I want to validate a filename with this format : LetterNumber_Enrollment_YYYYMMDD_HHMM.xml
string filename = "Try123_Enrollment_20130102_1200.xml";
Regex pattern = new Regex(#"[a-zA-z]_Enrollment_[0-9]{6}_[0-9]{4}\\.xml");
if (pattern.IsMatch(filename))
{
return isValid = true;
}
However, I can't make it to work.
Any thing that i missed here?
You are not matching digits at the beginning. Your pattern should be: ^[A-Za-z0-9]+_Enrollment_[0-9]{8}_[0-9]{4}\.xml$ to match given string.
Changes:
Your string starts with alphanumeric string before first _ symbol so you need to check both (letters and digits).
After Environment_ part you have digits with the length of 8 not 6.
No need of double \. You need to escape just dot (i.e. \.).
Demo app:
using System;
using System.Text.RegularExpressions;
class Test {
static void Main() {
string filename = "Try123_Enrollment_20130102_1200.xml";
Regex pattern = new Regex(#"^[A-Za-z0-9]+_Enrollment_[0-9]{8}_[0-9]{4}\.xml$");
if (pattern.IsMatch(filename))
{
Console.WriteLine("Matched");
}
}
}
Your Regex is nowhere near your actual string:
you only match a single letter at the start (and no digits) so Try123 doesn't match
you match 6 digits instead of 8 at the date part so 20130102 doesn't match
you have escaped your backslash near the end (\\.xml) but you've also used # on your string: with # you don't need to escape.
Try this instead:
#"[a-zA-Z]{3}\d{3}_Enrollment_[0-9]{8}_[0-9]{4}\.xml"
I've assumed you want only three letters and three numbers at the start; in fact you may want this:
#"[\w]*_Enrollment_[0-9]{8}_[0-9]{4}\.xml"
You can try the following, it matches letters and digits at the beginning and also ensures that the date is valid.
[A-Za-z0-9]+_Enrollment_(19|20)\d\d(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])_[0-9]{4}\.xml
As an aside, to test your regular expressions try the free regular expression designer from Rad Software, I find that it helps me work out complex expressions beforehand.
http://www.radsoftware.com.au/regexdesigner/

Regex instance can not find more than one match even though it exists

I was using Regex and I wrote this:
static void Main(string[] args)
{
string test = "this a string meant to test long space recognition n a";
Regex regex = new Regex(#"[a-z][\s]{4,}[a-z]$");
MatchCollection matches = regex.Matches(test);
if (matches.Count > 1)
Console.WriteLine("yes");
else
{
Console.WriteLine("no");
Console.WriteLine("the number of matches is "+matches.Count);
}
}
In my opinion the Matches method should find both "n n" and "n a". Nevertheless, it only manages to find "n n" and I just do not understand why is that..
The $ in your regular expression means, that the pattern must occur at the end of the line. If you want to find all the long spaces this simple expression suffices:
\s{4,}
If you really need to know whether the spaces are enclosed by a-z, you can search like this
(?<=[a-z])\s{4,}(?=[a-z])
This uses the pattern...
(?<=prefix)find(?=suffix)
...and finds positions enclosed between a prefix and a suffix. The prefix and suffix are not part of the match, i.e. match.Value contains only the contiguous spaces. Therefore you don't get the "n" is consumed problem mentioned by Jon Skeet.
You have two problems:
1) You're anchoring the match to the end of the string. So actually, the value that's matched is "n...a", not "n...n"
2) The middle "n" is consumed by the first match, so can't be part of the second match. If you change that "n" to "nx" (and remove the $) you'll see "n...n" and "x...a"
Short but complete example:
using System;
using System.Text.RegularExpressions;
public class Test
{
static void Main(string[] args)
{
string text = "ignored a bc d";
Regex regex = new Regex(#"[a-z][\s]{4,}[a-z]");
foreach (Match match in regex.Matches(text))
{
Console.WriteLine(match);
}
}
}
Result:
a b
c d
I just do not understand why is that..
I think the 'why' it is consumed by the first match is to prevent regexes like "\\w+s", designed to get every word that ends with an 's' from returning "ts", "ats" and "cats" when matched against "cats".
The Regex machinery does one match, if you want more, you have to restart it youself after the first match.

Regex to extract Variable Part

I have a string containing this: #[User::RootPath]+"Dim_MyPackage10.dtsx" and I need to extract the [User::RootPath] part using a regex. So far I have this regex: [a-zA-Z0-9]*\.dtsx but I don't know how to proceed further.
For the variable, why not consume what is needed by using the not set [^ ] to extract everything except in the set?
The ^ in the braces means find what is not matched, such as this where it seeks all that is not a ] or a quote (").
Then we can place the actual matches in named capture groups (?<{NameHere}> ) and extract accordingly
string pattern = #"(?:#\[)(?<Path>[^\]]+)(?:\]\+\"")(?<File>[^\""]+)(?:"")";
// Pattern is (?:#\[)(?<Path>[^\]]+)(?:\]\+\")(?<File>[^\"]+)(?:")
// w/o the "'s escapes for the C# parser
string text = #"#[User::RootPath]+""Dim_MyPackage10.dtsx""";
var result = Regex.Match(text, pattern);
Console.WriteLine ("Path: {0}{1}File: {2}",
result.Groups["Path"].Value,
Environment.NewLine,
result.Groups["File"].Value
);
/* Outputs
Path: User::RootPath
File: Dim_MyPackage10.dtsx
*/
(?: ) is match but don't capture, because we use those as defacto anchors for our pattern and to not place them into the match capture groups.
Use this regex pattern:
\[[^[\]]*\]
Check this demo.
Your regex will match any number of alphanumeric characters, followed by .dtsx. In your example, it would match MyPackage10.dtsx.
If you want to match Dim_MyPackage10.dtsx you need to add an underscore to your list of allowed characters in the regex: [a-zA-Z0-9]*.dtsx
If you want to match the [User::RootPath], you need a regex that will stop at the last / (or \, depends on which type of slashes you use in the paths): something like this: .*\/ (or .*\\)
From the answers and comments - and the fact that none has been 'accepted' so far - it appears to me that the question/problem is not completely clear. If you're looking for the pattern [User::SomeVariable] where only 'SomeVariable' is, well, variable, then you may try:
\[User::\w+]
to capture the full expression.
Furthermore, if you wish to detect that pattern, but then need only the "SomeVariable" part, you may try:
(?<=\[User::)\w+(?=])
which uses look-arounds.
Here it is bro
using System;
using System.Text.RegularExpressions;
namespace myapp
{
class Class1
{
static void Main(string[] args)
{
String sourcestring = "source string to match with pattern";
Regex re = new Regex(#"\[\S+\]");
MatchCollection mc = re.Matches(sourcestring);
int mIdx=0;
foreach (Match m in mc)
{
for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
{
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
}
mIdx++;
}
}
}
}

Categories

Resources