need to create a C# Regex similar to this perl expression - c#

I was wondering if it is possible to build equivalent C# regular expression for finding this pattern in a filename. For example, this is the expr in perl /^filer_(\d{10}).txt(.gz)?$/i Could we find or extract the \d{10} part so I can use it in processing?

To create a Regex object that will ignore character casing and match your filter try the following:
Regex fileFilter = new Regex(#"^filter_(\d{10})\.txt(\.gz)?$", RegexOptions.IgnoreCase),
To perform the match:
Match match = fileFilter.Match(filename);
And to get the value (number here):
if(match.Success)
string id = match.Groups[1].Value;
The matched groups work similar to Perl's matches, [0] references the whole match, [1] the first sub pattern/match, etc.
Note: In your initial perl code you didn't escape the . characters so they'd match any character, not just real periods!

Yes, you can. See the Groups property of the Match class that is returned by a call to Regex.Match.
In your case, it would be something along the lines of the following:
Regex yourRegex = new Regex("^filer_(\d{10}).txt(.gz)?$");
Match match = yourRegex.Match(input);
if(match.Success)
result = match.Groups[1].Value;
I don't know, what the /i means at the end of your regex, so I removed it in my sample code.

As daniel shows, you can access the content of the matched input via groups. But instead of using default indexed groups you can also use named groups. In the following i show how and also that you can use the static version of Match.
Match m = Regex.Match(input, #"^(?i)filer_(?<fileID>\d{10}).txt(?:.gz)?$");
if(m.Success)
string s = m.Groups["fileID"].Value;
The /i in perl means IgnoreCase as also shown by Mario. This can also be set inline in the regex statement using (?i) as shown above.
The last part (?:.gz) creates a non-capturing group, which means that it’s used in the match but no group is created.
I'm not sure if that's what you want, this is how you can do that.

Related

c# regex matches example to extract result

I am trying to extract a character/digit from a string that is between single quotes and seems like i am failing to write the correct pattern.
Test string - only value that changes is the single character/digit in single quotes
[+] Random session part: 'm'
I am using the following pattern but it returns empty
var line = "[+] Random session part: 'm'";
Regex pattern = new Regex(#"(?<=\')(.*?)(?=\')");
Match match = pattern.Match(line);
Debug.Log($"{match.Groups["postfix"].Value}");
int postFix = int.Parse(match.Groups["postfix"].Value);
what am i missing?
You have an overly complicated regex, and looking for a group named 'postfix' in you match, while your regex does not have such a named group.
A simpler regex would be:
'(.)'
This looks for a single character between two single quotes, and has that character wrapped in a capture group. Put a breakpoint after your match row, and you can explore the matched object.
You can explore the regex above with your match here:
https://regexr.com/77b0m
BTW: Your code tries to parse the string "m" into an int, this will throw and error, your should probably handle that case with int.TryParse
you can use this regX :
'(.)' // match any string between single quotes
show result
or
(?<=\')(.*?)(?=\') //containing a non-greedy match
show result

Regex including what is supposed to be non-capturing group in result

I have the following simple test where i'm trying to get the Regex pattern such that it yanks the executable name without the ".exe" suffix.
It appears my non-capturing group setting (?:\\.exe) isn't working or i'm misunderstanding how its intended to work.
Both regex101 and regexstorm.net show the same result and the former confirms that "(?:\.exe)" is a non-capturing match.
Any thoughts on what i'm doing wrong?
// test variable for what i would otherwise acquire from Environment.CommandLine
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?"
var asmName = Regex.Match(testEcl, #"[^\\]+(?:\.exe)", RegexOptions.IgnoreCase).Value;
// expecting "MyApp" but I get "MyApp.exe"
I have been able to extract the value i wanted by using a matching pattern with group names defined, as shown in the following, but would like to understand why non-capturing group setting approach didn't work the way i expected it to.
// test variable for what i would otherwise acquire from Environment.CommandLine
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?"
var asmName = Regex.Match(Environment.CommandLine, #"(?<fname>[^\\]+)(?<ext>\.exe)",
RegexOptions.IgnoreCase).Groups["fname"].Value;
// get the desired "MyApp" result
/eoq
A (?:...) is a non-capturing group that matches and still consumes the text. It means the part of text this group matches is still added to the overall match value.
In general, if you want to match something but not consume, you need to use lookarounds. So, if you need to match something that is followed with a specific string, use a positive lookahead, (?=...) construct:
some_pattern(?=specific string) // if specific string comes immmediately after pattern
some_pattern(?=.*specific string) // if specific string comes anywhere after pattern
If you need to match but "exclude from match" some specific text before, use a positive lookbehind:
(?<=specific string)some_pattern // if specific string comes immmediately before pattern
(?<=specific string.*?)some_pattern // if specific string comes anywhere before pattern
Note that .*? or .* - that is, patterns with *, +, ?, {2,} or even {1,3} quantifiers - in lookbehind patterns are not always supported by regex engines, however, C# .NET regex engine luckily supports them. They are also supported by Python PyPi regex module, Vim, JGSoft software and now by ECMAScript 2018 compliant JavaScript environments.
In this case, you may capture what you need to get and just match the context without capturing:
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?";
var asmName = string.Empty;
var m = Regex.Match(testEcl, #"([^\\]+)\.exe", RegexOptions.IgnoreCase);
if (m.Success)
{
asmName = m.Groups[1].Value;
}
Console.WriteLine(asmName);
See the C# demo
Details
([^\\]+) - Capturing group 1: one or more chars other than \
\. - a literal dot
exe - a literal exe substring.
Since we are only interested in capturing group #1 contents, we grab m.Groups[1].Value, and not the whole m.Value (that contains .exe).
You're using a non-capturing group. The emphasis is on the word group here; the group does not capture the .exe, but the regex in general still does.
You're probably wanting to use a positive lookahead, which just asserts that the string must meet a criteria for the match to be valid, though that criteria is not captured.
In other words, you want (?=, not (?:, at the start of your group.
The former is only if you are enumerating the Groups property of the Match object; in your case, you're just using the Value property, so there's no distinction between a normal group (\.exe) and a non-capturing group (?:\.exe).
To see the distinction, consider this test program:
static void Main(string[] args)
{
var positiveInput = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?";
Test(positiveInput, #"[^\\]+(\.exe)");
Test(positiveInput, #"[^\\]+(?:\.exe)");
Test(positiveInput, #"[^\\]+(?=\.exe)");
var negativeInput = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.dll\" /?";
Test(negativeInput, #"[^\\]+(?=\.exe)");
}
static void Test(String input, String pattern)
{
Console.WriteLine($"Input: {input}");
Console.WriteLine($"Regex pattern: {pattern}");
var match = Regex.Match(input, pattern, RegexOptions.IgnoreCase);
if (match.Success)
{
Console.WriteLine("Matched: " + match.Value);
for (int i = 0; i < match.Groups.Count; i++)
{
Console.WriteLine($"Groups[{i}]: {match.Groups[i]}");
}
}
else
{
Console.WriteLine("No match.");
}
Console.WriteLine("---");
}
The output of this is:
Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
Regex pattern: [^\\]+(\.exe)
Matched: MyApp.exe
Groups[0]: MyApp.exe
Groups[1]: .exe
---
Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
Regex pattern: [^\\]+(?:\.exe)
Matched: MyApp.exe
Groups[0]: MyApp.exe
---
Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
Regex pattern: [^\\]+(?=\.exe)
Matched: MyApp
Groups[0]: MyApp
---
Input: "D:\src\repos\myprj\bin\Debug\MyApp.dll" /?
Regex pattern: [^\\]+(?=\.exe)
No match.
---
The first regex (#"[^\\]+(\.exe)") has \.exe as just a normal group.
When we enumerate the Groups property, we see that .exe is indeed a group captured in our input.
(Note that the entire regex is itself a group, hence Groups[0] is equal to Value).
The second regex (#"[^\\]+(?:\.exe)") is the one provided in your question.
The only difference compared to the previous scenario is that the Groups property doesn't contain .exe as one of its entries.
The third regex (#"[^\\]+(?=\.exe)") is the one I'm suggesting you use.
Now, the .exe part of the input isn't captured by the regex at all, but a regex won't match a string unless it ends in .exe, as the fourth scenario illustrates.
It would match the non capturing group but won't capture it, so if you want the non captured part you should access the capture group instead of the whole match
you can access groups in
var asmName = Regex.Match(testEcl, #"([^\\]+)(?:\.exe)", RegexOptions.IgnoreCase);
asmName.Groups[1].Value
the demo for the regex can be found here

RegEx - Match using symbols but don't replace them

I would like to use a symbols in a RegEx pattern to find matches, but I don't want them replaced. This is for class and namespace manipulation in C#.
For example:
MyNamespaceLib.EntityDataModelTests.TestsMyClassTests+MyInnerClassTests
must be replaced as:
MyNamespaceLib.EntityDataModel.TestsMyClass+MyInnerClass
(Note, only "Tests" is replace when it appears at the end of the namespace part, and not when it's part of the class/namespace name)
I've managed to get the first part right in finding the matches, but I'm battling to keep the symbols in the replaced match.
So far I have:
var input = "MyNamespaceLib.EntityDataModelTests.TestsMyClassTests+MyInnerClassTests";
var output = Regex.Replace(input, "Tests[.+]|$", "");
I've tried using a non-capturing group, but I suspect it's not meant for the way I'm trying to use it.
Thanks!
So what you want to do is replace matches not followed by a . or a +? Use a lookahead:
#"Tests(?![.+])"
You can use the MatchEvaluator overload of the Regex.Replace method, where the string to replace the match with is generated on the fly. I get the special simbol in a capturing group (and the first capturing group is always Group1 of the match), and replace the match with the value, like this:
var output = Regex.Replace(input, #"Tests([.+]|$)", m => m.Groups[1].Value);
Also, per minitech's comment, you can also use $1 for the first capturing group in the (string, string) overload of Regex.Replace, like:
var output = Regex.Replace(input, #"Tests([.+]|$)", "$1");
That said, a regex is often write-only code, so you can always do a dumb and simple replace:
var output = input.Replace("Tests+","").Replace("Tests.","") ...;

C# regex match, match.Success returns false even after following the rules

Friends,
I want to match a string like
"int lnum[];" so I am trying to match it with a pattern like this
[A-Za-z_0-9] [A-Za-z_0-9]\[\]
but it does not seem to work.
I looked up rules at http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
string pJavaLine = "int lnum[]";
match = Regex.Match(pJavaLine, #"[A-Za-z_0-9] [A-Za-z_0-9]\[\] ", RegexOptions.IgnoreCase);
if (match.Success) {
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
the match.Success returns false.
Would anybody please let me know a possible way to get this.
Each of your character classes, like [A-Za-z_0-9], matches only a single character. If you want to match more than one character, you need to add something to the end. For example, [A-Za-z_0-9]+ -- the + means 1 or more of these. You could also use * for 0 or more, or specify a range, like {2,5} for 2-5 characters.
That said, you can use this pattern to match that string:
[A-Za-z_0-9]+ [A-Za-z_0-9]+\[\]
The \w is loosely equivalent to [A-Za-z_0-9] (see link in jessehouwing's comment below), so you can probably simply use:
\w+ \w+\[\]
Check here for more info on the standard Character Classes.

Extending [^,]+, Regular Expression in C#

Duplicate
Regex for variable declaration and initialization in c#
I was looking for a Regular Expression to parse CSV values, and I came across this Regular Expression
[^,]+
Which does my work by splitting the words on every occurance of a ",". What i want to know is say I have the string
value_name v1,v2,v3,v4,...
Now I want a regular expression to find me the words v1,v2,v3,v4..
I tried ->
^value_name\s+([^,]+)*
But it didn't work for me. Can you tell me what I am doing wrong? I remember working on regular expressions and their statemachine implementation. Doesn't it work in the same way.
If a string starts with Value_name followed by one or more whitespaces. Go to Next State. In That State read a word until a "," comes. Then do it again! And each word will be grouped!
Am i wrong in understanding it?
You could use a Regex similar to those proposed:
(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?
The first group is non-capturing and would match the start of the line and the value_name.
To ensure that the Regex is still valid over all matches, we make that group optional by using the '?' modified (meaning match at most once).
The second group is capturing and would match your vXX data.
The third group is non-capturing and would match the ,, and any whitespace before and after it.
Again, we make it optional by using the '?' modifier, otherwise the last 'vXX' group would not match unless we ended the string with a final ','.
In you trials, the Regex wouldn't match multiple times: you have to remember that if you want a Regex to match multiple occurrences in a strings, the whole Regex needs to match every single occurrence in the string, so you have to build your Regex not only to match the start of the string 'value_name', but also match every occurrence of 'vXX' in it.
In C#, you could list all matches and groups using code like this:
Regex r = new Regex(#"(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?");
Match m = r.Match(subjectString);
while (m.Success) {
for (int i = 1; i < m.Groups.Count; i++) {
Group g = m.Groups[i];
if (g.Success) {
// matched text: g.Value
// match start: g.Index
// match length: g.Length
}
}
m = m.NextMatch();
}
I would expect it only to get v1 in the group, because the first comma is "blocking" it from grabbing the rest of the fields. How you handle this is going to depend on the methods you use on the regular expression, but it may make sense to make two passes, first grab all the fields seperated by commas and then break things up on spaces. Perhaps ^value_name\s+(?:([^,]+),?)* instead.
Oh yeah, lists....
/(?:^value_name\s+|,\s*)([^,]+)/g will theoreticly grab them, but you will have to use RegExp.exec() in a loop to get the capture, rather than the whole match.
I wish pre-matches worked in JS :(.
Otherwise, go with Logan's idea: /^value_name\s+([^,]+(?:,\s*[^,]+)*)$/ followed by .split(/,\s*/);

Categories

Resources