RegEx - Match using symbols but don't replace them - c#

I would like to use a symbols in a RegEx pattern to find matches, but I don't want them replaced. This is for class and namespace manipulation in C#.
For example:
MyNamespaceLib.EntityDataModelTests.TestsMyClassTests+MyInnerClassTests
must be replaced as:
MyNamespaceLib.EntityDataModel.TestsMyClass+MyInnerClass
(Note, only "Tests" is replace when it appears at the end of the namespace part, and not when it's part of the class/namespace name)
I've managed to get the first part right in finding the matches, but I'm battling to keep the symbols in the replaced match.
So far I have:
var input = "MyNamespaceLib.EntityDataModelTests.TestsMyClassTests+MyInnerClassTests";
var output = Regex.Replace(input, "Tests[.+]|$", "");
I've tried using a non-capturing group, but I suspect it's not meant for the way I'm trying to use it.
Thanks!

So what you want to do is replace matches not followed by a . or a +? Use a lookahead:
#"Tests(?![.+])"

You can use the MatchEvaluator overload of the Regex.Replace method, where the string to replace the match with is generated on the fly. I get the special simbol in a capturing group (and the first capturing group is always Group1 of the match), and replace the match with the value, like this:
var output = Regex.Replace(input, #"Tests([.+]|$)", m => m.Groups[1].Value);
Also, per minitech's comment, you can also use $1 for the first capturing group in the (string, string) overload of Regex.Replace, like:
var output = Regex.Replace(input, #"Tests([.+]|$)", "$1");
That said, a regex is often write-only code, so you can always do a dumb and simple replace:
var output = input.Replace("Tests+","").Replace("Tests.","") ...;

Related

How to replace all occurrences of `someObject.ToString()` with `Convert.ToString(someObject);`

I want to search through my code and replace all instances of someObject.ToString() with Convert.ToString(someObject).
For example if I have:
var x = someClassInstance.ToString()
I want to replace it with:
var x = Convert.ToString(someClassInstance)
Is it possible to do this through regular expression?
The solution will differ slightly based on your environment, but for example in Notepad++:
Search for ([0-9a-zA-Z_]+)\.toString\(\).
Replace with Convert.toString\($1\).
In C#, you can use the following regex:
\b(\w+)\.ToString\(\)
It starts by matching a Word boundary and then graps all Word characters before the dot and ToString(). Note the escaped characters, they have special meaning in regex,
You then need to replace it with:
Convert.ToString($1)
Here '$1' will be replaced by the matched Group 1 from the regex (the name of the method).
Edit:
The above regex will fail if the method name is a call to a method, like 'myMethod(param).ToString()'.
I have changed the regex to accept anything not being a dot followed by 'ToString' (since the code can already compile, there's no need for further syntax checking):
\b((?!\.ToString)(?:[\w.()+*/-])*?)\.ToString\(\)
Now it should include function calls.
Example of match: 'SomeFunction(Int32.MaxValue-1).ToString()'
It will fail, if there are Spaces in the match.

Regex including what is supposed to be non-capturing group in result

I have the following simple test where i'm trying to get the Regex pattern such that it yanks the executable name without the ".exe" suffix.
It appears my non-capturing group setting (?:\\.exe) isn't working or i'm misunderstanding how its intended to work.
Both regex101 and regexstorm.net show the same result and the former confirms that "(?:\.exe)" is a non-capturing match.
Any thoughts on what i'm doing wrong?
// test variable for what i would otherwise acquire from Environment.CommandLine
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?"
var asmName = Regex.Match(testEcl, #"[^\\]+(?:\.exe)", RegexOptions.IgnoreCase).Value;
// expecting "MyApp" but I get "MyApp.exe"
I have been able to extract the value i wanted by using a matching pattern with group names defined, as shown in the following, but would like to understand why non-capturing group setting approach didn't work the way i expected it to.
// test variable for what i would otherwise acquire from Environment.CommandLine
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?"
var asmName = Regex.Match(Environment.CommandLine, #"(?<fname>[^\\]+)(?<ext>\.exe)",
RegexOptions.IgnoreCase).Groups["fname"].Value;
// get the desired "MyApp" result
/eoq
A (?:...) is a non-capturing group that matches and still consumes the text. It means the part of text this group matches is still added to the overall match value.
In general, if you want to match something but not consume, you need to use lookarounds. So, if you need to match something that is followed with a specific string, use a positive lookahead, (?=...) construct:
some_pattern(?=specific string) // if specific string comes immmediately after pattern
some_pattern(?=.*specific string) // if specific string comes anywhere after pattern
If you need to match but "exclude from match" some specific text before, use a positive lookbehind:
(?<=specific string)some_pattern // if specific string comes immmediately before pattern
(?<=specific string.*?)some_pattern // if specific string comes anywhere before pattern
Note that .*? or .* - that is, patterns with *, +, ?, {2,} or even {1,3} quantifiers - in lookbehind patterns are not always supported by regex engines, however, C# .NET regex engine luckily supports them. They are also supported by Python PyPi regex module, Vim, JGSoft software and now by ECMAScript 2018 compliant JavaScript environments.
In this case, you may capture what you need to get and just match the context without capturing:
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?";
var asmName = string.Empty;
var m = Regex.Match(testEcl, #"([^\\]+)\.exe", RegexOptions.IgnoreCase);
if (m.Success)
{
asmName = m.Groups[1].Value;
}
Console.WriteLine(asmName);
See the C# demo
Details
([^\\]+) - Capturing group 1: one or more chars other than \
\. - a literal dot
exe - a literal exe substring.
Since we are only interested in capturing group #1 contents, we grab m.Groups[1].Value, and not the whole m.Value (that contains .exe).
You're using a non-capturing group. The emphasis is on the word group here; the group does not capture the .exe, but the regex in general still does.
You're probably wanting to use a positive lookahead, which just asserts that the string must meet a criteria for the match to be valid, though that criteria is not captured.
In other words, you want (?=, not (?:, at the start of your group.
The former is only if you are enumerating the Groups property of the Match object; in your case, you're just using the Value property, so there's no distinction between a normal group (\.exe) and a non-capturing group (?:\.exe).
To see the distinction, consider this test program:
static void Main(string[] args)
{
var positiveInput = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?";
Test(positiveInput, #"[^\\]+(\.exe)");
Test(positiveInput, #"[^\\]+(?:\.exe)");
Test(positiveInput, #"[^\\]+(?=\.exe)");
var negativeInput = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.dll\" /?";
Test(negativeInput, #"[^\\]+(?=\.exe)");
}
static void Test(String input, String pattern)
{
Console.WriteLine($"Input: {input}");
Console.WriteLine($"Regex pattern: {pattern}");
var match = Regex.Match(input, pattern, RegexOptions.IgnoreCase);
if (match.Success)
{
Console.WriteLine("Matched: " + match.Value);
for (int i = 0; i < match.Groups.Count; i++)
{
Console.WriteLine($"Groups[{i}]: {match.Groups[i]}");
}
}
else
{
Console.WriteLine("No match.");
}
Console.WriteLine("---");
}
The output of this is:
Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
Regex pattern: [^\\]+(\.exe)
Matched: MyApp.exe
Groups[0]: MyApp.exe
Groups[1]: .exe
---
Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
Regex pattern: [^\\]+(?:\.exe)
Matched: MyApp.exe
Groups[0]: MyApp.exe
---
Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
Regex pattern: [^\\]+(?=\.exe)
Matched: MyApp
Groups[0]: MyApp
---
Input: "D:\src\repos\myprj\bin\Debug\MyApp.dll" /?
Regex pattern: [^\\]+(?=\.exe)
No match.
---
The first regex (#"[^\\]+(\.exe)") has \.exe as just a normal group.
When we enumerate the Groups property, we see that .exe is indeed a group captured in our input.
(Note that the entire regex is itself a group, hence Groups[0] is equal to Value).
The second regex (#"[^\\]+(?:\.exe)") is the one provided in your question.
The only difference compared to the previous scenario is that the Groups property doesn't contain .exe as one of its entries.
The third regex (#"[^\\]+(?=\.exe)") is the one I'm suggesting you use.
Now, the .exe part of the input isn't captured by the regex at all, but a regex won't match a string unless it ends in .exe, as the fourth scenario illustrates.
It would match the non capturing group but won't capture it, so if you want the non captured part you should access the capture group instead of the whole match
you can access groups in
var asmName = Regex.Match(testEcl, #"([^\\]+)(?:\.exe)", RegexOptions.IgnoreCase);
asmName.Groups[1].Value
the demo for the regex can be found here

Regex working in Regexr but not C#, why?

From the below mentioned input string, I want to extract the values specified in {} for s:ds field. I have attached my regex pattern. Now the pattern I used for testing on http://www.regexr.com/ is:
s:ds=\\\"({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\\\"
and it works absolutely fine.
But the same in C# code does not work. I have also added \\ instead of \ for c# code and replaced \" with \"" . Let me know if Im doing something wrong. Below is the code snippet.
string inputString is "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311}\" s:ds=\"{37BA4BA0-581C-40DC-A542-FFD9E99BC345}\" s:id=\"{C091E71D-4817-49BC-B120-56CE88BC52C2}\"";
string regex = #"s:ds=\\\""({[\d\w]{8}\-(?:[\d\w]{4}\-){3}[\d\w]{12}})\\\""";
MatchCollection matchCollection = Regex.Matches(layoutField, regex);
if (matchCollection.Count > 1)
{
Log.Info("Collection Found.", this);
}
If you only watch to match the values...
You should be able to just use ([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}) for your expression if you only want to match the withing your gullwing braces :
string input = "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311} ...";
// Use the following expression to just match your GUID values
string regex = #"([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
// Store your matches
MatchCollection matchCollection = Regex.Matches(input, regex);
// Iterate through each one
foreach(var match in matchCollection)
{
// Output the match
Console.WriteLine("Collection Found : {0}", match);
}
You can see a working example of this in action here and example output demonstrated below :
If you want to only match those following s:ds...
If you only want to capture the values for s:ds sections, you could consider appending (?<=(s:ds=""{)) to the front of your expression, which would be a look-behind that would only match values that were preceded by "s:ds={" :
string regex = #"(?<=(s:ds=""{))([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
You can see an example of this approach here and demonstrated below (notice it doesn't match the s:id element :
Another Consideration
Currently you are using \w to match "word" characters within your expression and while this might work for your uses, it will match all digits \d, letters a-zA-z and underscores _. It's unlikely that you would need some of these, so you may want to consider revising your character sets to use just what you would expect like [A-Z\d] to only match uppercase letters and numbers or [0-9A-Fa-f] if you are only expected GUID values (e.g. hex).
Looks like you might be over-escaping.
Give this a shot:
#"s:ds=\""({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\"""

Using Regular Expression to match fields with an arbitrary delimiter

I suppose this should be an old question, however, I didn't find suitable solution in the forums after several hours searching.
I'm using C# and I know the Regex.Split and String.Split methods can be used to achieve the expected results. For some reason, I need to use a regular expression to match the required fields by specifying an arbitrary delimiter. For example, here is the string:
#DIV#This#DIV#is#DIV#"A "#DIV#string#DIV#
Here, #DIV# is the delimiter and is going to be split as:
This
is
"A "
string
How can I use a regular expression to match these values?
By the way, the leading and trailing #DIV# could also be ignored, for example, below source string should also be same result with above:
#DIV#This#DIV#is#DIV#"A "#DIV#string
This#DIV#is#DIV#"A "#DIV#string#DIV#
This#DIV#is#DIV#"A "#DIV#string
UPDATE:
I think I found a way (mind it is not efficient!) to get rid of empty values with a regex.
var splits = Regex.Matches(strIn, #"(?<=#DIV#|^)(?:(?!#DIV#).)+?(?=$|#DIV#)");
See demo on regexstorm (mind the \r? is only to demo in Multiline mode, you do not need it when using in real life)
ORIGINAL ANSWER
Here is another approach using a regular Split:
var strIn = "#DIV#This#DIV#is#DIV#\"A # \"#DIV#string#DIV#";
var splitText = strIn.Split(new[] {"#DIV#"}, StringSplitOptions.RemoveEmptyEntries);
Or else, you can use a regex to match the fields you need and then remove empty items with LINQ:
var spltsTxt2 = Regex.Matches(strIn, #"(?<=#DIV#|^).*?(?=#DIV#|$)").Cast<Match>().Where(p => !string.IsNullOrEmpty(p.Value)).Select(p => p.Value).ToList();
Output:
#DIV#|(.+?)(?=#DIV#|$)
Try this.Grab the captures or groups.See demo.
https://www.regex101.com/r/fJ6cR4/21
You can use the following to match:
/#?DIV#?/g
And replace with ' ' (space)
But this will give trailing and leading spaces sometimes.. which can be removed by using String.Trim()
Edit1: If you want to match the field values you can use the following:
(?<=(#?DIV#?)|^)[^#]*?(?=(#?DIV#?)|$)
See DEMO
Edit2: More generalized regex for matching # in fields:
(?m)(?<=(^(?!#?DIV#)|(#?DIV#)))(.*?)(?=($|(#DIV#?)))

need to create a C# Regex similar to this perl expression

I was wondering if it is possible to build equivalent C# regular expression for finding this pattern in a filename. For example, this is the expr in perl /^filer_(\d{10}).txt(.gz)?$/i Could we find or extract the \d{10} part so I can use it in processing?
To create a Regex object that will ignore character casing and match your filter try the following:
Regex fileFilter = new Regex(#"^filter_(\d{10})\.txt(\.gz)?$", RegexOptions.IgnoreCase),
To perform the match:
Match match = fileFilter.Match(filename);
And to get the value (number here):
if(match.Success)
string id = match.Groups[1].Value;
The matched groups work similar to Perl's matches, [0] references the whole match, [1] the first sub pattern/match, etc.
Note: In your initial perl code you didn't escape the . characters so they'd match any character, not just real periods!
Yes, you can. See the Groups property of the Match class that is returned by a call to Regex.Match.
In your case, it would be something along the lines of the following:
Regex yourRegex = new Regex("^filer_(\d{10}).txt(.gz)?$");
Match match = yourRegex.Match(input);
if(match.Success)
result = match.Groups[1].Value;
I don't know, what the /i means at the end of your regex, so I removed it in my sample code.
As daniel shows, you can access the content of the matched input via groups. But instead of using default indexed groups you can also use named groups. In the following i show how and also that you can use the static version of Match.
Match m = Regex.Match(input, #"^(?i)filer_(?<fileID>\d{10}).txt(?:.gz)?$");
if(m.Success)
string s = m.Groups["fileID"].Value;
The /i in perl means IgnoreCase as also shown by Mario. This can also be set inline in the regex statement using (?i) as shown above.
The last part (?:.gz) creates a non-capturing group, which means that it’s used in the match but no group is created.
I'm not sure if that's what you want, this is how you can do that.

Categories

Resources