C# Regular Expressions - c#

I have a string that has multiple regular expression groups, and some parts of the string that aren't in the groups. I need to replace a character, in this case ^ only within the groups, but not in the parts of the string that aren't in a regex group.
Here's the input string:
STARTDONTREPLACEME^ENDDONTREPLACEME~STARTREPLACEME^ENDREPLACEME~STARTREPLACEME^BLAH^ENDREPLACEME~STARTDONTREPLACEME^BLAH^ENDDONTREPLACEME~
Here's what the output string should look like:
STARTDONTREPLACEME^ENDDONTREPLACEME~STARTREPLACEMEENDREPLACEME~STARTREPLACEMEBLAHENDREPLACEME~STARTDONTREPLACEME^BLAH^ENDDONTREPLACEME~
I need to do it using C# and can use regular expressions.
I can match the string into groups of those that should and shouldn't be replaced, but am struggling on how to return the final output string.

I'm not sure I get exactly what you're having trouble with, but it didn't take long to come up with this result:
string strRegex = #"STARTREPLACEME(.+)ENDREPLACEME";
RegexOptions myRegexOptions = RegexOptions.None;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"STARTDONTREPLACEME^ENDDONTREPLACEME~STARTREPLACEME^ENDREPLACEME~STARTREPLACEME^BLAH^ENDREPLACEME~STARTDONTREPLACEME^BLAH^ENDDONTREPLACEME~";
string strReplace = "STARTREPLACEMEENDREPLACEME";
return myRegex.Replace(strTargetString, strReplace);
By using my favorite online Regex tool: http://regexhero.net/tester/
Is that helpful?

Regex rgx = new Regex(
#"\^(?=(?>(?:(?!(?:START|END)(?:DONT)?REPLACEME).)*)ENDREPLACEME)");
string s1 = rgx.Replace(s0, String.Empty);
Explanation: Each time a ^ is found, the lookahead scans ahead for an ending delimiter (ENDREPLACEME). If it finds one without seeing any of the other delimiters first, the match must have occurred inside a REPLACEME group. If the lookahead reports failure, it indicates that the ^ was found either between groups or within a DONTREPLACEME group.
Because lookaheads are zero-width assertions, only the ^ will actually be consumed in the event of a successful match.
Be aware that this will only work if delimiters are always properly balanced and groups are never nested within other groups.

If you are able to separate into groups that should be replaced and those that shouldn't, then instead of providing a single replacement string, you should be able to use a MatchEvaluator (a delegate that takes a Match and returns a string) to make the decision of which case it is currently dealing with and return the replacement string for that group alone.
You may also use an additional regex inside the MatchEvaluator. This solution produces the expected output:
Regex outer = new Regex(#"STARTREPLACEME.+ENDREPLACEME", RegexOptions.Compiled);
Regex inner = new Regex(#"\^", RegexOptions.Compiled);
string replaced = outer.Replace(start, m =>
{
return inner.Replace(m.Value, String.Empty);
});

Related

Extract groups with regex and construct URL in a single line

I am currently trying to extract values from a string and construct a URL that includes those values. I went through a dozen regex question, but I am not quite satisfied with the answers.
I have custom encoded strings with more than one information and I want to construct a new URL that contains those information.
For example 35afe06d-8393-4559-b6d7-74d35ce131d8|Master should become http://my-server/media/guid/35afe06d-8393-4559-b6d7-74d35ce131d8?v=Master. My first assumption was
var input = "35afe06d-8393-4559-b6d7-74d35ce131d8|Master"
var pattern = #"((?:[a-f0-9]+-?){5})|(\w+)"
var replacement = "http://my-server/media/guid/$1?v=$2"
var output = Regex.Replace(input, pattern, replacement)
However this replaces each group with the full URL. Limitation is, that I am not aware of input, pattern, replacement or output. pattern and replacement are two config values and I don't want to make it x pairs of config values, input comes from somewhere else in the application and could have any custom encoding (pipe, colon, ...) output depends on the use case. It can have any number of groups in the pattern and doesn't even have to be a URL in the end.
I can think of different ways to do this, like parsing the string myself, or trying to create a replacement dictionary, or using regex to find the groups and then string replace for $1 => match.Groups[0]. I just feel like there must be an obvious 1-liner solution for that in .NET since I even remember doing that in PHP.
Answer: It's not a .NET limitation, it was simply the unescaped pipe.
In your pattern (([a-f0-9]+-?){5})|\w+ the second group should be capturing the word characters after the pipe (escape the pipe to match it literally).
If you repeat this part ([a-f0-9]+-?) 5 times, the match could also end on a hyphen.
To match the values separated by the dash, you could match the character class [a-f0-9]+ and repeat matching that {4} times prepended by a -
([a-f0-9]+(?:-[a-f0-9]+){4})\|(\w+)
.NET Regex demo | C# demo
var input = "35afe06d-8393-4559-b6d7-74d35ce131d8|Master";
var pattern = #"([a-f0-9]+(?:-[a-f0-9]+){4})\|(\w+)";
var replacement = "http://my-server/media/guid/$1?v=$2";
var output = Regex.Replace(input, pattern, replacement);
Console.WriteLine(output);
Result
http://my-server/media/guid/35afe06d-8393-4559-b6d7-74d35ce131d8?v=Master
This expression might also work here:
^(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)\s*\|\s*(.*?)\s*$
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
Test
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"^(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)\s*\|\s*(.*?)\s*$";
string substitution = #"http://my-server/media/guid/\1?v=$2";
string input = #"35afe06d-8393-4559-b6d7-74d35ce131d8|Master
35afe06d-8393-4559-b6d7-74d35ce131d8| Master ";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
}
}
Reference
Searching for UUIDs in text with regex

Use RegEx to uppercase and lowercase the string

I am trying to convert a string to uppercase and lowercase based on the index.
My string is a LanguageCode like cc-CC where cc is the language code and CC is the country code. The user can enter in any format like "cC-Cc". I am using the regular expression to match whether the data is in the format cc-CC.
var regex = new Regex("^[a-z]{2}-[A-Z]{2}$", RegexOptions.IgnoreCase);
//I can use CultureInfos from .net framework and compare it's valid or not.
//But the requirement is it should allow invalid language codes also as long
//The enterd code is cc-CC format
Now when the user enters something cC-Cc I'm trying to lowercase the first two characters and then uppercase last two characters.
I can split the string using - and then concatenate them.
var languageDetails = languageCode.Split('-');
var languageCodeUpdated = $"{languageDetails[0].ToLowerInvariant()}-{languageDetails[1].ToUpperInvariant()}";
I thought can I avoid multiple strings creation and use RegEx itself to uppercase and lowercase accordingly.
While searching for the same I found some solutions to use \L and \U but I am not able to use them as the C# compiler showing error. Also, RegEx.Replace() has a parameter or delegate MatchEvaluator which I'm not able to understand.
Is there any way in C# we can use RegEx to replace uppercase with lowercase and vice versa.
.NET regex does not support case modifying operators.
You may use MatchEvaluator:
var result = Regex.Replace(s, #"(?i)^([a-z]{2})-([a-z]{2})$", m =>
$"{m.Groups[1].Value.ToLower()}-{m.Groups[2].Value.ToUpper()}");
See the C# demo.
Details
(?i) - the inline version of RegexOptions.IgnoreCase mopdiofier
^ - start of the string
([a-z]{2}) - Capturing group #1: 2 ASCII letters
- - a hyphen
([a-z]{2}) - Capturing group #2: 2 ASCII letters
$ - end of string.
TLDR: This is Regex.Replace with \U and \L support.
private static string EnhancedReplace(string input, string pattern, string replacement, RegexOptions options)
{
replacement = Regex.Replace(replacement, #"(?<mode>\\[UL])(?<group>\$((\d+)|({[^}]+})))", #"<!<mode:${mode}>%&${group}&%>");
var output = Regex.Replace(input, pattern, replacement, options);
output = Regex.Replace(output, #"<!<mode:\\L>%&(?<value>[\w\W]*?)&%>", x => x.Groups["value"].Value.ToLower());
output = Regex.Replace(output, #"<!<mode:\\U>%&(?<value>[\w\W]*?)&%>", x => x.Groups["value"].Value.ToUpper());
return output;
}
How To Use
Call the function with \U followed by the group to be uppercase
var result = EnhancedReplace(input, #"(public \w+ )(\w)", #"$1\U$2", RegexOptions.None);
Will replace this:
public string test12 { get; set; } = "test3";
With that:
public string Test12 { get; set; } = "test3";
Details
I'm currently working on an app which allows the user to define a batch of Regex Replace operations.
For example the user enters json and the batch converts it to a C#-Class.
Therefore, speed is no key requirement. But it would be very handy to be able to use \U and \L.
This method will apply Regex.Replace 3 times to the whole content and one time to the replacement string. Therefore it’s at least three times slower than Regex.Replace without \U \L support.
Step by Step
The first Regex.Replace enhances the replacement string.
It replaces: \U$1 with <!<mode:\\U>%&$1&%>
(Also works for named groups: ${groupName})
The new replacement will be applied to the content.
& 4. The inserted placeholder is now relatively unique. That allows you to search only for <!<mode:\\U>%&Actual Value&%> and use the MatchEvaluator to replace it with its uppercase version. The same will be done for \L
Regex101 Demo:
Step 1: Enhance pattern with placeholder
https://regex101.com/r/ZtqigN/1
Step 2 Use new replacement pattern
https://regex101.com/r/PWLTFD/1
Step 3&4 Resolve new placeholders
https://regex101.com/r/5DIIUo/1
Answer
var result = EnhancedReplace(input, #"(cc)(-)(cc)", #"\L$1$2\U$3", RegexOptions.IgnoreCase);

Regular Expression : Check string expression and then filter out value

Trying to figure out to match to a regular expression and then get a value from that string.
The string values would be something like this:
computerFileHardware20131211.pdf
computerFileSoftware20131322.pdf
computerFileEngineering20232.pdf
Regex regex = new Regex(#"computerFile[^[A-Za-z]+$]([^0-9]+)\.pdf");
Match match = regex.Match("computerFileHardware20131211.pdf");
if (match.Success)
{
Console.WriteLine(match.Value);
}
So what I'm trying to do is make sure I can match to the regular expression and then be able to filter out the number value. So for example for computerFileHardware20131211.pdf the number value would be 20131211.
I'm not very good a regular expressions. I think my first hurdle is figuring out the regular expression. I read somewhere that you put parenthesis around the string you want to filter out. So that is why i have ([^0-9]+).
try something like https://regex101.com/r/KWiAg0/1
Regex regex = new Regex(#"computerFile[A-Za-z]+([0-9]+)\.pdf");
Match match = regex.Match("computerFileHardware20131211.pdf");
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
}
Regular expressions can contains "subexpressions" that are enclosed in parentheses.
Every subexpression forms a group. With the Groups property you can access to the various groups captured by the regular expression.
If you only want to replace the number:
string fileName = "computerFileHardware20131211";
string pattern = "[0-9]{1,}";
string replacement = "123";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(fileName , replacement);

Regular expression match substring

I tried to create a regular expression which pulls everything that matches:
[aA-zZ]{2}[0-9]{5}
The problem is that I want to exclude from matching when I have eg. ABCD12345678
Can anyone help me resolve this?
EDIT1:
I am looking two letters and five digits in the string, but I want to exclude from matching when I have string like ABCD12345678, because when I use above regular expression it will return CD12345.
EDIT2:
I didn't check everything but I think I found answer:
WHEN field is null then field
WHEN fnRegExMatch(field, '[a-zA-Z]{2}[0-9]{5}') = 'N/A' THEN field
WHEN field like '%[^a-z][a-z][a-z][0-9][0-9][0-9][0-9][0-9][^0-9]%' or field like '[a-z][a-z][0-9][0-9][0-9][0-9][0-9][^0-9]%' THEN fnRegExMatch(field, '[a-zA-Z]{2}[0-9]{5}')
ELSE field
First [aA-zZ] haven't any sense, second use word boundaries:
\b[a-zA-Z]{2}[0-9]{5}\b
You could also use case insensitive modifier:
(?i)\b[a-z]{2}[0-9]{5}\b
According to your comment, it seems you may have underscore after the five digits. In this case, word boundary doesn't work, you have to use ths instead:
(?i)(?<![a-z])([a-z]{2}[0-9]{5})(?![0-9])
(?<![a-z]) is a negative lookbehind that assumes you haven't a letter before the two that are mandatory
(?![0-9]) is a negative lookahead that assumes you haven't a digit after the five that are mandatory
This would be the code, along with usage samples.
public static Regex regex = new Regex(
"\\b[a-zA-Z]{2}\\d{5}\\b",
RegexOptions.CultureInvariant
| RegexOptions.Compiled
);
//// Replace the matched text in the InputText using the replacement pattern
// string result = regex.Replace(InputText,regexReplace);
//// Split the InputText wherever the regex matches
// string[] results = regex.Split(InputText);
//// Capture the first Match, if any, in the InputText
// Match m = regex.Match(InputText);
//// Capture all Matches in the InputText
// MatchCollection ms = regex.Matches(InputText);
//// Test to see if there is a match in the InputText
// bool IsMatch = regex.IsMatch(InputText);
//// Get the names of all the named and numbered capture groups
// string[] GroupNames = regex.GetGroupNames();
//// Get the numbers of all the named and numbered capture groups
// int[] GroupNumbers = regex.GetGroupNumbers();

Regular Expression For JSON

I have a string -
xyz":abc,"lmn
I want to extract abc. what will be the regular expression for this ?
I am trying this -
/xyz\":(.*?),\"lmn/
But it is not fetching any result.
In c# you could use
var regex = new Regex(#"(?<=xyz\"":).*?(?=,\""lmn)");
var value = regex.Match(#"xyz"":abc,""lmn").Value;
Note this makes use of the c# verbatim string prefix # that means that \ is not treated as an escape character. You will need to use a double " so that a single " will be included in the string.
This regex makes use of prefix and suffix matching rules so that you can get the match without having to select the specific group from the result.
Alternatively you can use group matching
var regex=new Regex(#"xyz\"":(.*?),\""lmn");
var value = regex.Match(#"xyz"":abc,""lmn").Groups[1].Value;
You can check for the existence of a match by doing the following
var match = regex.Match(#"xyz"":abc,""lmn");
var isMatch = match.Success;
and then follow up with either match.Value or match.Groups[1].Value depending on which regex you used.
EDIT
Actually the escaping the " is not needed in a c# regex so you could use either of the following instead.
var regex = new Regex("(?<=xyz\":).*?(?=,\"lmn)");
var regex = new Regex("xyz\":(.*?),\"lmn");
These two do not use the verbatim string prefix, so the \" is translated into just " in the regex giving an a regex of (?<=xyz":).*?(?=,"lmn) or xyz":(.*?),"lmn
Additionally if the is an entire string match rather than a substring you would want one of the following.
var regex = new Regex("(?<=^xyz\":).*?(?=,\"lmn$)");
var regex = new Regex("^xyz\":(.*?),\"lmn$");

Categories

Resources