C# Regex.Replace - search for matches in replacement string as well - c#

Given the following:
string input = "xSxSx";
var result = Regex.Replace(input, "xSx", "xTx");
// result == "xTxSx"
It looks like the replacement string is not used for further matching, which is why we don't get the expected "xTxTx".
How to solve that? Is there some kind of special notation to tell the engine to find for a second match using the string with the first replacement already in place?

That's because, once a character is matched, it won't be matched again. So, 2nd time the string to match is Sx and not xSx.
You would need to use look-arounds here:
Regex.Replace(input, "(?<=x)S(?=x)", "T");
This will replace the S which is both preceded and succeeded by x with a T. Since look-arounds are 0-length assertion, they will not consume x.

Just use a loop for anything more complex that you can't simplify into a look[ahead/behind].
string test = "xSxSx";
string result = string.Empty;
while(true) {
result = Regex.Replace(test, "xSx", "xTx");
if(result != test)
test = result;
else
break;
}

Related

Regular expression split string, extract string value before and numeric value between square brackets

I need to parse a string that looks like "Abc[123]". The numerical value between the brackets is needed, as well as the string value before the brackets.
The most examples that I tested work fine, but have problems to parse some special cases.
This code seems to work fine for "normal" cases, but has some problems handling "special" cases:
var pattern = #"\[(.*[0-9])\]";
var query = "Abc[123]";
var numVal = Regex.Matches(query, pattern).Cast<Match>().Select(m => m.Groups[1].Value).FirstOrDefault();
var stringVal = Regex.Split(query, pattern)
.Select(x => x.Trim())
.FirstOrDefault();
How should the code be adjusted to handle also some special cases?
For instance for the string "Abc[]" the parser should return correctly "Abc" as the string value and indicate an empty the numeric value (which could be eventually defaulted to 0).
For the string "Abc[xy33]" the parser should return "Abc" as the string value and indicate an invalid numeric value.
For the string "Abc" the parser should return "Abc" as the string value and indicate a missing numeric value. The blanks before/after or inside the brackets should be trimmed "Abc [ 123 ] ".
Try this pattern: ^([^\[]+)\[([^\]]*)\]
Explanation of a pattern:
^ - match beginning of a string
([^\[]+) - match one or more of any character ecept [ and store it insinde first capturing group
\[ - match [ literally
([^\]]*) - match zero or more of any character except ] and store inside second capturing group
\] - match ] literally
Here's tested code:
var pattern = #"^([^\[]+)\[([^\]]*)\]";
var queries = new string[]{ "Abc[123]", "Abc[xy33]", "Abc[]", "Abc[ 33 ]", "Abc" };
foreach (var query in queries)
{
string beforeBrackets;
string insideBrackets;
var match = Regex.Match(query, pattern);
if (match.Success)
{
beforeBrackets = match.Groups[1].Value;
insideBrackets = match.Groups[2].Value.Trim();
if (insideBrackets == "")
insideBrackets = "0";
else if (!int.TryParse(insideBrackets, out int i))
insideBrackets = "incorrect value!";
}
else
{
beforeBrackets = query;
insideBrackets = "no value";
}
Console.WriteLine($"Input string {query} : before brackets: {beforeBrackets}, inside brackets: {insideBrackets}");
}
Console.ReadKey();
Output:
We can try doing a regex replacement on the input, for a one-liner solution:
string input = "Abc[123]";
string letters = Regex.Replace(input, "\\[.*\\]", "");
string numbers = Regex.Replace("Abc[123]", ".*\\[(\\d+)\\]", "$1");
Console.WriteLine(letters);
Console.WriteLine(numbers);
This prints:
Abc
123
Pretty sure there'd be some language-based techniques for that, which I wouldn't know, yet with a regular expression, we'd capture everything using capturing groups and check for things one by one, maybe:
^([A-Za-z]+)\s*(\[?)\s*([A-Za-z]*)(\d*)\s*(\]?)\s*$
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
You can achieve that easily without using regex
string temp = "Abc[123]";
string[] arr = temp.Split('[');
string name = arr[0];
string value = arr[1].ToString().TrimEnd(']');
output name = Abc, and value = 123

Exclude first and last quotation of string in regex result

I'm running a little c# program where I need to extract the escape-quoted words from a string.
Sample code from linqpad:
string s = "action = 0;\r\ndir = \"C:\\\\folder\\\\\";\r\nresult";
var pattern = "\".*?\"";
var result = Regex.Split(s, pattern);
result.Dump();
Input (actual input contains many more escaped even-number-of quotes):
"action = 0;\r\ndir = \"C:\\\\folder\\\\\";\r\nresult"
expected result
"C:\\folder\\"
actual result (2 items)
"action = 0;
dir = "
_____
";
result"
I get exactly the opposite of what I require. How can I make the regex ignore the starting (and ending) quote of the actual string? Why does it include them in the search? I've used the regex from similar SO questions but still don't get the intended result. I only want to filter by escape quotes.
Instead of using Regex.Split, try Regex.Match.
You don't need RegEx. Simply use String.Split(';') and the second array element will have the path you need. You can then Trim() it to get rid of the quotes and Remove() to get rid of the ndir part. Something like:
result = s.Split(';')[1].Trim("\r ".ToCharArray()).Remove(0, 7).Trim('"');

match pattern for startswith

I want to do a Regex Match in c# to check whether a string starts with part of pattern.
Say if the pattern is "ABC...GHI" then valid strings can be in the format "A","AB","ABCDEF","ABCXYXGHI"
This is a sample code. What exactly regex has to be in the pattern to make it work
string pattern = "ABC...GHI"
code = "A" //valid
code = "ABC" valid
code = "ABCDE" //valid
code = "ABCXXX" //valid
code = "ABCXXXGHI" //valid
code = "ABCXXXGHIAA" //invalid
code = "B" //invalid
Regex.IsMatch(code, pattern)
You can use ? and make optional part of regexp. The final regexp string could be
A(B(C(.(.(.(G(H(I?)?)?)?)?)?)?)?)?
The final string is quite messy but you can create it automatically
The visualization of above regexp is here http://www.regexper.com/#A(B(C(.(.(.(G(H(I%3F)%3F)%3F)%3F)%3F)%3F)%3F)%3F)%3F
Are you looking for something like this?
var pat = new Regex(#"^A(B(C(.(Z)?)?)?)?");
var testStrings = new string[]
{
"ALPHA",
"ABGOOF",
"ABCblah",
"ABCbZ",
"FOOBAR"
};
foreach (var s in testStrings)
{
var m = pat.Match(s);
if (m.Success)
{
Console.WriteLine("{0} matches {1}", s, m.Value);
}
else
{
Console.WriteLine("No match found for {0}", s);
}
}
Results from that are:
ALPHA matches A
ABGOOF matches AB
ABCblah matches ABCb
ABCbZ matches ABCbZ
No match found for FOOBAR
The key is that everything after the A is optional. So if you wanted strings that start with A or AB, you'd have:
AB?
If you wanted to add ABC, you need:
A(BC?)?
Another character:
A(B(CZ?)?)?
Messy, but you could write code to generate the expression automatically if you had to.
Additional info
It's possible that you want the strings to be no longer than the pattern, and all characters must match the pattern. That is, given the pattern I showed above, "ABCxZ" would be valid, but "ABCblah" would not be valid because the "lab" part doesn't match the pattern. If that's the case, then you need to add a "$" to the end of the pattern to say that the string ends there. So:
var pat = new Regex(#"^A(B(C(.(Z)?)?)?)?$");
Or, in your example case:
"^A(B(C(.(.(.(G(H(I)?)?)?)?)?)?)?)?$"

Compare String with Regex in C#

I have a list of string taken from a file. Some of this strings are in the format "Q" + number + "null" (e.g. Q98null, Q1null, Q24null, etc)
With a foreach loop I must check if a string is just like the one shown before.
I use this right now
string a = "Q9null" //just for testing
if(a.Contains("Q") && a.Contains("null"))
MessageBox.Show("ok");
but I'd like to know if there is a better way to do this with regex.
Thank you!
Your method will produce a lot of false positives - for example, it would recognize some invalid strings, such as "nullQ" or "Questionable nullability".
A regex to test for the match would be "^Q\\d+null$". The structure is very simple: it says that the target string must start in a Q, then one or more decimal digits should come, and then there should be null at the end.
Console.WriteLine(Regex.IsMatch("Q123null", "^Q\\d+null$")); // Prints True
Console.WriteLine(Regex.IsMatch("nullQ", "^Q\\d+null$")); // Prints False
Demo.
public static bool Check(string s)
{
Regex regex = new Regex(#"^Q\d+null$");
Match match = regex.Match(s);
return match.Success;
}
Apply the above method in your code:
string a = "Q9null" //just for testing
if(Check(a))
MessageBox.Show("ok");
First way: Using the Regex
Use this Regex ^Q\d+null$
Second way: Using the SubString
string s = "Q1123null";
string First,Second,Third;
First = s[0].ToString();
Second = s.Substring(1,s.Length-5);
Third = s.Substring(s.Length-4);
Console.WriteLine (First);
Console.WriteLine (Second);
Console.WriteLine (Third);
then you can check everything after this...

Replace all alphanumeric characters in a string except pattern

I'm trying to obfuscate a string, but need to preserve a couple patterns. Basically, all alphanumeric characters need to be replaced with a single character (say 'X'), but the following (example) patterns need to be preserved (note that each pattern has a single space at the beginning)
QQQ"
RRR"
I've looked through a few samples on negative lookahead/behinds, but still not haven't any luck with this (only testing QQQ).
var test = #"""SOME TEXT AB123 12XYZ QQQ""""empty""""empty""1A2BCDEF";
var regex = new Regex(#"((?!QQQ)(?<!\sQ{1,3}))[0-9a-zA-Z]");
var result = regex.Replace(test, "X");
The correct result should be:
"XXXX XXXX XXXXX XXXXX QQQ""XXXXX""XXXXX"XXXXXXXX
This works for an exact match, but will fail with something like ' QQR"', which returns
"XXXX XXXX XXXXX XXXXX XQR""XXXXX""XXXXX"XXXXXXXX
You can use this:
var regex = new Regex(#"((?> QQQ|[^A-Za-z0-9]+)*)[A-Za-z0-9]");
var result = regex.Replace(test, "$1X");
The idea is to match all that must be preserved first and to put it in a capturing group.
Since the target characters are always preceded by zero or more things that must be preserved, you only need to write this capturing group before [A-Za-z0-9]
Here's a non-regex solution. Works quite nice, althought it fails when there is one pattern in an input sequence more then once. It would need a better algorithm fetching occurances. You can compare it with a regex solution for a large strings.
public static string ReplaceWithPatterns(this string input, IEnumerable<string> patterns, char replacement)
{
var patternsPositions = patterns.Select(p =>
new { Pattern = p, Index = input.IndexOf(p) })
.Where(i => i.Index > 0);
var result = new string(replacement, input.Length);
if (!patternsPositions.Any()) // no pattern in the input
return result;
foreach(var p in patternsPositions)
result = result.Insert(p.Index, p.Pattern); // return patterns back
return result;
}

Categories

Resources