.NET regex replace using backreference - c#

I have a fairly long string that contains sub strings with the following format:
project[1]/someword[1]
project[1]/someotherword[1]
There will be about 10 or so instances of this pattern in the string.
What I want to do is to be able to replace the second integer in square brackets with a different one. So the string would look like this for instance:
project[1]/someword[2]
project[1]/someotherword[2]
I''m thinking that regular expressions are what I need here. I came up with the regex:
project\[1\]/.*\[([0-9])\]
Which should capture the group [0-9] so I can replace it with something else. I'm looking at MSDN Regex.Replace() but I'm not seeing how to replace part of a string that is captured with a value of your choosing. Any advice on how to accomplish this would be appreciated. Thanks much.
*Edit: * After working with #Tharwen some I have changed my approach a bit. Here is the new code I am working with:
String yourString = String yourString = #"<element w:xpath=""/project[1]/someword[1]""/> <anothernode></anothernode> <another element w:xpath=""/project[1]/someotherword[1]""/>";
int yourNumber = 2;
string anotherString = string.Empty;
anotherString = Regex.Replace(yourString, #"(?<=project\[1\]/.*\[)\d(?=\]"")", yourNumber.ToString());

Matched groups are replaced using the $1, $2 syntax as follows :-
csharp> Regex.Replace("Meaning of life is 42", #"([^\d]*)(\d+)", "$1($2)");
"Meaning of life is (42)"
If you are new to regular expressions in .NET I recommend http://www.ultrapico.com/Expresso.htm
Also http://www.regular-expressions.info/dotnet.html has some good stuff for quick reference.

I've adapted yours to use a lookbehind and lookahead to only match a digit which is preceded by 'project[1]/xxxxx[' and followed by ']':
(?<=project\[1\]/.*\[)\d(?=\]")
Then, you can use:
String yourString = "project[1]/someword[1]";
int yourNumber = 2;
yourString = Regex.Replace(yourString, #"(?<=project\[1\]/.*\[)\d(?=\]"")", yourNumber.ToString());
I think maybe you were confused because Regex.Replace has lots of overloads which do slightly different things. I've used this one.

If you want to process the value of a captured group before replacing it, you'll have to separate the different parts of the string, make your modifications and put them back together.
string test = "project[1]/someword[1]\nproject[1]/someotherword[1]\n";
string result = string.Empty;
foreach (Match match in Regex.Matches(test, #"(project\[1\]/.*\[)([0-9])(\]\n)"))
{
result += match.Groups[1].Value;
result += (int.Parse(match.Groups[2].Value) + 1).ToString();
result += match.Groups[3].Value;
}
If you just want to replace text verbatim, it's easier: Regex.Replace(test, #"abc(.*)cba", #"cba$1abc").

you can use String.Replace (String, String)
for example
String.Replace ("someword[1]", "someword[2]")

Related

Passing a dynamic value to quantifier in C# regex

I have a regex that I am trying to pass a variable to:
int i = 0;
Match match = Regex.Match(strFile, "(^.{i})|(godness\\w+)(?<=\\2(\\d+).*?\\2)(\\d+)");
I'd like the regex engine to parse {i} as the number that the i variable holds.
The way I am doing that does not work as I get no matches when the text contains matching substrings.
It is not clear what strings you want to match with your regex, but if you need to use a vriable in the pattern, you can easily use string interpolation inside a verbatim string literal. Verbatim string literals are preferred when declaring regex patterns in order to avoid overescaping.
Since string interpolation was introduced in C#6.0 only, you can use string.Format:
string.Format(#"(^.{{{0}}})|(godness\w+)(?<=\2(\d+).*?\2)(\d+)", i)
Else, beginning with C#6.0, this seems a better alternative:
int i = 0;
Match match = Regex.Match(strFile, $#"(^.{{{i}}})|(godness\w+)(?<=\2(\d+).*?\2)(\d+)");
The regex pattern will look like
(^.{0})|(godness\w+)(?<=\2(\d+).*?\2)(\d+)
^^^
You may try this Concept, where you may use i as parameter and put any value of i.
int i = 0;
string Value =string.Format("(^.{0})|(godness\\w+)(?<=\\2(\\d+).*?\\2)(\\d+)",i);
Match match = Regex.Match(strFile, Value);

match first digits before # symbol

How to match all first digits before # in this line
26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html
Im trying to get this number 26909578
My try
string text = #"26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html";
MatchCollection m1 = Regex.Matches(text, #"(.+?)#", RegexOptions.Singleline);
but then its outputs all text
Make it explicit that it has to start at the beginning of the string:
#"^(.+?)#"
Alternatively, if you know that this will always be a number, restrict the possible characters to digits:
#"^\d+"
Alternatively use the function Match instead of Matches. Matches explicitly says, "give me all the matches", while Match will only return the first one.
Or, in a trivial case like this, you might also consider a non-RegEx approach. The IndexOf() method will locate the '#' and you could easily strip off what came before.
I even wrote a sscanf() replacement for C#, which you can see in my article A sscanf() Replacement for .NET.
If you dont want to/dont like to use regex, use a string builder and just loop until you hit the #.
so like this
StringBuilder sb = new StringBuilder();
string yourdata = "yourdata";
int i = 0;
while(yourdata[i]!='#')
{
sb.Append(yourdata[i]);
i++;
}
//when you get to that # your stringbuilder will have the number you want in it so return it with .toString();
string answer = sb.toString();
The entire string (except the final url) is composed of segments that can be matched by (.+?)#, so you will get several matches. Retrieve only the first match from the collection returned by matching .+?(?=#)

How can I use RegEx (Or Should I) to extract a string between the starting string '__' and ending with '__' or 'nothing'

RegEx has always confused me.
I have a string like this:
IDE\DiskDJ205GA20_____________________________A3VS____\5&1003ca0&0&0.0.0
Or Sometimes stored like this:
IDE\DiskSJ305GA23_____________________________PG33S\6&2003Sa0&0&0.0.0
I want to get the 'A3VS' or 'PG33S' string. It's my firmware and is varied in length and type. I used to use:
string[] split = PNP.Split('\\'); //where PHP is my string name
var start = split[1].LastIndexOf('_');
string mystring = split[1].Substring(start + 1);
But that only works for strings that don't end with __ after the firmware string. I noticed that some have an additional random '_' after it.
Is RegEx the way to solve this? Or is there another way better
just without RegEx it can be expressed like this:
var firmware = PNP.Split(new[] {'_'}, StringSplitOptions.RemoveEmptyEntries)[1].Split('\\')[0];
string s = split[1].TrimEnd('_');
string mystring = s.Substring(s.LastIndexOf('_') + 1);
If you want the RegEX way to do it here it is:
Regex regex = new Regex(#"\\.*_+(?<firmware>[A-Za-z0-9]+)_*\\");
var m1 = regex.Match("IDE\DiskSJ305GA23_____________________________PG33S\6&2003Sa0&0&0.0.0");
var g1 = m1.Groups["firmware"].Value;
//g1 == "PG33S"
Keep in mind you have to use [A-Za-z0-9] instead of \w in the capture subexpression since \w also matches an underscore (_).

regex and string

Consider the following:
string keywords = "(load|save|close)";
Regex x = new Regex(#"\b"+keywords+"\b");
I get no matches. However, if I do this:
Regex x = new Regex(#"\b(load|save|close)\b");
I get matches. How come the former doesn't work, and how can I fix this? Basically, I want the keywords to be configurable so I placed them in a string.
The last \b in the first code snippet needs a verbatim string specifier (#) in front of it as well as it is a seperate string instance.
string keywords = "(load|save|close)";
Regex x = new Regex(#"\b"+keywords+#"\b");
You're missing another verbatim string specifier (# prefixed to the last \b):
Regex x = new Regex(#"\b" + keywords + #"\b");
Regex x = new Regex(#"\b"+keywords+#"\b");
You forgot additional # before second "\b"

Find and Insert

I have a string that looks like (the * is literal):
clp*(seven digits)1*
I want to change it so that it looks like:
clp*(seven digits)(space)(space)1*
I'm working in C# and built my search pattern like this:
Regex regAddSpaces = new Regex(#"CLP\*.......1\*");
I'm not sure how to tell regex to keep the first 11 characters, add two spaces and then cap it with 1*
Any help is appreciated.
No need to use regex here. Simple string manipulation will do the job perfectly well.
var input = "clp*01234561*";
var output = input.Substring(0, 11) + " " + input.Substring(11, 2);
I agree with Noldorin. However, here's how you could do it with regular expressions if you really wanted:
var result = Regex.Replace("clp*12345671*", #"(clp\*\d{7})(1\*)", #"$1 $2");
If you just want to replace this anywhere in the text you can use the excluded prefix and suffix operators...
pattern = "(?<=clp*[0-9]{7})(?=1*)"
Handing this off to the regex replace with the replacement value of " " will insert the spaces.
Thus, the following one-liner does the trick:
string result = Regex.Replace(inputString, #"(?<=clp\*[0-9]{7})(?=1\*)", " ", RegexOptions.IgnoreCase);
Here is the regex, but if your solution is as simple as you stated above Noldorin's answer would be a clearer and more maintainable solution. But since you wanted regex... here you go:
// Not a fan of the 'out' usage but I am not sure if you care about the result success
public static bool AddSpacesToMyRegexMatch(string input, out string output)
{
Regex reg = new Regex(#"(^clp\*[0-9]{7})(1\*$)");
Match match = reg.Match(input);
output = match.Success ?
string.Format("{0} {1}", match.Groups[0], match.Groups[1]) :
input;
return match.Success;
}

Categories

Resources