How to rewrite a string by pattern - c#

I have a string, where the "special areas" are enclosed in curly braces:
{intIncG}/{intIncD}/02-{yy}
I need to iterate through all of these elements inbetween {} and replace them based on their content. What is the best code structure to do it in C#?
I can't just do a replace since I need to know the index of each "speacial area {}" in order to replace it with the correct value.

Regex rgx = new Regex( #"\({[^\}]*\})");
string output = rgx.Replace(input, new MatchEvaluator(DoStuff));
static string DoStuff(Match match)
{
//Here you have access to match.Index, and match.Value so can do something different for Match1, Match2, etc.
//You can easily strip the {'s off the value by
string value = match.Value.Substring(1, match.Value.Length-2);
//Then call a function which takes value and index to get the string to pass back to be susbstituted
}

string.Replace will do just fine.
var updatedString = myString.Replace("{intIncG}", "something");
Do once for every different string.
Update:
Since you need the index of { in order to produce the replacement string (as you commented), you can use Regex.Matches to find the indices of { - each Match object in the Matches collection will include the index in the string.

Use Regex.Replace:
Replaces all occurrences of a character pattern defined by a regular expression with a specified replacement character string.
from msdn

You can define a function and join it's output -- so you'll only need to traverse the parts once and not for every replace rule.
private IEnumerable<string> Traverse(string input)
{
int index = 0;
string[] parts = input.Split(new[] {'/'});
foreach(var part in parts)
{
index++;
string retVal = string.Empty;
switch(part)
{
case "{intIncG}":
retVal = "a"; // or something based on index!
break;
case "{intIncD}":
retVal = "b"; // or something based on index!
break;
...
}
yield return retVal;
}
}
string replaced = string.Join("/", Traverse(inputString));

Related

Replace regular expression with regular expression

Consider two regular expressions:
var regex_A = "Main\.(.+)\.Value";
var regex_B = "M_(.+)_Sp";
I want to be able to replace a string using regex_A as input, and regex_B as the replacement string. But also the other way around. And without supplying additional information like a format string per regex.
Specifically I want to create a replaced_B string from an input_A string. So:
var input_A = "Main.Rotating.Value";
var replaced_B = input_A.RegEx_Awesome_Replace(regex_A, regex_B);
Assert.AreEqual("M_Rotating_Sp", replaced_B);
And this should also work in reverse (thats the reason i can't use a simple string.format for regex_B). Because I don't want to supply a format string for every regular expression (i'm lazy).
var input_B = "M_Skew_Sp";
var replaced_A = input_B.RegEx_Awesome_Replace(regex_B, regex_A);
Assert.AreEqual("Main.Skew.Value", replaced_A);
I have no clue if this exists, or how to call it. Google search finds me all kinds of other regex replaces... not this one.
Update:
So basically I need a way to convert a regular expression to a format string.
var regex_A_format = Regex2Format(regex_A);
Assert.AreEqual("Main.$1.Value", regex_A_format);
and
var regex_B_format = Regex2Format(regex_B);
Assert.AreEqual("M_$1_Sp", regex_B_format);
So what should the RegEx_Awesome_Replace and/or Regex2Format function look like?
Update 2:
I guess the RegEx_Awesome_Replace should look something like (using some code from answers below):
public static class StringExtenstions
{
public static string RegExAwesomeReplace(this string inputString,string searchPattern,string replacePattern)
{
return Regex.Replace(inputString, searchPattern, Regex2Format(replacePattern));
}
}
Which would leave the Regex2Format as an open question.
There is no defined way for one regex to refer to a match found in another regex. Regexes are not format strings.
What you can do is to use Tuples of a format string together with its regex. e.g.
var a = new Tuple<Regex,string>(new Regex(#"(?<=Main\.).+(?=\.Value)"), #"Main.{0}.Value")
var b = new Tuple<Regex,string>(new Regex(#"(?<=M_).+(?=_Sp)"), #"M_{0}_Sp")`
Then you can pass these objects to a common replacement method in any order, like this:
private string RegEx_Awesome_Replace(string input, Tuple<Regex,string> toFind, Tuple<Regex,string> replaceWith)
{
return string.Format(replaceWith.Item2, toFind.Item1.Match(input).Value);
}
You will notice that I have used zero-width positive lookahead assertion and zero-width positive lookbehind assertions in my regexes, to ensure that Value contains exactly the text that I want to replace.
You may also want to add error handling, for cases where the match can not be found. Maybe read about Regex.Match
Since you have already reduced your problem to where you need to change a Regex into a string format (implementing Regex2Format) I will focus my answer just on that part. Note that my answer is incomplete because it doesn't address the full breadth of parsing regex capturing groups, however it works for simple cases.
First thing needed is a Regex that will match Regex capture groups. There is a negative lookbehind to not match escaped bracket symbols. There are other cases that break this regex. E.g. a non-capturing group, wildcard symbols, things between square braces.
private static readonly Regex CaptureGroupMatcher = new Regex(#"(?<!\\)\([^\)]+\)");
The implementation of Regex2Format here basically writes everything outside of capture groups into the output string, and replaces the capture group value by {x}.
static string Regex2Format(string pattern)
{
var targetBuilder = new StringBuilder();
int previousEndIndex = 0;
int formatIndex = 0;
foreach (Match match in CaptureGroupMatcher.Matches(pattern))
{
var group = match.Groups[0];
int endIndex = group.Index;
AppendPart(pattern, previousEndIndex, endIndex, targetBuilder);
targetBuilder.Append('{');
targetBuilder.Append(formatIndex++);
targetBuilder.Append('}');
previousEndIndex = group.Index + group.Length;
}
AppendPart(pattern, previousEndIndex, pattern.Length, targetBuilder);
return targetBuilder.ToString();
}
This helper function writes pattern string values into the output, it currently writes everything except \ characters used to escape something.
static void AppendPart(string pattern, int previousEndIndex, int endIndex, StringBuilder targetBuilder)
{
for (int i = previousEndIndex; i < endIndex; i++)
{
char c = pattern[i];
if (c == '\\' && i < pattern.Length - 1 && pattern[i + 1] != '\\')
{
//backslash not followed by another backslash - it's an escape char
}
else
{
targetBuilder.Append(c);
}
}
}
Test cases
static void Test()
{
var cases = new Dictionary<string, string>
{
{ #"Main\.(.+)\.Value", #"Main.{0}.Value" },
{ #"M_(.+)_Sp(.*)", "M_{0}_Sp{1}" },
{ #"M_\(.+)_Sp", #"M_(.+)_Sp" },
};
foreach (var kvp in cases)
{
if (PatternToStringFormat(kvp.Key) != kvp.Value)
{
Console.WriteLine("Test failed for {0} - expected {1} but got {2}", kvp.Key, kvp.Value, PatternToStringFormat(kvp.Key));
}
}
}
To wrap up, here is the usage:
private static string AwesomeRegexReplace(string input, string sourcePattern, string targetPattern)
{
var targetFormat = PatternToStringFormat(targetPattern);
return Regex.Replace(input, sourcePattern, match =>
{
var args = match.Groups.OfType<Group>().Skip(1).Select(g => g.Value).ToArray<object>();
return string.Format(targetFormat, args);
});
}
Something like this might work
var replaced_B = Regex.Replace(input_A, #"Main\.(.+)\.Value", #"M_$1_Sp");
Are you looking for something like this?
public static class StringExtenstions
{
public static string RegExAwesomeReplace(this string inputString,string searchPattern,string replacePattern)
{
Match searchMatch = Regex.Match(inputString,searchPattern);
Match replaceMatch = Regex.Match(inputString, replacePattern);
if (!searchMatch.Success || !replaceMatch.Success)
{
return inputString;
}
return inputString.Replace(searchMatch.Value, replaceMatch.Value);
}
}
The string extension method returns the string with replaced value for search pattern and replace pattern.
This is how you call:
input_A.RegEx_Awesome_Replace(regex_A, regex_B);

Replace Multiple References of a pattern with Regex

I have a string which is in the following form
$KL\U#, $AS\gehaeuse#, $KL\tol_plus#, $KL\tol_minus#
Basically this string is made up of the following parts
$ = Delimiter Start
(Some Text)
# = Delimiter End
(all of this n times)
I would now like to replace each of these sections with some meaningful text. Therefore I need to extract these sections, do something based on the text inside each section and then replace the section with the result. So the resulting string should look something like this:
12V, 0603, +20%, -20%
The commas and everything else that is not contained within the section stays as it is, the sections get replaced by meaningful values.
For the question: Can you help me with a Regex pattern that finds out where these sections are so I can replace them?
You need to use the Regex.Replace method and use a MatchEvaluator delegate to decide what the replacement value should be.
The pattern you need can be $ then anything except #, then #. We put the middle bit in brackets so it is stored as a separate group in the result.
\$([^#]+)#
The full thing can be something like this (up to you to do the correct appropriate replacement logic):
string value = #"$KL\U#, $AS\gehaeuse#, $KL\tol_plus#, $KL\tol_minus#";
string result = Regex.Replace(value, #"\$([^#]+)#", m =>
{
// This method takes the matching value and needs to return the correct replacement
// m.Value is e.g. "$KL\U#", m.Groups[1].Value is the bit in ()s between $ and #
switch (m.Groups[1].Value)
{
case #"KL\U":
return "12V";
case #"AS\gehaeuse":
return "0603";
case #"KL\tol_plus":
return "+20%";
case #"KL\tol_minus":
return "-20%";
default:
return m.Groups[1].Value;
}
});
As far as matching the pattern, you're wanting:
\$[^#]+#
The rest of your question isn't very clear. If you need to replace the original string with some meaningful values, just loop through your matches:
var str = #"$KL\U#, $AS\gehaeuse#, $KL\tol_plus#, $KL\tol_minus#";
foreach (Match match in Regex.Matches(str, #"\$[^#]+#"))
{
str = str.Replace(match.ToString(), "something meaningful");
}
beyond that you'll have to provide more context
are you sure you don't want to do just plain string manipulations?
var str = #"$KL\U#, $AS\gehaeuse#, $KL\tol_plus#, $KL\tol_minus#";
string ReturnManipulatedString(string str)
{
var list = str.split("$");
string newValues = string.Empty;
foreach (string st in str)
{
var temp = st.split("#");
newValues += ManipulateStuff(temp[0]);
if (0 < temp.Count();
newValues += temp[1];
}
}

replacing characters in a single field of a comma-separated list

I have string in my c# code
a,b,c,d,"e,f",g,h
I want to replace "e,f" with "e f" i.e. ',' which is inside inverted comma should be replaced by space.
I tried using string.split but it is not working for me.
OK, I can't be bothered to think of a regex approach so I am going to offer an old fashioned loop approach which will work:
string DoReplace(string input)
{
bool isInner = false;//flag to detect if we are in the inner string or not
string result = "";//result to return
foreach(char c in input)//loop each character in the input string
{
if(isInner && c == ',')//if we are in an inner string and it is a comma, append space
result += " ";
else//otherwise append the character
result += c;
if(c == '"')//if we have hit an inner quote, toggle the flag
isInner = !isInner;
}
return result;
}
NOTE: This solution assumes that there can only be one level of inner quotes, for example you cannot have "a,b,c,"d,e,"f,g",h",i,j" - because that's just plain madness!
For the scenario where you only need to match one pair of letters, the following regex will work:
string source = "a,b,c,d,\"e,f\",g,h";
string pattern = "\"([\\w]),([\\w])\"";
string replace = "\"$1 $2\"";
string result = Regex.Replace(source, pattern, replace);
Console.WriteLine(result); // a,b,c,d,"e f",g,h
Breaking apart the pattern, it is matching any instance where there is a "X,X" sequence where X is any letter, and is replacing it with the very same sequence, with a space in between the letters instead of a comma.
You could easily extend this if you needed to to have it match more than one letter, etc, as needed.
For the case where you can have multiple letters separated by commas within quotes that need to be replaced, the following can do it for you. Sample text is a,b,c,d,"e,f,a",g,h:
string source = "a,b,c,d,\"e,f,a\",g,h";
string pattern = "\"([ ,\\w]+),([ ,\\w]+)\"";
string replace = "\"$1 $2\"";
string result = source;
while (Regex.IsMatch(result, pattern)) {
result = Regex.Replace(result, pattern, replace);
}
Console.WriteLine(result); // a,b,c,d,"e f a",g,h
This does something similar compared to the first one, but just removes any comma that is sandwiched by letters surrounded by quotes, and repeats it until all cases are removed.
Here's a somewhat fragile but simple solution:
string.Join("\"", line.Split('"').Select((s, i) => i % 2 == 0 ? s : s.Replace(",", " ")))
It's fragile because it doesn't handle flavors of CSV that escape double-quotes inside double-quotes.
Use the following code:
string str = "a,b,c,d,\"e,f\",g,h";
string[] str2 = str.Split('\"');
var str3 = str2.Select(p => ((p.StartsWith(",") || p.EndsWith(",")) ? p : p.Replace(',', ' '))).ToList();
str = string.Join("", str3);
Use Split() and Join():
string input = "a,b,c,d,\"e,f\",g,h";
string[] pieces = input.Split('"');
for ( int i = 1; i < pieces.Length; i += 2 )
{
pieces[i] = string.Join(" ", pieces[i].Split(','));
}
string output = string.Join("\"", pieces);
Console.WriteLine(output);
// output: a,b,c,d,"e f",g,h

Split function in c#

I have a program which accepts a url(example:care.org), gets the page source of the url and does some calculation.
string text = <the page source of care.org>
string separator = "car";
var cnt = text.ToLower().Split(separator,StringSplitOptions.None);
My aim is to count the number of occurence of the "car" in the page source,
My code considers care as 'car'|'e' it splits it this way.. But i want it to consider whole seperator as one and do the splittin
Please help me with this
You should use reular expressions instead of split() method:
Regex regex = new Regex(#"\bcar\b"); // you should modify it if `car's` needed
Match match = regex.Match(text);
int cnt = 0;
while (match.Success)
{
cnt++;
match = match.NextMatch();
}
// here you get count of `car` in `cnt`
This is how can achieve what you want by using RegularExpressions:
string text = "the page source of care.org";
string separator = #"\bcar\b";
MatchCollection resultsarray = Regex.Matches(text, separator);
Now resultsarray contains your matches. You can count it using
resultsarray.Count
Split returns a string array, you could just count the results.
var cnt = text.ToLower().Split(separator,StringSplitOptions.None).count;
I dont think you need to split, since you are not going to do anything with the substring. You only want a count, so look in to using RegEx.Matches(text, "car[^a-zA-Z0-9]") or similar to define the patterns you are interested in. Good luck!

Regex: only replace non-nested matches

Given text such as:
This is my [position].
Here are some items:
[items]
[item]
Position within the item: [position]
[/item]
[/items]
Once again, my [position].
I need to match the first and last [position], but not the [position] within [items]...[/items]. Is this doable with a regular expression? So far, all I have is:
Regex.Replace(input, #"\[position\]", "replacement value")
But that is replacing more than I want.
As Wug mentioned, regular expressions aren't great at counting. An easier option would be to just find the locations of all of the tokens you're looking for, and then iterate over them and construct your output accordingly. Perhaps something like this:
public string Replace(input, replacement)
{
// find all the tags
var regex = new Regex("(\[(?:position|/?item)\])");
var matches = regex.Matches(input);
// loop through the tags and build up the output string
var builder = new StringBuilder();
int lastIndex = 0;
int nestingLevel = 0;
foreach(var match in matches)
{
// append everything since the last tag;
builder.Append(input.Substring(lastIndex, (match.Index - lastIndex) + 1));
switch(match.Value)
{
case "[item]":
nestingLevel++;
builder.Append(match.Value);
break;
case "[/item]":
nestingLevel--;
builder.Append(match.Value);
break;
case "[position]":
// Append the replacement text if we're outside of any [item]/[/item] pairs
// Otherwise append the tag
builder.Append(nestingLevel == 0 ? replacement : match.Value);
break;
}
lastIndex = match.Index + match.Length;
}
builder.Append(input.Substring(lastIndex));
return builder.ToString();
}
(Disclaimer: Have not tested. Or even attempted to compile. Apologies in advance for inevitable bugs.)
You could maaaaaybe get away with:
Regex.Replace(input,#"(?=\[position\])(!(\[item\].+\[position\].+\[/item\]))","replacement value");
I dunno, I hate ones like this. But this is a job for xml parsing, not regex. If your brackets are really brackets, just search and replace them with carrots, then xml parse.
What if you check it twice. Like,
s1 = Regex.Replace(input, #"(\[items\])(\w|\W)*(\[\/items\])", "")
This will give you the:
This is my [position].
Here are some items:
Once again, my [position].
As you can see the items section is extracted. And then on s1 you can extract your desired positions. Like,
s2 = Regex.Replace(s1, #"\[position\]", "raplacement_value")
This might not be the best solution. I tried very hard to solve it on regex but not successful.

Categories

Resources