C# How to replace a shorter string than the matched string? - c#

How can I replace only a part of a matched regex string ? I need to find some strings that are inside of some brackets like < >. In this example I need to match 23 characters and replace only 3 of them:
string input = "<tag abc=\"hello world\"> abc=\"whatever\"</tag>";
string output = Regex.Replace(result, ???, "def");
// wanted output: <tag def="hello world"> abc="whatever"</tag>
So I either need to find abc in <tag abc="hello world"> or find <tag abc="hello world"> and replace just abc. Do regular expressions or C# allow that ? And even if I solve the problem differently is it possible to match a big string but replace only a little part of it ?

I'd have to look up the #NET regex dialect, but in general you want to capture the parts you don't want to replace and refer to them in your replacement string.
string output = Regex.Replace(input, "(<tag )abc(=\"hello world\">)", "$1def$2");
Another option would be to use lookaround to match "abc" where it follows "<tag " and precedes "="hello world">"
string output = Regex.Replace(input, "(?<=<tag )abc(?==\"hello world\")", "def");

Instead of Regex.Replace use Regex.Match, then you can use the properties on the Match object to figure out where the match occurred.. then the regular string functions (String.Substring) can be used to replace the bit you want replaced.

Working sample with named groups:
string input = #"<tag abc=""hello world""> abc=whatever</tag>";
Regex regex = new Regex(#"<(?<Tag>\w+)\s+(?<Attr>\w+)=.*?>.*?</\k<Tag>>");
string output = regex.Replace(input, match =>
{
var attr = match.Groups["Attr"];
var value = match.Value;
var left = value.Substring(0, attr.Index);
var right = value.Substring(attr.Index + attr.Length);
return left + attr.Value.Replace("abc", "def") + right;
});

Related

Text between 2 optional strings with OR condition using Regex

I have a string with 2 possibilities:
var desc = "Keyword1: That text I want \r\n Keyword2: Value2 \r\n Keyword3: Value3 \r\n Keyword4: Value4"
var desc = "Keyword1: That text I want Keyword2: Value2 \r\n Keyword3: Value3 \r\n Keyword4: Value4"
where the order of the keywords after the text "That text I want" Keyword2, Keyword3, Keyword4 doesn't matter and they are all optional.
I tried with the Regex Keyword1:(\s+)(.*)(\W+?)(\r\n?)(?=Keyword2:|Keyword3:|Keyword4:)
It does not work. Not sure what is wrong in my regex.
Any help is highly appreciated.
Thanks in advance!
Show here for the solution.
In your case you could simply use (regex between two strings):
(?<=Keyword1:)(.*)(?=Keyword2)
Try it out
Hope it helps.
Assuming those \r\n are actual special characters in the string and not the literals, this should work:
Keyword1: (.*?)(Keyword2:|Keyword3:|Keyword4:|\r\n)
You need to get the second grouping from the match. For example: match.Groups[1].
This regex matches Keyword1:, followed by the minimum amount of necessary characters, and then followed by either Keyword2: or \r\n (special characters). If those are literals in your input string, you will need to double those backslashes.
You can check it here. Note that on the right, Group 1 contains your text in both cases.
var pattern = keywordName + #":\s+(.+?)\r?\n";
var regex = new Regex(pattern);
var match = regex.Match(description);
if (!match.Success) return null;
var firstMatch = match.Groups[1].Value;
//Find if there's another keyword in the extracted Value
var lstKeywords = Enum.GetValues(typeof(Keywords)).Cast<Keywords>().Where(k => k != keywordName);
//Add : to the last value so that it's recognized as a keyword
var sOtherKeywords = string.Join(":|", lstKeywords) + ":";
var pattern2 = #"(" + sOtherKeywords + #")(\s+)";
regex = new Regex(pattern2);
match = regex.Match(firstMatch);
//If there's no other keyword in the same line then return the expression that is extracted from the first regex
if (!match.Success) return firstMatch;
var secondMatch = match.Groups[1].Value;
var pattern3 = keywordName + #":\s+(.+)(\r?\n?)" + secondMatch;
regex = new Regex(pattern3);
match = regex.Match(description);
return match.Success ? match.Groups[1].Value.TrimEnd() : null;

Replacing a portion of a string with an exact matching

I just want to replace a portion of a string only if matches the given text.
My use case is as follows:
var text = "<wd:response><wd:response-data></wd:response-data></wd:response >";
string result = text.Replace("wd:response", "response");
/*
* expecting the below text
<response><wd:response-data></wd:response-data></response>
*
*/
I followed the following answers:
Way to have String.Replace only hit "whole words"
Regular expression for exact match of a string
But I failed to achieve what I want.
Please share your thoughts/solutions.
Sample on
https://dotnetfiddle.net/pMkO8Q
In general, you should really be parsing and manipulating XML as XML, using functions that know how XML works and what's legal in the language. Regex and other naive text manipulation will often lead you into trouble.
That said, for a very simple solution to this specific problem, you can do this with two replaces:
var text = "<wd:response><wd:response-data></wd:response-data></wd:response >";
text.Replace("wd:response>", "response>").Replace("wd:response ", "response ")
(Note the spaces at the end of the parameters to the second replace.)
Alternatively use a regex similar to "wd:response\s*>"
The easiest way to achieve your result as per your .net fiddle is use the replace as below.
string result = text.Replace("wd:response>", "response>");
But proper way to achieve this is parsing using XML
You can capture the string wd-response in a capturing group and replace using Regex.Replace using the MatchEvaluator like this.
Regex explanation - <[/]?(wd:response)[\s+]?>
Match < literally
Match / optionally hence the ?
Match the string wd:response and place it in a capturing group enclosed with ()
Match one or more optional whitespace [\s+]?
Match > literally
public class Program
{
public static void Main(string[] args)
{
string text = "<wd:response><wd:response-data></wd:response-data></wd:response >";
string replacePattern = "response";
string pattern = #"<[/]?(wd:response)[\s+]?>";
string replacedPattern = Regex.Replace(text, pattern, match =>
{
// Extract the first group
Group group = match.Groups[1];
// Replace the group value with the replacePattern
return string.Format("{0}{1}{2}", match.Value.Substring(0, group.Index - match.Index), replacePattern, match.Value.Substring(group.Index - match.Index + group.Length));
});
Console.WriteLine(replacedPattern);
}
}
Outputting:
<response><wd:response-data></wd:response-data></response >

Regex - extract rest of string after specific sequence

I have a long string with random letters, numbers, and spaces.
I need a regex expression to pull out the part of the string after the sequence of characters and numbers --> AQ102.
For example :
string t = "kjdsjsk158dfdd 125.196.168.210helloAQ102Lab101 section2";
desired output:
Lab101 section2
Why not use
string s = t.Split("AQ102").Last();
Or, a regular expression as originally asked for:
Regex regEx = new Regex(#".*(AQ102.*)");
OR
Regex regEx = new Regex(#".*(AQ102)(.*)");
And you can get the matches doing the following:
Matches matches = regEx.Matches(t);
And you can get the match by referencing the first index:
matches[1]
OR, if you're really confident:
string val = regEx.Matches(t)[1].Value;
Don't need Regex for this. A simple split should suffice:
string output = input.Split(new string[] { "AQ102" }, StringSplitOptions.None)[1];
Depend on how sure you are of your input, you may want to check that AQ102 exist first, or even to count how many times... but as I said, depends on your scenario.

Regex to match only numbers , no apostrophes

I want to match only numbers in the following string
String : "40’000"
Match : "40000"
basically tring to ignore apostrophe.
I am using C#, in case it matters.
Cant use any C# methods, need to only use Regex.
Replace like this it replace all char excpet numbers
string input = "40’000";
string result = Regex.Replace(input, #"[^\d]", "");
Since you said; I just want to pick up numbers only, how about without regex?
var s = "40’000";
var result = new string(s.Where(char.IsDigit).ToArray());
Console.WriteLine(result); // 40000
I suggest use regex to find the special characters not the digits, and then replace by ''.
So a simple (?=\S)\D should be enough, the (?=\S) is to ignore the whitespace at the end of number.
DEMO
Replace like this it replace all char excpet numbers and points
string input = "40’000";
string result = Regex.Replace(input, #"[^\d^.]", "");
Don't complicate your life, use Regex.Replace
string s = "40'000";
string replaced = Regex.Replace(s, #"\D", "");

Regular Expression For JSON

I have a string -
xyz":abc,"lmn
I want to extract abc. what will be the regular expression for this ?
I am trying this -
/xyz\":(.*?),\"lmn/
But it is not fetching any result.
In c# you could use
var regex = new Regex(#"(?<=xyz\"":).*?(?=,\""lmn)");
var value = regex.Match(#"xyz"":abc,""lmn").Value;
Note this makes use of the c# verbatim string prefix # that means that \ is not treated as an escape character. You will need to use a double " so that a single " will be included in the string.
This regex makes use of prefix and suffix matching rules so that you can get the match without having to select the specific group from the result.
Alternatively you can use group matching
var regex=new Regex(#"xyz\"":(.*?),\""lmn");
var value = regex.Match(#"xyz"":abc,""lmn").Groups[1].Value;
You can check for the existence of a match by doing the following
var match = regex.Match(#"xyz"":abc,""lmn");
var isMatch = match.Success;
and then follow up with either match.Value or match.Groups[1].Value depending on which regex you used.
EDIT
Actually the escaping the " is not needed in a c# regex so you could use either of the following instead.
var regex = new Regex("(?<=xyz\":).*?(?=,\"lmn)");
var regex = new Regex("xyz\":(.*?),\"lmn");
These two do not use the verbatim string prefix, so the \" is translated into just " in the regex giving an a regex of (?<=xyz":).*?(?=,"lmn) or xyz":(.*?),"lmn
Additionally if the is an entire string match rather than a substring you would want one of the following.
var regex = new Regex("(?<=^xyz\":).*?(?=,\"lmn$)");
var regex = new Regex("^xyz\":(.*?),\"lmn$");

Categories

Resources