C# Regex Replace similar words - c#

Need to replace operands named as [WORD, WORD1, WORD2,..., WORDnnn] in an expression like:
WORD-WORD1+WORD11
with operands named as:
[WORD_NEW, WORD1_NEW, WORD2_NEW, WORDnnn_NEW]
Some of the operands are not mapped, and those should not be replaced.
WORD-WORD1+WORD11 => WORD_NEW-WORD1_NEW+WORD11_NEW
WORD-WORD1+WORD11 => WORD_NEW-WORD1_NEW+WORD11 if WORD11 is not mapped.

Since you already have map (presumably in form of Dictionary<string,string>) just run Replace that takes delegate and check if mapping is present for each particular match:
var mapping = new Dictionary<string,string>{{"WORD1", "WORD_NEW1"}};
var result = Regex.Replace("WORD-WORD1+WORD11", "WORD\d+",
match => mapping.ContainsKey(match.Value)? mapping[match.Value] : match.Value);
// result is "WORD-WORD_NEW1+WORD11"

This Should Work.
Regexp:
(WORD\d*)
Replace with:
$1_NEW
C# Code:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(WORD\d*)";
string substitution = #"$1_NEW";
string input = #"WORD, WORD1, WORD2";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
Console.WriteLine(result);
}
}
OUTPUT:
WORD_NEW, WORD1_NEW, WORD2_NEW
See: https://regex101.com/r/2uTCjD/1
Test it: http://ideone.com/01Yxng

Related

Read a string with a particular format

I have some strings like
string text =
"these all are strings, 000_00_0 and more strings are there I have here 649_17_8, and more with this format 975_63_7."
So here I wanted to read only 000_00_0, 649_17_8, 975_64_7... All the string with this format.
Please help me with the situation
You can use Regex class and its Matches method.
var matches = System.Text.RegularExpressions.Regex.Matches(input, "([_0-9]+)");
var numberishs1 = matches
.Select(m => m.Groups[1].ToString())
.ToList();
var s1 = string.Join(", ", numberishs1);
// .NETCoreApp,Version=v3.0
these all are strings, 000_00_0 and more strings are there I have here 649_17_8, and more with this format 975_63_7.
000_00_0, 649_17_8, 975_63_7
You have to use regular expression.
In your case you want to find the pattern "..._.._."
If you only want numbers it would be pattern "([0-9]{3}_[0-9]{2}_[0-9]{1})"
Try this:
using System;
using System.Text.RegularExpressions;
namespace SyntaxTest
{
class Program
{
static void Main(string[] args)
{
var input =
"these all are strings, 000_00_0 and more strings are there I have here 649_17_8, and more with this format 975_63_7.";
var pattern = "..._.._.";
Match result = Regex.Match(input, pattern);
if (result.Success)
{
while (result.Success)
{
Console.WriteLine("Match: {0}", result.Value);
result = result.NextMatch();
}
Console.ReadLine();
}
}
}
}

Get Removed characters from string

I am using Regex to remove unwanted characters from string like below:
str = System.Text.RegularExpressions.Regex.Replace(str, #"[^\u0020-\u007E]", "");
How can I retrieve distinct characters which will be removed in efficient way?
EDIT:
Sample input : str = "This☺ contains Åüsome æspecialæ characters"
Sample output : str = "This contains some special characters"
removedchar = "☺,Å,ü,æ"
string pattern = #"[\u0020-\u007E]";
Regex rgx = new Regex(pattern);
List<string> matches = new List<string> ();
foreach (Match match in rgx.Matches(str))
{
if (!matches.Contains (match.Value))
{
matches.Add (match.Value);
}
}
Here is an example how you can do it with a callback method inside the Regex.Replace overload with an evaluator:
evaluator
Type: System.Text.RegularExpressions.MatchEvaluator
A custom method that examines each match and returns either the original matched string or a replacement string.
C# demo:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Test
{
public static List<string> characters = new List<string>();
public static void Main()
{
var str = Regex.Replace("§My string 123”˝", "[^\u0020-\u007E]", Repl);//""
Console.WriteLine(str); // => My string 123
Console.WriteLine(string.Join(", ", characters)); // => §, ”, ˝
}
public static string Repl(Match m)
{
characters.Add(m.Value);
return string.Empty;
}
}
See IDEONE demo
In short, declare a "global" variable (a list of strings, here, characters), initialize it. Add the Repl method to handle the replacement, and when Regex.Replace calls that method, add each matched value to the characters list.

C# Regex to remove appropriate part of text

I have two subsequently repeating texts in "<>". The text is dynamic and has the following pattern:
"some text <text1> <text2> some text"
Based on the condition I need to remove either first or second text in "<>". I also need to remove any brackets.
Example:
"The company <is> <is not> a co-owner of other accounts in the Bank."
if true condition:
"The company is a co-owner of other accounts in the Bank."
if false condition:
"The company is not a co-owner of other accounts in the Bank."
I'd appreciate your help with regex pattern.
var isTrue = true;
var str = "The company <is> <is not> a co-owner of other accounts in the Bank.";
var segment = Regex.Match(str, #"<(.*?)>\W<(.*?)>");
var replacement = str.Replace(segment.Value, isTrue ? segment.Groups[1].Value : segment.Groups[2].Value);
I've to make some prerequisites: some text doesn't contain a matching pair of "<>" brackets.
You can use this code to do your job:
var input = #"some text <truestring> <falsestring> some text";
var replacement = match => <some condition> ? match.Groups["truestring"].Value : match.Groups["falsestring"].Value
var regex = new Regex(#"(?:(?:(?<topen>\<)[^\<\>]*)+(?:(?<truestring-topen>\>)(?(topen)[^\<\>]*))+)+(?(topen)(?!))\s*(?:(?:(?<fopen>\<)[^\<\>]*)+(?:(?<falsestring-fopen>\>)(?(fopen)[^\<\>]*))+)+(?(fopen)(?!))");
var result = regex.Replace(input, replacement);
Replace some condition with your condition. This regex uses BCD to find the matching bracket of truestring or falsestring so truestring and falsestring are able to contain matching backets.
Try with this example:
bool condition = true;
string str = (condition) ? "$2" : "$5";
var text = "some text <trueString> <falseString> some text";
var result = Regex.Replace(text, "(<)([^<>]*)(>).*?(<)([^<>]*)(>)", str);
result will be: some text trueString some text
One more way:
public static string Substitute(string input, bool condition)
{
return Regex.Replace(input, #"<(?<true>.*?)>\s*<(?<false>.*?)>", m => condition ? m.Groups["true"].Value : m.Groups["false"].Value);
}
usage:
string input = "The company <is> <is not> a co-owner of other accounts in the Bank.";
string output = Substitute(input, false);
Use Regex
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input = "some text <trueString> <falseString> some text";
string pattern = #"(?'beginning'[^\<]+)\<(?'true'[^\>]+)\>\s*<(?'false'[^\>]+)\>(?'ending'[^$]*)";
Match match = Regex.Match(input, pattern);
bool condition = true;
Regex expr = new Regex(pattern);
string output = "";
if (condition)
{
output = expr.Replace(input, "${beginning}${true}${ending}");
}
else
{
output = expr.Replace(input, "${beginning}${false}${ending}");
}
}
}
}​

Why my \p{L} returns underscore?

I have the following code to parse by Regex:
const string patern = #"^(\p{L}+)_";
var rgx = new Regex(patern);
var str1 = "library_log_12312_12.log";
var m = rgx.Matches(str1);
It returns only one match and it is "library_". I have read a lot of resources and it should not contain underscore, should it?
Your pattern includes the _, so the match does too. If you only want the group, you need to specify that. It'll be in group 1 (as group 0 is always the whole match):
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
var regex = new Regex(#"^(\p{L}+)_");
var input = "library_log_12312_12.log";
var matches = regex.Matches(input);
var match = matches[0];
Console.WriteLine(match.Groups[0]); // library_
Console.WriteLine(match.Groups[1]); // library
}
}
Your regex ends with _ so basically, it matches on one or more Unicode letters, followed by an underscore (which is not a Unicode letter).
The captured group will not contain the _.
Works as expected.
It should contain the underscore as it is in your regular expression.
If you only want to have library as the result, you need to access the first sub-group in the result:
var m = rgx.Matches(str1).Cast<Match>().Select(x => x.Groups[1].Value);

C# Regex Validation Rule using Regex.Match()

I've written a Regular expression which should validate a string using the following rules:
The first four characters must be alphanumeric.
The alpha characters are followed by 6 or 7 numeric values for a total length of 10 or 11.
So the string should look like this if its valid:
CCCCNNNNNN or CCCCNNNNNNN
C being any character and N being a number.
My expression is written: #"^[0-9A-Za-z]{3}[0-9A-Za-z-]\d{0,21}$";
My regex match code looks like this:
var cc1 = "FOOBAR"; // should fail.
var cc2 = "AAAA1111111111"; // should succeed
var regex = #"^[0-9A-Za-z]{3}[0-9A-Za-z-]\d{0,21}$";
Match match = Regex.Match( cc1, regex, RegexOptions.IgnoreCase );
if ( cc1 != string.Empty && match.Success )
{
//"The Number must start with 4 letters and contain no numbers.",
Error = SeverityType.Error
}
I'm hoping someone can take a look at my expression and offer some feedback on improvements to produce a valid match.
Also, am I use .Match() correctly? If Match.Success is true, then does that mean that the string is valid?
The regex for 4 alphanumeric characters follows by 6 to 7 decimal digits is:
var regex = #"^\w{4}\d{6,7}$";
See: Regular Expression Language - Quick Reference
The Regex.Match Method returns a Match object. The Success Property indicates whether the match is successful or not.
var match = Regex.Match(input, regex, RegexOptions.IgnoreCase);
if (!match.Success)
{
// does not match
}
The following code demonstrates the regex usage:
var cc1 = "FOOBAR"; // should fail.
var cc2 = "AAAA1111111"; // should succeed
var r = new Regex(#"^[0-9a-zA-Z]{4}\d{6,7}$");
if (!r.IsMatch(cc2))
{
Console.WriteLine("cc2 doesn't match");
}
if (!r.IsMatch(cc1))
{
Console.WriteLine("cc1 doesn't match");
}
The output will be cc1 doesn't match.
The following code is using a regular expression and checks 4 different patterns to test it, see output below:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var p1 = "aaaa999999";
CheckMatch(p1);
p1 = "aaaa9999999";
CheckMatch(p1);
p1 = "aaaa99999999";
CheckMatch(p1);
p1 = "aaa999999";
CheckMatch(p1);
}
public static void CheckMatch(string p1)
{
var reg = new Regex(#"^\w{4}\d{6,7}$");
if (!reg.IsMatch(p1))
{
Console.WriteLine($"{p1} doesn't match");
}
else
{
Console.WriteLine($"{p1} matches");
}
}
}
Output:
aaaa999999 matches
aaaa9999999 matches
aaaa99999999 doesn't match
aaa999999 doesn't match
Try it as DotNetFiddle
Your conditions give:
The first four characters must be alphanumeric: [A-Za-z\d]{4}
Followed by 6 or 7 numeric values: \d{6,7}
Put it together and anchor it:
^[A-Za-z\d]{4}\d{6,7}\z
Altho that depends a bit how you define "alphanumeric". Also if you are using ignore case flag then you can remove the A-Z range from the expression.
Try the following pattern:
#"^[A-za-z\d]{4}\d{6,7}$"

Categories

Resources