Get Removed characters from string - c#

I am using Regex to remove unwanted characters from string like below:
str = System.Text.RegularExpressions.Regex.Replace(str, #"[^\u0020-\u007E]", "");
How can I retrieve distinct characters which will be removed in efficient way?
EDIT:
Sample input : str = "This☺ contains Åüsome æspecialæ characters"
Sample output : str = "This contains some special characters"
removedchar = "☺,Å,ü,æ"

string pattern = #"[\u0020-\u007E]";
Regex rgx = new Regex(pattern);
List<string> matches = new List<string> ();
foreach (Match match in rgx.Matches(str))
{
if (!matches.Contains (match.Value))
{
matches.Add (match.Value);
}
}

Here is an example how you can do it with a callback method inside the Regex.Replace overload with an evaluator:
evaluator
Type: System.Text.RegularExpressions.MatchEvaluator
A custom method that examines each match and returns either the original matched string or a replacement string.
C# demo:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Test
{
public static List<string> characters = new List<string>();
public static void Main()
{
var str = Regex.Replace("§My string 123”˝", "[^\u0020-\u007E]", Repl);//""
Console.WriteLine(str); // => My string 123
Console.WriteLine(string.Join(", ", characters)); // => §, ”, ˝
}
public static string Repl(Match m)
{
characters.Add(m.Value);
return string.Empty;
}
}
See IDEONE demo
In short, declare a "global" variable (a list of strings, here, characters), initialize it. Add the Repl method to handle the replacement, and when Regex.Replace calls that method, add each matched value to the characters list.

Related

Struggling to match exact regex to my strings

I have these strings
string1 = CD.TR.DRC/TF8
string2 = CD.TR.DRC/TF8/A8
string3 = CD.TR.DRC/TF8.PB
string4 = DRC/TF8
string5 = DDRC/TF8
I am trying to match to the regex DRC/TF8 exactly. So what I want is only string1, string3 and string4 to return true. Could someone please suggest how I could get obtain that using regex?
I would say this will work:
\bDRC\/TF8(?=\.|$)
\b binds the whole word
(?=\.|$) is negative lookahead which asserts that the word is terminated with a . or it's the end of the line
See example: https://regexr.com/634a3
Detailed syntax for C# can be found in this post.
Based on your current examples you can use this pattern: (?<=\.|^)DRC\/TF8(?=\.|$)
code:
using System;
using System.Text.RegularExpressions;
public class Test{
public static void Main(){
string pattern = #"(?<=\.|^)DRC\/TF8(?=\.|$)";
Regex re = new Regex(pattern);
string[] text = {"CD.TR.DRC/TF8", "CD.TR.DRC/TF8/A8", "CD.TR.DRC/TF8.PB", "DRC/TF8", "DDRC/TF8"};
foreach(string str in text){
if (re.IsMatch(str)){
Console.WriteLine(str);
}
}
}
}
output:
CD.TR.DRC/TF8
CD.TR.DRC/TF8.PB
DRC/TF8

How to replace a string a particular position with a matched Regular expression pattern at that position

So, I have this basic program which matches a particular pattern in a string and then stores all the matches in an array. Then I append each matched element to a string.
Now my question is, how can I replace the matched pattern in the original string with the modified string that I generated based on my regex.
A sample working program can be found here: https://dotnetfiddle.net/UvgOVc
As you can see that the final string generated has the last modified string only. How can I replace the corresponding matches in the replacement?
Code:
using System;
using System.Text.RegularExpressions;
using System.Linq;
public class Program
{
public static void Main()
{
string url = "Please ab123456 this is and also bc789456 and also de456789 ";
string[] queryString = getMatch(url,#"\w{2}\d{6}");
string[] formatted=new string[10000];
string finalurl=string.Empty;
for(int i=0;i<queryString.Length;i++)
{
formatted[i]="replace "+queryString[i];
Console.WriteLine(formatted[i]+"\n");
finalurl=Regex.Replace(url,#"\w{2}\d{6}",formatted[i]);
}
Console.WriteLine(finalurl);
}
private static string[] getMatch(string text, string expr)
{
string matched=string.Empty;
string[] matches=new string[100];
var mc = Regex.Matches(text, expr);
if ( text != string.Empty && mc.Count > 0)
{
matches = mc.OfType<Match>()
.Select(x => x.Value)
.ToArray();
}
return matches;
}
}
Output:
replace ab123456
replace bc789456
replace de456789
Please replace de456789 this is and also replace de456789 and also replace de456789
Your code can be reduced to
string url = "Please ab123456 this is and also bc789456 and also de456789 ";
string[] queryString = Regex.Matches(url,#"\w{2}\d{6}").Cast<Match>().Select(x => x.Value).ToArray();
string finalurl=Regex.Replace(url,#"\w{2}\d{6}", "replace $&");
Console.WriteLine(finalurl); // => Please replace ab123456 this is and also replace bc789456 and also replace de456789
See the online C# demo.
Here, Regex.Matches(url,#"\w{2}\d{6}").Cast<Match>().Select(x => x.Value).ToArray() collects all matches into the queryString variable (if you need the match values array).
The Regex.Replace(url,#"\w{2}\d{6}", "replace $&") finds all matches of two word chars followed with six digits and appends replace + space before the matched texts (note $& is the backreference to the whole match value).
If you plan to perform some more manipulation with the found matches, consider using a match evaluator. Say, you defined SomeMethod(string s) somewhere, then you may use
string finalurl=Regex.Replace(url,#"\w{2}\d{6}", m =>
SomeMethod(m.Value);
);

C# Regex Replace similar words

Need to replace operands named as [WORD, WORD1, WORD2,..., WORDnnn] in an expression like:
WORD-WORD1+WORD11
with operands named as:
[WORD_NEW, WORD1_NEW, WORD2_NEW, WORDnnn_NEW]
Some of the operands are not mapped, and those should not be replaced.
WORD-WORD1+WORD11 => WORD_NEW-WORD1_NEW+WORD11_NEW
WORD-WORD1+WORD11 => WORD_NEW-WORD1_NEW+WORD11 if WORD11 is not mapped.
Since you already have map (presumably in form of Dictionary<string,string>) just run Replace that takes delegate and check if mapping is present for each particular match:
var mapping = new Dictionary<string,string>{{"WORD1", "WORD_NEW1"}};
var result = Regex.Replace("WORD-WORD1+WORD11", "WORD\d+",
match => mapping.ContainsKey(match.Value)? mapping[match.Value] : match.Value);
// result is "WORD-WORD_NEW1+WORD11"
This Should Work.
Regexp:
(WORD\d*)
Replace with:
$1_NEW
C# Code:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(WORD\d*)";
string substitution = #"$1_NEW";
string input = #"WORD, WORD1, WORD2";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
Console.WriteLine(result);
}
}
OUTPUT:
WORD_NEW, WORD1_NEW, WORD2_NEW
See: https://regex101.com/r/2uTCjD/1
Test it: http://ideone.com/01Yxng

Validate and pass only valid characters against regex expression in c#

I am working on solution where I need to validate and pass only valid characters of string in c#.
E.g. my regular expression is : "^\\S(|(.|\\n)*\\S)\\Z"
and text I want validate is below
127 Finchfield Lane
Now I know its invalid. But how do I remove invalid against regex and pass only if string validate successfully against regex ?
if i understand you correctly, you are looking for Regex.IsMatch
if(Regex.IsMatch(str, "^\\S(|(.|\\n)*\\S)\\Z"))
{
// do something with the valid string
}
else
{
// strip invalid characters from the string
}
using System;
using System.Text.RegularExpressions;
namespace PatternMatching
{
class Program
{
static void Main()
{
string pattern = #"(\d+) (\w+)";
string[] strings = { "123 ABC", "ABC 123", "CD 45678", "57998 DAC" };
foreach (var s in strings)
{
Match result = Regex.Match(s, pattern);
if (result.Success)
{
Console.WriteLine("Match: {0}", result.Value);
}
}
Console.ReadKey();
}
}
}
This seems to do what you require. Hope I haven't misunderstood.
To validate the string against regex you can use use Regex.IsMatch.
Regex.IsMatch(string, pattern) //returns true if string is valid
If you want to get the Match value only then you can use it.
Match match = new Regex(#"\d+").Match(str);
match.value; //it returns only the matched string and unmatched string automatically stripped out

Substitute only one group when dealing with an unknown number of capturing groups

Assuming I have this input:
/green/blah/agriculture/apple/blah/
I'm only trying to capture and replace the occurrence of apple (need to replace it with orange), so I have this regex
var regex = new Regex("^/(?:green|red){1}(?:/.*)+(apple){1}(?:/.*)");
So I'm grouping sections of the input, but as non-capturing, and only capturing the one I'm concerned with. According to this $` will retrieve everything before the match in the input string, and $' will get everything after, so theoretically the following should work:
"$`Orange$'"
But it only retrieves the match ("apple").
Is it possible to do this with just substitutions and NOT match evaluators and looping through groups?
The issue is that apple can occur anywhere in that url scheme, hence an unknown number of capture groups.
Thanks.
To achieve what you want, I slightly changed your regex.
The new regex looks like this look for the updated version at the end of the answer:
What I am doing here is, I want all the other groups to become captured groups. Doing this I can use them as follow:
String replacement = "$1Orange$2";
string result = Regex.Replace(text, regex.ToString(), replacement);
I am using group 1,2 and 4 and in the middle of everything (where I suspect 'apple') I replace it with Orange.
A complete example looks like this:
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
String text = "/green/blah/agriculture/apple/blah/hallo/apple";
var regex = new Regex("^(/(?:green|red)/(?:[^/]+/)*?)apple(/.*)");
String replacement = "$1$2Orange$4";
string result = Regex.Replace(text, regex.ToString(), replacement);
Console.WriteLine(result);
}
}
And as well a running example is here
See the updated regex, I needed to change it again to capture things like this:
/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple
With the above regex it matched the last apple and not the first as prio designated. I changed the regex to this:
var regex = new Regex("^(/(?:green|red)/(?:[^/]+/)*?)apple(/.*)");
I updated the code as well as the running example.
If you really want to replace only the first occurence of apple and dont mind about the URL structure then can you use one of the following methods:
First simply use apple as regex and use the overloaded Replace method.
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
String text = "/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple";
var regex = new Regex(Regex.Escape("apple"));
String replacement = "Orange";
string result = regex.Replace(text, replacement.ToString(), 1);
Console.WriteLine(result);
}
}
See working Example
Second is the use of IndexOf and Substring which could be much quick as the use of the regex classes.
See the following Example:
class Program
{
static void Main(string[] args)
{
string search = "apple";
string text = "/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple";
int idx = text.IndexOf(search);
int endIdx = idx + search.Length;
int secondStrLen = text.Length - endIdx;
if (idx != -1 && idx < text.Length && endIdx < text.Length && secondStrLen > -1)
{
string first = text.Substring(0, idx);
string second = text.Substring(endIdx, secondStrLen);
string result = first + "Orange" + second;
Console.WriteLine(result);
}
}
}
Working Example

Categories

Resources