Regular expressions: How to remove all "R.G(*******)" from a string - c#

There are several strings, and I wanna to remove all "R.G(**)" from these strings. For example:
1、Original string:
Push("Command", string.Format(R.G("#{0} this is a string"), accID));
Result:
Push("Command", string.Format("#{0} this is a string", accID));
2、Original string:
Select(Case(T["AccDirect"]).WhenThen(1, R.G("input")).Else(R.G("output")).As("Direct"));
Result:
Select(Case(T["AccDirect"]).WhenThen(1, "input").Else("output").As("Direct"));
3、Original string:
R.G("this is a \"string\"")
Result:
"this is a \"string\""
4、Original string:
R.G("this is a (string)")
Result:
"this is a (string)"
5、Original string:
AppendLine(string.Format(R.G("[{0}] Error:"), str) + R.G("Contains one of these symbols: \\ / : ; * ? \" \' < > | & +"));
Result:
AppendLine(string.Format("[{0}] Error:", str) + "Contains one of these symbols: \\ / : ; * ? \" \' < > | & +");
6 、Original string:
R.G(#"this is the ""1st"" string.
this is the (2nd) string.")
Result:
#"this is the ""1st"" string.
this is the (2nd) string."
Please Help.

Use this, capture group 0 is your target, group 1 is your replace.
Fiddle
R[.]G[(]"(.*?[^\\])"[)]
Example acting on your #2 and #4 string and a new edge case R.G("this is a (\"string\")")
var pattern = #"R[.]G[(]\""(.*?[^\\])\""[)]";
var str = "Select(Case(T[\"AccDirect\"]).WhenThen(1, R.G(\"input\")).Else(R.G(\"output\")).As(\"Direct\"));";
var str2 = "R.G(\"this is a (string)\")";
var str3 = "R.G(\"this is a (\\\"string\\\")\")";
var res = Regex.Replace(str,pattern, "\"$1\"");
var res2 = Regex.Replace(str2,pattern, "\"$1\"");
var res3 = Regex.Replace(str3,pattern, "\"$1\"");

Try this:
var result = Regex.Replace(input, #"(.*)R\.G\(([^)]*)\)(.*)", "$1$2$3");
explanation:
(.*) # capture any characters
R.G\( # then match 'R.G.'
([^)]*) # then capture anything that isn't ')'
\) # match end parenthesis
(.*) # and capture any characters after
The $1$2$3 replaces your entire match with capture group 1, 2, and 3. Which effectively removes everything that isn't part of those matches, namely the "R.G(*)" part.
Note that you will run into problems if your strings contain 'R.G' or a right parenthesis somewhere, but depending on your input data, maybe this will do the trick well enough.

Related

Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!
For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?
I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.
For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'

String Replace & Split

How to split this values using single Replace & Spit method
Tel-0190 Texas 2020-12-31 9 890,00 $ 4,00 $ 8 690,00 $
I want to split String Result like this :
"Tel-0190" "Texas" "2020-12-31" "9 890,00 $" "4,00 $" "8 690,00 $"
I tried:
str.Replace(" ","_")
.Replace("\d* ","\d* ")
.Replace(" €"," €")
.Split("_"C)
Here's a try using Regex:
private const string Source = "Tel-0190 Texas 2020-12-31 9 890,00 $ 4,00 $ 8 690,00 $";
private const string RegexPattern =
#"(?<tel>Tel-\d+) (?<state>Texas) (?<date>\d{4}-\d{1,2}-\d{1,2}) (?<num1>[0-9, ]+[$€]) (?<num2>[0-9, ]+[$€]) (?<num3>[0-9, ]+[$€])";
I'm using "named groups" in the regex. I tried to guess your rules. This code will find the groups:
var regex = new Regex(RegexPattern);
var match = regex.Match(Source);
if (match != null && match.Groups.Count == 7)
{
var groups = match.Groups;
Debug.WriteLine(groups["tel"]);
Debug.WriteLine(groups["state"]);
Debug.WriteLine(groups["date"]);
Debug.WriteLine(groups["num1"]);
Debug.WriteLine(groups["num2"]);
Debug.WriteLine(groups["num3"]);
}
The result looks like:
Tel-0190
Texas
2020-12-31
9 890,00 $
4,00 $
8 690,00 $

Regular expression replace (C#)

How to make Regex.Replace for the following texts:
1) "Name's", "Sex", "Age", "Height_(in)", "Weight (lbs)"
2) " LatD", "LatM ", 'LatS', "NS", "LonD", "LonM", "LonS", "EW", "City", "State"
Result:
1) Name's, Sex, Age, Height (in), Weight (lbs)
2) LatD, LatM, LatS, NS, LonD, LonM, LonS, EW, City, State
Spaces between brackets can be any size (Example 1). There may also be incorrect spaces in brackets (Example 2). Also, instead of spaces, the "_" sign can be used (Example 1). And instead of double quotes, single quotes can be used (Example 2).
As a result, words must be separated with a comma and a space.
Snippet of my code
StreamReader fileReader = new StreamReader(...);
var fileRow = fileReader.ReadLine();
fileRow = Regex.Replace(fileRow, "_", " ");
fileRow = Regex.Replace(fileRow, "\"", "");
var fileDataField = fileRow.Split(',');
I don't well know C# syntax, but this regex does the job:
Find: (?:_|^["']\h*|\h*["']$|\h*["']\h*,\h*["']\h*)
Replace: A space
Explanation:
(?: # non capture group
_ # undersscore
| # OR
^["']\h* # beginning of line, quote or apostrophe, 0 or more horizontal spaces
| # OR
\h*["']$ # 0 or more horizontal spaces, quote or apostrophe, end of line
| # OR
\h*["']\h* # 0 or more horizontal spaces, quote or apostrophe, 0 or more horizontal spaces
, #
\h*["']\h* # 0 or more horizontal spaces, quote or apostrophe, 0 or more horizontal spaces
) # end group
Demo
How about a simple straight string manipulation way?
using System;
using System.Linq;
static void Main(string[] args)
{
string dirty1 = "\"Name's\", \"Sex\", \"Age\", \"Height_(in)\", \"Weight (lbs)\"";
string dirty2 = "\" LatD\", \"LatM \", 'LatS', \"NS\", \"LonD\", \"LonM\", \"LonS\", \"EW\", \"City\", \"State\"";
Console.WriteLine(Clean(dirty1));
Console.WriteLine(Clean(dirty2));
Console.ReadKey();
}
private static string Clean(string dirty)
{
return dirty.Split(',').Select(item => item.Trim(' ', '"', '\'')).Aggregate((a, b) => string.Join(", ", a, b));
}
private static string CleanNoLinQ(string dirty)
{
string[] items = dirty.Split(',');
for(int i = 0; i < items.Length; i++)
{
items[i] = items[i].Trim(' ', '"', '\'');
}
return String.Join(", ", items);
}
You can even replace the LinQ with a foreach and then string.Join().
Easier to understand - easier to maintain.

Removal of colon and carriage returns and replace with colon

I'm working on a project where I have a HMTL fragment which needs to be cleaned up - the HTML has been removed and as a result of table being removed, there are some strange ends where they shouldnt be :-)
the characters as they appear are
a space at the beginning of a line
a colon, carriage return and linefeed at the end of the line - which needs to be replaced simply with the colon;
I am presently using regex as follows:
s = Regex.Replace(s, #"(:[\r\n])", ":", RegexOptions.Multiline | RegexOptions.IgnoreCase);
// gets rid of the leading space
s = Regex.Replace(s, #"(^[( )])", "", RegexOptions.Multiline | RegexOptions.IgnoreCase);
Example of what I am dealing with:
Tomas Adams
Solicitor
APLawyers
p:
1800 995 718
f:
07 3102 9135
a:
22 Fultam Street
PO Box 132, Booboobawah QLD 4113
which should look like:
Tomas Adams
Solicitor
APLawyers
p:1800 995 718
f:07 3102 9135
a:22 Fultam Street
PO Box 132, Booboobawah QLD 4313
as my attempt to clean the string, but the result is far from perfect ... Can someone assist me to correct the error and achive my goal ...
[EDIT]
the offending characters
f:\r\n07 3102 9135\r\na:\r\n22
the combination of :\r\n should be replaced by a single colon.
MTIA
Darrin
You may use
var result = Regex.Replace(s, #"(?m)^\s+|(?<=:)(?:\r?\n)+|(\r?\n){2,}", "$1")
See the .NET regex demo.
Details
(?m) - equal to RegexOptions.Multiline - makes ^ match the start of any line here
^ - start of a line
\s+ - 1+ whitespaces
| - or
(?<=:)(?:\r?\n)+ - a position that is immediately preceded with : (matched with (?<=:) positive lookbehind) followed with 1+ occurrences of an optional CR and LF (those are removed)
| - or
(\r?\n){2,} - two or more consecutive occurrences of an optional CR followed with an LF symbol. Only the last occurrence is saved in Group 1 memory buffer, thus the $1 replacement pattern inserts that last, single, occurrence.
A basic solution without Regex:
var lines = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries);
var output = new StringBuilder();
for (var i = 0; i < lines.Length; i++)
{
if (lines[i].EndsWith(":")) // feel free to also check for the size
{
lines[i + 1] = lines[i] + lines[i + 1];
continue;
}
output.AppendLine(lines[i].Trim()); // remove space before or after a line
}
Try it Online!
I tried to use your regular expression.I was able to replace "\n" and ":" with the following regular expression.This is removing ":" and "\n" at the end of the line.
#"([:\r\n])"
A Linq solution without Regex:
var tmp = string.Empty;
var output = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries).Aggregate(new StringBuilder(), (a,b) => {
if (b.EndsWith(":")) { // feel free to also check for the size
tmp = b;
}
else {
a.AppendLine((tmp + b).Trim()); // remove space before or after a line
tmp = string.Empty;
}
return a;
});
Try it Online!

Matching balanced parentheses before a character

I need to match a string within balanced parentheses before a literal period in c#. My regex with balanced groups works except when there are extra open parens in the string. According to my understanding, this requires a conditional fail pattern to ensure the stack is empty on match, yet something is not quite right.
Original regex:
#"(?<Par>[(]).+(?<-Par>[)])\."
With fail-pattern:
#"(?<Par>[(]).+(?<-Par>[)])(?(Par)(?!))\."
Test-code (last 2 fail):
string[] tests = {
"a.c", "",
"a).c", "",
"(a.c", "",
"a(a).c", "(a).",
"a(a b).c", "(a b).",
"a((a b)).c", "((a b)).",
"a(((a b))).c", "(((a b))).",
"a((a) (b)).c", "((a) (b)).",
"a((a)(b)).c", "((a)(b)).",
"a((ab)).c", "((ab)).",
"a)((ab)).(c", "((ab)).",
"a(((a b)).c", "((a b)).",
"a(((a b)).)c", "((a b))."
};
Regex re = new Regex(#"(?<Par>[(]).+(?<-Par>[)])(?(Par)(?!))\.");
for (int i = 0; i < tests.Length; i += 2)
{
var result = re.Match(tests[i]).Groups[0].Value;
if (result != tests[i + 1]) throw new Exception
("Expecting: " + tests[i + 1] + ", got " + result);
}
You may use a well-known regex to match balanced parentheses and just append a \. to it:
\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)\.
|---------- balanced parens part ----------|.|
See the regex demo.
Details
\( - a (
(?> - start of an atomic group
[^()]+ - 1 or more chars other than ( and )
| - or
(?<o>)\( - an opening ( is pushed on to the Group o stack
| - or
(?<-o>)\) - a closing ( is popped off the Group o stack
)* - 0 or more repetitions of the atomic group
(?(o)(?!)) - fail the match if Group o stack is not empty
\) - a )
\. - a dot.

Categories

Resources