Regular expression replace (C#) - c#

How to make Regex.Replace for the following texts:
1) "Name's", "Sex", "Age", "Height_(in)", "Weight (lbs)"
2) " LatD", "LatM ", 'LatS', "NS", "LonD", "LonM", "LonS", "EW", "City", "State"
Result:
1) Name's, Sex, Age, Height (in), Weight (lbs)
2) LatD, LatM, LatS, NS, LonD, LonM, LonS, EW, City, State
Spaces between brackets can be any size (Example 1). There may also be incorrect spaces in brackets (Example 2). Also, instead of spaces, the "_" sign can be used (Example 1). And instead of double quotes, single quotes can be used (Example 2).
As a result, words must be separated with a comma and a space.
Snippet of my code
StreamReader fileReader = new StreamReader(...);
var fileRow = fileReader.ReadLine();
fileRow = Regex.Replace(fileRow, "_", " ");
fileRow = Regex.Replace(fileRow, "\"", "");
var fileDataField = fileRow.Split(',');

I don't well know C# syntax, but this regex does the job:
Find: (?:_|^["']\h*|\h*["']$|\h*["']\h*,\h*["']\h*)
Replace: A space
Explanation:
(?: # non capture group
_ # undersscore
| # OR
^["']\h* # beginning of line, quote or apostrophe, 0 or more horizontal spaces
| # OR
\h*["']$ # 0 or more horizontal spaces, quote or apostrophe, end of line
| # OR
\h*["']\h* # 0 or more horizontal spaces, quote or apostrophe, 0 or more horizontal spaces
, #
\h*["']\h* # 0 or more horizontal spaces, quote or apostrophe, 0 or more horizontal spaces
) # end group
Demo

How about a simple straight string manipulation way?
using System;
using System.Linq;
static void Main(string[] args)
{
string dirty1 = "\"Name's\", \"Sex\", \"Age\", \"Height_(in)\", \"Weight (lbs)\"";
string dirty2 = "\" LatD\", \"LatM \", 'LatS', \"NS\", \"LonD\", \"LonM\", \"LonS\", \"EW\", \"City\", \"State\"";
Console.WriteLine(Clean(dirty1));
Console.WriteLine(Clean(dirty2));
Console.ReadKey();
}
private static string Clean(string dirty)
{
return dirty.Split(',').Select(item => item.Trim(' ', '"', '\'')).Aggregate((a, b) => string.Join(", ", a, b));
}
private static string CleanNoLinQ(string dirty)
{
string[] items = dirty.Split(',');
for(int i = 0; i < items.Length; i++)
{
items[i] = items[i].Trim(' ', '"', '\'');
}
return String.Join(", ", items);
}
You can even replace the LinQ with a foreach and then string.Join().
Easier to understand - easier to maintain.

Related

Insert space when needed with Regex in c#

I have to write a function that looks up for a string and check if is followed/preceded by a blank space, and if not add it here is my try :
public string AddSpaceIfNeeded(string originalValue, string targetValue)
{
if (originalValue.Contains(targetValue))
{
if (!originalValue.StartsWith(targetValue))
{
int targetValueIndex = originalValue.IndexOf(targetValue);
if (!char.IsWhiteSpace(originalValue[targetValueIndex - 1]))
originalValue.Insert(targetValueIndex - 1, " ");
}
if (!originalValue.EndsWith(targetValue))
{
int targetValueIndex = originalValue.IndexOf(targetValue);
if (!char.IsWhiteSpace(originalValue[targetValueIndex + targetValue.Length + 1]) && !originalValue[targetValueIndex + targetValue.Length + 1].Equals("(s)"))
originalValue.Insert(targetValueIndex + targetValue.Length + 1, " ");
}
}
return originalValue;
}
I want to try with Regex :
I tried like this for adding spaces after the targetValue :
Regex spaceRegex = new Regex("(" + targetValue + ")(?!,)(?!!)(?!(s))(?= )");
originalValue = spaceRegex.Replace(originalValue, (Match m) => m.ToString() + " ");
But not working, and I don't really know for adding space before the word.
Example adding space after:
AddSpaceIfNeeded(Hello my nameis ElBarto, name)
=> Output Hello my name is ElBarto
Example adding space before:
AddSpaceIfNeeded(Hello myname is ElBarto, name)
=> Output Hello my name is ElBarto
You may match your word in all three context while capturing them in separate groups and test for a match later in the match evaluator:
public static string AddSpaceIfNeeded(string originalValue, string targetValue)
{
return Regex.Replace(originalValue,
$#"(?<=\S)({targetValue})(?=\S)|(?<=\S)({targetValue})(?!\S)|(?<!\S){targetValue}(?=\S)", m =>
m.Groups[1].Success ? $" {targetValue} " :
m.Groups[2].Success ? $" {targetValue}" :
$"{targetValue} ");
}
See the C# demo
Note you may need to use Regex.Escape(targetValue) to escape any sepcial chars in the string used as a dynamic pattern.
Pattern details
(?<=\S)({targetValue})(?=\S) - a targetValue that is preceded with a non-whitespace ((?<=\S)) and followed with a non-whitespace ((?=\S))
| - or
(?<=\S)({targetValue})(?!\S) - a targetValue that is preceded with a non-whitespace ((?<=\S)) and not followed with a non-whitespace ((?!\S))
| - or
(?<!\S){targetValue}(?=\S) - a targetValue that is not preceded with a non-whitespace ((?<!\S)) and followed with a non-whitespace ((?!\S))
When m.Groups[1].Success is true, the whole value should be enclosed with spaces. When m.Groups[2].Success is true, we need to add a space before the value. Else, we add a space after the value.

Removal of colon and carriage returns and replace with colon

I'm working on a project where I have a HMTL fragment which needs to be cleaned up - the HTML has been removed and as a result of table being removed, there are some strange ends where they shouldnt be :-)
the characters as they appear are
a space at the beginning of a line
a colon, carriage return and linefeed at the end of the line - which needs to be replaced simply with the colon;
I am presently using regex as follows:
s = Regex.Replace(s, #"(:[\r\n])", ":", RegexOptions.Multiline | RegexOptions.IgnoreCase);
// gets rid of the leading space
s = Regex.Replace(s, #"(^[( )])", "", RegexOptions.Multiline | RegexOptions.IgnoreCase);
Example of what I am dealing with:
Tomas Adams
Solicitor
APLawyers
p:
1800 995 718
f:
07 3102 9135
a:
22 Fultam Street
PO Box 132, Booboobawah QLD 4113
which should look like:
Tomas Adams
Solicitor
APLawyers
p:1800 995 718
f:07 3102 9135
a:22 Fultam Street
PO Box 132, Booboobawah QLD 4313
as my attempt to clean the string, but the result is far from perfect ... Can someone assist me to correct the error and achive my goal ...
[EDIT]
the offending characters
f:\r\n07 3102 9135\r\na:\r\n22
the combination of :\r\n should be replaced by a single colon.
MTIA
Darrin
You may use
var result = Regex.Replace(s, #"(?m)^\s+|(?<=:)(?:\r?\n)+|(\r?\n){2,}", "$1")
See the .NET regex demo.
Details
(?m) - equal to RegexOptions.Multiline - makes ^ match the start of any line here
^ - start of a line
\s+ - 1+ whitespaces
| - or
(?<=:)(?:\r?\n)+ - a position that is immediately preceded with : (matched with (?<=:) positive lookbehind) followed with 1+ occurrences of an optional CR and LF (those are removed)
| - or
(\r?\n){2,} - two or more consecutive occurrences of an optional CR followed with an LF symbol. Only the last occurrence is saved in Group 1 memory buffer, thus the $1 replacement pattern inserts that last, single, occurrence.
A basic solution without Regex:
var lines = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries);
var output = new StringBuilder();
for (var i = 0; i < lines.Length; i++)
{
if (lines[i].EndsWith(":")) // feel free to also check for the size
{
lines[i + 1] = lines[i] + lines[i + 1];
continue;
}
output.AppendLine(lines[i].Trim()); // remove space before or after a line
}
Try it Online!
I tried to use your regular expression.I was able to replace "\n" and ":" with the following regular expression.This is removing ":" and "\n" at the end of the line.
#"([:\r\n])"
A Linq solution without Regex:
var tmp = string.Empty;
var output = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries).Aggregate(new StringBuilder(), (a,b) => {
if (b.EndsWith(":")) { // feel free to also check for the size
tmp = b;
}
else {
a.AppendLine((tmp + b).Trim()); // remove space before or after a line
tmp = string.Empty;
}
return a;
});
Try it Online!

string manipulation : how to split and join a string with delimiters include

I want to split a string and join a certain string at the same time. the string that will be splitted is SQL query.
I set the split delimiters: {". ", ",", ", ", " "}
for example:
select id, name, age, status from tb_test where age > 20 and status = 'Active'
I want it to produce a result something like this:
select
id
,
name
,
age
,
status
from
tb_test
where
age > 20
and
status = 'Active'
but the one that I got by using string split is only word by word.
what should I do to make it have a result like the above?
Thanks in advance.
First create a list of all SQL commands where you want to split on:
List<string> sql = new List<string>() {
"select",
"where",
"and",
"or",
"from",
","
};
After that loop over this list and replace the command with his self surrounded by $.
This $ dollar sign will be the character to split on later on.
string query = "select id, name, age, status from tb_test where age > 20 and status = 'Active'";
foreach (string s in sql)
{
//Use ToLower() so that all strings don't have capital characters
query = query.Replace(s.ToLower(), "$" + s.ToLower() + "$");
}
Now do the split and remove the spaces in front and end using Trim():
string[] splits = query.Split(new char[] { '$' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in splits) Console.WriteLine(s.Trim() + "\r\n");
This will split on the SQL commands. Now you can further customize it to your needs.
Result:
select
id
,
name
,
age
,
status
from
tb_test
where
age > 20
and
status = 'Active'
Here's a pure-regex solution:
(?:(?=,)|(?<![<>=]) +(?! *[<>=])|(?:(?<=,)))(?=(?:(?:[^'"]*(?P<s>['"])(?:(?!(?P=s)).)*(?P=s)))*[^'"]*$)
I made it so it can deal with the usual pitfalls, like strings, but there's probably still some stuff that'll break it. See demo.
Explanation:
(?:
(?=,) # split before a comma.
|
(?<! # if not preceded by an operator, ...
[<>=]
)
+ #...split at a space...
(?! *[<>=]) #...unless there's an operator behind the space.
|
(?: # also split after a comma.
(?<=,)
)
)
# HOWEVER, make sure this isn't inside of a string.
(?= # assert that there's an even number of quotes left in the text.
(?: # consume pairs of quotes.
[^'"]* # all text up to a quote
(?P<s>['"]) # capture the quote
(?: # consume everything up to the next quote.
(?!
(?P=s)
)
.
)*
(?P=s)
)*
[^'"]* # then make sure there are no more quotes until the end of the text.
$
)
First split splits keywords SELECT, FROM, WHERE.
Second split splits all columns by using your delimeters
One approach using regex:
string strRegex = #"(select)|(from)|(where)|([,\.])";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase | RegexOptions.Multiline);
string strTargetString = #"select id, name, age, status from tb_test where age > 20 and status = 'Active'";
string strReplace = "$1\r\n";
return myRegex.Replace(strTargetString, strReplace);
This should output:
select
id ,
name ,
age ,
status from
tb_test where
age > 20 and status = 'Active'
You may want to perform another replacement to trim spaces before coma.
And also use "\r\n$1\r\n" only for sql keywords (select, from where, ...)
Hope this help.

Regular expressions: How to remove all "R.G(*******)" from a string

There are several strings, and I wanna to remove all "R.G(**)" from these strings. For example:
1、Original string:
Push("Command", string.Format(R.G("#{0} this is a string"), accID));
Result:
Push("Command", string.Format("#{0} this is a string", accID));
2、Original string:
Select(Case(T["AccDirect"]).WhenThen(1, R.G("input")).Else(R.G("output")).As("Direct"));
Result:
Select(Case(T["AccDirect"]).WhenThen(1, "input").Else("output").As("Direct"));
3、Original string:
R.G("this is a \"string\"")
Result:
"this is a \"string\""
4、Original string:
R.G("this is a (string)")
Result:
"this is a (string)"
5、Original string:
AppendLine(string.Format(R.G("[{0}] Error:"), str) + R.G("Contains one of these symbols: \\ / : ; * ? \" \' < > | & +"));
Result:
AppendLine(string.Format("[{0}] Error:", str) + "Contains one of these symbols: \\ / : ; * ? \" \' < > | & +");
6 、Original string:
R.G(#"this is the ""1st"" string.
this is the (2nd) string.")
Result:
#"this is the ""1st"" string.
this is the (2nd) string."
Please Help.
Use this, capture group 0 is your target, group 1 is your replace.
Fiddle
R[.]G[(]"(.*?[^\\])"[)]
Example acting on your #2 and #4 string and a new edge case R.G("this is a (\"string\")")
var pattern = #"R[.]G[(]\""(.*?[^\\])\""[)]";
var str = "Select(Case(T[\"AccDirect\"]).WhenThen(1, R.G(\"input\")).Else(R.G(\"output\")).As(\"Direct\"));";
var str2 = "R.G(\"this is a (string)\")";
var str3 = "R.G(\"this is a (\\\"string\\\")\")";
var res = Regex.Replace(str,pattern, "\"$1\"");
var res2 = Regex.Replace(str2,pattern, "\"$1\"");
var res3 = Regex.Replace(str3,pattern, "\"$1\"");
Try this:
var result = Regex.Replace(input, #"(.*)R\.G\(([^)]*)\)(.*)", "$1$2$3");
explanation:
(.*) # capture any characters
R.G\( # then match 'R.G.'
([^)]*) # then capture anything that isn't ')'
\) # match end parenthesis
(.*) # and capture any characters after
The $1$2$3 replaces your entire match with capture group 1, 2, and 3. Which effectively removes everything that isn't part of those matches, namely the "R.G(*)" part.
Note that you will run into problems if your strings contain 'R.G' or a right parenthesis somewhere, but depending on your input data, maybe this will do the trick well enough.

String Manipulation Using RegEx

Given the following scenario, I am wondering if a better solution could be written with Regular Expressions for which I am not very familiar with yet. I am seeing holes in my basic c# string manipulation even though it somewhat works. Your thoughts and ideas are most appreciated.
Thanks much,
Craig
Given the string "story" below, write a script to do the following:
Variable text is enclosed by { }.
If the variable text is blank, remove any other text enclosed in [ ].
Text to be removed can be nested deep with [ ].
Format:
XYZ Company [- Phone: [({404}) ]{321-4321} [Ext: {6789}]]
Examples:
All variable text filled in.
XYZ Company - Phone: (404) 321-4321 Ext: 6789
No Extension entered, remove "Ext:".
XYZ Company - Phone: (404) 321-4321
No Extension and no area code entered, remove "Ext:" and "( ) ".
XYZ Company - Phone: 321-4321
No extension, no phone number, and no area code, remove "Ext:" and "( ) " and "- Phone: ".
XYZ Company
Here is my solution with plain string manipulation.
private string StoryManipulation(string theStory)
{
// Loop through story while there are still curly brackets
while (theStory.IndexOf("{") > 0)
{
// Extract the first curly text area
string lcCurlyText = StringUtils.ExtractString(theStory, "{", "}");
// Look for surrounding brackets and blank all text between
if (String.IsNullOrWhiteSpace(lcCurlyText))
{
for (int lnCounter = theStory.IndexOf("{"); lnCounter >= 0; lnCounter--)
{
if (theStory.Substring(lnCounter - 1, 1) == "[")
{
string lcSquareText = StringUtils.ExtractString(theStory.Substring(lnCounter - 1), "[", "]");
theStory = StringUtils.ReplaceString(theStory, ("[" + lcSquareText + "]"), "", false);
break;
}
}
}
else
{
// Replace current curly brackets surrounding the text
theStory = StringUtils.ReplaceString(theStory, ("{" + lcCurlyText + "}"), lcCurlyText, false);
}
}
// Replace all brackets with blank (-1 all instances)
theStory = StringUtils.ReplaceStringInstance(theStory, "[", "", -1, false);
theStory = StringUtils.ReplaceStringInstance(theStory, "]", "", -1, false);
return theStory.Trim();
}
Dealing with nested structures is generally beyond the scope of regular expressions. But I think there is a solution, if you run the regex replacement in a loop, starting from the inside out. You will need a callback-function though (a MatchEvaluator):
string ReplaceCallback(Match match)
{
if(String.IsNullOrWhiteSpace(match.Groups[2])
return "";
else
return match.Groups[1]+match.Groups[2]+match.Groups[3];
}
Then you can create the evaluator:
MatchEvaluator evaluator = new MatchEvaluator(ReplaceCallback);
And then you can call this in a loop until the replacement does not change anything any more:
newString = Regex.Replace(
oldString,
#"
\[ # a literal [
( # start a capturing group. this is what we access with "match.Groups[1]"
[^{}[\]]
# a negated character class, that matches anything except {, }, [ and ]
* # arbitrarily many of those
) # end of the capturing group
\{ # a literal {
([^{}[\]]*)
# the same thing as before, we will access this with "match.Groups[2]"
} # a literal }
([^{}[\]]*)
# "match.Groups[3]"
] # a literal ]
",
evaluator,
RegexOptions.IgnorePatternWhitespace
);
Here is the whitespace-free version of the regex:
\[([^{}[\]]*)\{([^{}[\]]*)}([^{}[\]]*)]

Categories

Resources