Insert space when needed with Regex in c# - c#

I have to write a function that looks up for a string and check if is followed/preceded by a blank space, and if not add it here is my try :
public string AddSpaceIfNeeded(string originalValue, string targetValue)
{
if (originalValue.Contains(targetValue))
{
if (!originalValue.StartsWith(targetValue))
{
int targetValueIndex = originalValue.IndexOf(targetValue);
if (!char.IsWhiteSpace(originalValue[targetValueIndex - 1]))
originalValue.Insert(targetValueIndex - 1, " ");
}
if (!originalValue.EndsWith(targetValue))
{
int targetValueIndex = originalValue.IndexOf(targetValue);
if (!char.IsWhiteSpace(originalValue[targetValueIndex + targetValue.Length + 1]) && !originalValue[targetValueIndex + targetValue.Length + 1].Equals("(s)"))
originalValue.Insert(targetValueIndex + targetValue.Length + 1, " ");
}
}
return originalValue;
}
I want to try with Regex :
I tried like this for adding spaces after the targetValue :
Regex spaceRegex = new Regex("(" + targetValue + ")(?!,)(?!!)(?!(s))(?= )");
originalValue = spaceRegex.Replace(originalValue, (Match m) => m.ToString() + " ");
But not working, and I don't really know for adding space before the word.
Example adding space after:
AddSpaceIfNeeded(Hello my nameis ElBarto, name)
=> Output Hello my name is ElBarto
Example adding space before:
AddSpaceIfNeeded(Hello myname is ElBarto, name)
=> Output Hello my name is ElBarto

You may match your word in all three context while capturing them in separate groups and test for a match later in the match evaluator:
public static string AddSpaceIfNeeded(string originalValue, string targetValue)
{
return Regex.Replace(originalValue,
$#"(?<=\S)({targetValue})(?=\S)|(?<=\S)({targetValue})(?!\S)|(?<!\S){targetValue}(?=\S)", m =>
m.Groups[1].Success ? $" {targetValue} " :
m.Groups[2].Success ? $" {targetValue}" :
$"{targetValue} ");
}
See the C# demo
Note you may need to use Regex.Escape(targetValue) to escape any sepcial chars in the string used as a dynamic pattern.
Pattern details
(?<=\S)({targetValue})(?=\S) - a targetValue that is preceded with a non-whitespace ((?<=\S)) and followed with a non-whitespace ((?=\S))
| - or
(?<=\S)({targetValue})(?!\S) - a targetValue that is preceded with a non-whitespace ((?<=\S)) and not followed with a non-whitespace ((?!\S))
| - or
(?<!\S){targetValue}(?=\S) - a targetValue that is not preceded with a non-whitespace ((?<!\S)) and followed with a non-whitespace ((?!\S))
When m.Groups[1].Success is true, the whole value should be enclosed with spaces. When m.Groups[2].Success is true, we need to add a space before the value. Else, we add a space after the value.

Related

Regular expression replace (C#)

How to make Regex.Replace for the following texts:
1) "Name's", "Sex", "Age", "Height_(in)", "Weight (lbs)"
2) " LatD", "LatM ", 'LatS', "NS", "LonD", "LonM", "LonS", "EW", "City", "State"
Result:
1) Name's, Sex, Age, Height (in), Weight (lbs)
2) LatD, LatM, LatS, NS, LonD, LonM, LonS, EW, City, State
Spaces between brackets can be any size (Example 1). There may also be incorrect spaces in brackets (Example 2). Also, instead of spaces, the "_" sign can be used (Example 1). And instead of double quotes, single quotes can be used (Example 2).
As a result, words must be separated with a comma and a space.
Snippet of my code
StreamReader fileReader = new StreamReader(...);
var fileRow = fileReader.ReadLine();
fileRow = Regex.Replace(fileRow, "_", " ");
fileRow = Regex.Replace(fileRow, "\"", "");
var fileDataField = fileRow.Split(',');
I don't well know C# syntax, but this regex does the job:
Find: (?:_|^["']\h*|\h*["']$|\h*["']\h*,\h*["']\h*)
Replace: A space
Explanation:
(?: # non capture group
_ # undersscore
| # OR
^["']\h* # beginning of line, quote or apostrophe, 0 or more horizontal spaces
| # OR
\h*["']$ # 0 or more horizontal spaces, quote or apostrophe, end of line
| # OR
\h*["']\h* # 0 or more horizontal spaces, quote or apostrophe, 0 or more horizontal spaces
, #
\h*["']\h* # 0 or more horizontal spaces, quote or apostrophe, 0 or more horizontal spaces
) # end group
Demo
How about a simple straight string manipulation way?
using System;
using System.Linq;
static void Main(string[] args)
{
string dirty1 = "\"Name's\", \"Sex\", \"Age\", \"Height_(in)\", \"Weight (lbs)\"";
string dirty2 = "\" LatD\", \"LatM \", 'LatS', \"NS\", \"LonD\", \"LonM\", \"LonS\", \"EW\", \"City\", \"State\"";
Console.WriteLine(Clean(dirty1));
Console.WriteLine(Clean(dirty2));
Console.ReadKey();
}
private static string Clean(string dirty)
{
return dirty.Split(',').Select(item => item.Trim(' ', '"', '\'')).Aggregate((a, b) => string.Join(", ", a, b));
}
private static string CleanNoLinQ(string dirty)
{
string[] items = dirty.Split(',');
for(int i = 0; i < items.Length; i++)
{
items[i] = items[i].Trim(' ', '"', '\'');
}
return String.Join(", ", items);
}
You can even replace the LinQ with a foreach and then string.Join().
Easier to understand - easier to maintain.

Removal of colon and carriage returns and replace with colon

I'm working on a project where I have a HMTL fragment which needs to be cleaned up - the HTML has been removed and as a result of table being removed, there are some strange ends where they shouldnt be :-)
the characters as they appear are
a space at the beginning of a line
a colon, carriage return and linefeed at the end of the line - which needs to be replaced simply with the colon;
I am presently using regex as follows:
s = Regex.Replace(s, #"(:[\r\n])", ":", RegexOptions.Multiline | RegexOptions.IgnoreCase);
// gets rid of the leading space
s = Regex.Replace(s, #"(^[( )])", "", RegexOptions.Multiline | RegexOptions.IgnoreCase);
Example of what I am dealing with:
Tomas Adams
Solicitor
APLawyers
p:
1800 995 718
f:
07 3102 9135
a:
22 Fultam Street
PO Box 132, Booboobawah QLD 4113
which should look like:
Tomas Adams
Solicitor
APLawyers
p:1800 995 718
f:07 3102 9135
a:22 Fultam Street
PO Box 132, Booboobawah QLD 4313
as my attempt to clean the string, but the result is far from perfect ... Can someone assist me to correct the error and achive my goal ...
[EDIT]
the offending characters
f:\r\n07 3102 9135\r\na:\r\n22
the combination of :\r\n should be replaced by a single colon.
MTIA
Darrin
You may use
var result = Regex.Replace(s, #"(?m)^\s+|(?<=:)(?:\r?\n)+|(\r?\n){2,}", "$1")
See the .NET regex demo.
Details
(?m) - equal to RegexOptions.Multiline - makes ^ match the start of any line here
^ - start of a line
\s+ - 1+ whitespaces
| - or
(?<=:)(?:\r?\n)+ - a position that is immediately preceded with : (matched with (?<=:) positive lookbehind) followed with 1+ occurrences of an optional CR and LF (those are removed)
| - or
(\r?\n){2,} - two or more consecutive occurrences of an optional CR followed with an LF symbol. Only the last occurrence is saved in Group 1 memory buffer, thus the $1 replacement pattern inserts that last, single, occurrence.
A basic solution without Regex:
var lines = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries);
var output = new StringBuilder();
for (var i = 0; i < lines.Length; i++)
{
if (lines[i].EndsWith(":")) // feel free to also check for the size
{
lines[i + 1] = lines[i] + lines[i + 1];
continue;
}
output.AppendLine(lines[i].Trim()); // remove space before or after a line
}
Try it Online!
I tried to use your regular expression.I was able to replace "\n" and ":" with the following regular expression.This is removing ":" and "\n" at the end of the line.
#"([:\r\n])"
A Linq solution without Regex:
var tmp = string.Empty;
var output = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries).Aggregate(new StringBuilder(), (a,b) => {
if (b.EndsWith(":")) { // feel free to also check for the size
tmp = b;
}
else {
a.AppendLine((tmp + b).Trim()); // remove space before or after a line
tmp = string.Empty;
}
return a;
});
Try it Online!

Regex with optional matching groups

I'm trying to parse given string which is kind a of path separated with /. I need to write regex that would match each segment in the path to corresponding regex group.
Example 1:
input:
/EAN/SomeBrand/appliances/refrigerators/RF444
output:
Group: producer, Value: SomeBrand
Group: category, Value: appliances
Group: subcategory, Value: refrigerators
Group: product, Value: RF4441
Example 2:
input:
/EAN/SomeBrand/appliances
output:
Group: producer, Value: SomeBrand
Group: category, Value: appliances
Group: subcategory, Value:
Group: product, Value:
I tried following code, it works fine when the path is full (like in the first exmaple) but fails to find the groups when the input string is impartial (like in example 2).
static void Main()
{
var pattern = #"^" + #"/EAN"
+ #"/" + #"(?<producer>.+)"
+ #"/" + #"(?<category>.+)"
+ #"/" + #"(?<subcategory>.+)"
+ #"/" + #"(?<product>.+)?"
+ #"$";
var rgx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
var result = rgx.Match(#"/EAN/SomeBrand/appliances/refrigerators/RF444");
foreach (string groupName in rgx.GetGroupNames())
{
Console.WriteLine(
"Group: {0}, Value: {1}",
groupName,
result.Groups[groupName].Value);
}
Console.ReadLine();
}
Any suggestion is welcome. Unfortunately I cannot simply split the string since the framework I'm using expects regex object.
You can use optional groups (...)? and replace the .+ greedy dot matching patterns with negated character classes [^/]+:
^/EAN/(?<producer>[^/]+)/(?<category>[^/]+)(/(?<subcategory>[^/]+))?(/(?<product>[^/]+))?$
^ ^^^ ^^
See the regex demo
This is how you need to declare your regex in the C# code:
var pattern = #"^" + #"/EAN"
+ #"/(?<producer>[^/]+)"
+ #"/(?<category>[^/]+)"
+ #"(/(?<subcategory>[^/]+))?"
+ #"(/(?<product>[^/]+))?"
+ #"$";
var rgx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);
Note I am using regular capturing groups as optional ones, but the RegexOptions.ExplicitCapture flag turns all non-named capturing groups into non-capturing and thus, they do not appear among the Match.Groups. So, we only have 5 groups all the time even without using non-capturing optional groups (?:...)?.
Try
var pattern = #"^" + #"/EAN"
+ #"(?:/" + #"(?<producer>[^/]+))?"
+ #"(?:/" + #"(?<category>[^/]+))?"
+ #"(?:/" + #"(?<subcategory>[^/]+))?"
+ #"(?:/" + #"(?<product>[^/]+))?";
Note how I replaced the . with [^/], because you want to use the / to split strings. Note even the use of the optional quantifier for each sub-part (?)

String Manipulation Using RegEx

Given the following scenario, I am wondering if a better solution could be written with Regular Expressions for which I am not very familiar with yet. I am seeing holes in my basic c# string manipulation even though it somewhat works. Your thoughts and ideas are most appreciated.
Thanks much,
Craig
Given the string "story" below, write a script to do the following:
Variable text is enclosed by { }.
If the variable text is blank, remove any other text enclosed in [ ].
Text to be removed can be nested deep with [ ].
Format:
XYZ Company [- Phone: [({404}) ]{321-4321} [Ext: {6789}]]
Examples:
All variable text filled in.
XYZ Company - Phone: (404) 321-4321 Ext: 6789
No Extension entered, remove "Ext:".
XYZ Company - Phone: (404) 321-4321
No Extension and no area code entered, remove "Ext:" and "( ) ".
XYZ Company - Phone: 321-4321
No extension, no phone number, and no area code, remove "Ext:" and "( ) " and "- Phone: ".
XYZ Company
Here is my solution with plain string manipulation.
private string StoryManipulation(string theStory)
{
// Loop through story while there are still curly brackets
while (theStory.IndexOf("{") > 0)
{
// Extract the first curly text area
string lcCurlyText = StringUtils.ExtractString(theStory, "{", "}");
// Look for surrounding brackets and blank all text between
if (String.IsNullOrWhiteSpace(lcCurlyText))
{
for (int lnCounter = theStory.IndexOf("{"); lnCounter >= 0; lnCounter--)
{
if (theStory.Substring(lnCounter - 1, 1) == "[")
{
string lcSquareText = StringUtils.ExtractString(theStory.Substring(lnCounter - 1), "[", "]");
theStory = StringUtils.ReplaceString(theStory, ("[" + lcSquareText + "]"), "", false);
break;
}
}
}
else
{
// Replace current curly brackets surrounding the text
theStory = StringUtils.ReplaceString(theStory, ("{" + lcCurlyText + "}"), lcCurlyText, false);
}
}
// Replace all brackets with blank (-1 all instances)
theStory = StringUtils.ReplaceStringInstance(theStory, "[", "", -1, false);
theStory = StringUtils.ReplaceStringInstance(theStory, "]", "", -1, false);
return theStory.Trim();
}
Dealing with nested structures is generally beyond the scope of regular expressions. But I think there is a solution, if you run the regex replacement in a loop, starting from the inside out. You will need a callback-function though (a MatchEvaluator):
string ReplaceCallback(Match match)
{
if(String.IsNullOrWhiteSpace(match.Groups[2])
return "";
else
return match.Groups[1]+match.Groups[2]+match.Groups[3];
}
Then you can create the evaluator:
MatchEvaluator evaluator = new MatchEvaluator(ReplaceCallback);
And then you can call this in a loop until the replacement does not change anything any more:
newString = Regex.Replace(
oldString,
#"
\[ # a literal [
( # start a capturing group. this is what we access with "match.Groups[1]"
[^{}[\]]
# a negated character class, that matches anything except {, }, [ and ]
* # arbitrarily many of those
) # end of the capturing group
\{ # a literal {
([^{}[\]]*)
# the same thing as before, we will access this with "match.Groups[2]"
} # a literal }
([^{}[\]]*)
# "match.Groups[3]"
] # a literal ]
",
evaluator,
RegexOptions.IgnorePatternWhitespace
);
Here is the whitespace-free version of the regex:
\[([^{}[\]]*)\{([^{}[\]]*)}([^{}[\]]*)]

Can I write this regex in one step?

This is the input string "23x +y-34 x + y+21x - 3y2-3x-y+2". I want to surround every '+' and '-' character with whitespaces but only if they are not allready sourrounded from left or right side. So my input string would look like this "23x + y - 34 x + y + 21x - 3y2 - 3x - y + 2". I wrote this code that does the job:
Regex reg1 = new Regex(#"\+(?! )|\-(?! )");
input = reg1.Replace(input, delegate(Match m) { return m.Value + " "; });
Regex reg2 = new Regex(#"(?<! )\+|(?<! )\-");
input = reg2.Replace(input, delegate(Match m) { return " " + m.Value; });
explanation:
reg1 // Match '+' followed by any character not ' ' (whitespace) or same thing for '-'
reg2 // Same thing only that I match '+' or '-' not preceding by ' '(whitespace)
delegate 1 and 2 just insert " " before and after m.Value ( match value )
Question is, is there a way to create just one regex and just one delegate? i.e. do this job in one step? I am a new to regex and I want to learn efficient way.
I don't see the need of lookarounds or delegates here. Just replace
\s*([-+])\s*
with
" $1 "
(See http://ideone.com/r3Oog.)
I'd try
Regex.Replace(input, #"\s*[+-]\s*", m => " " + m.ToString().Trim() + " ");

Categories

Resources