This is the input string "23x +y-34 x + y+21x - 3y2-3x-y+2". I want to surround every '+' and '-' character with whitespaces but only if they are not allready sourrounded from left or right side. So my input string would look like this "23x + y - 34 x + y + 21x - 3y2 - 3x - y + 2". I wrote this code that does the job:
Regex reg1 = new Regex(#"\+(?! )|\-(?! )");
input = reg1.Replace(input, delegate(Match m) { return m.Value + " "; });
Regex reg2 = new Regex(#"(?<! )\+|(?<! )\-");
input = reg2.Replace(input, delegate(Match m) { return " " + m.Value; });
explanation:
reg1 // Match '+' followed by any character not ' ' (whitespace) or same thing for '-'
reg2 // Same thing only that I match '+' or '-' not preceding by ' '(whitespace)
delegate 1 and 2 just insert " " before and after m.Value ( match value )
Question is, is there a way to create just one regex and just one delegate? i.e. do this job in one step? I am a new to regex and I want to learn efficient way.
I don't see the need of lookarounds or delegates here. Just replace
\s*([-+])\s*
with
" $1 "
(See http://ideone.com/r3Oog.)
I'd try
Regex.Replace(input, #"\s*[+-]\s*", m => " " + m.ToString().Trim() + " ");
Related
How to make Regex.Replace for the following texts:
1) "Name's", "Sex", "Age", "Height_(in)", "Weight (lbs)"
2) " LatD", "LatM ", 'LatS', "NS", "LonD", "LonM", "LonS", "EW", "City", "State"
Result:
1) Name's, Sex, Age, Height (in), Weight (lbs)
2) LatD, LatM, LatS, NS, LonD, LonM, LonS, EW, City, State
Spaces between brackets can be any size (Example 1). There may also be incorrect spaces in brackets (Example 2). Also, instead of spaces, the "_" sign can be used (Example 1). And instead of double quotes, single quotes can be used (Example 2).
As a result, words must be separated with a comma and a space.
Snippet of my code
StreamReader fileReader = new StreamReader(...);
var fileRow = fileReader.ReadLine();
fileRow = Regex.Replace(fileRow, "_", " ");
fileRow = Regex.Replace(fileRow, "\"", "");
var fileDataField = fileRow.Split(',');
I don't well know C# syntax, but this regex does the job:
Find: (?:_|^["']\h*|\h*["']$|\h*["']\h*,\h*["']\h*)
Replace: A space
Explanation:
(?: # non capture group
_ # undersscore
| # OR
^["']\h* # beginning of line, quote or apostrophe, 0 or more horizontal spaces
| # OR
\h*["']$ # 0 or more horizontal spaces, quote or apostrophe, end of line
| # OR
\h*["']\h* # 0 or more horizontal spaces, quote or apostrophe, 0 or more horizontal spaces
, #
\h*["']\h* # 0 or more horizontal spaces, quote or apostrophe, 0 or more horizontal spaces
) # end group
Demo
How about a simple straight string manipulation way?
using System;
using System.Linq;
static void Main(string[] args)
{
string dirty1 = "\"Name's\", \"Sex\", \"Age\", \"Height_(in)\", \"Weight (lbs)\"";
string dirty2 = "\" LatD\", \"LatM \", 'LatS', \"NS\", \"LonD\", \"LonM\", \"LonS\", \"EW\", \"City\", \"State\"";
Console.WriteLine(Clean(dirty1));
Console.WriteLine(Clean(dirty2));
Console.ReadKey();
}
private static string Clean(string dirty)
{
return dirty.Split(',').Select(item => item.Trim(' ', '"', '\'')).Aggregate((a, b) => string.Join(", ", a, b));
}
private static string CleanNoLinQ(string dirty)
{
string[] items = dirty.Split(',');
for(int i = 0; i < items.Length; i++)
{
items[i] = items[i].Trim(' ', '"', '\'');
}
return String.Join(", ", items);
}
You can even replace the LinQ with a foreach and then string.Join().
Easier to understand - easier to maintain.
I have to write a function that looks up for a string and check if is followed/preceded by a blank space, and if not add it here is my try :
public string AddSpaceIfNeeded(string originalValue, string targetValue)
{
if (originalValue.Contains(targetValue))
{
if (!originalValue.StartsWith(targetValue))
{
int targetValueIndex = originalValue.IndexOf(targetValue);
if (!char.IsWhiteSpace(originalValue[targetValueIndex - 1]))
originalValue.Insert(targetValueIndex - 1, " ");
}
if (!originalValue.EndsWith(targetValue))
{
int targetValueIndex = originalValue.IndexOf(targetValue);
if (!char.IsWhiteSpace(originalValue[targetValueIndex + targetValue.Length + 1]) && !originalValue[targetValueIndex + targetValue.Length + 1].Equals("(s)"))
originalValue.Insert(targetValueIndex + targetValue.Length + 1, " ");
}
}
return originalValue;
}
I want to try with Regex :
I tried like this for adding spaces after the targetValue :
Regex spaceRegex = new Regex("(" + targetValue + ")(?!,)(?!!)(?!(s))(?= )");
originalValue = spaceRegex.Replace(originalValue, (Match m) => m.ToString() + " ");
But not working, and I don't really know for adding space before the word.
Example adding space after:
AddSpaceIfNeeded(Hello my nameis ElBarto, name)
=> Output Hello my name is ElBarto
Example adding space before:
AddSpaceIfNeeded(Hello myname is ElBarto, name)
=> Output Hello my name is ElBarto
You may match your word in all three context while capturing them in separate groups and test for a match later in the match evaluator:
public static string AddSpaceIfNeeded(string originalValue, string targetValue)
{
return Regex.Replace(originalValue,
$#"(?<=\S)({targetValue})(?=\S)|(?<=\S)({targetValue})(?!\S)|(?<!\S){targetValue}(?=\S)", m =>
m.Groups[1].Success ? $" {targetValue} " :
m.Groups[2].Success ? $" {targetValue}" :
$"{targetValue} ");
}
See the C# demo
Note you may need to use Regex.Escape(targetValue) to escape any sepcial chars in the string used as a dynamic pattern.
Pattern details
(?<=\S)({targetValue})(?=\S) - a targetValue that is preceded with a non-whitespace ((?<=\S)) and followed with a non-whitespace ((?=\S))
| - or
(?<=\S)({targetValue})(?!\S) - a targetValue that is preceded with a non-whitespace ((?<=\S)) and not followed with a non-whitespace ((?!\S))
| - or
(?<!\S){targetValue}(?=\S) - a targetValue that is not preceded with a non-whitespace ((?<!\S)) and followed with a non-whitespace ((?!\S))
When m.Groups[1].Success is true, the whole value should be enclosed with spaces. When m.Groups[2].Success is true, we need to add a space before the value. Else, we add a space after the value.
I am using the following regex to tokenize:
reg = new Regex("([ \\t{}%$^&*():;_–`,\\-\\d!\"?\n])");
The regex is supposed to filter out everything later, however the input string format that i am having problem with is in the following form:
; "string1"; "string2"; "string...n";
the result of the string: ; "social life"; "city life"; "real life" as I know should be like the following:
; White " social White life " ; White " city White life " ; White " real White life "
However there is a problem such that, I get the output in the following form
; empty White empty " social White life " empty ; empty White empty " city White life " empty ; empty White empty " real White life " empty
White: means White-Space,
empty: means empty entry in the split array.
My code for split is as following:
string[] ret = reg.Split(input);
for (int i = 0; i < ret.Length; i++)
{
if (ret[i] == "")
Response.Write("empty<br>");
else
if (ret[i] == " ")
Response.Write("White<br>");
else
Response.Write(ret[i] + "<br>");
}
Why I get these empty entries ? and especially when there is ; followed by space followed by " then the result looks like the following:
; empty White empty "
can I get explanation of why the command adds empty entries ? and how to remove them without any additional O(n) complexity or using another data structure as ret
In my experience, splitting at regex matches is almost always not the best idea. You'll get much better results through plain matching.
And regexes are very well suited for tokenization purposes, as they let you implement a state machine really easily, just take a look at that:
\G(?:
(?<string> "(?>[^"\\]+|\\.)*" )
| (?<separator> ; )
| (?<whitespace> \s+ )
| (?<invalid> . )
)
Demo - use this with RegexOptions.IgnorePatternWhitespace of course.
Here, each match will have the following properties:
It will start at the end of the previous match, so there will be no unmatched text
It will contain exactly one matching group
The name of the group tells you the token type
You can ignore the whitespace group, and you should raise an error if you ever encounter a matching invalid group.
The string group will match an entire quoted string, it can handle escapes such as \" inside the string.
The invalid group should always be last in the pattern. You may add rules for other other types.
Some example code:
var regex = new Regex(#"
\G(?:
(?<string> ""(?>[^""\\]+|\\.)*"" )
| (?<separator> ; )
| (?<whitespace> \s+ )
| (?<invalid> . )
)
", RegexOptions.IgnorePatternWhitespace);
var input = "; \"social life\"; \"city life\"; \"real life\"";
var groupNames = regex.GetGroupNames().Skip(1).ToList();
foreach (Match match in regex.Matches(input))
{
var groupName = groupNames.Single(name => match.Groups[name].Success);
var group = match.Groups[groupName];
Console.WriteLine("{0}: {1}", groupName, group.Value);
}
This produces the following:
separator: ;
whitespace:
string: "social life"
separator: ;
whitespace:
string: "city life"
separator: ;
whitespace:
string: "real life"
See how much easier it is to deal with these results rather than using split?
I'm trying to parse given string which is kind a of path separated with /. I need to write regex that would match each segment in the path to corresponding regex group.
Example 1:
input:
/EAN/SomeBrand/appliances/refrigerators/RF444
output:
Group: producer, Value: SomeBrand
Group: category, Value: appliances
Group: subcategory, Value: refrigerators
Group: product, Value: RF4441
Example 2:
input:
/EAN/SomeBrand/appliances
output:
Group: producer, Value: SomeBrand
Group: category, Value: appliances
Group: subcategory, Value:
Group: product, Value:
I tried following code, it works fine when the path is full (like in the first exmaple) but fails to find the groups when the input string is impartial (like in example 2).
static void Main()
{
var pattern = #"^" + #"/EAN"
+ #"/" + #"(?<producer>.+)"
+ #"/" + #"(?<category>.+)"
+ #"/" + #"(?<subcategory>.+)"
+ #"/" + #"(?<product>.+)?"
+ #"$";
var rgx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
var result = rgx.Match(#"/EAN/SomeBrand/appliances/refrigerators/RF444");
foreach (string groupName in rgx.GetGroupNames())
{
Console.WriteLine(
"Group: {0}, Value: {1}",
groupName,
result.Groups[groupName].Value);
}
Console.ReadLine();
}
Any suggestion is welcome. Unfortunately I cannot simply split the string since the framework I'm using expects regex object.
You can use optional groups (...)? and replace the .+ greedy dot matching patterns with negated character classes [^/]+:
^/EAN/(?<producer>[^/]+)/(?<category>[^/]+)(/(?<subcategory>[^/]+))?(/(?<product>[^/]+))?$
^ ^^^ ^^
See the regex demo
This is how you need to declare your regex in the C# code:
var pattern = #"^" + #"/EAN"
+ #"/(?<producer>[^/]+)"
+ #"/(?<category>[^/]+)"
+ #"(/(?<subcategory>[^/]+))?"
+ #"(/(?<product>[^/]+))?"
+ #"$";
var rgx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);
Note I am using regular capturing groups as optional ones, but the RegexOptions.ExplicitCapture flag turns all non-named capturing groups into non-capturing and thus, they do not appear among the Match.Groups. So, we only have 5 groups all the time even without using non-capturing optional groups (?:...)?.
Try
var pattern = #"^" + #"/EAN"
+ #"(?:/" + #"(?<producer>[^/]+))?"
+ #"(?:/" + #"(?<category>[^/]+))?"
+ #"(?:/" + #"(?<subcategory>[^/]+))?"
+ #"(?:/" + #"(?<product>[^/]+))?";
Note how I replaced the . with [^/], because you want to use the / to split strings. Note even the use of the optional quantifier for each sub-part (?)
I've got a regex...
internal static readonly Regex _parseSelector = new Regex(#"
(?<tag>" + _namePattern + #")?
(?:\.(?<class>" + _namePattern + #"))*
(?:\#(?<id>" + _namePattern + #"))*
(?<attr>\[\s*
(?<name>" + _namePattern + #")\s*
(?:
(?<op>[|*~$!^%<>]?=|[<>])\s*
(?<quote>['""]?)
(?<value>.*?)
(?<!\\)\k<quote>\s*
)?
\])*
(?::(?<pseudo>" + _namePattern + #"))*
", RegexOptions.IgnorePatternWhitespace);
For which I grab the match object...
var m = _parseSelector.Match("tag.class1.class2#id[attr1=val1][attr2=\"val2\"][attr3]:pseudo");
Now is there a way to do something akin to m.Group["attr"]["name"]? Or somehow get the groups inside the attr group?
Group names aren't nested in regular expressions - it's a flat structure. You can just use this:
m.Group["name"]