Regex with optional matching groups

Regex with optional matching groups - c#

I'm trying to parse given string which is kind a of path separated with /. I need to write regex that would match each segment in the path to corresponding regex group.
Example 1:
input:
/EAN/SomeBrand/appliances/refrigerators/RF444
output:
Group: producer, Value: SomeBrand
Group: category, Value: appliances
Group: subcategory, Value: refrigerators
Group: product, Value: RF4441
Example 2:
input:
/EAN/SomeBrand/appliances
output:
Group: producer, Value: SomeBrand
Group: category, Value: appliances
Group: subcategory, Value:
Group: product, Value:
I tried following code, it works fine when the path is full (like in the first exmaple) but fails to find the groups when the input string is impartial (like in example 2).
static void Main()
{
var pattern = #"^" + #"/EAN"
+ #"/" + #"(?<producer>.+)"
+ #"/" + #"(?<category>.+)"
+ #"/" + #"(?<subcategory>.+)"
+ #"/" + #"(?<product>.+)?"
+ #"$";
var rgx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
var result = rgx.Match(#"/EAN/SomeBrand/appliances/refrigerators/RF444");
foreach (string groupName in rgx.GetGroupNames())
{
Console.WriteLine(
"Group: {0}, Value: {1}",
groupName,
result.Groups[groupName].Value);
}
Console.ReadLine();
}
Any suggestion is welcome. Unfortunately I cannot simply split the string since the framework I'm using expects regex object.

You can use optional groups (...)? and replace the .+ greedy dot matching patterns with negated character classes [^/]+:
^/EAN/(?<producer>[^/]+)/(?<category>[^/]+)(/(?<subcategory>[^/]+))?(/(?<product>[^/]+))?$
^ ^^^ ^^
See the regex demo
This is how you need to declare your regex in the C# code:
var pattern = #"^" + #"/EAN"
+ #"/(?<producer>[^/]+)"
+ #"/(?<category>[^/]+)"
+ #"(/(?<subcategory>[^/]+))?"
+ #"(/(?<product>[^/]+))?"
+ #"$";
var rgx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);
Note I am using regular capturing groups as optional ones, but the RegexOptions.ExplicitCapture flag turns all non-named capturing groups into non-capturing and thus, they do not appear among the Match.Groups. So, we only have 5 groups all the time even without using non-capturing optional groups (?:...)?.

Try
var pattern = #"^" + #"/EAN"
+ #"(?:/" + #"(?<producer>[^/]+))?"
+ #"(?:/" + #"(?<category>[^/]+))?"
+ #"(?:/" + #"(?<subcategory>[^/]+))?"
+ #"(?:/" + #"(?<product>[^/]+))?";
Note how I replaced the . with [^/], because you want to use the / to split strings. Note even the use of the optional quantifier for each sub-part (?)

Related

Insert space when needed with Regex in c#

I have to write a function that looks up for a string and check if is followed/preceded by a blank space, and if not add it here is my try :
public string AddSpaceIfNeeded(string originalValue, string targetValue)
{
if (originalValue.Contains(targetValue))
{
if (!originalValue.StartsWith(targetValue))
{
int targetValueIndex = originalValue.IndexOf(targetValue);
if (!char.IsWhiteSpace(originalValue[targetValueIndex - 1]))
originalValue.Insert(targetValueIndex - 1, " ");
}
if (!originalValue.EndsWith(targetValue))
{
int targetValueIndex = originalValue.IndexOf(targetValue);
if (!char.IsWhiteSpace(originalValue[targetValueIndex + targetValue.Length + 1]) && !originalValue[targetValueIndex + targetValue.Length + 1].Equals("(s)"))
originalValue.Insert(targetValueIndex + targetValue.Length + 1, " ");
}
}
return originalValue;
}
I want to try with Regex :
I tried like this for adding spaces after the targetValue :
Regex spaceRegex = new Regex("(" + targetValue + ")(?!,)(?!!)(?!(s))(?= )");
originalValue = spaceRegex.Replace(originalValue, (Match m) => m.ToString() + " ");
But not working, and I don't really know for adding space before the word.
Example adding space after:
AddSpaceIfNeeded(Hello my nameis ElBarto, name)
=> Output Hello my name is ElBarto
Example adding space before:
AddSpaceIfNeeded(Hello myname is ElBarto, name)
=> Output Hello my name is ElBarto

You may match your word in all three context while capturing them in separate groups and test for a match later in the match evaluator:
public static string AddSpaceIfNeeded(string originalValue, string targetValue)
{
return Regex.Replace(originalValue,
$#"(?<=\S)({targetValue})(?=\S)|(?<=\S)({targetValue})(?!\S)|(?<!\S){targetValue}(?=\S)", m =>
m.Groups[1].Success ? $" {targetValue} " :
m.Groups[2].Success ? $" {targetValue}" :
$"{targetValue} ");
}
See the C# demo
Note you may need to use Regex.Escape(targetValue) to escape any sepcial chars in the string used as a dynamic pattern.
Pattern details
(?<=\S)({targetValue})(?=\S) - a targetValue that is preceded with a non-whitespace ((?<=\S)) and followed with a non-whitespace ((?=\S))
| - or
(?<=\S)({targetValue})(?!\S) - a targetValue that is preceded with a non-whitespace ((?<=\S)) and not followed with a non-whitespace ((?!\S))
| - or
(?<!\S){targetValue}(?=\S) - a targetValue that is not preceded with a non-whitespace ((?<!\S)) and followed with a non-whitespace ((?!\S))
When m.Groups[1].Success is true, the whole value should be enclosed with spaces. When m.Groups[2].Success is true, we need to add a space before the value. Else, we add a space after the value.

How to trim whitespaces inside regex replacement string

I have a regex match string as;
public static string RegExMatchString = "(?<NVE>.{20})(?<SN>.{20})(?<REGION>.{4})(?<YY>\\d{4})(?<Mo" +
"n>\\d{2})(?<DD>\\d{1,2})(?<HH>\\d{2})(?<Min>\\d{2})(?<SS>\\d" +
"{2}).{6}(?<USER>.{10})(?<SCANTYPE>.{2})(?<IN>.{4})(?<OU" +
"T>.{4})(?<DISPO>.{2})(?<ROUTE>.{7})(?<LP>.{16})(?<POOL>.{3})" +
"(?<CONT>.{9})(?<REGION_L>.{18})(?<CAT>.{2})";
And I'm replacing it as
public string RegExReplacementString = "LogBarcodeID ( \"${NVE}\", ID2: \"${SN}\", Scanner: \"${USER}" +
"\", AreaName: \"${REGION_L}${CAT}${SCANTYPE}\", TimeStamp: \"${YY}/${Mon}/${D" +
"D} ${HH}:${Min}:${SS} \") ";
I need to remove all trailing and preceding whitespaces from these three variable;
${REGION_L}
${CAT}
${SCANTYPE}
How should I change RegExReplacementString (or maybe RegExMatchString) so that this can be achieved?
Sample input is:
0034025876080795786104041811071 135 20150304111404 DFRANZ 61 9990020569910 DA ST6007 135 F
Currently I'm getting related part as
AreaName: "135 F61" however I need to get AreaName: "135F61"
EDIT:
I'm reading regex match string from text file. And initing regex ;
RegExMatchString = File.ReadAllText(regexMatchStringPath);
regex = new Regex( RegExMatchString ,
RegexOptions.IgnoreCase | RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled
);
string replaced = regex.Replace("0034025876080795786104041811071 135 20150304111404 DFRANZ 61 9990020569910 DA ST6007 135 F", RegExReplacementString);

I think the fixed length info of each field would be useful to solve the problem here.
use a regex like : "^(.{20})(.{10})(.{2})(.{2})(.{2})$" to isolate each field.
This is for an example with 5 fields that you know are of
Length 20, Length 10, Length 2, Length 2, Length 2.
then use some LINQ and C# to get a list of (trimmed) fields.
Example :
var testRegex = "^(.{20})(.{10})(.{2})(.{2})(.{2})$";
var testData = "Field of length 20 FieldLen10123456";
var fields = Regex.Match(testData, testRegex).Groups.Cast<Group>().Skip(1).Select(i => i.Value.Trim());

Regular expressions: How to remove all "R.G(*******)" from a string

There are several strings, and I wanna to remove all "R.G(**)" from these strings. For example:
1、Original string：
Push("Command", string.Format(R.G("#{0} this is a string"), accID));
Result:
Push("Command", string.Format("#{0} this is a string", accID));
2、Original string：
Select(Case(T["AccDirect"]).WhenThen(1, R.G("input")).Else(R.G("output")).As("Direct"));
Result:
Select(Case(T["AccDirect"]).WhenThen(1, "input").Else("output").As("Direct"));
3、Original string：
R.G("this is a \"string\"")
Result:
"this is a \"string\""
4、Original string：
R.G("this is a (string)")
Result:
"this is a (string)"
5、Original string：
AppendLine(string.Format(R.G("[{0}] Error:"), str) + R.G("Contains one of these symbols: \\ / : ; * ? \" \' < > | & +"));
Result:
AppendLine(string.Format("[{0}] Error:", str) + "Contains one of these symbols: \\ / : ; * ? \" \' < > | & +");
6 、Original string：
R.G(#"this is the ""1st"" string.
this is the (2nd) string.")
Result:
#"this is the ""1st"" string.
this is the (2nd) string."
Please Help.

Use this, capture group 0 is your target, group 1 is your replace.
Fiddle
R[.]G[(]"(.*?[^\\])"[)]
Example acting on your #2 and #4 string and a new edge case R.G("this is a (\"string\")")
var pattern = #"R[.]G[(]\""(.*?[^\\])\""[)]";
var str = "Select(Case(T[\"AccDirect\"]).WhenThen(1, R.G(\"input\")).Else(R.G(\"output\")).As(\"Direct\"));";
var str2 = "R.G(\"this is a (string)\")";
var str3 = "R.G(\"this is a (\\\"string\\\")\")";
var res = Regex.Replace(str,pattern, "\"$1\"");
var res2 = Regex.Replace(str2,pattern, "\"$1\"");
var res3 = Regex.Replace(str3,pattern, "\"$1\"");

Try this:
var result = Regex.Replace(input, #"(.*)R\.G\(([^)]*)\)(.*)", "$1$2$3");
explanation:
(.*) # capture any characters
R.G\( # then match 'R.G.'
([^)]*) # then capture anything that isn't ')'
\) # match end parenthesis
(.*) # and capture any characters after
The $1$2$3 replaces your entire match with capture group 1, 2, and 3. Which effectively removes everything that isn't part of those matches, namely the "R.G(*)" part.
Note that you will run into problems if your strings contain 'R.G' or a right parenthesis somewhere, but depending on your input data, maybe this will do the trick well enough.

C# Regex: Get sub-capture?

I've got a regex...
internal static readonly Regex _parseSelector = new Regex(#"
(?<tag>" + _namePattern + #")?
(?:\.(?<class>" + _namePattern + #"))*
(?:\#(?<id>" + _namePattern + #"))*
(?<attr>\[\s*
(?<name>" + _namePattern + #")\s*
(?:
(?<op>[|*~$!^%<>]?=|[<>])\s*
(?<quote>['""]?)
(?<value>.*?)
(?<!\\)\k<quote>\s*
)?
\])*
(?::(?<pseudo>" + _namePattern + #"))*
", RegexOptions.IgnorePatternWhitespace);
For which I grab the match object...
var m = _parseSelector.Match("tag.class1.class2#id[attr1=val1][attr2=\"val2\"][attr3]:pseudo");
Now is there a way to do something akin to m.Group["attr"]["name"]? Or somehow get the groups inside the attr group?

Group names aren't nested in regular expressions - it's a flat structure. You can just use this:
m.Group["name"]

Can I write this regex in one step?

This is the input string "23x +y-34 x + y+21x - 3y2-3x-y+2". I want to surround every '+' and '-' character with whitespaces but only if they are not allready sourrounded from left or right side. So my input string would look like this "23x + y - 34 x + y + 21x - 3y2 - 3x - y + 2". I wrote this code that does the job:
Regex reg1 = new Regex(#"\+(?! )|\-(?! )");
input = reg1.Replace(input, delegate(Match m) { return m.Value + " "; });
Regex reg2 = new Regex(#"(?<! )\+|(?<! )\-");
input = reg2.Replace(input, delegate(Match m) { return " " + m.Value; });
explanation:
reg1 // Match '+' followed by any character not ' ' (whitespace) or same thing for '-'
reg2 // Same thing only that I match '+' or '-' not preceding by ' '(whitespace)
delegate 1 and 2 just insert " " before and after m.Value ( match value )
Question is, is there a way to create just one regex and just one delegate? i.e. do this job in one step? I am a new to regex and I want to learn efficient way.

I don't see the need of lookarounds or delegates here. Just replace
\s*([-+])\s*
with
" $1 "
(See http://ideone.com/r3Oog.)

I'd try
Regex.Replace(input, #"\s*[+-]\s*", m => " " + m.ToString().Trim() + " ");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex with optional matching groups - c#

Related

Insert space when needed with Regex in c#

How to trim whitespaces inside regex replacement string

Regular expressions: How to remove all "R.G(*******)" from a string

C# Regex: Get sub-capture?

Can I write this regex in one step?

Categories

Resources