I've got a problem. I use following regular expression:
Pattern =
(?'name'\w+(?:\w|\s)*), \s*
(?'category'\w+(?:\w|\s)*), \s*
(?:
\{ \s*
[yY]: (?'year'\d+), \s*
[vV]: (?'volume'(?:([1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))+), \s*
\} \s*
,? \s*
)*
with IgnorePatternWhitespaces option.
Everything seemed fine in my application until I debugged it & encountered a problem.
var Year = default(UInt32);
// ...
if((Match = Regex.Match(Line, Pattern, Options)).Success)
{
// Getting Product header information
Name = Match.Groups["name"].Value;
// Gathering Product statistics
for(var ix = default(Int32); ix < Match.Groups["year"].Captures.Count; ix++)
{
// never get here
Year = UInt32.Parse(Match.Groups["year"].Captures[ix].Value, NumberType, Culture);
}
}
So in the code above.. In my case Match is always successful. I get proper value for Name but when turn comes to for loop program flow just passes it by. I debugged there's no Captures in Match.Groups["year"]. So it is logical behavior. But not obvious to me where I'm wrong. Help!!
There is a previous connected post Extract number values enclosed inside curly brackets I made.
Thanks!
EDIT. Input Samples
Sherwood, reciever, {y:2008,V:5528.35}, {y:2009,V:8653.89}, {y:2010, V:4290.51}
I need to capture 2008, 5528.35, 2009, 8653.89, 2010, 4290.51 values and operate with them as named groups.
2D EDIT
I tried using ExplicitCapture Option and following expression:
(?<name>\w+(w\| )*), (?<category>\w+(w\| )*), (\{[yY]:(?<year>\d+), *[vV]:(?<volume>(([1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))+)\}(, )?)+
But that didn't help.
Edit: You could simplify by matching everything until the next comma: [^,]*. Here's a full code snippet to match your source data:
var testRegex = new Regex(#"
(?'name'[^,]*),\s*
(?'category'[^,]*),\s*
({y:(?'year'[^,]*),\s*
V:(?'volume'[^,]*),?\s*)*",
RegexOptions.IgnorePatternWhitespace);
var testMatches = testRegex.Matches(
"Sherwood, reciev, {y:2008,V:5528.35}, {y:2009,V:8653.89}, {y:2010, V:4290.51}");
foreach (Match testMatch in testMatches)
{
Console.WriteLine("Name = {0}", testMatch.Groups["name"].Value);
foreach (var capture in testMatch.Groups["year"].Captures)
Console.WriteLine(" Year = {0}", capture);
}
This prints:
Name = Sherwood
Year = 2008
Year = 2009
Year = 2010
I think the problem is a comma:
, \s* \}
which should be optional (or omitted?):
,? \s* \}
To expound on what MRAB said:
(?'name'
\w+
(?:
\w|\s
)*
),
\s*
(?'category'
\w+
(?:
\w|\s
)*
),
\s*
(?:
\{
\s*
[yY]:
(?'year'
\d+
),
\s*
[vV]:
(?'volume'
(?:
( # Why do you need capturing parenth's here ?
[1-9][0-9]*
\.?
[0-9]*
)
|
(
\.[0-9]+
)
)+
), # I'm just guessing this comma doesent match input samples
\s*
\}
\s*
,?
\s*
)*
Sherwood, reciever, {y:2008,V:5528.35}, {y:2009,V:8653.89}, {y:2010, V:4290.51}
Related
Working with a pipe-delimited file. Currently, I use Notepad++ find and replace REGEX pattern ^(?:[^|]*\|){5}\K[^|]* that replaces all lines with an empty string between the 5th and 6th |. I'm trying to programmatically do this process, but .NET does not support \K. I've tried a few instances of the backward lookup, but I cannot seem to grasp it.
string[] lines = File.ReadAllLines(path);
foreach (string line in lines)
{
string line2 = null;
string finalLine = line;
string[] col = line.Split('|');
if (col[5] != null)
{
line2 = Regex.Replace(line, #"^(?:[^|]*\|){5}\K[^|]*", "");
\K is a "workaround" for regex grammars/engines that don't support anchoring against look-behind assertions.
.NET's regex grammar has look-behind assertions (using the syntax (?<=subexpression)), so use them:
Regex.Replace(line, #"(?<=^(?:[^|]*\|){5})[^|]*", "")
In the context of .NET, this pattern now describes:
(?<= # begin (positive) look-behind assertion
^ # match start of string
(?: # begin non-capturing group
[^|]*\| # match (optional) field value + delimiter
){5} # end of group, repeat 5 times
) # end of look-behind assertion
[^|]* # match any non-delimiters (will only occur where the lookbehind is satisfied)
No need using lookbehinds, use capturing groups and backreferences:
line2 = Regex.Replace(line, #"^((?:[^|]*\|){5})[^|]*", "$1");
See proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
(?: group, but do not capture (5 times):
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
){5} end of grouping
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more times
(matching the most amount possible))
I want a regular expression that's match anything as a parameter for this string concat(1st,2nd) and extract three matching groups as below :
Group1: concat
Group2: 1st
Group3: 2nd.
I have tried this :^\s*(concat)\(\s*(.*?)\s*\,\s*(.*)\)\s*$, and it worked fine until I had a parameter with comma as below:
concat(regex(3,4),regex(3,4)). It seams the comma is breaking it down, how to ignore the parameter content and take it as a seperate group?
You may use
^\s*(concat)\(\s*((?>\w*\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)|\w+))\s*,\s*((?>\w*\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)|\w+))\)\s*$
See the regex demo.
Details
^ - start of string
\s* - 0+ whitespaces
(concat) - Group 1: concat word
\( - a ( char
\s* - 0+ whitespaces
({arg}) - Group 2: arg pattern:
\w* - 0+ word chars
\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\) - (, then any amount of nested parentheses or chars other than ( and ) and then )
|\w+ - or just 1+ word chars
\s*,\s* - a comma enclosed with 0+ whitespaces
({arg}) - Group 3: arg pattern
\) - a ) char
\s* - 0+ whitespaces
$ - end of string.
See C# demo:
var arg = #"(?>\w*\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)|\w+)";
var pattern = $#"^\s*(concat)\(\s*({arg})\s*,\s*({arg})\)\s*$";
var match = Regex.Match("concat(regex(3,4),regex(3,4))", pattern);
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
Console.WriteLine(match.Groups[2].Value);
Console.WriteLine(match.Groups[3].Value);
}
// => concat regex(3,4) regex(3,4)
Results:
Here is my input:
#
tag1, tag with space, !##%^, 🦄
I would like to match it with a regex and yield the following elements easily:
tag1
tag with space
!##%^
🦄
I know I could do it this way:
var match = Regex.Match(input, #"^#[\n](?<tags>[\S ]+)$");
// if match is a success
var tags = match.Groups["tags"].Value.Split(',').Select(x => x.Trim());
But that's cheating, as it involves messing around with C#. There must be a neat way to do this with a regex. Just must be... right? ;D
The question is: how to write a regular expression that would allow me to iterate through captures and extract tags, without the need of splitting and trimming?
This works (?ms)^\#\s+(?:\s*((?:(?!,|^\#\s+).)*?)\s*(?:,|$))+
It uses C#'s Capture Collection to find a variable amount of field data
in a single record.
You could extend the regex further to get all records at once.
Where each record contains its own variable amount of field data.
The regex has built-in trimming as well.
Expanded:
(?ms) # Inline modifiers: multi-line, dot-all
^ \# \s+ # Beginning of record
(?: # Quantified group, 1 or more times, get all fields of record at once
\s* # Trim leading wsp
( # (1 start), # Capture collector for variable fields
(?: # One char at a time, but not comma or begin of record
(?!
,
| ^ \# \s+
)
.
)*?
) # (1 end)
\s*
(?: , | $ ) # End of this field, comma or EOL
)+
C# code:
string sOL = #"
#
tag1, tag with space, !##%^, 🦄";
Regex RxOL = new Regex(#"(?ms)^\#\s+(?:\s*((?:(?!,|^\#\s+).)*?)\s*(?:,|$))+");
Match _mOL = RxOL.Match(sOL);
while (_mOL.Success)
{
CaptureCollection ccOL1 = _mOL.Groups[1].Captures;
Console.WriteLine("-------------------------");
for (int i = 0; i < ccOL1.Count; i++)
Console.WriteLine(" '{0}'", ccOL1[i].Value );
_mOL = _mOL.NextMatch();
}
Output:
-------------------------
'tag1'
'tag with space'
'!##%^'
'??'
''
Press any key to continue . . .
Nothing wrong with cheating ;]
string input = #"#
tag1, tag with space, !##%^, 🦄";
string[] tags = Array.ConvertAll(input.Split('\n').Last().Split(','), s => s.Trim());
You can pretty much make it without regex. Just split it like this:
var result = input.Split(new []{'\n','\r'}, StringSplitOptions.RemoveEmptyEntries).Skip(1).SelectMany(x=> x.Split(new []{','},StringSplitOptions.RemoveEmptyEntries).Select(y=> y.Trim()));
I'm working on MPXJ library. I want to get predecessors id from below string. It's complex for me. Please help me get all predecessors id. Thanks.
Task predecessor string:
Task Predecessors:[[Relation [Task id=12 uniqueID=145 name=Alibaba1] -> [Task id=10 uniqueID=143 name=Alibaba2]],
[Relation [Task id=12 uniqueID=145 name=Alibaba3] -> [Task id=11 uniqueID=144 name=Alibaba4]], [Relation [Task id=12 uniqueID=145 name=Alibaba5] -> [Task id=9 uniqueID=142 name=Alibaba6]]]
I need get the predecessors id: 10, 11, 9
Pattern:
[Task id=12 uniqueID=145 name=Alibaba1] -> [Task id=10 uniqueID=143 name=Alibaba2]]
To grab those ID's you need to look for the Task id after -> You can try the following using Matches method.
Regex rgx = new Regex(#"->\s*\[Task\s*id=(\d+)");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[1].Value);
Working Demo
Explanation:
-> # '->'
\s* # whitespace (\n, \r, \t, \f, and " ") (0 or more times)
\[ # '['
Task # 'Task'
\s* # whitespace (\n, \r, \t, \f, and " ") (0 or more times)
id= # 'id='
( # group and capture to \1:
\d+ # digits (0-9) (1 or more times)
) # end of \1
No need for capture groups. See full C# online demo
My original answer used capture groups. But we don't need them.
You can use this regex:
(?<=-> \[Task id=)\d+
See the output of at the very bottom of this C# online demo:
10
11
9
The (?<=-> \[Task id=) lookbehind ensures that we are preceded by the section from the arrow to the equal sign
\d+ matches the id
This C# code adds all the codes to resultList:
var myRegex = new Regex(#"(?<=-> \[Task id=)\d+");
Match matchResult = myRegex.Match(s1);
while (matchResult.Success) {
resultList.Add(matchResult.Value);
Console.WriteLine(matchResult.Value);
matchResult = matchResult.NextMatch();
}
Original Version with Capture Groups
To give you a second option, here is my original demo using a capture group.
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
A part of my application requires to find out the occurrence of "1p/sec" or "22p/sec" or "22p/ sec" or ( [00-99]p/sec also [00-99]p/ sec) in a string.
so far I am able to get only the first occurrence(i.e if its a single digit, like the one in the above string). I should be able to get 'n' number of occurrence
Someone pl provide guidance
string input = "US Canada calling # 1p/ sec (Base Tariff - 11p/sec). Validity : 30 Days.";
// Here we call Regex.Match.
Match match = Regex.Match(input, #"(\d)[p/sec]",
RegexOptions.IgnorePatternWhitespace);
// input.IndexOf("p/sec");
// Here we check the Match instance.
if (match.Success)
{
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
You need to quantify your \d in the regex, for example by adding a + quantifier, which will then cause \d+ to match at least one but possibly more digits. To restrict to a specific number of digits, you can use the {n,m} quantifier, e.g. \d{1,2} which will then match either one or two digits.
Note also that [p/sec] as you use it in the regex is a character class, matching a single character from the set { c, e, p, s, / }, which is probably not what you want because you'd want to match the p/sec literally.
A more robust option would probably be the following
(\d+)\s*p\s*/\s*sec
which a) matches p/sec literally and also allows for whitespace between the number and the unit as well as around the /.
Use
Match match = Regex.Match(input, #"(\d{1,2})p/sec" ...
instead.
\d mathces a single digit. If you append {1,2} to that you instead match one - two digits. \d* would match zero or more and \d+ would match one or more. \d{1,10} would match 1-10 digits.
If you need to know if it was surrounded by brackets or not you could do
Match match = Regex.Match(input, #"(([\d{1,2}])|(\d{1,2}))p/sec"
...
bool hasBrackets = match.Groups[1].Value[0] == '[';
How about this regex:
^\[?\d\d?\]?/\s*sec$
Explanation:
The regular expression:
^\[?\d\d?\]?/\s*sec$
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\[? '[' (optional (matching the most amount
possible))
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
\d? digits (0-9) (optional (matching the most
amount possible))
----------------------------------------------------------------------
\]? ']' (optional (matching the most amount
possible))
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
sec 'sec'
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
If i get you right, you want to find all occurrances, right? Use a matchcollection
string input = "US Canada calling # 1p/sec (Base Tariff - 11p/sec). Validity : 30 Days.";
// Here we call Regex.Match.
Regex regex = new Regex(#"(\d){1,2}[p/sec]", RegexOptions.IgnorePatternWhitespace);
MatchCollection matchCollection = regex.Matches(input);
// input.IndexOf("p/sec");
// Here we check the Match instance.
foreach (Match match in matchCollection)
{
Console.WriteLine(match.Groups[0].Value);
}
You can do this:
string input = "US Canada calling # 1p/ sec (Base Tariff - 11p/sec). 91p/ sec , 123p/ sec Validity : 30 Days.";
MatchCollection matches = Regex.Matches(input, #"\b(\d{1,2})p/\s?sec");
foreach (Match m in matches) {
string key = m.Groups[1].Value;
Console.WriteLine(key);
}
Output:
1
11
91
\b is a word boundary, to anchor the match to the start of a "word" (notice that it will not match the "123p/ sec" in the test string!)
\d{1,2} Will match one or two digits. See Quantifiers
p/\s?sec matches a literal "p/sec" with an optional whitespace before "sec"
Regex Expression:
((?<price>(\d+))p/sec)|(\[(?<price>(\d+[-]\d+))\]p/sec)
In C# you need to run a for loop to check multiple captures:
if(match.Success)
{
for(int i = 0; i< match.Groups["price"].Captures.Count;i++)
{
string key = match.Groups["price"].Value;
Console.WriteLine(key);
}
}