How to get predecessor id from task predecessor string? - c#

I'm working on MPXJ library. I want to get predecessors id from below string. It's complex for me. Please help me get all predecessors id. Thanks.
Task predecessor string:
Task Predecessors:[[Relation [Task id=12 uniqueID=145 name=Alibaba1] -> [Task id=10 uniqueID=143 name=Alibaba2]],
[Relation [Task id=12 uniqueID=145 name=Alibaba3] -> [Task id=11 uniqueID=144 name=Alibaba4]], [Relation [Task id=12 uniqueID=145 name=Alibaba5] -> [Task id=9 uniqueID=142 name=Alibaba6]]]
I need get the predecessors id: 10, 11, 9
Pattern:
[Task id=12 uniqueID=145 name=Alibaba1] -> [Task id=10 uniqueID=143 name=Alibaba2]]

To grab those ID's you need to look for the Task id after -> You can try the following using Matches method.
Regex rgx = new Regex(#"->\s*\[Task\s*id=(\d+)");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[1].Value);
Working Demo
Explanation:
-> # '->'
\s* # whitespace (\n, \r, \t, \f, and " ") (0 or more times)
\[ # '['
Task # 'Task'
\s* # whitespace (\n, \r, \t, \f, and " ") (0 or more times)
id= # 'id='
( # group and capture to \1:
\d+ # digits (0-9) (1 or more times)
) # end of \1

No need for capture groups. See full C# online demo
My original answer used capture groups. But we don't need them.
You can use this regex:
(?<=-> \[Task id=)\d+
See the output of at the very bottom of this C# online demo:
10
11
9
The (?<=-> \[Task id=) lookbehind ensures that we are preceded by the section from the arrow to the equal sign
\d+ matches the id
This C# code adds all the codes to resultList:
var myRegex = new Regex(#"(?<=-> \[Task id=)\d+");
Match matchResult = myRegex.Match(s1);
while (matchResult.Success) {
resultList.Add(matchResult.Value);
Console.WriteLine(matchResult.Value);
matchResult = matchResult.NextMatch();
}
Original Version with Capture Groups
To give you a second option, here is my original demo using a capture group.
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind

Related

Converting Notepad++ REGEX to .NET C#

Working with a pipe-delimited file. Currently, I use Notepad++ find and replace REGEX pattern ^(?:[^|]*\|){5}\K[^|]* that replaces all lines with an empty string between the 5th and 6th |. I'm trying to programmatically do this process, but .NET does not support \K. I've tried a few instances of the backward lookup, but I cannot seem to grasp it.
string[] lines = File.ReadAllLines(path);
foreach (string line in lines)
{
string line2 = null;
string finalLine = line;
string[] col = line.Split('|');
if (col[5] != null)
{
line2 = Regex.Replace(line, #"^(?:[^|]*\|){5}\K[^|]*", "");
\K is a "workaround" for regex grammars/engines that don't support anchoring against look-behind assertions.
.NET's regex grammar has look-behind assertions (using the syntax (?<=subexpression)), so use them:
Regex.Replace(line, #"(?<=^(?:[^|]*\|){5})[^|]*", "")
In the context of .NET, this pattern now describes:
(?<= # begin (positive) look-behind assertion
^ # match start of string
(?: # begin non-capturing group
[^|]*\| # match (optional) field value + delimiter
){5} # end of group, repeat 5 times
) # end of look-behind assertion
[^|]* # match any non-delimiters (will only occur where the lookbehind is satisfied)
No need using lookbehinds, use capturing groups and backreferences:
line2 = Regex.Replace(line, #"^((?:[^|]*\|){5})[^|]*", "$1");
See proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
(?: group, but do not capture (5 times):
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
){5} end of grouping
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more times
(matching the most amount possible))

How to capture groups

In C# and NET regex engine, I have an input line like this and it is terminated by \n
1ROSS/SVETA/JAMIE MRS T02XT 2WHITE/VIKA MS 3GREEN/ANDYMR
I have to obtain
First capture
1. num=1
2. surname=ROSS
3. name=SVETA
4. name=JAMIE
5. title=MRS
6. other=T02XT
Second capture
1. num=2
2. surname=WHITE
3. name=VIKA
4. title=MS
Third capture
1. num=3
2. surname=GREEN
3. name=ANDY
4. title=MR
The first group has two names and there is no space within ANDY and MR in the third group. I am unable to solve this problem. I started using
(^\d|\s\d)
to detect the groups and it works, but after I do not know how to capture till the end of each group and split into subgroups the inside data.
If the title values are set to MR, MRS or MS, you may use
\b(?<num>\d)(?<surname>\p{L}+)(?:/(?<name>\p{L}+?))+(?:\s*(?<title>M(?:RS?|S)))?\b\s*(?<other>.*?)(?=\b\d\p{L}+/\p{L}|$)
See the regex demo
Details
\b - word boundary
(?<num>\d) - Group "num": a digit (replace with \d+ if there can be more than 1)
(?<surname>\p{L}+) - Group "surname": 1+ letters
(?:/(?<name>\p{L}+?))+ - one or more sequences of / followed with Group "surname": 1+ letters, as few as possible
(?:\s*(?<title>M(?:RS?|S)))? - an optional sequence of
\s* - 0+ whitespaces
(?<title>M(?:RS?|S)) - Group "title": M followed with R and optional S or followed with S
\b - word boundary
\s* - 0+ whitespaces
(?<other>.*?) - Group "other": 0 or more chars, as few as possible
(?=\b\d\p{L}+/\p{L}|$) - up to the first occurrence of the initial pattern (word boundary, digit, 1+ letters, / and a letter) or end of string.
C# demo:
var text = "1ROSS/SVETA/JAMIE MRS T02XT 2WHITE/VIKA MS 3GREEN/ANDYMR";
var pattern = #"\b(?<num>\d)(?<surname>\p{L}+)(?:/(?<name>\p{L}+?))+(?:\s*(?<title>M(?:RS?|S)))?\b\s*(?<other>.*?)(?=\b\d\p{L}+/\p{L}|$)";
var result = Regex.Matches(text, pattern);
foreach (Match m in result) {
Console.WriteLine("Num: {0}", m.Groups["num"].Value);
Console.WriteLine("Surname: {0}", m.Groups["surname"].Value);
Console.WriteLine("Names: {0}", string.Join(", ", m.Groups["name"].Captures.Cast<Capture>().Select(x => x.Value)));
Console.WriteLine("Title: {0}", m.Groups["title"].Value);
Console.WriteLine("Other: {0}", m.Groups["other"].Value);
Console.WriteLine("===== NEXT MATCH ======");
}
Output:
Num: 1
Surname: ROSS
Names: SVETA, JAMIE
Title: MRS
Other: T02XT
===== NEXT MATCH ======
Num: 2
Surname: WHITE
Names: VIKA
Title: MS
Other:
===== NEXT MATCH ======
Num: 3
Surname: GREEN
Names: ANDY
Title: MR
Other:
===== NEXT MATCH ======

Regex with balancing groups

I need to write regex that capture generic arguments (that also can be generic) of type name in special notation like this:
System.Action[Int32,Dictionary[Int32,Int32],Int32]
lets assume type name is [\w.]+ and parameter is [\w.,\[\]]+
so I need to grab only Int32, Dictionary[Int32,Int32] and Int32
Basically I need to take something if balancing group stack is empty, but I don't really understand how.
UPD
The answer below helped me solve the problem fast (but without proper validation and with depth limitation = 1), but I've managed to do it with group balancing:
^[\w.]+ #Type name
\[(?<delim>) #Opening bracet and first delimiter
[\w.]+ #Minimal content
(
[\w.]+
((?(open)|(?<param-delim>)),(?(open)|(?<delim>)))* #Cutting param if balanced before comma and placing delimiter
((?<open>\[))* #Counting [
((?<-open>\]))* #Counting ]
)*
(?(open)|(?<param-delim>))\] #Cutting last param if balanced
(?(open)(?!) #Checking balance
)$
Demo
UPD2 (Last optimization)
^[\w.]+
\[(?<delim>)
[\w.]+
(?:
(?:(?(open)|(?<param-delim>)),(?(open)|(?<delim>))[\w.]+)?
(?:(?<open>\[)[\w.]+)?
(?:(?<-open>\]))*
)*
(?(open)|(?<param-delim>))\]
(?(open)(?!)
)$
I suggest capturing those values using
\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*
See the regex demo.
Details:
\w+(?:\.\w+)* - match 1+ word chars followed with . + 1+ word chars 1 or more times
\[ - a literal [
(?:,?(?<res>\w+(?:\[[^][]*])?))* - 0 or more sequences of:
,? - an optional comma
(?<res>\w+(?:\[[^][]*])?) - Group "res" capturing:
\w+ - one or more word chars (perhaps, you would like [\w.]+)
(?:\[[^][]*])? - 1 or 0 (change ? to * to match 1 or more) sequences of a [, 0+ chars other than [ and ], and a closing ].
A C# demo below:
var line = "System.Action[Int32,Dictionary[Int32,Int32],Int32]";
var pattern = #"\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*";
var result = Regex.Matches(line, pattern)
.Cast<Match>()
.SelectMany(x => x.Groups["res"].Captures.Cast<Capture>()
.Select(t => t.Value))
.ToList();
foreach (var s in result) // DEMO
Console.WriteLine(s);
UPDATE: To account for unknown depth [...] substrings, use
\w+(?:\.\w+)*\[(?:\s*,?\s*(?<res>\w+(?:\[(?>[^][]+|(?<o>\[)|(?<-o>]))*(?(o)(?!))])?))*
See the regex demo

Regular expression [0-99]p/sec

A part of my application requires to find out the occurrence of "1p/sec" or "22p/sec" or "22p/ sec" or ( [00-99]p/sec also [00-99]p/ sec) in a string.
so far I am able to get only the first occurrence(i.e if its a single digit, like the one in the above string). I should be able to get 'n' number of occurrence
Someone pl provide guidance
string input = "US Canada calling # 1p/ sec (Base Tariff - 11p/sec). Validity : 30 Days.";
// Here we call Regex.Match.
Match match = Regex.Match(input, #"(\d)[p/sec]",
RegexOptions.IgnorePatternWhitespace);
// input.IndexOf("p/sec");
// Here we check the Match instance.
if (match.Success)
{
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
You need to quantify your \d in the regex, for example by adding a + quantifier, which will then cause \d+ to match at least one but possibly more digits. To restrict to a specific number of digits, you can use the {n,m} quantifier, e.g. \d{1,2} which will then match either one or two digits.
Note also that [p/sec] as you use it in the regex is a character class, matching a single character from the set { c, e, p, s, / }, which is probably not what you want because you'd want to match the p/sec literally.
A more robust option would probably be the following
(\d+)\s*p\s*/\s*sec
which a) matches p/sec literally and also allows for whitespace between the number and the unit as well as around the /.
Use
Match match = Regex.Match(input, #"(\d{1,2})p/sec" ...
instead.
\d mathces a single digit. If you append {1,2} to that you instead match one - two digits. \d* would match zero or more and \d+ would match one or more. \d{1,10} would match 1-10 digits.
If you need to know if it was surrounded by brackets or not you could do
Match match = Regex.Match(input, #"(([\d{1,2}])|(\d{1,2}))p/sec"
...
bool hasBrackets = match.Groups[1].Value[0] == '[';
How about this regex:
^\[?\d\d?\]?/\s*sec$
Explanation:
The regular expression:
^\[?\d\d?\]?/\s*sec$
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\[? '[' (optional (matching the most amount
possible))
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
\d? digits (0-9) (optional (matching the most
amount possible))
----------------------------------------------------------------------
\]? ']' (optional (matching the most amount
possible))
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
sec 'sec'
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
If i get you right, you want to find all occurrances, right? Use a matchcollection
string input = "US Canada calling # 1p/sec (Base Tariff - 11p/sec). Validity : 30 Days.";
// Here we call Regex.Match.
Regex regex = new Regex(#"(\d){1,2}[p/sec]", RegexOptions.IgnorePatternWhitespace);
MatchCollection matchCollection = regex.Matches(input);
// input.IndexOf("p/sec");
// Here we check the Match instance.
foreach (Match match in matchCollection)
{
Console.WriteLine(match.Groups[0].Value);
}
You can do this:
string input = "US Canada calling # 1p/ sec (Base Tariff - 11p/sec). 91p/ sec , 123p/ sec Validity : 30 Days.";
MatchCollection matches = Regex.Matches(input, #"\b(\d{1,2})p/\s?sec");
foreach (Match m in matches) {
string key = m.Groups[1].Value;
Console.WriteLine(key);
}
Output:
1
11
91
\b is a word boundary, to anchor the match to the start of a "word" (notice that it will not match the "123p/ sec" in the test string!)
\d{1,2} Will match one or two digits. See Quantifiers
p/\s?sec matches a literal "p/sec" with an optional whitespace before "sec"
Regex Expression:
((?<price>(\d+))p/sec)|(\[(?<price>(\d+[-]\d+))\]p/sec)
In C# you need to run a for loop to check multiple captures:
if(match.Success)
{
for(int i = 0; i< match.Groups["price"].Captures.Count;i++)
{
string key = match.Groups["price"].Value;
Console.WriteLine(key);
}
}

Captures count is always zero

I've got a problem. I use following regular expression:
Pattern =
(?'name'\w+(?:\w|\s)*), \s*
(?'category'\w+(?:\w|\s)*), \s*
(?:
\{ \s*
[yY]: (?'year'\d+), \s*
[vV]: (?'volume'(?:([1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))+), \s*
\} \s*
,? \s*
)*
with IgnorePatternWhitespaces option.
Everything seemed fine in my application until I debugged it & encountered a problem.
var Year = default(UInt32);
// ...
if((Match = Regex.Match(Line, Pattern, Options)).Success)
{
// Getting Product header information
Name = Match.Groups["name"].Value;
// Gathering Product statistics
for(var ix = default(Int32); ix < Match.Groups["year"].Captures.Count; ix++)
{
// never get here
Year = UInt32.Parse(Match.Groups["year"].Captures[ix].Value, NumberType, Culture);
}
}
So in the code above.. In my case Match is always successful. I get proper value for Name but when turn comes to for loop program flow just passes it by. I debugged there's no Captures in Match.Groups["year"]. So it is logical behavior. But not obvious to me where I'm wrong. Help!!
There is a previous connected post Extract number values enclosed inside curly brackets I made.
Thanks!
EDIT. Input Samples
Sherwood, reciever, {y:2008,V:5528.35}, {y:2009,V:8653.89}, {y:2010, V:4290.51}
I need to capture 2008, 5528.35, 2009, 8653.89, 2010, 4290.51 values and operate with them as named groups.
2D EDIT
I tried using ExplicitCapture Option and following expression:
(?<name>\w+(w\| )*), (?<category>\w+(w\| )*), (\{[yY]:(?<year>\d+), *[vV]:(?<volume>(([1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))+)\}(, )?)+
But that didn't help.
Edit: You could simplify by matching everything until the next comma: [^,]*. Here's a full code snippet to match your source data:
var testRegex = new Regex(#"
(?'name'[^,]*),\s*
(?'category'[^,]*),\s*
({y:(?'year'[^,]*),\s*
V:(?'volume'[^,]*),?\s*)*",
RegexOptions.IgnorePatternWhitespace);
var testMatches = testRegex.Matches(
"Sherwood, reciev, {y:2008,V:5528.35}, {y:2009,V:8653.89}, {y:2010, V:4290.51}");
foreach (Match testMatch in testMatches)
{
Console.WriteLine("Name = {0}", testMatch.Groups["name"].Value);
foreach (var capture in testMatch.Groups["year"].Captures)
Console.WriteLine(" Year = {0}", capture);
}
This prints:
Name = Sherwood
Year = 2008
Year = 2009
Year = 2010
I think the problem is a comma:
, \s* \}
which should be optional (or omitted?):
,? \s* \}
To expound on what MRAB said:
(?'name'
\w+
(?:
\w|\s
)*
),
\s*
(?'category'
\w+
(?:
\w|\s
)*
),
\s*
(?:
\{
\s*
[yY]:
(?'year'
\d+
),
\s*
[vV]:
(?'volume'
(?:
( # Why do you need capturing parenth's here ?
[1-9][0-9]*
\.?
[0-9]*
)
|
(
\.[0-9]+
)
)+
), # I'm just guessing this comma doesent match input samples
\s*
\}
\s*
,?
\s*
)*
Sherwood, reciever, {y:2008,V:5528.35}, {y:2009,V:8653.89}, {y:2010, V:4290.51}

Categories

Resources