C# regular expression to NSRegularExpression - c#

I have the following regular expression that gets me the table name and column details of a create index statement:
Regex r = new Regex(#"create\s*index.*?\son\s*\[?(?<table>[\s\w]*\w)\]?\s*\((?:(?<cname>[\s\d\w\[\]]*),?)*\)", RegexOptions.Compiled | RegexOptions.IgnoreCase);
I would like to use this in Objective-C. I have tried the following:
NSError * error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern: #"create\\s*index.*?\\son\\s*\\[?([\\s\\w]*\\w)\\]?\\s*\\((?:([\\s\\d\\w\\[\\]]*),?)*\\)"
options: NSRegularExpressionCaseInsensitive | NSRegularExpressionSearch
error: &error];
if(nil != error)
{
NSLog(#"Error is: %#. %#", [error localizedDescription], error);
}
NSRange rangeOfFirstMatch = [regex rangeOfFirstMatchInString: createStatement options:0 range: NSMakeRange(0, [createStatement length])];
NSArray *matches = [regex matchesInString: createStatement
options: 0
range: NSMakeRange(0, [createStatement length])];
Which partially works. It gives me three ranges. The first one contains the entire string and the second one contains the table name. The problem is that the third one is empty.
Anyone have any ideas where I'm going wrong?
Edit: The string I'm trying to parse is: CREATE INDEX cardSetIndex ON [card] (cardSetId ASC)

The problem you seem to have is that the second capture group is being overwritten by the last itteration of (?: )*. Since its optional, its always blank.
Your regex:
create\s*index.*?\son\s*
\[?
( [\s\w]*\w )
\]?\s*
\(
(?:
( [\s\d\w\[\]]* )
,?
)*
\)
Change it to:
create\s*index.*?\son\s*
\[?
( [\s\w]*\w )
\]?
\s*
\(
(
(?: [\s\d\w\[\]]* ,? )*
)
\)
Compressed and escaped:
create\\s*index.*?\\son\\s*\\[?([\\s\\w]*\\w)\\]?\\s*\\(((?:[\\s\\d\\w\\[\\]]*,?)*)\\)

Your question is sort of vague about what you actually want to grab, and that regex is quite gnarly, so I wasn't able to glean it from that.
CREATE\s*INDEX\s*(\w+)\s*ON\s*(\[\w+\])\s*\((.+)\)
That will grab the table name in group 1, the ON property in group 2, and the following property after that in group 3.

Related

How to write a regular expression that captures tags in a comma-separated list?

Here is my input:
#
tag1, tag with space, !##%^, 🦄
I would like to match it with a regex and yield the following elements easily:
tag1
tag with space
!##%^
🦄
I know I could do it this way:
var match = Regex.Match(input, #"^#[\n](?<tags>[\S ]+)$");
// if match is a success
var tags = match.Groups["tags"].Value.Split(',').Select(x => x.Trim());
But that's cheating, as it involves messing around with C#. There must be a neat way to do this with a regex. Just must be... right? ;D
The question is: how to write a regular expression that would allow me to iterate through captures and extract tags, without the need of splitting and trimming?
This works (?ms)^\#\s+(?:\s*((?:(?!,|^\#\s+).)*?)\s*(?:,|$))+
It uses C#'s Capture Collection to find a variable amount of field data
in a single record.
You could extend the regex further to get all records at once.
Where each record contains its own variable amount of field data.
The regex has built-in trimming as well.
Expanded:
(?ms) # Inline modifiers: multi-line, dot-all
^ \# \s+ # Beginning of record
(?: # Quantified group, 1 or more times, get all fields of record at once
\s* # Trim leading wsp
( # (1 start), # Capture collector for variable fields
(?: # One char at a time, but not comma or begin of record
(?!
,
| ^ \# \s+
)
.
)*?
) # (1 end)
\s*
(?: , | $ ) # End of this field, comma or EOL
)+
C# code:
string sOL = #"
#
tag1, tag with space, !##%^, 🦄";
Regex RxOL = new Regex(#"(?ms)^\#\s+(?:\s*((?:(?!,|^\#\s+).)*?)\s*(?:,|$))+");
Match _mOL = RxOL.Match(sOL);
while (_mOL.Success)
{
CaptureCollection ccOL1 = _mOL.Groups[1].Captures;
Console.WriteLine("-------------------------");
for (int i = 0; i < ccOL1.Count; i++)
Console.WriteLine(" '{0}'", ccOL1[i].Value );
_mOL = _mOL.NextMatch();
}
Output:
-------------------------
'tag1'
'tag with space'
'!##%^'
'??'
''
Press any key to continue . . .
Nothing wrong with cheating ;]
string input = #"#
tag1, tag with space, !##%^, 🦄";
string[] tags = Array.ConvertAll(input.Split('\n').Last().Split(','), s => s.Trim());
You can pretty much make it without regex. Just split it like this:
var result = input.Split(new []{'\n','\r'}, StringSplitOptions.RemoveEmptyEntries).Skip(1).SelectMany(x=> x.Split(new []{','},StringSplitOptions.RemoveEmptyEntries).Select(y=> y.Trim()));

Regex with balancing groups

I need to write regex that capture generic arguments (that also can be generic) of type name in special notation like this:
System.Action[Int32,Dictionary[Int32,Int32],Int32]
lets assume type name is [\w.]+ and parameter is [\w.,\[\]]+
so I need to grab only Int32, Dictionary[Int32,Int32] and Int32
Basically I need to take something if balancing group stack is empty, but I don't really understand how.
UPD
The answer below helped me solve the problem fast (but without proper validation and with depth limitation = 1), but I've managed to do it with group balancing:
^[\w.]+ #Type name
\[(?<delim>) #Opening bracet and first delimiter
[\w.]+ #Minimal content
(
[\w.]+
((?(open)|(?<param-delim>)),(?(open)|(?<delim>)))* #Cutting param if balanced before comma and placing delimiter
((?<open>\[))* #Counting [
((?<-open>\]))* #Counting ]
)*
(?(open)|(?<param-delim>))\] #Cutting last param if balanced
(?(open)(?!) #Checking balance
)$
Demo
UPD2 (Last optimization)
^[\w.]+
\[(?<delim>)
[\w.]+
(?:
(?:(?(open)|(?<param-delim>)),(?(open)|(?<delim>))[\w.]+)?
(?:(?<open>\[)[\w.]+)?
(?:(?<-open>\]))*
)*
(?(open)|(?<param-delim>))\]
(?(open)(?!)
)$
I suggest capturing those values using
\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*
See the regex demo.
Details:
\w+(?:\.\w+)* - match 1+ word chars followed with . + 1+ word chars 1 or more times
\[ - a literal [
(?:,?(?<res>\w+(?:\[[^][]*])?))* - 0 or more sequences of:
,? - an optional comma
(?<res>\w+(?:\[[^][]*])?) - Group "res" capturing:
\w+ - one or more word chars (perhaps, you would like [\w.]+)
(?:\[[^][]*])? - 1 or 0 (change ? to * to match 1 or more) sequences of a [, 0+ chars other than [ and ], and a closing ].
A C# demo below:
var line = "System.Action[Int32,Dictionary[Int32,Int32],Int32]";
var pattern = #"\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*";
var result = Regex.Matches(line, pattern)
.Cast<Match>()
.SelectMany(x => x.Groups["res"].Captures.Cast<Capture>()
.Select(t => t.Value))
.ToList();
foreach (var s in result) // DEMO
Console.WriteLine(s);
UPDATE: To account for unknown depth [...] substrings, use
\w+(?:\.\w+)*\[(?:\s*,?\s*(?<res>\w+(?:\[(?>[^][]+|(?<o>\[)|(?<-o>]))*(?(o)(?!))])?))*
See the regex demo

Regex replace between and including tags

I have the following line of text (META Title):
Buy [ProductName][Text] at a great price [/Text] from [ShopName] today.
I am replacing depending on what values I have.
I have it working as I require however I can't find the correct regex to replace:
[Text] at a great price [/Text]
The words (in a nd between square brackets) change so the only thing that will remain the same is:
[][/]
i.e I may also want to replace
[TestText]some test text[/TestText] with nothing.
I have this working:
System.Text.RegularExpressions.Regex.Replace(SEOContent, #"\[Text].*?\[/Text]", #"");
I presumed the regex of:
[.*?].*?\[/.*?]
Would work but it didn't! - I'm coding in ASP.NET C#
Thanks in advance,
Dave
Use a named capture to get the node name of [..], then find it again using \k<..>.
(\[(?<Tag>[^\]]+)\][^\[]+\[/\k<Tag>\])
Broken down using Ignore Pattern Whitespace and an example program.
string pattern = #"
( # Begin our Match
\[ # Look for the [ escape anchor
(?<Tag>[^\]]+) # Place anything that is not antother ] into the named match Tag
\] # Anchor of ]
[^\[]+ # Get all the text to the next anchor
\[/ # Anchor of the closing [...] tag
\k<Tag> # Use the named capture subgroup Tag to balance it out
\] # Properly closed end tag/node.
) # Match is done";
string text = "[TestText]some test text[/TestText] with nothing.";
Console.WriteLine (Regex.Replace(text, pattern, "Jabberwocky", RegexOptions.IgnorePatternWhitespace));
// Outputs
// Jabberwocky with nothing.
As an aside, I would actually create a tokenizing regex (using a regex If with the above pattern) and replace within matches by identify the sections by named captures. Then in the replace using a match evaluator replace the identified tokens such as:
string pattern = #"
(?(\[(?<Tag>[^\]]+)\][^\[]+\[/\k<Tag>\]) # If statement to check []..[/] situation
( # Yes it is, match into named captures
\[
(?<Token>[^\]]+) # What is the text inside the [ ], into Token
\]
(?<TextOptional>[^\[]+) # Optional text to reuse
\[
(?<Closing>/[^\]]+) # The closing tag info
\]
)
| # Else, let is start a new check for either [] or plain text
(?(\[) # If a [ is found it is a token.
( # Yes process token
\[
(?<Token>[^\]]+) # What is the text inside the [ ], into Token
\]
)
| # Or (No of the second if) it is just plain text
(?<Text>[^\[]+) # Put it into the text match capture.
)
)
";
string text = #"Buy [ProductName] [Text]at a great price[/Text] from [ShopName] today.";
Console.WriteLine (
Regex.Replace(text,
pattern,
new MatchEvaluator((mtch) =>
{
if (mtch.Groups["Text"].Success) // If just text, return it.
return mtch.Groups["Text"].Value;
if (mtch.Groups["Closing"].Success) // If a Closing match capture group reports success, then process
{
return string.Format("Reduced Beyond Comparison (Used to be {0})", mtch.Groups["TextOptional"].Value);
}
// Otherwise its just a plain old token, swap it out.
switch ( mtch.Groups["Token"].Value )
{
case "ProductName" : return "Jabberwocky"; break;
case "ShopName" : return "StackOverFlowiZon"; break;
}
return "???"; // If we get to here...we have failed...need to determine why.
}),
RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture));
// Outputs:
// Buy Jabberwocky Reduced Beyond Comparison (Used to be at a great price) from StackOverFlowiZon today.

Captures count is always zero

I've got a problem. I use following regular expression:
Pattern =
(?'name'\w+(?:\w|\s)*), \s*
(?'category'\w+(?:\w|\s)*), \s*
(?:
\{ \s*
[yY]: (?'year'\d+), \s*
[vV]: (?'volume'(?:([1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))+), \s*
\} \s*
,? \s*
)*
with IgnorePatternWhitespaces option.
Everything seemed fine in my application until I debugged it & encountered a problem.
var Year = default(UInt32);
// ...
if((Match = Regex.Match(Line, Pattern, Options)).Success)
{
// Getting Product header information
Name = Match.Groups["name"].Value;
// Gathering Product statistics
for(var ix = default(Int32); ix < Match.Groups["year"].Captures.Count; ix++)
{
// never get here
Year = UInt32.Parse(Match.Groups["year"].Captures[ix].Value, NumberType, Culture);
}
}
So in the code above.. In my case Match is always successful. I get proper value for Name but when turn comes to for loop program flow just passes it by. I debugged there's no Captures in Match.Groups["year"]. So it is logical behavior. But not obvious to me where I'm wrong. Help!!
There is a previous connected post Extract number values enclosed inside curly brackets I made.
Thanks!
EDIT. Input Samples
Sherwood, reciever, {y:2008,V:5528.35}, {y:2009,V:8653.89}, {y:2010, V:4290.51}
I need to capture 2008, 5528.35, 2009, 8653.89, 2010, 4290.51 values and operate with them as named groups.
2D EDIT
I tried using ExplicitCapture Option and following expression:
(?<name>\w+(w\| )*), (?<category>\w+(w\| )*), (\{[yY]:(?<year>\d+), *[vV]:(?<volume>(([1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))+)\}(, )?)+
But that didn't help.
Edit: You could simplify by matching everything until the next comma: [^,]*. Here's a full code snippet to match your source data:
var testRegex = new Regex(#"
(?'name'[^,]*),\s*
(?'category'[^,]*),\s*
({y:(?'year'[^,]*),\s*
V:(?'volume'[^,]*),?\s*)*",
RegexOptions.IgnorePatternWhitespace);
var testMatches = testRegex.Matches(
"Sherwood, reciev, {y:2008,V:5528.35}, {y:2009,V:8653.89}, {y:2010, V:4290.51}");
foreach (Match testMatch in testMatches)
{
Console.WriteLine("Name = {0}", testMatch.Groups["name"].Value);
foreach (var capture in testMatch.Groups["year"].Captures)
Console.WriteLine(" Year = {0}", capture);
}
This prints:
Name = Sherwood
Year = 2008
Year = 2009
Year = 2010
I think the problem is a comma:
, \s* \}
which should be optional (or omitted?):
,? \s* \}
To expound on what MRAB said:
(?'name'
\w+
(?:
\w|\s
)*
),
\s*
(?'category'
\w+
(?:
\w|\s
)*
),
\s*
(?:
\{
\s*
[yY]:
(?'year'
\d+
),
\s*
[vV]:
(?'volume'
(?:
( # Why do you need capturing parenth's here ?
[1-9][0-9]*
\.?
[0-9]*
)
|
(
\.[0-9]+
)
)+
), # I'm just guessing this comma doesent match input samples
\s*
\}
\s*
,?
\s*
)*
Sherwood, reciever, {y:2008,V:5528.35}, {y:2009,V:8653.89}, {y:2010, V:4290.51}

Need help with regex to parse expression

I have an expression:
((((the&if)|sky)|where)&(end|finish))
What I need is to put a space between symbols and words so that it ends up like:
( ( ( ( the & if ) | sky ) | where ) & ( end | finish ) )
The regex I came up with is (\w)*[(\&*)(\|*)] which only gets me:
( ( ( ( the& if) | sky) | where) & ( end| finish) )
Could I get a little help here from a resident regex guru please? I will be using this in C#.
Edit: Since you're using C#, try this:
output = Regex.Replace(input, #"([^\w\s]|\w(?!\w))(?!$)", "$1 ");
That inserts a space after any character that matches the following conditions:
Is neither a letter, number, underscore, or whitespace
OR is a word character that is NOT followed by another word character
AND is not at the end of a line.
resultString = Regex.Replace(subjectString, #"\b|(?<=\W)(?=\W)", " ");
Explanation:
\b # Match a position at the start or end of a word
| # or...
(?<=\W) # a position between two
(?=\W) # non-word characters
(and replace those with a space).
You could just add a space after each word and after each non-word character (so look for \W|\w+ and replace it with the match and a space. e.g. in Vim:
:s/\W\|\w\+/\0 /g
You could use:
(\w+|&|\(|\)|\|)(?!$)
which means a word, or a & symbol, or a ( symbol, or a ) symbol, or a | symbol not followed by an end of string; and then replace a match with a match + space symbol. By using c# this could be done like:
var result = Regex.Replace(
#"((((the&if)|sky)|where)&(end|finish))",
#"(\w+|&|\(|\)|\|)(?!$)",
"$+ "
);
Now result variable contains a value:
( ( ( ( the & if ) | sky ) | where ) & ( end | finish ) )

Categories

Resources