Need help with regex to parse expression - c#

I have an expression:
((((the&if)|sky)|where)&(end|finish))
What I need is to put a space between symbols and words so that it ends up like:
( ( ( ( the & if ) | sky ) | where ) & ( end | finish ) )
The regex I came up with is (\w)*[(\&*)(\|*)] which only gets me:
( ( ( ( the& if) | sky) | where) & ( end| finish) )
Could I get a little help here from a resident regex guru please? I will be using this in C#.

Edit: Since you're using C#, try this:
output = Regex.Replace(input, #"([^\w\s]|\w(?!\w))(?!$)", "$1 ");
That inserts a space after any character that matches the following conditions:
Is neither a letter, number, underscore, or whitespace
OR is a word character that is NOT followed by another word character
AND is not at the end of a line.

resultString = Regex.Replace(subjectString, #"\b|(?<=\W)(?=\W)", " ");
Explanation:
\b # Match a position at the start or end of a word
| # or...
(?<=\W) # a position between two
(?=\W) # non-word characters
(and replace those with a space).

You could just add a space after each word and after each non-word character (so look for \W|\w+ and replace it with the match and a space. e.g. in Vim:
:s/\W\|\w\+/\0 /g

You could use:
(\w+|&|\(|\)|\|)(?!$)
which means a word, or a & symbol, or a ( symbol, or a ) symbol, or a | symbol not followed by an end of string; and then replace a match with a match + space symbol. By using c# this could be done like:
var result = Regex.Replace(
#"((((the&if)|sky)|where)&(end|finish))",
#"(\w+|&|\(|\)|\|)(?!$)",
"$+ "
);
Now result variable contains a value:
( ( ( ( the & if ) | sky ) | where ) & ( end | finish ) )

Related

Regex to replace a symbol & within quotes C#

I'm trying to replace '&' inside quotes.
Input
"I & my friends are stuck here", & we can't resolve
Output
"I and my friends are stuck here", & we can't resolve
Replace '&' by 'and' and only inside quotes, could you please help?
By far the quickest way is to use the \G construct and do it with a single regex.
C# code
var str =
"\"I & my friends are stuck here & we can't get up\", & we can't resolve\n" +
"=> \"I and my friends are stuck here and we can't get up\", & we can't resolve\n";
var rx = #"((?:""(?=[^""]*"")|(?<!""|^)\G)[^""&]*)(?:(&)|(""))";
var res = Regex.Replace(str, rx, m =>
// Replace the ampersands inside double quotes with 'and'
m.Groups[1].Value + (m.Groups[2].Value.Length > 0 ? "and" : m.Groups[3].Value));
Console.WriteLine(res);
Output
"I and my friends are stuck here and we can't get up", & we can't resolve
=> "I and my friends are stuck here and we can't get up", & we can't resolve
Regex ((?:"(?=[^"]*")|(?<!"|^)\G)[^"&]*)(?:(&)|("))
https://regex101.com/r/db8VkQ/1
Explained
( # (1 start), Preamble
(?: # Block
" # Begin of quote
(?= [^"]* " ) # One-time check for close quote
| # or,
(?<! " | ^ ) # If not a quote behind or BOS
\G # Start match where last left off
)
[^"&]* # Many non-quote, non-ampersand
) # (1 end)
(?: # Body
( & ) # (2), Ampersand, replace with 'and'
| # or,
( " ) # (3), End of quote, just put back "
)
Benchmark
Regex1: ((?:"(?=[^"]*")|(?<!"|^)\G)[^"&]*)(?:(&)|("))
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 10
Elapsed Time: 2.21 s, 2209.03 ms, 2209035 ยตs
Matches per sec: 226,343
Use
Regex.Replace(s, "\"[^\"]*\"", m => Regex.Replace(m.Value, #"\B&\B", "and"))
See the C# demo:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var s = "\"I & my friends are stuck here\", & we can't resolve";
Console.WriteLine(
Regex.Replace(s, "\"[^\"]*\"", m => Regex.Replace(m.Value, #"\B&\B", "and"))
);
}
}
Output: "I and my friends are stuck here", & we can't resolve

How to write a regular expression that captures tags in a comma-separated list?

Here is my input:
#
tag1, tag with space, !##%^, ๐Ÿฆ„
I would like to match it with a regex and yield the following elements easily:
tag1
tag with space
!##%^
๐Ÿฆ„
I know I could do it this way:
var match = Regex.Match(input, #"^#[\n](?<tags>[\S ]+)$");
// if match is a success
var tags = match.Groups["tags"].Value.Split(',').Select(x => x.Trim());
But that's cheating, as it involves messing around with C#. There must be a neat way to do this with a regex. Just must be... right? ;D
The question is: how to write a regular expression that would allow me to iterate through captures and extract tags, without the need of splitting and trimming?
This works (?ms)^\#\s+(?:\s*((?:(?!,|^\#\s+).)*?)\s*(?:,|$))+
It uses C#'s Capture Collection to find a variable amount of field data
in a single record.
You could extend the regex further to get all records at once.
Where each record contains its own variable amount of field data.
The regex has built-in trimming as well.
Expanded:
(?ms) # Inline modifiers: multi-line, dot-all
^ \# \s+ # Beginning of record
(?: # Quantified group, 1 or more times, get all fields of record at once
\s* # Trim leading wsp
( # (1 start), # Capture collector for variable fields
(?: # One char at a time, but not comma or begin of record
(?!
,
| ^ \# \s+
)
.
)*?
) # (1 end)
\s*
(?: , | $ ) # End of this field, comma or EOL
)+
C# code:
string sOL = #"
#
tag1, tag with space, !##%^, ๐Ÿฆ„";
Regex RxOL = new Regex(#"(?ms)^\#\s+(?:\s*((?:(?!,|^\#\s+).)*?)\s*(?:,|$))+");
Match _mOL = RxOL.Match(sOL);
while (_mOL.Success)
{
CaptureCollection ccOL1 = _mOL.Groups[1].Captures;
Console.WriteLine("-------------------------");
for (int i = 0; i < ccOL1.Count; i++)
Console.WriteLine(" '{0}'", ccOL1[i].Value );
_mOL = _mOL.NextMatch();
}
Output:
-------------------------
'tag1'
'tag with space'
'!##%^'
'??'
''
Press any key to continue . . .
Nothing wrong with cheating ;]
string input = #"#
tag1, tag with space, !##%^, ๐Ÿฆ„";
string[] tags = Array.ConvertAll(input.Split('\n').Last().Split(','), s => s.Trim());
You can pretty much make it without regex. Just split it like this:
var result = input.Split(new []{'\n','\r'}, StringSplitOptions.RemoveEmptyEntries).Skip(1).SelectMany(x=> x.Split(new []{','},StringSplitOptions.RemoveEmptyEntries).Select(y=> y.Trim()));

How to match string that contains ^ in regular expression?

I tried to make a regular expression using online tool but not succeeded. Here is the string i need to match:-
27R4FF^27R4FF Text until end
always starts with alphanumeric (case-insensitive)
then always caret sign ^ (no space before & after)
then alphanumeric string
then always one white space
then string until end.
Here is the regular expression that is not working for me:-
((?:[a-z][a-z]*[0-9]+[a-z0-9]*))(\^)((?:[a-z][a-z]*[0-9]+[a-z0-9]*)).*?((?:[a-z][a-z]+))
c# code:-
string txt = "784SFS^784SFS Value is here";
var regs = #"((?:[a-z][a-z]*[0-9]+[a-z0-9]*))(\^)((?:[a-z][a-z]*[0-9]+[a-z0-9]*)).*?((?:[a-z][a-z]+))";
Regex r = new Regex(regs, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(txt);
Console.Write(m.Success ? "matched" : "didn't match");
Console.ReadLine();
Help appreciated. Thanks
Verbatim ^[^\W_]+\^[^\W_]+[ ].*$
^ # BOS
[^\W_]+ # Alphanum
\^ # Caret
[^\W_]+ # Alphanum
[ ] # Space
.* # Anything
$ # EOS
Output
** Grp 0 - ( pos 0 , len 28 )
27R4FF^27R4FF Text until end
I didn't get if string 'until the end' should be matched.
This works for
27R4FF^27R4FF Text
^\w+\^\w+\s\w+$
if you have some spaces at the end, try with
^\w+\^\w+\s[\w\s]+$
Try this: https://regex101.com/r/hD0hV0/2
^[\da-z]+\^[\da-z]+\s.*$
...or commented (assumes RegexOptions.IgnorePatternWhitespace if you're using the format in code):
^ # always starts...
[\da-z]+ # ...with alphanumeric (case-insensitive)
\^ # then always caret sign ^ (no space before & after)
[\da-z]+ # then alphanumeric string
\s # then always one white space
.* # then string...
$ # ...until end.
The other answers don't actually match what you describe (at the time of this writing) because \w matches underscore and you didn't mention any limitations on "the string at the end".

How do I make Regex capture only named groups

According to Regex documentation, using RegexOptions.ExplicitCapture makes the Regex only match named groups like (?<groupName>...); but in action it does something a little bit different.
Consider these lines of code:
static void Main(string[] args) {
Regex r = new Regex(
#"(?<code>^(?<l1>[\d]{2})/(?<l2>[\d]{3})/(?<l3>[\d]{2})$|^(?<l1>[\d]{2})/(?<l2>[\d]{3})$|(?<l1>^[\d]{2}$))"
, RegexOptions.ExplicitCapture
);
var x = r.Match("32/123/03");
r.GetGroupNames().ToList().ForEach(gn => {
Console.WriteLine("GroupName:{0,5} --> Value: {1}", gn, x.Groups[gn].Success ? x.Groups[gn].Value : "");
});
}
When you run this snippet you'll see the result contains a group named 0 while I don't have a group named 0 in my regex!
GroupName: 0 --> Value: 32/123/03
GroupName: code --> Value: 32/123/03
GroupName: l1 --> Value: 32
GroupName: l2 --> Value: 123
GroupName: l3 --> Value: 03
Press any key to continue . . .
Could somebody please explain this behavior to me?
You always have group 0: that's the entire match. Numbered groups are relative to 1 based on the ordinal position of the opening parenthesis that defines the group. Your regular expression (formatted for clarity):
(?<code>
^
(?<l1> [\d]{2} )
/
(?<l2> [\d]{3} )
/
(?<l3> [\d]{2} )
$
|
^
(?<l1>[\d]{2})
/
(?<l2>[\d]{3})
$
|
(?<l1> ^[\d]{2} $ )
)
Your expression will backtrack, so you might consider simplifying your regular expression. This is probably clearer and more efficient:
static Regex rxCode = new Regex(#"
^ # match start-of-line, followed by
(?<code> # a mandatory group ('code'), consisting of
(?<g1> \d\d ) # - 2 decimal digits ('g1'), followed by
( # - an optional group, consisting of
/ # - a literal '/', followed by
(?<g2> \d\d\d ) # - 3 decimal digits ('g2'), followed by
( # - an optional group, consisting of
/ # - a literal '/', followed by
(?<g3> \d\d ) # - 2 decimal digits ('g3')
)? # - END: optional group
)? # - END: optional group
) # - END: named group ('code'), followed by
$ # - end-of-line
" , RegexOptions.IgnorePatternWhitespace|RegexOptions.ExplicitCapture );
Once you have that, something like this:
string[] texts = { "12" , "12/345" , "12/345/67" , } ;
foreach ( string text in texts )
{
Match m = rxCode.Match( text ) ;
Console.WriteLine("{0}: match was {1}" , text , m.Success ? "successful" : "NOT successful" ) ;
if ( m.Success )
{
Console.WriteLine( " code: {0}" , m.Groups["code"].Value ) ;
Console.WriteLine( " g1: {0}" , m.Groups["g1"].Value ) ;
Console.WriteLine( " g2: {0}" , m.Groups["g2"].Value ) ;
Console.WriteLine( " g3: {0}" , m.Groups["g3"].Value ) ;
}
}
produces the expected
12: match was successful
code: 12
g1: 12
g2:
g3:
12/345: match was successful
code: 12/345
g1: 12
g2: 345
g3:
12/345/67: match was successful
code: 12/345/67
g1: 12
g2: 345
g3: 67
named group
^(?<l1>[\d]{2})/(?<l2>[\d]{3})/(?<l3>[\d]{2})$|^(?<l1>[\d]{2})/(?<l2>[\d]{3})$|(?<l1>^[\d]{2}$)
try this (i remove first group from your regex) - see demo

C# regular expression to NSRegularExpression

I have the following regular expression that gets me the table name and column details of a create index statement:
Regex r = new Regex(#"create\s*index.*?\son\s*\[?(?<table>[\s\w]*\w)\]?\s*\((?:(?<cname>[\s\d\w\[\]]*),?)*\)", RegexOptions.Compiled | RegexOptions.IgnoreCase);
I would like to use this in Objective-C. I have tried the following:
NSError * error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern: #"create\\s*index.*?\\son\\s*\\[?([\\s\\w]*\\w)\\]?\\s*\\((?:([\\s\\d\\w\\[\\]]*),?)*\\)"
options: NSRegularExpressionCaseInsensitive | NSRegularExpressionSearch
error: &error];
if(nil != error)
{
NSLog(#"Error is: %#. %#", [error localizedDescription], error);
}
NSRange rangeOfFirstMatch = [regex rangeOfFirstMatchInString: createStatement options:0 range: NSMakeRange(0, [createStatement length])];
NSArray *matches = [regex matchesInString: createStatement
options: 0
range: NSMakeRange(0, [createStatement length])];
Which partially works. It gives me three ranges. The first one contains the entire string and the second one contains the table name. The problem is that the third one is empty.
Anyone have any ideas where I'm going wrong?
Edit: The string I'm trying to parse is: CREATE INDEX cardSetIndex ON [card] (cardSetId ASC)
The problem you seem to have is that the second capture group is being overwritten by the last itteration of (?: )*. Since its optional, its always blank.
Your regex:
create\s*index.*?\son\s*
\[?
( [\s\w]*\w )
\]?\s*
\(
(?:
( [\s\d\w\[\]]* )
,?
)*
\)
Change it to:
create\s*index.*?\son\s*
\[?
( [\s\w]*\w )
\]?
\s*
\(
(
(?: [\s\d\w\[\]]* ,? )*
)
\)
Compressed and escaped:
create\\s*index.*?\\son\\s*\\[?([\\s\\w]*\\w)\\]?\\s*\\(((?:[\\s\\d\\w\\[\\]]*,?)*)\\)
Your question is sort of vague about what you actually want to grab, and that regex is quite gnarly, so I wasn't able to glean it from that.
CREATE\s*INDEX\s*(\w+)\s*ON\s*(\[\w+\])\s*\((.+)\)
That will grab the table name in group 1, the ON property in group 2, and the following property after that in group 3.

Categories

Resources