Regex to replace a symbol & within quotes C# - c#

I'm trying to replace '&' inside quotes.
Input
"I & my friends are stuck here", & we can't resolve
Output
"I and my friends are stuck here", & we can't resolve
Replace '&' by 'and' and only inside quotes, could you please help?

By far the quickest way is to use the \G construct and do it with a single regex.
C# code
var str =
"\"I & my friends are stuck here & we can't get up\", & we can't resolve\n" +
"=> \"I and my friends are stuck here and we can't get up\", & we can't resolve\n";
var rx = #"((?:""(?=[^""]*"")|(?<!""|^)\G)[^""&]*)(?:(&)|(""))";
var res = Regex.Replace(str, rx, m =>
// Replace the ampersands inside double quotes with 'and'
m.Groups[1].Value + (m.Groups[2].Value.Length > 0 ? "and" : m.Groups[3].Value));
Console.WriteLine(res);
Output
"I and my friends are stuck here and we can't get up", & we can't resolve
=> "I and my friends are stuck here and we can't get up", & we can't resolve
Regex ((?:"(?=[^"]*")|(?<!"|^)\G)[^"&]*)(?:(&)|("))
https://regex101.com/r/db8VkQ/1
Explained
( # (1 start), Preamble
(?: # Block
" # Begin of quote
(?= [^"]* " ) # One-time check for close quote
| # or,
(?<! " | ^ ) # If not a quote behind or BOS
\G # Start match where last left off
)
[^"&]* # Many non-quote, non-ampersand
) # (1 end)
(?: # Body
( & ) # (2), Ampersand, replace with 'and'
| # or,
( " ) # (3), End of quote, just put back "
)
Benchmark
Regex1: ((?:"(?=[^"]*")|(?<!"|^)\G)[^"&]*)(?:(&)|("))
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 10
Elapsed Time: 2.21 s, 2209.03 ms, 2209035 ยตs
Matches per sec: 226,343

Use
Regex.Replace(s, "\"[^\"]*\"", m => Regex.Replace(m.Value, #"\B&\B", "and"))
See the C# demo:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var s = "\"I & my friends are stuck here\", & we can't resolve";
Console.WriteLine(
Regex.Replace(s, "\"[^\"]*\"", m => Regex.Replace(m.Value, #"\B&\B", "and"))
);
}
}
Output: "I and my friends are stuck here", & we can't resolve

Related

Matching balanced parentheses before a character

I need to match a string within balanced parentheses before a literal period in c#. My regex with balanced groups works except when there are extra open parens in the string. According to my understanding, this requires a conditional fail pattern to ensure the stack is empty on match, yet something is not quite right.
Original regex:
#"(?<Par>[(]).+(?<-Par>[)])\."
With fail-pattern:
#"(?<Par>[(]).+(?<-Par>[)])(?(Par)(?!))\."
Test-code (last 2 fail):
string[] tests = {
"a.c", "",
"a).c", "",
"(a.c", "",
"a(a).c", "(a).",
"a(a b).c", "(a b).",
"a((a b)).c", "((a b)).",
"a(((a b))).c", "(((a b))).",
"a((a) (b)).c", "((a) (b)).",
"a((a)(b)).c", "((a)(b)).",
"a((ab)).c", "((ab)).",
"a)((ab)).(c", "((ab)).",
"a(((a b)).c", "((a b)).",
"a(((a b)).)c", "((a b))."
};
Regex re = new Regex(#"(?<Par>[(]).+(?<-Par>[)])(?(Par)(?!))\.");
for (int i = 0; i < tests.Length; i += 2)
{
var result = re.Match(tests[i]).Groups[0].Value;
if (result != tests[i + 1]) throw new Exception
("Expecting: " + tests[i + 1] + ", got " + result);
}
You may use a well-known regex to match balanced parentheses and just append a \. to it:
\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)\.
|---------- balanced parens part ----------|.|
See the regex demo.
Details
\( - a (
(?> - start of an atomic group
[^()]+ - 1 or more chars other than ( and )
| - or
(?<o>)\( - an opening ( is pushed on to the Group o stack
| - or
(?<-o>)\) - a closing ( is popped off the Group o stack
)* - 0 or more repetitions of the atomic group
(?(o)(?!)) - fail the match if Group o stack is not empty
\) - a )
\. - a dot.

How to write a regular expression that captures tags in a comma-separated list?

Here is my input:
#
tag1, tag with space, !##%^, ๐Ÿฆ„
I would like to match it with a regex and yield the following elements easily:
tag1
tag with space
!##%^
๐Ÿฆ„
I know I could do it this way:
var match = Regex.Match(input, #"^#[\n](?<tags>[\S ]+)$");
// if match is a success
var tags = match.Groups["tags"].Value.Split(',').Select(x => x.Trim());
But that's cheating, as it involves messing around with C#. There must be a neat way to do this with a regex. Just must be... right? ;D
The question is: how to write a regular expression that would allow me to iterate through captures and extract tags, without the need of splitting and trimming?
This works (?ms)^\#\s+(?:\s*((?:(?!,|^\#\s+).)*?)\s*(?:,|$))+
It uses C#'s Capture Collection to find a variable amount of field data
in a single record.
You could extend the regex further to get all records at once.
Where each record contains its own variable amount of field data.
The regex has built-in trimming as well.
Expanded:
(?ms) # Inline modifiers: multi-line, dot-all
^ \# \s+ # Beginning of record
(?: # Quantified group, 1 or more times, get all fields of record at once
\s* # Trim leading wsp
( # (1 start), # Capture collector for variable fields
(?: # One char at a time, but not comma or begin of record
(?!
,
| ^ \# \s+
)
.
)*?
) # (1 end)
\s*
(?: , | $ ) # End of this field, comma or EOL
)+
C# code:
string sOL = #"
#
tag1, tag with space, !##%^, ๐Ÿฆ„";
Regex RxOL = new Regex(#"(?ms)^\#\s+(?:\s*((?:(?!,|^\#\s+).)*?)\s*(?:,|$))+");
Match _mOL = RxOL.Match(sOL);
while (_mOL.Success)
{
CaptureCollection ccOL1 = _mOL.Groups[1].Captures;
Console.WriteLine("-------------------------");
for (int i = 0; i < ccOL1.Count; i++)
Console.WriteLine(" '{0}'", ccOL1[i].Value );
_mOL = _mOL.NextMatch();
}
Output:
-------------------------
'tag1'
'tag with space'
'!##%^'
'??'
''
Press any key to continue . . .
Nothing wrong with cheating ;]
string input = #"#
tag1, tag with space, !##%^, ๐Ÿฆ„";
string[] tags = Array.ConvertAll(input.Split('\n').Last().Split(','), s => s.Trim());
You can pretty much make it without regex. Just split it like this:
var result = input.Split(new []{'\n','\r'}, StringSplitOptions.RemoveEmptyEntries).Skip(1).SelectMany(x=> x.Split(new []{','},StringSplitOptions.RemoveEmptyEntries).Select(y=> y.Trim()));

How do I make Regex capture only named groups

According to Regex documentation, using RegexOptions.ExplicitCapture makes the Regex only match named groups like (?<groupName>...); but in action it does something a little bit different.
Consider these lines of code:
static void Main(string[] args) {
Regex r = new Regex(
#"(?<code>^(?<l1>[\d]{2})/(?<l2>[\d]{3})/(?<l3>[\d]{2})$|^(?<l1>[\d]{2})/(?<l2>[\d]{3})$|(?<l1>^[\d]{2}$))"
, RegexOptions.ExplicitCapture
);
var x = r.Match("32/123/03");
r.GetGroupNames().ToList().ForEach(gn => {
Console.WriteLine("GroupName:{0,5} --> Value: {1}", gn, x.Groups[gn].Success ? x.Groups[gn].Value : "");
});
}
When you run this snippet you'll see the result contains a group named 0 while I don't have a group named 0 in my regex!
GroupName: 0 --> Value: 32/123/03
GroupName: code --> Value: 32/123/03
GroupName: l1 --> Value: 32
GroupName: l2 --> Value: 123
GroupName: l3 --> Value: 03
Press any key to continue . . .
Could somebody please explain this behavior to me?
You always have group 0: that's the entire match. Numbered groups are relative to 1 based on the ordinal position of the opening parenthesis that defines the group. Your regular expression (formatted for clarity):
(?<code>
^
(?<l1> [\d]{2} )
/
(?<l2> [\d]{3} )
/
(?<l3> [\d]{2} )
$
|
^
(?<l1>[\d]{2})
/
(?<l2>[\d]{3})
$
|
(?<l1> ^[\d]{2} $ )
)
Your expression will backtrack, so you might consider simplifying your regular expression. This is probably clearer and more efficient:
static Regex rxCode = new Regex(#"
^ # match start-of-line, followed by
(?<code> # a mandatory group ('code'), consisting of
(?<g1> \d\d ) # - 2 decimal digits ('g1'), followed by
( # - an optional group, consisting of
/ # - a literal '/', followed by
(?<g2> \d\d\d ) # - 3 decimal digits ('g2'), followed by
( # - an optional group, consisting of
/ # - a literal '/', followed by
(?<g3> \d\d ) # - 2 decimal digits ('g3')
)? # - END: optional group
)? # - END: optional group
) # - END: named group ('code'), followed by
$ # - end-of-line
" , RegexOptions.IgnorePatternWhitespace|RegexOptions.ExplicitCapture );
Once you have that, something like this:
string[] texts = { "12" , "12/345" , "12/345/67" , } ;
foreach ( string text in texts )
{
Match m = rxCode.Match( text ) ;
Console.WriteLine("{0}: match was {1}" , text , m.Success ? "successful" : "NOT successful" ) ;
if ( m.Success )
{
Console.WriteLine( " code: {0}" , m.Groups["code"].Value ) ;
Console.WriteLine( " g1: {0}" , m.Groups["g1"].Value ) ;
Console.WriteLine( " g2: {0}" , m.Groups["g2"].Value ) ;
Console.WriteLine( " g3: {0}" , m.Groups["g3"].Value ) ;
}
}
produces the expected
12: match was successful
code: 12
g1: 12
g2:
g3:
12/345: match was successful
code: 12/345
g1: 12
g2: 345
g3:
12/345/67: match was successful
code: 12/345/67
g1: 12
g2: 345
g3: 67
named group
^(?<l1>[\d]{2})/(?<l2>[\d]{3})/(?<l3>[\d]{2})$|^(?<l1>[\d]{2})/(?<l2>[\d]{3})$|(?<l1>^[\d]{2}$)
try this (i remove first group from your regex) - see demo

C# regular expression

Please give a suggestion here:
I try to do a regex in C#. Here is the text
E1A: pop(+)
call T
call E1
E1B: return
TA: call F
call T1
I want to split it like this:
1)
E1A: pop(+)
call T
call E1
2)
E1B: return
3)
TA: call F
call T1
I tought at lookbehind, but it's not working because of the .+
Here is what I hope to work but it doesn't:
"[A-Z0-9]+[:].+(?=([A-Z0-9]+[:]))"
Does anyone have a better ideea?
EDIT: The "E1A","E1B","TA" are changing. All it remains the same is that they are made by letter and numbers follow by ":"
Regex regexObj = new Regex(
#"^ # Start of line
[A-Z0-9]+: # Match identifier
(?: # Match...
(?!^[A-Z0-9]+:) # (unless it's the start of the next identifier)
. # ... any character,
)* # repeat as needed.",
RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
allMatchResults = regexObj.Matches(subjectString);
Now allMatchResults.Count will contain the number of matches in subjectString, and allMatchResults.Item[i] will contain the ith match.

Need help with regex to parse expression

I have an expression:
((((the&if)|sky)|where)&(end|finish))
What I need is to put a space between symbols and words so that it ends up like:
( ( ( ( the & if ) | sky ) | where ) & ( end | finish ) )
The regex I came up with is (\w)*[(\&*)(\|*)] which only gets me:
( ( ( ( the& if) | sky) | where) & ( end| finish) )
Could I get a little help here from a resident regex guru please? I will be using this in C#.
Edit: Since you're using C#, try this:
output = Regex.Replace(input, #"([^\w\s]|\w(?!\w))(?!$)", "$1 ");
That inserts a space after any character that matches the following conditions:
Is neither a letter, number, underscore, or whitespace
OR is a word character that is NOT followed by another word character
AND is not at the end of a line.
resultString = Regex.Replace(subjectString, #"\b|(?<=\W)(?=\W)", " ");
Explanation:
\b # Match a position at the start or end of a word
| # or...
(?<=\W) # a position between two
(?=\W) # non-word characters
(and replace those with a space).
You could just add a space after each word and after each non-word character (so look for \W|\w+ and replace it with the match and a space. e.g. in Vim:
:s/\W\|\w\+/\0 /g
You could use:
(\w+|&|\(|\)|\|)(?!$)
which means a word, or a & symbol, or a ( symbol, or a ) symbol, or a | symbol not followed by an end of string; and then replace a match with a match + space symbol. By using c# this could be done like:
var result = Regex.Replace(
#"((((the&if)|sky)|where)&(end|finish))",
#"(\w+|&|\(|\)|\|)(?!$)",
"$+ "
);
Now result variable contains a value:
( ( ( ( the & if ) | sky ) | where ) & ( end | finish ) )

Categories

Resources