Regular expression to parse [A][B] into A and B

Regular expression to parse [A][B] into A and B - c#

I am trying to separate the following string into a separate lines with regular expression
[property1=text1][property2=text2]
and the desired result should be
property1=text1
property2=text2
here is my code
string[] attrs = Regex.Split(attr_str, #"\[(.+)\]");
Results is incorrect, am probably doing something wrong
UPDATE: after applying the suggested answers. Now it shows spaces and empty string

.+ is a greedy match, so it grabs as much as possible.
Use either
\[([^]]+)\]
or
\[(.+?)\]
In the first case, matching ] is not allowed, so "as much as possible" becomes shorter. The second uses a non-greedy match.

Your dot is grabbing the braces as well. You need to exclude braces:
\[([^]]+)\]
The [^]] matches any character except a close brace.

Try adding the 'lazy' specifier:
Regex.Split(attr_str, #"\[(.+?)\]");

Try:
var s = "[property1=text1][property2=text2]";
var matches = Regex.Matches(s, #"\[(.+?)\]")
.Cast<Match>()
.Select(m => m.Groups[1].Value);

Related

Using Regular Expression to match fields with an arbitrary delimiter

I suppose this should be an old question, however, I didn't find suitable solution in the forums after several hours searching.
I'm using C# and I know the Regex.Split and String.Split methods can be used to achieve the expected results. For some reason, I need to use a regular expression to match the required fields by specifying an arbitrary delimiter. For example, here is the string:
#DIV#This#DIV#is#DIV#"A "#DIV#string#DIV#
Here, #DIV# is the delimiter and is going to be split as:
This
is
"A "
string
How can I use a regular expression to match these values?
By the way, the leading and trailing #DIV# could also be ignored, for example, below source string should also be same result with above:
#DIV#This#DIV#is#DIV#"A "#DIV#string
This#DIV#is#DIV#"A "#DIV#string#DIV#
This#DIV#is#DIV#"A "#DIV#string

UPDATE:
I think I found a way (mind it is not efficient!) to get rid of empty values with a regex.
var splits = Regex.Matches(strIn, #"(?<=#DIV#|^)(?:(?!#DIV#).)+?(?=$|#DIV#)");
See demo on regexstorm (mind the \r? is only to demo in Multiline mode, you do not need it when using in real life)
ORIGINAL ANSWER
Here is another approach using a regular Split:
var strIn = "#DIV#This#DIV#is#DIV#\"A # \"#DIV#string#DIV#";
var splitText = strIn.Split(new[] {"#DIV#"}, StringSplitOptions.RemoveEmptyEntries);
Or else, you can use a regex to match the fields you need and then remove empty items with LINQ:
var spltsTxt2 = Regex.Matches(strIn, #"(?<=#DIV#|^).*?(?=#DIV#|$)").Cast<Match>().Where(p => !string.IsNullOrEmpty(p.Value)).Select(p => p.Value).ToList();
Output:

#DIV#|(.+?)(?=#DIV#|$)
Try this.Grab the captures or groups.See demo.
https://www.regex101.com/r/fJ6cR4/21

You can use the following to match:
/#?DIV#?/g
And replace with ' ' (space)
But this will give trailing and leading spaces sometimes.. which can be removed by using String.Trim()
Edit1: If you want to match the field values you can use the following:
(?<=(#?DIV#?)|^)[^#]*?(?=(#?DIV#?)|$)
See DEMO
Edit2: More generalized regex for matching # in fields:
(?m)(?<=(^(?!#?DIV#)|(#?DIV#)))(.*?)(?=($|(#DIV#?)))

Regular Expression - Remove zeroes inside an expression

I need to remove leading zeroes from the numerical part of an expression (using .net 2.0 C# Regex class).
Ex:
PAR0000034 -> PAR34
WP0003204 -> WP3204
I tried the following:
//keep starting characters, get rid of leading zeroes, keep remaining digits
string result = Regex.Replace(inputStr, "^(.+)(0+)(/d*)", "$1$3", RegexOptions.IgnoreCase)
Obviously, it did not work. I need a bit of help to find the mistake.

You don't need a regular expression for that, the Split method can do that for you.
Splitting on '0', removing empty entries (i.e. between the mulitple zeroes), and limiting the result to two strings will give you the two strings before and after the leading zeroes. Then you just put those two strings together again:
string result = String.Concat(
input.Split(new char[] { '0' }, 2, StringSplitOptions.RemoveEmptyEntries)
);

In your expression the .* part is greedy, so it catches full string. Further
use backslash instead of slash for digit \d
string result = Regex.Replace(inputStr, #"^([^0]+)(0+)(\d*)", "$1$3");
Or use look behind instead:
string result = Regex.Replace(inputStr, "(?<=[a-zA-Z])0+", "");

This works for me:
Regex.Replace("PPP00001001", "([^0]*)0+(.*)", "$1$2");

The phrase "leading zeroes" is confusing, since the zeroes you're talking about aren't actually at the beginning of the string. But if I understand you correctly, you want this:
string result = Regex.Replace(inputStr, "^(.*?)0+", "$1");
There are actually several ways to do it, with and without regex, but the above is probably the shortest and easiest to understand. The important part is the .*? lazy quantifier. This will ensure that it a) finds only the first string of zeroes, and b) deletes all the "leading" zeroes in the string.

Filtering out non-matching characters when using a regular expression

I have a string which has to be matching #"^[\w*$][\w\s-$]*((\d{1,})){0,1}$".
If it doesn't match this regular expression, I want the characters that do not match to be deleted from the string. How can I set this up?

s = Regex.Replace(s, #"^[^[\w*\$][\w\s-\$]*((\d{1,})){0,1}]$", "")

You probably want something like (but I am not sure of the actual question). Maybe you want to remove the whole regex if it does not match; that's not what the code below does:
s = Regex.Replace(s, #"^[^\w*\$]([\w*\$])[^\w*\$\s-]*([\w\s-\$]*).*$", "$1$2")
The idea is to interleave each wanted character blocks with list of forbidden characters and keep only those that you want. The end of your regex was a bit strange, so I simplified it.

Extract string between braces using RegEx, ie {{content}}

I am given a string that has place holders in the format of {{some_text}}. I would like to extract this into a collection using C# and believe RegEx is the best way to do this. RegEx is a little over my head but it seems powerful enough to work in this case. Here is my example:
<a title="{{element='title'}}" href="{{url}}">
<img border="0" alt="{{element='title'}}" src="{{element='photo' property='src' maxwidth='135'}}" width="135" height="135" /></a>
<span>{{element='h1'}}</span>
<span><strong>{{element='price'}}<br /></strong></span>
I would like to end up with something like this:
collection[0] = "element='title'";
collection[1] = "url";
collection[2] = "element='photo' property='src' maxwidth='135'";
collection[3] = "element='h1'";
collection[4] = "element='price'";
Notice that there are no duplicates either, but I do not want to complicate things if it is difficult to do.
I saw this post that does something similar but within brackets:
How to extract the contents of square brackets in a string of text in c# using Regex
My problem here is that I have double braces instead of just one character. How can I do this?

Taking exactly from the question you linked:
ICollection<string> matches =
Regex.Matches(s.Replace(Environment.NewLine, ""), #"\{\{([^}]*)\}\}")
.Cast<Match>()
.Select(x => x.Groups[1].Value)
.ToList();
foreach (string match in matches)
Console.WriteLine(match);
I've changed the [ and ] to {{ and }} (escaped). This should make the collection you need. Be sure to read the first answer to the other question for the regex breakdown. It's important to understand it if you use it.

RegEx is more than powerful enough for what you need.
Try this regular expression:
\{\{.*?\}\}
That will match expressions between double brackets, lazily.
Edit: that will give you the strings, including the double brackets. You can parse them manually, but if the regex engine supports lookahead and lookbehind, you can get what's inside directly with something like:
(?<=\{\{).*?(?=\}\})

You will need to get rid of the duplicates after you have the matches.
\{\{(.*?)}}
Result 1
element='title'
Result 2
url
Result 3
element='title'
Result 4
element='photo' property='src' maxwidth='135'
Result 5
element='h1'
Result 6
element='price'

C# regex replace unexpected behavior

Given $displayHeight = "800";, replace whatever number is at 800 with int value y_res.
resultString = Regex.Replace(
im_cfg_contents,
#"\$displayHeight[\s]*=[\s]*""(.*)"";",
Convert.ToString(y_res));
In Python I'd use re.sub and it would work. In .NET it replaces the whole line, not the matched group.
What is a quick fix?

Building on a couple of the answers already posted. The Zero-width assertion allows you to do a regular expression match without placing those characters in the match. By placing the first part of the string in a group we've separated it from the digits that you want to be replaced. Then by using a zero-width lookbehind assertion in that group we allow the regular expression to proceed as normal but omit the characters in that group in the match. Similarly, we've placed the last part of the string in a group, and used a zero-width lookahead assertion. Grouping Constructs on MSDN shows the groups as well as the assertions.
resultString = Regex.Replace(
im_cfg_contents,
#"(?<=\$displayHeight[\s]*=[\s]*"")(.*)(?="";)",
Convert.ToString(y_res));
Another approach would be to use the following code. The modification to the regular expression is just placing the first part in a group and the last part in a group. Then in the replace string, we add back in the first and third groups. Not quite as nice as the first approach, but not quite as bad as writing out the $displayHeight part. Substitutions on MSDN shows how the $ characters work.
resultString = Regex.Replace(
im_cfg_contents,
#"(\$displayHeight[\s]*=[\s]*"")(.*)("";)",
"${1}" + Convert.ToString(y_res) + "${3}");

Try this:
resultString = Regex.Replace(
im_cfg_contents,
#"\$displayHeight[\s]*=[\s]*""(.*)"";",
#"\$displayHeight = """ + Convert.ToString(y_res) + #""";");

It replaces the whole string because you've matched the whole string - nothing about this statement tells C# to replace just the matched group, it will find and store that matched group sure, but it's still matching the whole string overall.
You can either change your replacer to:
#"\$displayHeight = """ + Convert.ToString(y_res) + #""";"
..or you can change your pattern to just match the digits, i.e.:
#"[0-9]+"
..or you could see if C# regex supports lookarounds (I'm not sure if it does offhand) and change your match accordingly.

You could also try this, though I think it is a little slower than my other method:
resultString = Regex.Replace(
im_cfg_contents,
"(?<=\\$displayHeight[\\s]*=[\\s]*\").*(?=\";)",
Convert.ToString(y_res));

Check this pattern out
(?<=(\$displayHeight\s*=\s*"))\d+(?=";)
A word about "lookaround".

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression to parse [A][B] into A and B - c#

.+ is a greedy match, so it grabs as much as possible. Use either \[([^]]+)\] or \[(.+?)\] In the first case, matching ] is not allowed, so "as much as possible" becomes shorter. The second uses a non-greedy match.

Your dot is grabbing the braces as well. You need to exclude braces: \[([^]]+)\] The [^]] matches any character except a close brace.

Try adding the 'lazy' specifier: Regex.Split(attr_str, #"\[(.+?)\]");

Try: var s = "[property1=text1][property2=text2]"; var matches = Regex.Matches(s, #"\[(.+?)\]") .Cast<Match>() .Select(m => m.Groups[1].Value);

Related

Using Regular Expression to match fields with an arbitrary delimiter

Regular Expression - Remove zeroes inside an expression

Filtering out non-matching characters when using a regular expression

Extract string between braces using RegEx, ie {{content}}

C# regex replace unexpected behavior

Categories

Resources