Select all words inside brackets (multiple matches) - c#

I need to split a string in C#. I think it is better to see the next example:
string formula="[[A]]*[[B]]"
string split = Regex.Match(formula, #"\[\[([^)]*)\]\]").Groups[1].Value;
I would like to get a list of strings with the word contained between '[[' and ']]' so, in this case, I should get 'A' and 'B', but I am getting this: A]]*[[B

Your main problem is that Regex.Match will match the first occurrence, and stop. From the documentation:
Searches the specified input string for the first occurrence of the regular expression specified in the Regex constructor.
You want Regex.Matches to get them all. This regex will work:
\[\[(.+?)\]\]
It will capture anything between [[ and ]]
so your code could look like:
string formula = "[[A]]*[[B]]";
var matches = Regex.Matches(formula, #"\[\[(.+?)\]\]");
var results = (from Match m in matches select m.Groups[1].ToString()).ToList();
// results contains "A" and "B"

The * matches as much as possible of the expression before it. Use a *? to match the smallest possible match.
See http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx#quantifiers
So your regex should be #"\[\[([^)]*?)\]\]"
Also, use Regex.Matches rather than Regex.Match, to get them all.

Related

c# regex matches example to extract result

I am trying to extract a character/digit from a string that is between single quotes and seems like i am failing to write the correct pattern.
Test string - only value that changes is the single character/digit in single quotes
[+] Random session part: 'm'
I am using the following pattern but it returns empty
var line = "[+] Random session part: 'm'";
Regex pattern = new Regex(#"(?<=\')(.*?)(?=\')");
Match match = pattern.Match(line);
Debug.Log($"{match.Groups["postfix"].Value}");
int postFix = int.Parse(match.Groups["postfix"].Value);
what am i missing?
You have an overly complicated regex, and looking for a group named 'postfix' in you match, while your regex does not have such a named group.
A simpler regex would be:
'(.)'
This looks for a single character between two single quotes, and has that character wrapped in a capture group. Put a breakpoint after your match row, and you can explore the matched object.
You can explore the regex above with your match here:
https://regexr.com/77b0m
BTW: Your code tries to parse the string "m" into an int, this will throw and error, your should probably handle that case with int.TryParse
you can use this regX :
'(.)' // match any string between single quotes
show result
or
(?<=\')(.*?)(?=\') //containing a non-greedy match
show result

Is it possible to use the backreference of match result as a match of another capture group?

I have the following string fragments where I want to match the contents of the key attribute and replace all it's occurrences with ***:
name="prefix1 - key_string suffix1" displayName="prefix1 - key_string suffix2" key="key_string" name2="prefix2 - key_string suffix1" desc="prefix1 - key_string suffix1"
I can easily match the attribute value key_string with something like (?<=key=")\b([^"]+) and replace it with *** so it will read like key="***", but can't seem to figure out how to replace other key_string occurrences using backreference.
Is it possible or do I need to split this into 2 regex passes: 1 to get the match result and another to replace the occurrences of match result?
It is not possible to find a string between double quotes after key= string and then replace all occurrences of the found match in the whole input string using a single Regex.Replace operation.
This would imply saving the value to some buffer, then seek to the string beginning point and re-scan the whole input string. This is not possible since regular expression engine searches from left to right (by default) or from right to left (with the RegexOptions.RightToLeft option) but never allows to re-wind to the string start scan position.
The closest pattern would be (?<=key=\"([^\"]+)\".*)\1|(?=.*key=\"([^\"]+)\".*)\2 (see its demo online) but it is useless as the found match will remain, as it is the "pivot" for all matches (if it is removed before, the lookarounds will not match, and it cannot be remove later as the regex index will be long past the match).
So, use a two-step approach like
var match = Regex.Match(text, #"\bkey=""([^""]+)""")?.Groups[1].Value;
if (!string.IsNullOrEmpty(match)
{
sentence = sentence.Replace(match, ""); // If you just want to remove the found match anywhere inside the string
}

get number from a string after trimming 0 using Regex c#

I have a namestring like ( This is a file name stored in server)
Offer_2018-06-05_PROSP000033998_20180413165327.02155000.NML.050618.1040.67648.0
The file name format is given above. I need to get the number out of
PROSP000033998
and remove the leading zeros ( 33998) using Regex in C# . there are different values that will come instead of PROSP. So i want to use a regex to get the number instead of string split. Tried using (0|[1-9]\d*), but not sure whether this is correct as i got 2018 as the output
Regex regexLetterOfOffer = new Regex (#"0|[1-9]\d*");
Match match = regexLetterOfOffer.Match (fileInfo.Name);
if (match.Success)
{
Console.WriteLine (match.Value);
}
A generalized regular expression for alphabetical characters, possibly followed by zeros, then capturing digits with an underscore afterwards could be
[A-Z]0*([1-9]\d*)(?=_)
That is:
Regex regexLetterOfOffer = new Regex (#"[A-Z]0*([1-9]\d*)(?=_)");
Match match = regexLetterOfOffer.Match("Offer_2018-06-05_PROSP000033998_20180413165327.02155000.NML.050618.1040.67648.0");
if (match.Success)
{
Console.WriteLine (match.Groups[1].Value);
}
This will match similar strings whose digit sequences start with something other than PROSP.
Putting (0|[1-9]\d*) into https://java-regex-tester.appspot.com/ shows that it is actually matching the number you want, it's just also matching all the other numbers in the string. The Match method only returns the first one, 2018 in this case. To only match the part you're after, you could use PROSP0*([1-9]\d*) as the regex. The brackets () around the last part make it a capturing group, which you can retrieve using the Groups property of the Match object:
Console.WriteLine(match.Groups[1].Value)
(Group 0 is the whole match, hence we want group 1.)

Regex to extract string between quotes

I'm trying to extract a string between two quotes, and I thought I had my regex working, but it's giving me two strings in my GroupCollection, and I can't get it to ignore the first one, which includes the first quote and ID=
The string that I want to parse is
Test ID="12345" hello
I want to return 12345 in a group, so that I can manipulate it in code later. I've tried the following regex: http://regexr.com/3bgtl, with this code:
nodeValue = "Test ID=\"12345\" hello";
GroupCollection ids = Regex.Match(nodeValue, "ID=\"([^\"]*)").Groups;
The problem is that the GroupCollection contains two entries:
ID="12345
12345
I just want it to return the second one.
Use positive lookbehind operator:
GroupCollection ids = Regex.Match(nodeValue, "(?<=ID=\")[^\"]*").Groups;
You also used a capturing group (the parenthesis), this is why you get 2 results.
There are a few ways to accomplish this. I like named capture groups for readability.
Regex with named capture group:
"(?<capture>.*?)"
And your code would be:
match.Groups["capture"].Value
Your code is totally OK and is the most efficient from all the solutions suggested here. Capturing groups allow the quickest and least resource-consuming way to match substrings inside larger texts.
All you need to do with your regex is just access the captured group 1 that is defined by the round brackets. Like this:
var nodeValue = "Test ID=\"12345\" hello";
GroupCollection ids = Regex.Match(nodeValue, "ID=\"([^\"]*)").Groups;
Console.WriteLine(ids[1].Value);
// or just on one line
// Console.WriteLine(Regex.Match(nodeValue, "ID=\"([^\"]*)").Groups[1].Value);
See IDEONE demo
Please have a look at Grouping Constructs in Regular Expressions:
Grouping constructs delineate the subexpressions of a regular expression and capture the substrings of an input string. You can use grouping constructs to do the following:
Match a subexpression that is repeated in the input string.
Apply a quantifier to a subexpression that has multiple regular expression language elements. For more information about quantifiers, see [Quantifiers in Regular Expressions][3].
Include a subexpression in the string that is returned by the [Regex.Replace][4] and [Match.Result][5] methods.
Retrieve individual subexpressions from the [Match.Groups][6] property and process them separately from the matched text as a whole.
Note that if you do not need overlapping matches, capturing group mechanism is the best solution here.

Extending [^,]+, Regular Expression in C#

Duplicate
Regex for variable declaration and initialization in c#
I was looking for a Regular Expression to parse CSV values, and I came across this Regular Expression
[^,]+
Which does my work by splitting the words on every occurance of a ",". What i want to know is say I have the string
value_name v1,v2,v3,v4,...
Now I want a regular expression to find me the words v1,v2,v3,v4..
I tried ->
^value_name\s+([^,]+)*
But it didn't work for me. Can you tell me what I am doing wrong? I remember working on regular expressions and their statemachine implementation. Doesn't it work in the same way.
If a string starts with Value_name followed by one or more whitespaces. Go to Next State. In That State read a word until a "," comes. Then do it again! And each word will be grouped!
Am i wrong in understanding it?
You could use a Regex similar to those proposed:
(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?
The first group is non-capturing and would match the start of the line and the value_name.
To ensure that the Regex is still valid over all matches, we make that group optional by using the '?' modified (meaning match at most once).
The second group is capturing and would match your vXX data.
The third group is non-capturing and would match the ,, and any whitespace before and after it.
Again, we make it optional by using the '?' modifier, otherwise the last 'vXX' group would not match unless we ended the string with a final ','.
In you trials, the Regex wouldn't match multiple times: you have to remember that if you want a Regex to match multiple occurrences in a strings, the whole Regex needs to match every single occurrence in the string, so you have to build your Regex not only to match the start of the string 'value_name', but also match every occurrence of 'vXX' in it.
In C#, you could list all matches and groups using code like this:
Regex r = new Regex(#"(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?");
Match m = r.Match(subjectString);
while (m.Success) {
for (int i = 1; i < m.Groups.Count; i++) {
Group g = m.Groups[i];
if (g.Success) {
// matched text: g.Value
// match start: g.Index
// match length: g.Length
}
}
m = m.NextMatch();
}
I would expect it only to get v1 in the group, because the first comma is "blocking" it from grabbing the rest of the fields. How you handle this is going to depend on the methods you use on the regular expression, but it may make sense to make two passes, first grab all the fields seperated by commas and then break things up on spaces. Perhaps ^value_name\s+(?:([^,]+),?)* instead.
Oh yeah, lists....
/(?:^value_name\s+|,\s*)([^,]+)/g will theoreticly grab them, but you will have to use RegExp.exec() in a loop to get the capture, rather than the whole match.
I wish pre-matches worked in JS :(.
Otherwise, go with Logan's idea: /^value_name\s+([^,]+(?:,\s*[^,]+)*)$/ followed by .split(/,\s*/);

Categories

Resources