I am trying to extract a character/digit from a string that is between single quotes and seems like i am failing to write the correct pattern.
Test string - only value that changes is the single character/digit in single quotes
[+] Random session part: 'm'
I am using the following pattern but it returns empty
var line = "[+] Random session part: 'm'";
Regex pattern = new Regex(#"(?<=\')(.*?)(?=\')");
Match match = pattern.Match(line);
Debug.Log($"{match.Groups["postfix"].Value}");
int postFix = int.Parse(match.Groups["postfix"].Value);
what am i missing?
You have an overly complicated regex, and looking for a group named 'postfix' in you match, while your regex does not have such a named group.
A simpler regex would be:
'(.)'
This looks for a single character between two single quotes, and has that character wrapped in a capture group. Put a breakpoint after your match row, and you can explore the matched object.
You can explore the regex above with your match here:
https://regexr.com/77b0m
BTW: Your code tries to parse the string "m" into an int, this will throw and error, your should probably handle that case with int.TryParse
you can use this regX :
'(.)' // match any string between single quotes
show result
or
(?<=\')(.*?)(?=\') //containing a non-greedy match
show result
Related
I have the following string fragments where I want to match the contents of the key attribute and replace all it's occurrences with ***:
name="prefix1 - key_string suffix1" displayName="prefix1 - key_string suffix2" key="key_string" name2="prefix2 - key_string suffix1" desc="prefix1 - key_string suffix1"
I can easily match the attribute value key_string with something like (?<=key=")\b([^"]+) and replace it with *** so it will read like key="***", but can't seem to figure out how to replace other key_string occurrences using backreference.
Is it possible or do I need to split this into 2 regex passes: 1 to get the match result and another to replace the occurrences of match result?
It is not possible to find a string between double quotes after key= string and then replace all occurrences of the found match in the whole input string using a single Regex.Replace operation.
This would imply saving the value to some buffer, then seek to the string beginning point and re-scan the whole input string. This is not possible since regular expression engine searches from left to right (by default) or from right to left (with the RegexOptions.RightToLeft option) but never allows to re-wind to the string start scan position.
The closest pattern would be (?<=key=\"([^\"]+)\".*)\1|(?=.*key=\"([^\"]+)\".*)\2 (see its demo online) but it is useless as the found match will remain, as it is the "pivot" for all matches (if it is removed before, the lookarounds will not match, and it cannot be remove later as the regex index will be long past the match).
So, use a two-step approach like
var match = Regex.Match(text, #"\bkey=""([^""]+)""")?.Groups[1].Value;
if (!string.IsNullOrEmpty(match)
{
sentence = sentence.Replace(match, ""); // If you just want to remove the found match anywhere inside the string
}
I have a namestring like ( This is a file name stored in server)
Offer_2018-06-05_PROSP000033998_20180413165327.02155000.NML.050618.1040.67648.0
The file name format is given above. I need to get the number out of
PROSP000033998
and remove the leading zeros ( 33998) using Regex in C# . there are different values that will come instead of PROSP. So i want to use a regex to get the number instead of string split. Tried using (0|[1-9]\d*), but not sure whether this is correct as i got 2018 as the output
Regex regexLetterOfOffer = new Regex (#"0|[1-9]\d*");
Match match = regexLetterOfOffer.Match (fileInfo.Name);
if (match.Success)
{
Console.WriteLine (match.Value);
}
A generalized regular expression for alphabetical characters, possibly followed by zeros, then capturing digits with an underscore afterwards could be
[A-Z]0*([1-9]\d*)(?=_)
That is:
Regex regexLetterOfOffer = new Regex (#"[A-Z]0*([1-9]\d*)(?=_)");
Match match = regexLetterOfOffer.Match("Offer_2018-06-05_PROSP000033998_20180413165327.02155000.NML.050618.1040.67648.0");
if (match.Success)
{
Console.WriteLine (match.Groups[1].Value);
}
This will match similar strings whose digit sequences start with something other than PROSP.
Putting (0|[1-9]\d*) into https://java-regex-tester.appspot.com/ shows that it is actually matching the number you want, it's just also matching all the other numbers in the string. The Match method only returns the first one, 2018 in this case. To only match the part you're after, you could use PROSP0*([1-9]\d*) as the regex. The brackets () around the last part make it a capturing group, which you can retrieve using the Groups property of the Match object:
Console.WriteLine(match.Groups[1].Value)
(Group 0 is the whole match, hence we want group 1.)
I need to split a string in C#. I think it is better to see the next example:
string formula="[[A]]*[[B]]"
string split = Regex.Match(formula, #"\[\[([^)]*)\]\]").Groups[1].Value;
I would like to get a list of strings with the word contained between '[[' and ']]' so, in this case, I should get 'A' and 'B', but I am getting this: A]]*[[B
Your main problem is that Regex.Match will match the first occurrence, and stop. From the documentation:
Searches the specified input string for the first occurrence of the regular expression specified in the Regex constructor.
You want Regex.Matches to get them all. This regex will work:
\[\[(.+?)\]\]
It will capture anything between [[ and ]]
so your code could look like:
string formula = "[[A]]*[[B]]";
var matches = Regex.Matches(formula, #"\[\[(.+?)\]\]");
var results = (from Match m in matches select m.Groups[1].ToString()).ToList();
// results contains "A" and "B"
The * matches as much as possible of the expression before it. Use a *? to match the smallest possible match.
See http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx#quantifiers
So your regex should be #"\[\[([^)]*?)\]\]"
Also, use Regex.Matches rather than Regex.Match, to get them all.
I'm trying to split a string into tokens (via regular expressions)
in the following way:
Example #1
input string: 'hello'
first token: '
second token: hello
third token: '
Example #2
input string: 'hello world'
first token: '
second token: hello world
third token: '
Example #3
input string: hello world
first token: hello
second token: world
i.e., only split up the string if it is NOT in single quotation marks, and single quotes should be in their own token.
This is what I have so far:
string pattern = #"'|\s";
Regex RE = new Regex(pattern);
string[] tokens = RE.Split("'hello world'");
This will work for example #1 and example #3 but it will NOT work for example #2.
I'm wondering if there's theoretically a way to achieve what I want with regular expressions
You could build a simple lexer, which would involve consuming each of the tokens one by one. So you would have a list of regular expressions and would attempt to match one of them at each point. That is the easiest and cleanest way to do this if your input is anything beyond the very simple.
Use a token parsor to split into tokens. Use regex to find a string patterns
'[^']+' will match text inside single quotes. If you want it grouped, (')([^']+)('). If no matches are found, then just use a regular string split. I don't think it makes sense to try to do the whole thing in one regular expression.
EDIT: It seems from your comments on the question that you actually want this applied over a larger block of text rather than just simple inputs like you indicated. If that's the case, then I don't think a regular expression is your answer.
While it would be possible to match ' and the text inside separately, and also alternatively match the text alone, RegExp does not allow an indefinite number of matches. Or better said, you can only match those objects you explicitely state in the expression. So ((\w+)+\b) could theoretically match all words one-by-one. The outer group will correctly match the whole text, and also the inner group will match the words separately correctly, but you will only be able to reference the last match.
There is no way to match a group of matched matches (weird sentence). The only possible way would be to match the string and then split it into separate words.
Not exactly what you are trying to do, but regular expression conditions might help out as you look for a solution:
(?<quot>')?(?<words>(?(quot)[^']|\w)+)(?(quot)')
If a quote is found, then it matches until a non-quote is found. Otherwise looks at word characters. Your results are in groups named "quot" and "words".
You'll have hard time using Split here, but you can use a MatchCollection to find all matches in your string:
string str = "hello world, 'HELLO WORLD': we'll be fine.";
MatchCollection matches = Regex.Matches(str, #"(')([^']+)(')|(\w+)");
The regex searches for a string between single quotes. If it cannot find one, it takes a single word.
Now it gets a little tricky - .net returns a collection of Matchs. Each Match has several Groups - the first Group has the whole string ('hello world'), but the rest have sub-matches (',hello world,'). Also, you get many empty unsuccessful Groups.
You can still iterate easily and get your matches. Here's an example using LINQ:
var tokens = from match in matches.Cast<Match>()
from g in match.Groups.Cast<Group>().Skip(1)
where g.Success
select g.Value;
tokens is now a collection of strings:
hello, world, ', HELLO WORLD, ', we, ll, be, fine
You can first split on quoted string, and then further tokenize.
foreach (String s in Regex.Split(input, #"('[^']+')")) {
// Check first if s is a quote.
// If so, split out the quotes.
// If not, do what you intend to do.
}
(Note: you need the brackets in the pattern to make sure Regex.Split returns those too)
Try this Regular Expression:
([']*)([a-z]+)([']*)
This finds 1 or more single quotes at the beginning and end of a string. It then finds 1 or more characters in the a-z set (if you don't set it to be case insensitive it will only find lower case characters). It groups these so that group 1 has the ', group 2 (or more) has the words which are split by anything that is not a character a - z and the last group has the single quote if it exists.
Duplicate
Regex for variable declaration and initialization in c#
I was looking for a Regular Expression to parse CSV values, and I came across this Regular Expression
[^,]+
Which does my work by splitting the words on every occurance of a ",". What i want to know is say I have the string
value_name v1,v2,v3,v4,...
Now I want a regular expression to find me the words v1,v2,v3,v4..
I tried ->
^value_name\s+([^,]+)*
But it didn't work for me. Can you tell me what I am doing wrong? I remember working on regular expressions and their statemachine implementation. Doesn't it work in the same way.
If a string starts with Value_name followed by one or more whitespaces. Go to Next State. In That State read a word until a "," comes. Then do it again! And each word will be grouped!
Am i wrong in understanding it?
You could use a Regex similar to those proposed:
(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?
The first group is non-capturing and would match the start of the line and the value_name.
To ensure that the Regex is still valid over all matches, we make that group optional by using the '?' modified (meaning match at most once).
The second group is capturing and would match your vXX data.
The third group is non-capturing and would match the ,, and any whitespace before and after it.
Again, we make it optional by using the '?' modifier, otherwise the last 'vXX' group would not match unless we ended the string with a final ','.
In you trials, the Regex wouldn't match multiple times: you have to remember that if you want a Regex to match multiple occurrences in a strings, the whole Regex needs to match every single occurrence in the string, so you have to build your Regex not only to match the start of the string 'value_name', but also match every occurrence of 'vXX' in it.
In C#, you could list all matches and groups using code like this:
Regex r = new Regex(#"(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?");
Match m = r.Match(subjectString);
while (m.Success) {
for (int i = 1; i < m.Groups.Count; i++) {
Group g = m.Groups[i];
if (g.Success) {
// matched text: g.Value
// match start: g.Index
// match length: g.Length
}
}
m = m.NextMatch();
}
I would expect it only to get v1 in the group, because the first comma is "blocking" it from grabbing the rest of the fields. How you handle this is going to depend on the methods you use on the regular expression, but it may make sense to make two passes, first grab all the fields seperated by commas and then break things up on spaces. Perhaps ^value_name\s+(?:([^,]+),?)* instead.
Oh yeah, lists....
/(?:^value_name\s+|,\s*)([^,]+)/g will theoreticly grab them, but you will have to use RegExp.exec() in a loop to get the capture, rather than the whole match.
I wish pre-matches worked in JS :(.
Otherwise, go with Logan's idea: /^value_name\s+([^,]+(?:,\s*[^,]+)*)$/ followed by .split(/,\s*/);