Trying to regex a string with backslashes and quotes - c#

I am trying to regex a string in csharp. I am expecting to pass a string with the following format:
<%=Application(\"DisplayName\")%>
and get back:
DisplayName
I am using the regex class to accomplish this:
var text = "<%=Application(\"DisplayName\")%>";
Regex regex = new Regex(#"(<\%=Application[\>\(\)][\\][""](.*?)[\\][""][\>\(\)k]%\>)");
var v = regex.Match(text);
var s = v.Groups[1].ToString();
I am expecting s to contain the output string, but it is coming back as "". I tried building the regex string step by step, but I can't get the \ or " to process correctly. Any help would be greatly appreciated. Thanks!

var text = "<%=Application(\"DisplayName\")%>";
Regex regex = new Regex(#"(<%=Application[>()][""](.*?)[""][>()k]%>)");
var v = regex.Match(text);
var s = v.Groups[1].ToString();

Your pattern is very close. Since the backslashes are not actually a part of the string, rather only in the string to escape the double quotes, they need to be left out of the regex pattern. Notice I removed the [\\] from before both of the double quotes [""].
Now, you expect DisplayName in Group[1]. Since Regex sticks the entire match in Group[0], that made your outer capture group (whole pattern in parenthesis) the first actual capture group (Making DisplayName actually Group[2]). For best practice, I changed the outer capture group to be a non-capture group by adding ?: to the open parenthesis. This ignores this full group and makes DisplayName Group[1]. Hope this helps.
Full test code:
var text = "<%=Application(\"DisplayName\")%>";
Regex regex = new Regex(#"(?:<\%=Application[\>\(\)][""](.*?)[""][\>\(\)k]%\>)");
var v = regex.Match(text);
var s = v.Groups[1].ToString();

Related

Exclude first and last quotation of string in regex result

I'm running a little c# program where I need to extract the escape-quoted words from a string.
Sample code from linqpad:
string s = "action = 0;\r\ndir = \"C:\\\\folder\\\\\";\r\nresult";
var pattern = "\".*?\"";
var result = Regex.Split(s, pattern);
result.Dump();
Input (actual input contains many more escaped even-number-of quotes):
"action = 0;\r\ndir = \"C:\\\\folder\\\\\";\r\nresult"
expected result
"C:\\folder\\"
actual result (2 items)
"action = 0;
dir = "
_____
";
result"
I get exactly the opposite of what I require. How can I make the regex ignore the starting (and ending) quote of the actual string? Why does it include them in the search? I've used the regex from similar SO questions but still don't get the intended result. I only want to filter by escape quotes.
Instead of using Regex.Split, try Regex.Match.
You don't need RegEx. Simply use String.Split(';') and the second array element will have the path you need. You can then Trim() it to get rid of the quotes and Remove() to get rid of the ndir part. Something like:
result = s.Split(';')[1].Trim("\r ".ToCharArray()).Remove(0, 7).Trim('"');

Regular Expression For JSON

I have a string -
xyz":abc,"lmn
I want to extract abc. what will be the regular expression for this ?
I am trying this -
/xyz\":(.*?),\"lmn/
But it is not fetching any result.
In c# you could use
var regex = new Regex(#"(?<=xyz\"":).*?(?=,\""lmn)");
var value = regex.Match(#"xyz"":abc,""lmn").Value;
Note this makes use of the c# verbatim string prefix # that means that \ is not treated as an escape character. You will need to use a double " so that a single " will be included in the string.
This regex makes use of prefix and suffix matching rules so that you can get the match without having to select the specific group from the result.
Alternatively you can use group matching
var regex=new Regex(#"xyz\"":(.*?),\""lmn");
var value = regex.Match(#"xyz"":abc,""lmn").Groups[1].Value;
You can check for the existence of a match by doing the following
var match = regex.Match(#"xyz"":abc,""lmn");
var isMatch = match.Success;
and then follow up with either match.Value or match.Groups[1].Value depending on which regex you used.
EDIT
Actually the escaping the " is not needed in a c# regex so you could use either of the following instead.
var regex = new Regex("(?<=xyz\":).*?(?=,\"lmn)");
var regex = new Regex("xyz\":(.*?),\"lmn");
These two do not use the verbatim string prefix, so the \" is translated into just " in the regex giving an a regex of (?<=xyz":).*?(?=,"lmn) or xyz":(.*?),"lmn
Additionally if the is an entire string match rather than a substring you would want one of the following.
var regex = new Regex("(?<=^xyz\":).*?(?=,\"lmn$)");
var regex = new Regex("^xyz\":(.*?),\"lmn$");

Extracting Numbers from String RegEx

I am really struggling with Regular Expressions and can't seem to extract the number from this string
"id":143331539043251,
I've tried with this ... but I'm getting compilation errors
var regex = new Regex(#""id:"\d+,");
Note that the full string contains other numbers I don't want. I want numbers between id: and the ending ,
Try this code:
var match = Regex.Match(input, #"\""id\"":(?<num>\d+)");
var yourNumber = match.Groups["num"].Value;
Then use extracted number yourNumber as a string or parse it to number type.
If all you need is the digits, just match on that:
[0-9]+
Note that I am not using \d as that would match on any digit (such as Arabic numerals) in the .NET regex engine.
Update, following comments on the question and on this answer - the following regex will match the pattern and place the matched numbers in a capturing group:
#"""id"":([0-9]+),"
Used as:
Regex.Match(#"""id"":143331539043251,", #"""id"":([0-9]+),").Groups[1].Value
Which returns 143331539043251.
If you are open to using LINQ try the following (c#):
string stringVariable = "123cccccbb---556876---==";
var f = (from a in stringVariable.ToCharArray() where Char.IsDigit(a) == true select a);
var number = String.Join("", f);

How can I use RegEx (Or Should I) to extract a string between the starting string '__' and ending with '__' or 'nothing'

RegEx has always confused me.
I have a string like this:
IDE\DiskDJ205GA20_____________________________A3VS____\5&1003ca0&0&0.0.0
Or Sometimes stored like this:
IDE\DiskSJ305GA23_____________________________PG33S\6&2003Sa0&0&0.0.0
I want to get the 'A3VS' or 'PG33S' string. It's my firmware and is varied in length and type. I used to use:
string[] split = PNP.Split('\\'); //where PHP is my string name
var start = split[1].LastIndexOf('_');
string mystring = split[1].Substring(start + 1);
But that only works for strings that don't end with __ after the firmware string. I noticed that some have an additional random '_' after it.
Is RegEx the way to solve this? Or is there another way better
just without RegEx it can be expressed like this:
var firmware = PNP.Split(new[] {'_'}, StringSplitOptions.RemoveEmptyEntries)[1].Split('\\')[0];
string s = split[1].TrimEnd('_');
string mystring = s.Substring(s.LastIndexOf('_') + 1);
If you want the RegEX way to do it here it is:
Regex regex = new Regex(#"\\.*_+(?<firmware>[A-Za-z0-9]+)_*\\");
var m1 = regex.Match("IDE\DiskSJ305GA23_____________________________PG33S\6&2003Sa0&0&0.0.0");
var g1 = m1.Groups["firmware"].Value;
//g1 == "PG33S"
Keep in mind you have to use [A-Za-z0-9] instead of \w in the capture subexpression since \w also matches an underscore (_).

C# Regex.Split - Subpattern returns empty strings

Hey, first time poster on this awesome community.
I have a regular expression in my C# application to parse an assignment of a variable:
NewVar = 40
which is entered in a Textbox. I want my regular expression to return (using Regex.Split) the name of the variable and the value, pretty straightforward. This is the Regex I have so far:
var r = new Regex(#"^(\w+)=(\d+)$", RegexOptions.IgnorePatternWhitespace);
var mc = r.Split(command);
My goal was to do the trimming of whitespace in the Regex and not use the Trim() method of the returned values. Currently, it works but it returns an empty string at the beginning of the MatchCollection and an empty string at the end.
Using the above input example, this is what's returned from Regex.Split:
mc[0] = ""
mc[1] = "NewVar"
mc[2] = "40"
mc[3] = ""
So my question is: why does it return an empty string at the beginning and the end?
Thanks.
The reson RegEx.Split is returning four values is that you have exactly one match, so RegEx.Split is returning:
All the text before your match, which is ""
All () groups within your match, which are "NewVar" and "40"
All the text after your match, which is ""
RegEx.Split's primary purpose is to extract any text between the matched regex, for example you could use RegEx.Split with a pattern of "[,;]" to split text on either commas or semicolons. In NET Framework 1.0 and 1.1, Regex.Split only returned the split values, in this case "" and "", but in NET Framework 2.0 it was modified to also include values matched by () within the Regex, which is why you are seeing "NewVar" and "40" at all.
What you were looking for is Regex.Match, not Regex.Split. It will do exactly what you want:
var r = new Regex(#"^(\w+)=(\d+)$");
var match = r.Match(command);
var varName = match.Groups[0].Value;
var valueText = match.Groups[1].Value;
Note that RegexOptions.IgnorePatternWhitespace means you can include extra spaces in your pattern - it has nothing to do with the matched text. Since you have no extra whitespace in your pattern it is unnecesssary.
From the docs, Regex.Split() uses the regular expression as the delimiter to split on. It does not split the captured groups out of the input string. Also, the IgnorePatternWhitespace ignore unescaped whitespace in your pattern, not the input.
Instead, try the following:
var r = new Regex(#"\s*=\s*");
var mc = r.Split(command);
Note that the whitespace is actually consumed as a part of the delimiter.

Categories

Resources