Regular Expression For JSON - c#

I have a string -
xyz":abc,"lmn
I want to extract abc. what will be the regular expression for this ?
I am trying this -
/xyz\":(.*?),\"lmn/
But it is not fetching any result.

In c# you could use
var regex = new Regex(#"(?<=xyz\"":).*?(?=,\""lmn)");
var value = regex.Match(#"xyz"":abc,""lmn").Value;
Note this makes use of the c# verbatim string prefix # that means that \ is not treated as an escape character. You will need to use a double " so that a single " will be included in the string.
This regex makes use of prefix and suffix matching rules so that you can get the match without having to select the specific group from the result.
Alternatively you can use group matching
var regex=new Regex(#"xyz\"":(.*?),\""lmn");
var value = regex.Match(#"xyz"":abc,""lmn").Groups[1].Value;
You can check for the existence of a match by doing the following
var match = regex.Match(#"xyz"":abc,""lmn");
var isMatch = match.Success;
and then follow up with either match.Value or match.Groups[1].Value depending on which regex you used.
EDIT
Actually the escaping the " is not needed in a c# regex so you could use either of the following instead.
var regex = new Regex("(?<=xyz\":).*?(?=,\"lmn)");
var regex = new Regex("xyz\":(.*?),\"lmn");
These two do not use the verbatim string prefix, so the \" is translated into just " in the regex giving an a regex of (?<=xyz":).*?(?=,"lmn) or xyz":(.*?),"lmn
Additionally if the is an entire string match rather than a substring you would want one of the following.
var regex = new Regex("(?<=^xyz\":).*?(?=,\"lmn$)");
var regex = new Regex("^xyz\":(.*?),\"lmn$");

Related

Trying to regex a string with backslashes and quotes

I am trying to regex a string in csharp. I am expecting to pass a string with the following format:
<%=Application(\"DisplayName\")%>
and get back:
DisplayName
I am using the regex class to accomplish this:
var text = "<%=Application(\"DisplayName\")%>";
Regex regex = new Regex(#"(<\%=Application[\>\(\)][\\][""](.*?)[\\][""][\>\(\)k]%\>)");
var v = regex.Match(text);
var s = v.Groups[1].ToString();
I am expecting s to contain the output string, but it is coming back as "". I tried building the regex string step by step, but I can't get the \ or " to process correctly. Any help would be greatly appreciated. Thanks!
var text = "<%=Application(\"DisplayName\")%>";
Regex regex = new Regex(#"(<%=Application[>()][""](.*?)[""][>()k]%>)");
var v = regex.Match(text);
var s = v.Groups[1].ToString();
Your pattern is very close. Since the backslashes are not actually a part of the string, rather only in the string to escape the double quotes, they need to be left out of the regex pattern. Notice I removed the [\\] from before both of the double quotes [""].
Now, you expect DisplayName in Group[1]. Since Regex sticks the entire match in Group[0], that made your outer capture group (whole pattern in parenthesis) the first actual capture group (Making DisplayName actually Group[2]). For best practice, I changed the outer capture group to be a non-capture group by adding ?: to the open parenthesis. This ignores this full group and makes DisplayName Group[1]. Hope this helps.
Full test code:
var text = "<%=Application(\"DisplayName\")%>";
Regex regex = new Regex(#"(?:<\%=Application[\>\(\)][""](.*?)[""][\>\(\)k]%\>)");
var v = regex.Match(text);
var s = v.Groups[1].ToString();

Regular Expression : Check string expression and then filter out value

Trying to figure out to match to a regular expression and then get a value from that string.
The string values would be something like this:
computerFileHardware20131211.pdf
computerFileSoftware20131322.pdf
computerFileEngineering20232.pdf
Regex regex = new Regex(#"computerFile[^[A-Za-z]+$]([^0-9]+)\.pdf");
Match match = regex.Match("computerFileHardware20131211.pdf");
if (match.Success)
{
Console.WriteLine(match.Value);
}
So what I'm trying to do is make sure I can match to the regular expression and then be able to filter out the number value. So for example for computerFileHardware20131211.pdf the number value would be 20131211.
I'm not very good a regular expressions. I think my first hurdle is figuring out the regular expression. I read somewhere that you put parenthesis around the string you want to filter out. So that is why i have ([^0-9]+).
try something like https://regex101.com/r/KWiAg0/1
Regex regex = new Regex(#"computerFile[A-Za-z]+([0-9]+)\.pdf");
Match match = regex.Match("computerFileHardware20131211.pdf");
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
}
Regular expressions can contains "subexpressions" that are enclosed in parentheses.
Every subexpression forms a group. With the Groups property you can access to the various groups captured by the regular expression.
If you only want to replace the number:
string fileName = "computerFileHardware20131211";
string pattern = "[0-9]{1,}";
string replacement = "123";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(fileName , replacement);

Extracting Numbers from String RegEx

I am really struggling with Regular Expressions and can't seem to extract the number from this string
"id":143331539043251,
I've tried with this ... but I'm getting compilation errors
var regex = new Regex(#""id:"\d+,");
Note that the full string contains other numbers I don't want. I want numbers between id: and the ending ,
Try this code:
var match = Regex.Match(input, #"\""id\"":(?<num>\d+)");
var yourNumber = match.Groups["num"].Value;
Then use extracted number yourNumber as a string or parse it to number type.
If all you need is the digits, just match on that:
[0-9]+
Note that I am not using \d as that would match on any digit (such as Arabic numerals) in the .NET regex engine.
Update, following comments on the question and on this answer - the following regex will match the pattern and place the matched numbers in a capturing group:
#"""id"":([0-9]+),"
Used as:
Regex.Match(#"""id"":143331539043251,", #"""id"":([0-9]+),").Groups[1].Value
Which returns 143331539043251.
If you are open to using LINQ try the following (c#):
string stringVariable = "123cccccbb---556876---==";
var f = (from a in stringVariable.ToCharArray() where Char.IsDigit(a) == true select a);
var number = String.Join("", f);

C# Regular Expressions

I have a string that has multiple regular expression groups, and some parts of the string that aren't in the groups. I need to replace a character, in this case ^ only within the groups, but not in the parts of the string that aren't in a regex group.
Here's the input string:
STARTDONTREPLACEME^ENDDONTREPLACEME~STARTREPLACEME^ENDREPLACEME~STARTREPLACEME^BLAH^ENDREPLACEME~STARTDONTREPLACEME^BLAH^ENDDONTREPLACEME~
Here's what the output string should look like:
STARTDONTREPLACEME^ENDDONTREPLACEME~STARTREPLACEMEENDREPLACEME~STARTREPLACEMEBLAHENDREPLACEME~STARTDONTREPLACEME^BLAH^ENDDONTREPLACEME~
I need to do it using C# and can use regular expressions.
I can match the string into groups of those that should and shouldn't be replaced, but am struggling on how to return the final output string.
I'm not sure I get exactly what you're having trouble with, but it didn't take long to come up with this result:
string strRegex = #"STARTREPLACEME(.+)ENDREPLACEME";
RegexOptions myRegexOptions = RegexOptions.None;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"STARTDONTREPLACEME^ENDDONTREPLACEME~STARTREPLACEME^ENDREPLACEME~STARTREPLACEME^BLAH^ENDREPLACEME~STARTDONTREPLACEME^BLAH^ENDDONTREPLACEME~";
string strReplace = "STARTREPLACEMEENDREPLACEME";
return myRegex.Replace(strTargetString, strReplace);
By using my favorite online Regex tool: http://regexhero.net/tester/
Is that helpful?
Regex rgx = new Regex(
#"\^(?=(?>(?:(?!(?:START|END)(?:DONT)?REPLACEME).)*)ENDREPLACEME)");
string s1 = rgx.Replace(s0, String.Empty);
Explanation: Each time a ^ is found, the lookahead scans ahead for an ending delimiter (ENDREPLACEME). If it finds one without seeing any of the other delimiters first, the match must have occurred inside a REPLACEME group. If the lookahead reports failure, it indicates that the ^ was found either between groups or within a DONTREPLACEME group.
Because lookaheads are zero-width assertions, only the ^ will actually be consumed in the event of a successful match.
Be aware that this will only work if delimiters are always properly balanced and groups are never nested within other groups.
If you are able to separate into groups that should be replaced and those that shouldn't, then instead of providing a single replacement string, you should be able to use a MatchEvaluator (a delegate that takes a Match and returns a string) to make the decision of which case it is currently dealing with and return the replacement string for that group alone.
You may also use an additional regex inside the MatchEvaluator. This solution produces the expected output:
Regex outer = new Regex(#"STARTREPLACEME.+ENDREPLACEME", RegexOptions.Compiled);
Regex inner = new Regex(#"\^", RegexOptions.Compiled);
string replaced = outer.Replace(start, m =>
{
return inner.Replace(m.Value, String.Empty);
});

C# Regex.Split - Subpattern returns empty strings

Hey, first time poster on this awesome community.
I have a regular expression in my C# application to parse an assignment of a variable:
NewVar = 40
which is entered in a Textbox. I want my regular expression to return (using Regex.Split) the name of the variable and the value, pretty straightforward. This is the Regex I have so far:
var r = new Regex(#"^(\w+)=(\d+)$", RegexOptions.IgnorePatternWhitespace);
var mc = r.Split(command);
My goal was to do the trimming of whitespace in the Regex and not use the Trim() method of the returned values. Currently, it works but it returns an empty string at the beginning of the MatchCollection and an empty string at the end.
Using the above input example, this is what's returned from Regex.Split:
mc[0] = ""
mc[1] = "NewVar"
mc[2] = "40"
mc[3] = ""
So my question is: why does it return an empty string at the beginning and the end?
Thanks.
The reson RegEx.Split is returning four values is that you have exactly one match, so RegEx.Split is returning:
All the text before your match, which is ""
All () groups within your match, which are "NewVar" and "40"
All the text after your match, which is ""
RegEx.Split's primary purpose is to extract any text between the matched regex, for example you could use RegEx.Split with a pattern of "[,;]" to split text on either commas or semicolons. In NET Framework 1.0 and 1.1, Regex.Split only returned the split values, in this case "" and "", but in NET Framework 2.0 it was modified to also include values matched by () within the Regex, which is why you are seeing "NewVar" and "40" at all.
What you were looking for is Regex.Match, not Regex.Split. It will do exactly what you want:
var r = new Regex(#"^(\w+)=(\d+)$");
var match = r.Match(command);
var varName = match.Groups[0].Value;
var valueText = match.Groups[1].Value;
Note that RegexOptions.IgnorePatternWhitespace means you can include extra spaces in your pattern - it has nothing to do with the matched text. Since you have no extra whitespace in your pattern it is unnecesssary.
From the docs, Regex.Split() uses the regular expression as the delimiter to split on. It does not split the captured groups out of the input string. Also, the IgnorePatternWhitespace ignore unescaped whitespace in your pattern, not the input.
Instead, try the following:
var r = new Regex(#"\s*=\s*");
var mc = r.Split(command);
Note that the whitespace is actually consumed as a part of the delimiter.

Categories

Resources