Simple question here guys. I'm attempting to create a pattern to use with a Regex in C#.
Here is my attempt:
"(value\":\[\[\"([A-Za-z0-9]+(?:-{0,1})[A-Za-z0-9]+)\"\]\])"
However for some reason when I go to compile this I get "Unrecognized escape sequence" on the brackets. Can I not simply use \ to escape the brackets?
The strings I'm searching for have the form of
value":[["AB-AB"]]
or
value":[["ABAB"]]
and I'd like to grab group[1] from the results.
The C# compiler by default disallows escape sequences it does not recognize. You can override this behavior by using "#" like this:
#"(value\"":\[\[\""([A-Za-z0-9]+(?:-{0,1})[A-Za-z0-9]+)\""\]\])"
Edit:
The # sign is a little more complicated than that. To quote #Guffa:
A # delimited string simply doesn't use backslash for escape
sequences.
Furthermore it should be noted that the replacement for \" in such a string is ""
I would recommend placing your pattern inside a verbatim string literal while implementing a negated character class to match the context; then reference the first group to grab the match results.
String s = #"I have value"":[[""AB-AB""]] and value"":[[""ABAB""]]";
foreach (Match m in Regex.Matches(s, #"value"":\[\[""([^""]+)""]]"))
Console.WriteLine(m.Groups[1].Value);
Output
AB-AB
ABAB
Related
I need to escape all single quotes strings with literals and I am using the following regular expression:
'[^']*'
It is working fine, except when I have escaped single quotes in the string that must be replaced itself. For example, for the following string:
[COMPUTE:{IIF([Client Name] LIKE '%Happy%', 'Happy\'s', 'Other\'s Jr.')}]
I have these matches:
%Happy%
Happy\
,
s Jr.
I can replace the \' with some other sequence of characters (for example internal_value) and than to perform the string replacement, but it will be more clearer if I can do this with the regular expression instead.
You just need a negative look behind. Basically just use a lazy match to anything .*? then you can put a negative look behind for a backslash (?<!\\) before the end single quote.
var reg = new Regex(#"'.*?(?<!\\)'");
foreach(Match m in reg.Matches(#"[COMPUTE:{IIF([Client Name] LIKE '%Happy%', 'Happy\'s', 'Other\'s Jr.')}]"))
Console.WriteLine(m);
outputs
'%Happy%'
'Happy\'s'
'Other\'s Jr.'
You can use
'((?:[^'\\]+|\\.)*)'
(modified a bit from the bible of regex :- Mastering Regular expression)
regexstorm demo
I have following Regex on C# and its causing Error: C# Unrecognized escape sequence on \w \. \/ .
string reg = "<a href=\"[\w\.\/:]+\" target=\"_blank\">.?<img src=\"(?<imgurl>\w\.\/:])+\"";
Regex regex = new Regex(reg);
I also tried
string reg = #"<a href="[w./:]+" target=\"_blank\">.?<img src="(?<imgurl>w./:])+"";
But this way the string "ends" at href=" "-char
Can anyone help me please?
Use "" to escape quotations when using the # literal.
There are two escaping mechanisms at work here, and they interfere. For example, you use \" to tell C# to escape the following double quote, but you also use \w to tell the regular expression parser to treat the following W special. But C# thinks \w is meant for C#, doesn't understand it, and you get a compiler error.
For example take this example text:
<a href="file://C:\Test\Test2\[\w\.\/:]+">
There are two ways to escape it such that C# accepts it.
One way is to escape all characters that are special to C#. In this case the " is used to denote the end of the string, and \ denotes a C# escape sequence. Both need to be prefixed with a C# escape \ to escape them:
string s = "<a href=\"file://C:\\Test\\Test2\\[\\w\\.\\/:]+\">";
But this often leads to ugly strings, especially when used with paths or regular expressions.
The other way is to prefix the string with # and escape only the " by replacing them with "":
string s = #"<a href=""file://C:\Test\Test2\[\w\.\/:]+"">";
The # will prevent C# from trying to interpret the \ in the string as escape characters, but since \" will not be recognized then either, they invented the "" to escape the double quote.
Here's a better regex, yours is filled with problems:
string reg = #"<a href=""[\w./:]+"" target=""_blank"">.?<img src=""(?<imgurl>[\w./:]+)""";
Regex regex = new Regex(reg);
var m = regex.Match(#"http://www.yahoo.com""
target=""_blank"">http://flickr.com/something.jpg""");
Catches <a href="http://www.yahoo.com" target="_blank"><img src="http://flickr.com/something.jpg".
Problems with yours: Forward slashes don't need to be escaped, missing the [ bracket in the img part, putting the ) in the right position in the closing of the group.
However, as has been said many times, HTML is not structured enough to be caught by regex. But if you need to get something quick and dirty done, it will do.
Here's the deal. C# Strings recognize certain character combinations as specific special characters to manipulate strings. Maybe you are familiar with inserting a \n in a string to work as and End of Line character, for example?
When you put a single \ in a string, it will try to verify it, along with the next character, as one of these special commands, and will throw an error when its not a valid combination.
Fortunately, that does not prevent you from using backslashes, as one of those sequences, \\, works for that purpose, being interpreted as a single backslash.
So, in practice, if you substitute every backslash in your string for a double backslash, it should work properly.
I was using Regex and I tried to write:
Regex RegObj2 = new Regex("\w[a][b][(c|d)][(c|d)].\w");
Gives me this error twice, one for each appearance of \w:
unrecognized escape sequence
What am I doing wrong?
You are not escaping the \s in a non-verbatim string literal.
Solution: put a # in front of the string or double the backslashes, as per the C# rules for string literals.
Try to escape the escape ;)
Regex RegObj2 = new Regex("\\w[a][b][(c|d)][(c|d)].\\w");
or add a # (as #Dominic Kexel suggested)
There are two levels of potential escaping required when writing a regular expression:
The regular expression escaping (e.g. escaping brackets, or in this case specifying a character class)
The C# string literal escaping
In this case, it's the latter which is tripping you up. Either escape the \ so that it becomes part of the string, or use a verbatim string literal (with an # prefix) so that \ doesn't have its normal escaping meaning. So either of these:
Regex regex1 = new Regex(#"\w[a][b][(c|d)][(c|d)].\w");
Regex regex2 = new Regex("\\w[a][b][(c|d)][(c|d)].\\w");
The two approaches are absolutely equivalent at execution time. In both cases you're trying to create a string constant with the value
\w[a][b][(c|d)][(c|d)].\w
The two forms are just different ways of expressing this in C# source code.
The backslashes are not being escaped e.g. \\ or
new Regex(#"\w[a][b][(c|d)][(c|d)].\w");
Im using C# and wanting to use the following regular expression in my code:
sDatabaseServer\s*=\s*"([^"]*)"
I have placed it in my code as:
Regex databaseServer = new Regex(#"sDatabaseServer\s*=\s*"([^"]*)"", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
I know you have to escape all parenthesis and quotes inside the string quotes but for some reason the following does still not work:
Working Version:
Regex databaseServer = new Regex(#"sDatabaseServer\s*=\s*""([^""]*)""", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
Any ideas how to get C# to see my regex as just a string? I know i know....easy question...Sorry im still somewhat of an amateur to C#...
SOLVED: Thanks guys!
You went one step too far when you escaped the parentheses. If you want them to be regex meta-characters (i.e. a capturing group), then you must not escape them. Otherwise they will match literal parentheses.
So this is probably what you are looking for:
#"sDatabaseServer\s*=\s*""([^""]*)"""
string regex = "sDatabaseServer\\s*=\\s*\"([^\"]*)\""
in your first try, you forgot to escape your quotes. But since it's a string literal, escaping with a \ doesn't work.
In y our second try, you escaped the quotes, but you didn't escape the \ that's needed for your whitespace token \s
Use \x22 instead of quotes:
string pattern = #"sDatabaseServer\s*=\s*\x22([^\x22]*)\x22";
But
Ignorepattern whitespace allows for comments in the regex pattern (the # sign) or the pattern split over multiple lines. You don't have either; remove.
A better pattern for what you seek is
string pattern =#"(?:sDatabaseServer\s*=\s*\x22)([^\x22]+)(?:\x22)";
(?: ) is match but don't capture and acts like an anchor for the parser. Also it assumes there will be at least 1 character in the quotes, so using the + instead of the *.
How i can use "/show_name=(.?)&show_name_exact=true\">(.?)
Match m = Regex.Match(input, "/show_name=(.*?)&show_name_exact=true\">(.*?)</i", RegexOptions.IgnoreCase);
// Check Match instance
if (m.Success)
{
// Get Group value
string key = m.Groups[1].Value;
Console.WriteLine(key);
// alternate-1
}
Error, Unterminated string literal(CS1039)]
Error, Newline in constant(CS1010)]
What I am doing wrong?
I think you're mixing up .NET's regex syntax with PHP's. PHP requires you to use a regex delimiter in addition to the quotes that are required by the C# string literal. For instance, if you want to match "foo" case-insensitively in PHP you would use something like this:
'/foo/i'
...but C# doesn't require the extra regex delimiters, which means it doesn't support the /i style for adding match modifiers (that would have been redundant anyway, since you're also using the RegexOptions.IgnoreCase flag). I think this is what you're looking for:
#"show_name=(.*?)&show_name_exact=true"">(.*?)<"
Note also how I escaped the internal quotation mark using another quotation mark instead of a backslash. You have to do it that way whether you use the old-fashioned string literal syntax or C#'s verbatim strings with the leading '#' (which is highly recommended for writing regexes). That's why you were getting the unterminated string error.