Regex error c#? - c#

How i can use "/show_name=(.?)&show_name_exact=true\">(.?)
Match m = Regex.Match(input, "/show_name=(.*?)&show_name_exact=true\">(.*?)</i", RegexOptions.IgnoreCase);
// Check Match instance
if (m.Success)
{
// Get Group value
string key = m.Groups[1].Value;
Console.WriteLine(key);
// alternate-1
}
Error, Unterminated string literal(CS1039)]
Error, Newline in constant(CS1010)]
What I am doing wrong?

I think you're mixing up .NET's regex syntax with PHP's. PHP requires you to use a regex delimiter in addition to the quotes that are required by the C# string literal. For instance, if you want to match "foo" case-insensitively in PHP you would use something like this:
'/foo/i'
...but C# doesn't require the extra regex delimiters, which means it doesn't support the /i style for adding match modifiers (that would have been redundant anyway, since you're also using the RegexOptions.IgnoreCase flag). I think this is what you're looking for:
#"show_name=(.*?)&show_name_exact=true"">(.*?)<"
Note also how I escaped the internal quotation mark using another quotation mark instead of a backslash. You have to do it that way whether you use the old-fashioned string literal syntax or C#'s verbatim strings with the leading '#' (which is highly recommended for writing regexes). That's why you were getting the unterminated string error.

Related

Regex Pattern Using Brackets

Simple question here guys. I'm attempting to create a pattern to use with a Regex in C#.
Here is my attempt:
"(value\":\[\[\"([A-Za-z0-9]+(?:-{0,1})[A-Za-z0-9]+)\"\]\])"
However for some reason when I go to compile this I get "Unrecognized escape sequence" on the brackets. Can I not simply use \ to escape the brackets?
The strings I'm searching for have the form of
value":[["AB-AB"]]
or
value":[["ABAB"]]
and I'd like to grab group[1] from the results.
The C# compiler by default disallows escape sequences it does not recognize. You can override this behavior by using "#" like this:
#"(value\"":\[\[\""([A-Za-z0-9]+(?:-{0,1})[A-Za-z0-9]+)\""\]\])"
Edit:
The # sign is a little more complicated than that. To quote #Guffa:
A # delimited string simply doesn't use backslash for escape
sequences.
Furthermore it should be noted that the replacement for \" in such a string is ""
I would recommend placing your pattern inside a verbatim string literal while implementing a negated character class to match the context; then reference the first group to grab the match results.
String s = #"I have value"":[[""AB-AB""]] and value"":[[""ABAB""]]";
foreach (Match m in Regex.Matches(s, #"value"":\[\[""([^""]+)""]]"))
Console.WriteLine(m.Groups[1].Value);
Output
AB-AB
ABAB

\w gives me an error while using regular exprasions

I was using Regex and I tried to write:
Regex RegObj2 = new Regex("\w[a][b][(c|d)][(c|d)].\w");
Gives me this error twice, one for each appearance of \w:
unrecognized escape sequence
What am I doing wrong?
You are not escaping the \s in a non-verbatim string literal.
Solution: put a # in front of the string or double the backslashes, as per the C# rules for string literals.
Try to escape the escape ;)
Regex RegObj2 = new Regex("\\w[a][b][(c|d)][(c|d)].\\w");
or add a # (as #Dominic Kexel suggested)
There are two levels of potential escaping required when writing a regular expression:
The regular expression escaping (e.g. escaping brackets, or in this case specifying a character class)
The C# string literal escaping
In this case, it's the latter which is tripping you up. Either escape the \ so that it becomes part of the string, or use a verbatim string literal (with an # prefix) so that \ doesn't have its normal escaping meaning. So either of these:
Regex regex1 = new Regex(#"\w[a][b][(c|d)][(c|d)].\w");
Regex regex2 = new Regex("\\w[a][b][(c|d)][(c|d)].\\w");
The two approaches are absolutely equivalent at execution time. In both cases you're trying to create a string constant with the value
\w[a][b][(c|d)][(c|d)].\w
The two forms are just different ways of expressing this in C# source code.
The backslashes are not being escaped e.g. \\ or
new Regex(#"\w[a][b][(c|d)][(c|d)].\w");

How do I escape a RegEx?

I have a Regex that I now need to moved into C#. I'm getting errors like this
Unrecognized escape sequence
I am using Regex.Escape -- but obviously incorrectly.
string pattern = Regex.Escape("^.*(?=.{7,})(?=.*[a-zA-Z])(?=.*(\d|[!##$%\?\(\)\*\&\^\-\+\=_])).*$");
hiddenRegex.Attributes.Add("value", pattern);
How is this correctly done?
The error you're getting is coming at compile time correct? That means C# compiler is not able to make sense of your string. Prepend # sign before the string and you should be fine. You don't need Regex.Escape.
See What's the # in front of a string in C#?
var pattern = new Regex(#"^.*(?=.{7,})(?=.*[a-zA-Z])(?=.*(\d|[!##$%\?\(\)\*\&\^\-\+\=_])).*$");
pattern.IsMatch("Your input string to test the pattern against");
The error you are getting is due to the fact that your string contains invalid escape sequences (e.g. \d). To fix this, either escape the backslashes manually or write a verbatim string literal instead:
string pattern = #"^.*(?=.{7,})(?=.*[a-zA-Z])(?=.*(\d|[!##$%\?\(\)\*\&\^\-\+\=_])).*$";
Regex.Escape would be used when you want to embed dynamic content to a regular expression, not when you want to construct a fixed regex. For example, you would use it here:
string name = "this comes from user input";
string pattern = string.Format("^{0}$", Regex.Escape(name));
You do this because name could very well include characters that have special meaning in a regex, such as dots or parentheses. When name is hardcoded (as in your example) you can escape those characters manually.

Disallowing Backslashes in a Regular Expression C#

For a Username field there are certain varaitions that cannot be chosen as an appropiate username nor can certain characters be used.
For example: TIM1....TIM9 cannot be used BIN1....BIN9 cannot be used, nor can the characters <>:\/|?* appear anywhere in the field.
The code I have so far is thus:
private bool ValidateId(string regexValue)
{
Regex regex = new Regex("TIM[1-9]|BIN[1-9]|[<>:\"/|?*]");
return !regex.IsMatch(regexValue);
}
What I'm struggling to allow for however is the backslash character. Trying to escape it as I have done with the quotation character doesn't appear to work.
Thanks in advance.
You need to do a double escape. Try this:
Regex regex = new Regex("TIM[1-9]|BIN[1-9]|[<>:\\\\\"/|?*]");
Explanation:
You need to escape the backslash in C# strings to get a backslash in the string. Additionally, the string needs to have two backslashes, because Regex also requires the backslashes to be escaped.
BTW, using verbatim strings makes it a bit more readable:
Regex regex = new Regex(#"TIM[1-9]|BIN[1-9]|[<>:\\""/|?*]");
Both codes will result in a Regex with this expression:
TIM[1-9]|BIN[1-9]|[<>:\\"/|?*]

Trying to understand this line of Java, as C# code

See this java code :-
s = s.replaceAll( "\\\\", "\\\\\\\\" ).replaceAll( "\\$", "\\\\\\$" );
I sorta don't understand it. It's a regex replace all.
I've tried the following C# code...
text = text.RegexReplace("\\\\", "\\\\\\\\");
text = text.RegexReplace("\\$", "\\\\\\$");
But if i have the following unit test :-
} ul[id$=foo] label:hover {
The java code returns: } ul[id\$=foo] label:hover {
My c# code returns: } ul[id\\\$=foo] label:hover {
So i'm not sure I understand why my c# code is putting more \'s in, mainly with regards to how these control characters are being represented.. ??
Update:
So, when i use XXX's idea of just using text.Replace(..), this works.
eg.
text = text.Replace("\\\\", "\\\\\\\\");
text = text.Replace("\\$", "\\\\\\$");
But I was hoping to stick with RegEx... to try and keep it as close to the java code as possible.
The extension method being used is...
public static string RegexReplace(this string input,
string pattern,
string replacement)
{
return Regex.Replace(input, pattern, replacement);
}
hmm...
Java needs all $ signs escaped in its replace string - "\\\\\\$" means \\ and \$. Without it it throws an error: http://www.regular-expressions.info/refreplace.html (look for "$ (unescaped dollar as literal text)").
Remember $1, $0 etc are replaced the text with captured groups, so there are a part of the syntax on the second argument to replaceAll. C# has a slightly different syntax, and doesn't require the extra slash, which it takes literally.
You could write:
text = text.RegexReplace(#"\\", #"\\");
text = text.RegexReplace(#"\$", #"\$");
Or,
text = text.RegexReplace(#"[$\\]", #"\$&");
I think it's the equivalent of this C# code:
text = text.Replace(#"\", #"\\");
text = text.Replace("$", #"\$");
The # indicates a verbatim string in C#, meaning that the backslashes in strings don't have to be escaped with more backslashes. In other words, the code replaces a single backslash with a double backslash and then replaces a dollarsign with a backslash followed by a dollarsign.
If you were to use the regex function, it would be something like this:
text = text.RegexReplace(#"\\", #"\\");
text = text.RegexReplace(#"\$", #"\$$");
Note that in the regex pattern (the first parameter), backslashes are special, while in the replacement (the second parameter) it is the dollarsigns that are special.
The code quotes the backslashes and '$' characters in the original string.
Java regex parsing: http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
C#: http://msdn.microsoft.com/en-us/library/xwewhkd1.aspx
I think that in Java, you have to escape the \ character by using \, but in C#, you don't. Try taking out half of the \ in your C# version.

Categories

Resources