I have following Regex on C# and its causing Error: C# Unrecognized escape sequence on \w \. \/ .
string reg = "<a href=\"[\w\.\/:]+\" target=\"_blank\">.?<img src=\"(?<imgurl>\w\.\/:])+\"";
Regex regex = new Regex(reg);
I also tried
string reg = #"<a href="[w./:]+" target=\"_blank\">.?<img src="(?<imgurl>w./:])+"";
But this way the string "ends" at href=" "-char
Can anyone help me please?
Use "" to escape quotations when using the # literal.
There are two escaping mechanisms at work here, and they interfere. For example, you use \" to tell C# to escape the following double quote, but you also use \w to tell the regular expression parser to treat the following W special. But C# thinks \w is meant for C#, doesn't understand it, and you get a compiler error.
For example take this example text:
<a href="file://C:\Test\Test2\[\w\.\/:]+">
There are two ways to escape it such that C# accepts it.
One way is to escape all characters that are special to C#. In this case the " is used to denote the end of the string, and \ denotes a C# escape sequence. Both need to be prefixed with a C# escape \ to escape them:
string s = "<a href=\"file://C:\\Test\\Test2\\[\\w\\.\\/:]+\">";
But this often leads to ugly strings, especially when used with paths or regular expressions.
The other way is to prefix the string with # and escape only the " by replacing them with "":
string s = #"<a href=""file://C:\Test\Test2\[\w\.\/:]+"">";
The # will prevent C# from trying to interpret the \ in the string as escape characters, but since \" will not be recognized then either, they invented the "" to escape the double quote.
Here's a better regex, yours is filled with problems:
string reg = #"<a href=""[\w./:]+"" target=""_blank"">.?<img src=""(?<imgurl>[\w./:]+)""";
Regex regex = new Regex(reg);
var m = regex.Match(#"http://www.yahoo.com""
target=""_blank"">http://flickr.com/something.jpg""");
Catches <a href="http://www.yahoo.com" target="_blank"><img src="http://flickr.com/something.jpg".
Problems with yours: Forward slashes don't need to be escaped, missing the [ bracket in the img part, putting the ) in the right position in the closing of the group.
However, as has been said many times, HTML is not structured enough to be caught by regex. But if you need to get something quick and dirty done, it will do.
Here's the deal. C# Strings recognize certain character combinations as specific special characters to manipulate strings. Maybe you are familiar with inserting a \n in a string to work as and End of Line character, for example?
When you put a single \ in a string, it will try to verify it, along with the next character, as one of these special commands, and will throw an error when its not a valid combination.
Fortunately, that does not prevent you from using backslashes, as one of those sequences, \\, works for that purpose, being interpreted as a single backslash.
So, in practice, if you substitute every backslash in your string for a double backslash, it should work properly.
Related
I'm writing a program in C# using Microsoft Visual Studio, i need the program to match the vertical bar, but when I try to escape it like this "\|" it gives me an unrecognized escape sequence error. What am I doing wrong?
In C#
string test = "\|";
Is going to fail because this is a C# string escape sequence, and no such escape exists. Because you are trying to include a backslash in the string, you need to escape the slash so the string actually contains a slash:
string test = "\\|";
What will actually be stored in this string is \|
The reason you get an unrecognized escape sequence is that backslash is used as an escape character in C# string literals as well as in regex.
You have several choices to fix this:
Use verbatim literals, i.e. #"\|", or
Use a second escape inside a regular literal, i.e. "\\|", or
Use a character class, i.e. [|]
The third one is my personal favorite, because it does not require counting backslashes.
The string is treating "\|" as an escaped pipe in C#. Try "\|" to escape the \ so that the regex actually sees the \| you want.
I'm trying to resolve tokens in a string.
What I would like is given input like this:
string input = "asdf %(text) %(123) %(a\)a) asdf";
That I could run that through regex.Replace() and have it replace on "%(text)", "%(123)" and "%(a\)a)".
That is, that it would match everything between a starting "%(" and a closing ")" unless the closing ")" was escaped. (But of course, then you could escape the slash with another slash, which would prevent it from escaping the end paren...)
I'm pretty sure standard regular expressions can't do this, but I'm wondering if any of the various fancy expanded capabilities of the C# regular expression library could, rather than just iterating across the string totally manually? Or some other method that could do this? I feel like it's a common enough program that there has to be some way to solve it without implementing the solution from scratch, given the immensity of the .net framework? If I do have to implement iterating through the string and replacing with string.Replace(), I will, but it just seems so inelegant.
How about
var regex = new Regex(#"%\(.*?(?<!\\)(?:\\\\)*\)");
var result = regex.Replace(source,"");
%\( match literal %(
.*? match anything non-greedy
(?<!\\) preceding character to next match must not be \
(?:\\\\)* match zero or more literal \\ (i.e. match escaped \
\) match literal )
This is working for me :
String something = "\"asdf %(text) %(123) %(a\\)a) asdf\";";
String change = something.replaceAll("%\\(.*\\)", "");
System.out.println(change);
The output
"asdf asdf";
I was using Regex and I tried to write:
Regex RegObj2 = new Regex("\w[a][b][(c|d)][(c|d)].\w");
Gives me this error twice, one for each appearance of \w:
unrecognized escape sequence
What am I doing wrong?
You are not escaping the \s in a non-verbatim string literal.
Solution: put a # in front of the string or double the backslashes, as per the C# rules for string literals.
Try to escape the escape ;)
Regex RegObj2 = new Regex("\\w[a][b][(c|d)][(c|d)].\\w");
or add a # (as #Dominic Kexel suggested)
There are two levels of potential escaping required when writing a regular expression:
The regular expression escaping (e.g. escaping brackets, or in this case specifying a character class)
The C# string literal escaping
In this case, it's the latter which is tripping you up. Either escape the \ so that it becomes part of the string, or use a verbatim string literal (with an # prefix) so that \ doesn't have its normal escaping meaning. So either of these:
Regex regex1 = new Regex(#"\w[a][b][(c|d)][(c|d)].\w");
Regex regex2 = new Regex("\\w[a][b][(c|d)][(c|d)].\\w");
The two approaches are absolutely equivalent at execution time. In both cases you're trying to create a string constant with the value
\w[a][b][(c|d)][(c|d)].\w
The two forms are just different ways of expressing this in C# source code.
The backslashes are not being escaped e.g. \\ or
new Regex(#"\w[a][b][(c|d)][(c|d)].\w");
Im using C# and wanting to use the following regular expression in my code:
sDatabaseServer\s*=\s*"([^"]*)"
I have placed it in my code as:
Regex databaseServer = new Regex(#"sDatabaseServer\s*=\s*"([^"]*)"", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
I know you have to escape all parenthesis and quotes inside the string quotes but for some reason the following does still not work:
Working Version:
Regex databaseServer = new Regex(#"sDatabaseServer\s*=\s*""([^""]*)""", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
Any ideas how to get C# to see my regex as just a string? I know i know....easy question...Sorry im still somewhat of an amateur to C#...
SOLVED: Thanks guys!
You went one step too far when you escaped the parentheses. If you want them to be regex meta-characters (i.e. a capturing group), then you must not escape them. Otherwise they will match literal parentheses.
So this is probably what you are looking for:
#"sDatabaseServer\s*=\s*""([^""]*)"""
string regex = "sDatabaseServer\\s*=\\s*\"([^\"]*)\""
in your first try, you forgot to escape your quotes. But since it's a string literal, escaping with a \ doesn't work.
In y our second try, you escaped the quotes, but you didn't escape the \ that's needed for your whitespace token \s
Use \x22 instead of quotes:
string pattern = #"sDatabaseServer\s*=\s*\x22([^\x22]*)\x22";
But
Ignorepattern whitespace allows for comments in the regex pattern (the # sign) or the pattern split over multiple lines. You don't have either; remove.
A better pattern for what you seek is
string pattern =#"(?:sDatabaseServer\s*=\s*\x22)([^\x22]+)(?:\x22)";
(?: ) is match but don't capture and acts like an anchor for the parser. Also it assumes there will be at least 1 character in the quotes, so using the + instead of the *.
I asked another question poorly so i'll ask something else.
According to http://www.c-point.com/javascript_tutorial/special_characters.htm there are a few escape characters such as \n and \b. However / is not one of them. What happens in this case? (\/) is the \ ignored?
I have a string in javascript 'http:\/\/www.site.com\/user'. Not that this is a literal with ' so with " it would look like \\/ anyways i would like to escape this string thus the question on what happens on non 'special' escape characters.
And another question is if i had name:\t me (or "name:\\t me" is there a function to escape it so there is a tab? i am using C# and these strings come from a JSON file
According to Mozilla:
For characters not listed [...] a preceding backslash is ignored, but this usage is deprecated and
should be avoided.
https://developer.mozilla.org/en/JavaScript/Guide/Values%2c_Variables%2c_and_Literals#section_19
The \/ sequence is not listed but there're at least two common usages:
<1> It's required to escape literal slashes in regular expressions that use the /foo/ syntax:
var re = /^http:\/\//;
<2> It's required to avoid invalid HTML when you embed JavaScript code inside HTML:
<script type="text/javascript"><!--
alert('</p>')
//--></script>
... triggers: end tag for element "P" which is not open
<script type="text/javascript"><!--
alert('<\/p>')
//--></script>
... doesn't.
If a backslash is found before a character which is not meaningful as an escape sequence, it will be ignored, i.e. "\/" and "/" are the same string in Javascript.
The / character is the regular expression delimiter, so it only has to be escaped in a regex context:
/[a-z]/[0-9]/ // Invalid.
/[a-z]\/[0-9]/ // Matches a lowercase letter, followed by a slash,
// followed by a digit.
Finally, if you want to collapse a backslash followed by a character into the corresponding escape sequence, you'll have to replace the whole expression:
string expr = "name:\\t me"; // Backslash followed by `t`.
expr = expr.Replace("\\t", "\t"); // Tab character.
\ is evaluated as \ if \ + next character is not an escape sequence.
examples:
\t -> escape sequence t -> tab
\\t -> escape \ and t -> \t
\\ -> escape sequence \ -> \
\c -> \c (not an escape sequence)
\a -> escape sequence a -> ???
Note that there are escape sequences also on completely weird symbols, so be careful. IMHO there is no good standard between languages and operating systems.
And actually, its even more non-stardard: in basic C '\y' -> y + warning, not \y. So this is very language dependent, be careful. (disregard my comment below).
br,
Juha
edit: What language are you using?= Java and c have slightly different behavior.
C and java seem to have the same escapes and python has different:
http://en.csharp-online.net/CSharp_FAQ:_What_are_the_CSharp_character_escape_sequences
http://www.cerritos.edu/jwilson/cis_182/language_resources/java_escape_sequences.htm
http://www.java2s.com/Code/Python/String/EscapeCodesbtnar.htm
In C# you can use the backslash character to tell the compiler what you really want. After compiling though, these escape characters do not exist.
If you use string myString = "\t"; the string will actually contain a TAB character, not just represent one. You can test this by checking myString.Length which is 1.
If you want to send the characters "backslash" and "t" to your JSON client however, you'll have to tell the compiler to keep his hands off the backslash, by escaping the backslash:
string myString = "\\t"; will result in a string of two characters, the "backslash" and the "t".
Things get messy if you have to cross multiple layers of escaping and unescaping, try to debug through these layers to see what's really happening under the hood.