I have data with several occurrencies of the following string:
<a href="default.asp?itemID=987">
in which the itemID is always different. I am using C# and I want to get all those itemIDs with a Regular Expression.
At first I tried this
"<a href=\"default.asp?itemID=([0-9]*)\">"
But the questionmark is a reserved character. I considered using the # operator to disable escaping of characters. But there are still some double quotes that really need escaping. So then I would go for
"<a href=\"default.asp\\?itemID=([0-9]*)\">"
which should be translated (as a string) to
<a href="default.asp\?itemID=([0-9]*)">
But the Regex.Match method gets no success. I tried the very same regex here and it worked. What am I doing wrong?
? and . are special chars for a regex, but can't be escaped "as is" in a string litteral.
So if you put one \, it will be wrong for a string, and if you don't put \\, it will be taken as the "special char" of the regexp. So :
"#<a href=\"default\\.asp\\?itemID=([0-9]*)\">";
When using the #operator, you can regain double quotes with "".
You also need to escape certain special chars in the regex, in this case, the chars .\?
Try this:
#"<a href=""default\.asp\?itemID=([0-9]*)"">"
Try escaping the dot '.' character with \.
Related
I have a JSON string in which I would like to remove all white spaces that are not within quotes. I searched online and I already found a solution, which is the following:
aidstring = Regex.Replace(aidstring, "\\s+(?=([^\"]*\"[^\"]*\")*[^\"]*$)", "");
However, I am now dealing with a string that contains escaped quotes:
"boolean": "k near/3 \"funds private\""
and the above regular expression solution turns it into:
"boolean":"k near/3 \"fundsprivate\""
Since escaped quotes are treated as normal quotes.
Could anyone post a regex in which escaped quotes are ignored?
I'd suggest using
aidstring = Regex.Replace(aidstring, #"(""[^""\\]*(?:\\.[^""\\]*)*"")|\s+", "$1");
See regex demo
The regex will match all C quoted strings into Capture group 1 and with $1 these strings will be restored in the result, but all whitespaces caught with \s+ will be removed.
Regex explanation:
Alternative 1:
("[^"\\]*(?:\\.[^"\\]*)*"):
" - a literal "
[^"\\]* - zero or more characters other than \ or "
(?:\\.[^"\\]*)* - zero or more sequences of...
\\. - \ and any character but a newline
[^"\\]* - zero or more characters other than \ or "
" - a literal "
Alternative 2:
\s+ - 1 or more whitespace (in .NET, any Unicode whitespace)
Just a thought... And this doesn't immediately look legit because there are obvious possible flaws. But if you think about it, the scenarios where would fail are nearly zero chance of happening:
Regex.Replace(aidstring, #"\"\s*:\s*\"", "\":\"");
Long story short, look for the spaces you WANT to replace, instead of looking for all of the spaces you Don't Want to replace:
"boolean" : "k near/3 \"funds private\""
^^^^^^^^^
The only time it would fail is if the actual value-content of the json object were literally a colon... let me know how often that happens. :)
But Skeet is most-right. Use a Json Parser to clean it up.
Can someone explain to me when using regular expressions when a double backslash or single backslash needs to be used to escape a character?
A lot of references online use a single backslash and online regex testers work with single backslashes, but in practice I often have to use a double backslash to escape a character.
For example:
"SomeString\."
Works in an online regex tester and matches "SomeString" followed by a dot.
However in practice I have to use a double escape:
if (Regex.IsMatch(myString, "SomeString\\."))
C# does not have a special syntax for construction of regular expressions, like Perl, Ruby or JavaScript do. It instead uses a constructor that takes a string. However, strings have their own escaping mechanism, because you want to be able to put quotes inside the string. Thus, there are two levels of escaping.
So, in a regular expression, w means the letter "w", while \w means a word character. However, if you make a string "\w", you are escaping the character "w", which makes no sense, since character "w" is not a quote or a backslash, so "w" == "\w". Then this string containing only "w" gets passed to the regexp constructor, and you end up matching the letter "w" instead of any word character. Thus, to pass the backslash to regexp, you need to put in two backslashes in the string literal (\\w): one will be removed when the string literal is interpreted, one will be used by the regular expression.
When working with regular expressions directly (such as on most online regexp testers, or when using verbatim strings #"..."), you don't have to worry about the interpretation of string literals, and you always write just one backslash (except when you want to match the backslash itself, but then you're espacing the backslash for the regexp, not for the string).
\ Is also an escape character for string literals in c# so the first \ is escaping the second \ being passed to the method and the second one is escaping the . in the regex.
Use:
if (Regex.IsMatch(myString, #"SomeString\."))
If you want to avoid double escaping.
I you use a verbatim symbol #(verbatim string), you don't need to escape the backslash again.
if (Regex.IsMatch(myString, #"SomeString\."))
Old post but Regex.Escape may be useful
In JavaScript you have to use double escape character: \
let m = "My numer is [56]".match("\\[(.*)\\]");
alert(m[1]);//outputs 56
In C# single \
I have following Regex on C# and its causing Error: C# Unrecognized escape sequence on \w \. \/ .
string reg = "<a href=\"[\w\.\/:]+\" target=\"_blank\">.?<img src=\"(?<imgurl>\w\.\/:])+\"";
Regex regex = new Regex(reg);
I also tried
string reg = #"<a href="[w./:]+" target=\"_blank\">.?<img src="(?<imgurl>w./:])+"";
But this way the string "ends" at href=" "-char
Can anyone help me please?
Use "" to escape quotations when using the # literal.
There are two escaping mechanisms at work here, and they interfere. For example, you use \" to tell C# to escape the following double quote, but you also use \w to tell the regular expression parser to treat the following W special. But C# thinks \w is meant for C#, doesn't understand it, and you get a compiler error.
For example take this example text:
<a href="file://C:\Test\Test2\[\w\.\/:]+">
There are two ways to escape it such that C# accepts it.
One way is to escape all characters that are special to C#. In this case the " is used to denote the end of the string, and \ denotes a C# escape sequence. Both need to be prefixed with a C# escape \ to escape them:
string s = "<a href=\"file://C:\\Test\\Test2\\[\\w\\.\\/:]+\">";
But this often leads to ugly strings, especially when used with paths or regular expressions.
The other way is to prefix the string with # and escape only the " by replacing them with "":
string s = #"<a href=""file://C:\Test\Test2\[\w\.\/:]+"">";
The # will prevent C# from trying to interpret the \ in the string as escape characters, but since \" will not be recognized then either, they invented the "" to escape the double quote.
Here's a better regex, yours is filled with problems:
string reg = #"<a href=""[\w./:]+"" target=""_blank"">.?<img src=""(?<imgurl>[\w./:]+)""";
Regex regex = new Regex(reg);
var m = regex.Match(#"http://www.yahoo.com""
target=""_blank"">http://flickr.com/something.jpg""");
Catches <a href="http://www.yahoo.com" target="_blank"><img src="http://flickr.com/something.jpg".
Problems with yours: Forward slashes don't need to be escaped, missing the [ bracket in the img part, putting the ) in the right position in the closing of the group.
However, as has been said many times, HTML is not structured enough to be caught by regex. But if you need to get something quick and dirty done, it will do.
Here's the deal. C# Strings recognize certain character combinations as specific special characters to manipulate strings. Maybe you are familiar with inserting a \n in a string to work as and End of Line character, for example?
When you put a single \ in a string, it will try to verify it, along with the next character, as one of these special commands, and will throw an error when its not a valid combination.
Fortunately, that does not prevent you from using backslashes, as one of those sequences, \\, works for that purpose, being interpreted as a single backslash.
So, in practice, if you substitute every backslash in your string for a double backslash, it should work properly.
Im using C# and wanting to use the following regular expression in my code:
sDatabaseServer\s*=\s*"([^"]*)"
I have placed it in my code as:
Regex databaseServer = new Regex(#"sDatabaseServer\s*=\s*"([^"]*)"", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
I know you have to escape all parenthesis and quotes inside the string quotes but for some reason the following does still not work:
Working Version:
Regex databaseServer = new Regex(#"sDatabaseServer\s*=\s*""([^""]*)""", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
Any ideas how to get C# to see my regex as just a string? I know i know....easy question...Sorry im still somewhat of an amateur to C#...
SOLVED: Thanks guys!
You went one step too far when you escaped the parentheses. If you want them to be regex meta-characters (i.e. a capturing group), then you must not escape them. Otherwise they will match literal parentheses.
So this is probably what you are looking for:
#"sDatabaseServer\s*=\s*""([^""]*)"""
string regex = "sDatabaseServer\\s*=\\s*\"([^\"]*)\""
in your first try, you forgot to escape your quotes. But since it's a string literal, escaping with a \ doesn't work.
In y our second try, you escaped the quotes, but you didn't escape the \ that's needed for your whitespace token \s
Use \x22 instead of quotes:
string pattern = #"sDatabaseServer\s*=\s*\x22([^\x22]*)\x22";
But
Ignorepattern whitespace allows for comments in the regex pattern (the # sign) or the pattern split over multiple lines. You don't have either; remove.
A better pattern for what you seek is
string pattern =#"(?:sDatabaseServer\s*=\s*\x22)([^\x22]+)(?:\x22)";
(?: ) is match but don't capture and acts like an anchor for the parser. Also it assumes there will be at least 1 character in the quotes, so using the + instead of the *.
If you have a string with special characters that you want to match with:
System.Text.RegularExpressions.Regex.Matches(theTextToCheck, myString);
It will obviously give you wrong results, if you have special characters inside myString like "%" or "\".
The idea is to convert myString and replacing all occurences of special characters like "%" to be replaced by their corresponding characters.
Does anyone know how to solve that or does someone have a RegEx for that? :)
Update:
The following characters have a special meaning, that I should turn of with adding a leading backslash: \, &, ~, ^, %, [, ], {, }, ?, +, *,(,),|,$
are there any others I should replace?
As #Kobi links to in the comments, you need to use Regex.Escape to ensure that that regular expression string is properly escaped.
Escapes a minimal set of characters (\, *, +, ?, |, {, [, (,), ^, $,., #, and white space) by replacing them with their escape codes. This instructs the regular expression engine to interpret these characters literally rather than as metacharacters.
If you want to escape all characters that carry a special meaning in regex, you could simply escape every character with a backslash (There is no harm in escaping characters that don't need to be escaped).
But if you do, why are you using Regex at all instead of string.IndexOf?
Regex.Escape will do that for you. Somewhere in msdn doc it reads:
Escape converts a string so that the regular expression engine will interpret any metacharacters that it may contain as character literals
which is much more informative that the function description.
This is left for search/replace reference.
Use this as your regex:
(\\|\&|\~|\^|\%|\[|\]|\{|\}|\?|\+|\*|\(|\)|\||\$)
gets your chars of interes in a numbered group
And this as your replacement string:
\$1
replaces the matches with backslash plus the group content
Sample code:
Regex re = new Regex(#"(\\|\&|\~|\^|\%|\[|\]|\{|\}|\?|\+|\*|\(|\)|\||\$)");
string replaced = re.Replace(#"Look for (special {characters} and scape [100%] of them)", #"\$1");