Replace single backslash with double backslash - c#

It seems simple enough, right? Well, I don't know.
Here's the code I'm trying:
input = Regex.Replace(input, "\\", "\\\\\\");
However, I'm receiving an error,
ArgumentException was unhandled - parsing "\" - Illegal \ at end of pattern.
How do I do this?

The first one should be "\\\\", not "\\". It works like this:
You have written "\\".
This translates to the sequence \ in a string.
The regex engine then reads this, which translates as backslash which isn't escaping anything, so it throws an error.
With regex, it's much easier to use a "verbatim string". In this case the verbatim string would be #"\\". When using verbatim strings you only have to consider escaping for the regex engine, as backslashes are treated literally. The second string will also be #"\\", as it will not be interpreted by the regex engine.

If you want to replace one backslash with two, it might be clearer to eliminate one level of escaping in the regular expression by using #"..." as the format for your string literals, also known as a verbatim string. It is then easier to see that
string output = Regex.Replace(input, #"\\", #"\\");
is a replacement from \ to \\.

I know it's too late to help you, maybe someone else will benefit from this. Anyway this worked for me:
text = text.Replace(#"\",#"\\");
and I find it even more simplier.
Cheers!

var result = Regex.Replace(#"afd\tas\asfd\", #"\\", #"\\");
The first parameter is string \\ which is \ in regex.
The second parameter is not processed by regex, so it will put it as is, when replacing.

If you intend to use the input in a regex pattern later, it can be a good idea to use Regex.Encode.
input = Regex.Escape(input);

Related

Regex-like construction to match %([text]) where [text] can contain escaped parens

I'm trying to resolve tokens in a string.
What I would like is given input like this:
string input = "asdf %(text) %(123) %(a\)a) asdf";
That I could run that through regex.Replace() and have it replace on "%(text)", "%(123)" and "%(a\)a)".
That is, that it would match everything between a starting "%(" and a closing ")" unless the closing ")" was escaped. (But of course, then you could escape the slash with another slash, which would prevent it from escaping the end paren...)
I'm pretty sure standard regular expressions can't do this, but I'm wondering if any of the various fancy expanded capabilities of the C# regular expression library could, rather than just iterating across the string totally manually? Or some other method that could do this? I feel like it's a common enough program that there has to be some way to solve it without implementing the solution from scratch, given the immensity of the .net framework? If I do have to implement iterating through the string and replacing with string.Replace(), I will, but it just seems so inelegant.
How about
var regex = new Regex(#"%\(.*?(?<!\\)(?:\\\\)*\)");
var result = regex.Replace(source,"");
%\( match literal %(
.*? match anything non-greedy
(?<!\\) preceding character to next match must not be \
(?:\\\\)* match zero or more literal \\ (i.e. match escaped \
\) match literal )
This is working for me :
String something = "\"asdf %(text) %(123) %(a\\)a) asdf\";";
String change = something.replaceAll("%\\(.*\\)", "");
System.out.println(change);
The output
"asdf asdf";

C# Unrecognized escape sequence

I have following Regex on C# and its causing Error: C# Unrecognized escape sequence on \w \. \/ .
string reg = "<a href=\"[\w\.\/:]+\" target=\"_blank\">.?<img src=\"(?<imgurl>\w\.\/:])+\"";
Regex regex = new Regex(reg);
I also tried
string reg = #"<a href="[w./:]+" target=\"_blank\">.?<img src="(?<imgurl>w./:])+"";
But this way the string "ends" at href=" "-char
Can anyone help me please?
Use "" to escape quotations when using the # literal.
There are two escaping mechanisms at work here, and they interfere. For example, you use \" to tell C# to escape the following double quote, but you also use \w to tell the regular expression parser to treat the following W special. But C# thinks \w is meant for C#, doesn't understand it, and you get a compiler error.
For example take this example text:
<a href="file://C:\Test\Test2\[\w\.\/:]+">
There are two ways to escape it such that C# accepts it.
One way is to escape all characters that are special to C#. In this case the " is used to denote the end of the string, and \ denotes a C# escape sequence. Both need to be prefixed with a C# escape \ to escape them:
string s = "<a href=\"file://C:\\Test\\Test2\\[\\w\\.\\/:]+\">";
But this often leads to ugly strings, especially when used with paths or regular expressions.
The other way is to prefix the string with # and escape only the " by replacing them with "":
string s = #"<a href=""file://C:\Test\Test2\[\w\.\/:]+"">";
The # will prevent C# from trying to interpret the \ in the string as escape characters, but since \" will not be recognized then either, they invented the "" to escape the double quote.
Here's a better regex, yours is filled with problems:
string reg = #"<a href=""[\w./:]+"" target=""_blank"">.?<img src=""(?<imgurl>[\w./:]+)""";
Regex regex = new Regex(reg);
var m = regex.Match(#"http://www.yahoo.com""
target=""_blank"">http://flickr.com/something.jpg""");
Catches <a href="http://www.yahoo.com" target="_blank"><img src="http://flickr.com/something.jpg".
Problems with yours: Forward slashes don't need to be escaped, missing the [ bracket in the img part, putting the ) in the right position in the closing of the group.
However, as has been said many times, HTML is not structured enough to be caught by regex. But if you need to get something quick and dirty done, it will do.
Here's the deal. C# Strings recognize certain character combinations as specific special characters to manipulate strings. Maybe you are familiar with inserting a \n in a string to work as and End of Line character, for example?
When you put a single \ in a string, it will try to verify it, along with the next character, as one of these special commands, and will throw an error when its not a valid combination.
Fortunately, that does not prevent you from using backslashes, as one of those sequences, \\, works for that purpose, being interpreted as a single backslash.
So, in practice, if you substitute every backslash in your string for a double backslash, it should work properly.

replace unicode character

String jData="Memur adayar\u0131n\u0131n en b\u00fcy\u00fck sorunar"
+ "\u0131ndan KPSS \u0 131 ";
jData = Regex.Replace(jData, #"\\u0 ", #"\\u0", RegexOptions.Compiled).Trim();
I have to replace "\u0 " in jData with "\u0" (i.e. remove the trailing whitespace character if there is one) but the method I used isn't working. What should I do?
So you've got some malformed Unicode escapes in the string and you want to fix them by removing any whitespace after the 0. That's simple enough:
jData = Regex.Replace(jData, #"(\\u0)\s+(\w+)", "$1$2");
The hardest part of all this is figuring out what all the backslashes are supposed to mean. C# can helps you with that supports an alternative string literal syntax for verbatim string, the only character that you have to escape with a backslash is the backslash itself. (You have to escape quotation marks too, but you do that with another quote, i.e. "").
With that out of the way, the real reason I answered this question was to advise you not to use RegexOptions.Compiled. I'm sure you've heard many people say it makes the regex work faster. That's true, but it's an oversimplification. Read this article for a good discussion of this issue. Do yourself a favor and forget RegexOptions.Compiled even exists until you run into a problem you can't solve without it.
find: #"\\u0 "
replace: #"\\u0"
they are the same. Try it with an capital O or normal o
I think I got it working
string jData= #"Memur adayar\u0131n\u0131n en b\u00fcy\u00fck sorunar\u0131ndan KPSS \u0 131 ";
jData = Regex.Replace(jData, #"\\u0 ", #"\u0", RegexOptions.Compiled).Trim();
Notice I added an extra '#' in front of the input string. And in the regex part I changed the third argument to #"\u0"
There's a problem with your example string. I'm supposing that you actually wanted the backslashes in the string, in which case the simplest approach is to put # before the string literals. And then I believe you have the opposite problem in the second line, where you should have either used just one backslash in each string, or omitted the #.
There's no reason to use Regex.Replace() here. jData.Replace() would suffice just fine:
String jData=#"Memur adayar\u0131n\u0131n en b\u00fcy\u00fck sorunar"
+ #"\u0131ndan KPSS \u0 131 ";
jData = jData.Replace(#"\u0 ", #"\u0").Trim();

C# doesn't recognize my regular expression as a string even though i have tried escaping everything

Im using C# and wanting to use the following regular expression in my code:
sDatabaseServer\s*=\s*"([^"]*)"
I have placed it in my code as:
Regex databaseServer = new Regex(#"sDatabaseServer\s*=\s*"([^"]*)"", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
I know you have to escape all parenthesis and quotes inside the string quotes but for some reason the following does still not work:
Working Version:
Regex databaseServer = new Regex(#"sDatabaseServer\s*=\s*""([^""]*)""", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
Any ideas how to get C# to see my regex as just a string? I know i know....easy question...Sorry im still somewhat of an amateur to C#...
SOLVED: Thanks guys!
You went one step too far when you escaped the parentheses. If you want them to be regex meta-characters (i.e. a capturing group), then you must not escape them. Otherwise they will match literal parentheses.
So this is probably what you are looking for:
#"sDatabaseServer\s*=\s*""([^""]*)"""
string regex = "sDatabaseServer\\s*=\\s*\"([^\"]*)\""
in your first try, you forgot to escape your quotes. But since it's a string literal, escaping with a \ doesn't work.
In y our second try, you escaped the quotes, but you didn't escape the \ that's needed for your whitespace token \s
Use \x22 instead of quotes:
string pattern = #"sDatabaseServer\s*=\s*\x22([^\x22]*)\x22";
But
Ignorepattern whitespace allows for comments in the regex pattern (the # sign) or the pattern split over multiple lines. You don't have either; remove.
A better pattern for what you seek is
string pattern =#"(?:sDatabaseServer\s*=\s*\x22)([^\x22]+)(?:\x22)";
(?: ) is match but don't capture and acts like an anchor for the parser. Also it assumes there will be at least 1 character in the quotes, so using the + instead of the *.

Regex : replace a string

I'm currently facing a (little) blocking issue. I'd like to replace a substring by one another using regular expression. But here is the trick : I suck at regex.
Regex.Replace(contenu, "Request.ServerVariables("*"))",
"ServerVariables('test')");
Basically I'd like to replace whatever is between the " by "test". I tried ".{*}" as a pattern but it doesn't work.
Could you give me some tips, I'd appreciate it!
There are several issues you need to take care of.
You are using special characters in your regex (., parens, quotes) -- you need to escape these with a slash. And you need to escape the slashes with another slash as well because we 're in a C# string literal, unless you prefix the string with # in which case the escaping rules are different.
The expression to match "any number of whatever characters" is .*. In this case, you would want to match any number of non-quote characters, which is [^"]*.
In contrast to (1) above, the replacement string is not a regular expression so you don't want any slashes there.
You need to store the return value of the replace somewhere.
The end result is
var result = Regex.Replace(contenu,
#"Request\.ServerVariables\(""[^""]*""\)",
"Request.ServerVariables('test')");
Based purely on my knowledge of regex (and not how they are done in C#), the pattern you want is probably:
"[^"]*"
ie - match a " then match everything that's not a " then match another "
You may need to escape the double-quotes to make your regex-parser actually match on them... that's what I don't know about C#
Try to avoid where you can the '.*' in regex, you can usually find what you want to get by avoiding other characters, for example [^"]+ not quoted, or ([^)]+) not in parenthesis. So you may just want "([^"]+)" which should give you the whole thing in [0], then in [1] you'll find 'test'.
You could also just replace '"' with '' I think.
Taryn Easts regex includes the *. You should remove it, if it is just a placeholder for any value:
"[^"]"
BTW: You can test this regex with this cool editor: http://rubular.com/r/1MMtJNF3kM

Categories

Resources