Characters are not escaped properly in a Dictionary - c#

I have a string such as this:
Hello[00]
And I want to replace the [00] with 00 (I don't want to do it through deleting the [] because that won't be useful for me later). I want a direct replace from [00] to 00. To do so, I have the following code:
var conversionRegex = new Regex(string.Join("|", conversion.Keys));
var textConverted = conversionRegex.Replace(allLines, n => conversion[n.Value]);
"conversion" is a Dictionary [string],[string]. And one of its entries is this one:
{#"\[00\]","00"}
According to my knowledge and experience, that should work properly, but it isn't. It throws an exception: the key can't be found in the dictionary. However, when the exception is thrown, the debugger says that "n.Value" equals to "[00]". So it should be found in the dictionary, because it's there!
I have more elements in this Dictionary, but the only ones that are throwing exceptions are the ones with characters that should be escaped. Somehow they are not escaped properly...
Any ideas on this? Thank you very much!

I think you are confusing escaping for regex with escaping for C# string literals. Square brackets ([]) have no special meaning in C# string literals and thus do not need to be escaped. However, they do have special meaning in regex so they do need to be escaped in the regex string if you wish to match those chars. Your key is properly escaped for regex but that means your C# string literal contains literal backslash chars.
Here is how C# interprets the following string literals:
"[00]" is a 4-char string containing the chars [00].
"\[00\]" is invalid C# due to invalid \[ and \] C# string literal escape sequences. It will not compile.
#"\[00\]" is a 6-char string containing the chars \[00\]. This is the proper format for escaping for regex but it's important to recognize that the backslashes are part of the C# string literal and not C# escape sequences. This will not match "[00]" because they are different strings.
"\\[00\\]" is the same as the previous. Instead of using #, it uses the C# \\ escape sequence which emits a literal backslash char.
When you use #"\[00\]" as a dictionary key, your dictionary key includes those
backslash chars. Therefore, your dictionary does not contain the key "[00]".
There are a few different ways you could rewrite your code to accomplish what you are trying to do. Here's an easy way to do by using the string representation without the regex escaping as the dict keys and then using Regex.Escape to escape these for generating the regex string.
var conversion = new Dictionary<string, string> {
{ #"[00]", "00" }
};
var allLines = "Hello[00]\r\nWorld[00]";
var conversionRegex = new Regex(string.Join("|", conversion.Keys.Select(key => Regex.Escape(key))));
var textConverted = conversionRegex.Replace(allLines, n => conversion[n.Value]);
Console.WriteLine(textConverted);

Related

Replace broken characters

I have a small programm that replace strings that contains umlauts, apostrophes etc.
But sometimes I haven broken strings that contains for example A¶ for ü, A¼ (or ü) for ö, and so on.
Is there a way to fix these strings?
I just tried to use another replace statement
str = str.Replace("A¶", "ü");
str = str.Replace("A¼", "ö");
str = str.Replace("ü", "ö");
But this do not work for me
It looks like because they are non-standard characters it is having trouble matching. You will probably have to use Regex.Replace and reference the Unicode value of the characters in your regex: How can you strip non-ASCII characters from a string? (in C#)
Unicode/UTF8 reference: http://www.utf8-chartable.de/
Complete Unicode character set: http://www.unicode.org/charts/

C# Unrecognized escape sequence

I have following Regex on C# and its causing Error: C# Unrecognized escape sequence on \w \. \/ .
string reg = "<a href=\"[\w\.\/:]+\" target=\"_blank\">.?<img src=\"(?<imgurl>\w\.\/:])+\"";
Regex regex = new Regex(reg);
I also tried
string reg = #"<a href="[w./:]+" target=\"_blank\">.?<img src="(?<imgurl>w./:])+"";
But this way the string "ends" at href=" "-char
Can anyone help me please?
Use "" to escape quotations when using the # literal.
There are two escaping mechanisms at work here, and they interfere. For example, you use \" to tell C# to escape the following double quote, but you also use \w to tell the regular expression parser to treat the following W special. But C# thinks \w is meant for C#, doesn't understand it, and you get a compiler error.
For example take this example text:
<a href="file://C:\Test\Test2\[\w\.\/:]+">
There are two ways to escape it such that C# accepts it.
One way is to escape all characters that are special to C#. In this case the " is used to denote the end of the string, and \ denotes a C# escape sequence. Both need to be prefixed with a C# escape \ to escape them:
string s = "<a href=\"file://C:\\Test\\Test2\\[\\w\\.\\/:]+\">";
But this often leads to ugly strings, especially when used with paths or regular expressions.
The other way is to prefix the string with # and escape only the " by replacing them with "":
string s = #"<a href=""file://C:\Test\Test2\[\w\.\/:]+"">";
The # will prevent C# from trying to interpret the \ in the string as escape characters, but since \" will not be recognized then either, they invented the "" to escape the double quote.
Here's a better regex, yours is filled with problems:
string reg = #"<a href=""[\w./:]+"" target=""_blank"">.?<img src=""(?<imgurl>[\w./:]+)""";
Regex regex = new Regex(reg);
var m = regex.Match(#"http://www.yahoo.com""
target=""_blank"">http://flickr.com/something.jpg""");
Catches <a href="http://www.yahoo.com" target="_blank"><img src="http://flickr.com/something.jpg".
Problems with yours: Forward slashes don't need to be escaped, missing the [ bracket in the img part, putting the ) in the right position in the closing of the group.
However, as has been said many times, HTML is not structured enough to be caught by regex. But if you need to get something quick and dirty done, it will do.
Here's the deal. C# Strings recognize certain character combinations as specific special characters to manipulate strings. Maybe you are familiar with inserting a \n in a string to work as and End of Line character, for example?
When you put a single \ in a string, it will try to verify it, along with the next character, as one of these special commands, and will throw an error when its not a valid combination.
Fortunately, that does not prevent you from using backslashes, as one of those sequences, \\, works for that purpose, being interpreted as a single backslash.
So, in practice, if you substitute every backslash in your string for a double backslash, it should work properly.

\w gives me an error while using regular exprasions

I was using Regex and I tried to write:
Regex RegObj2 = new Regex("\w[a][b][(c|d)][(c|d)].\w");
Gives me this error twice, one for each appearance of \w:
unrecognized escape sequence
What am I doing wrong?
You are not escaping the \s in a non-verbatim string literal.
Solution: put a # in front of the string or double the backslashes, as per the C# rules for string literals.
Try to escape the escape ;)
Regex RegObj2 = new Regex("\\w[a][b][(c|d)][(c|d)].\\w");
or add a # (as #Dominic Kexel suggested)
There are two levels of potential escaping required when writing a regular expression:
The regular expression escaping (e.g. escaping brackets, or in this case specifying a character class)
The C# string literal escaping
In this case, it's the latter which is tripping you up. Either escape the \ so that it becomes part of the string, or use a verbatim string literal (with an # prefix) so that \ doesn't have its normal escaping meaning. So either of these:
Regex regex1 = new Regex(#"\w[a][b][(c|d)][(c|d)].\w");
Regex regex2 = new Regex("\\w[a][b][(c|d)][(c|d)].\\w");
The two approaches are absolutely equivalent at execution time. In both cases you're trying to create a string constant with the value
\w[a][b][(c|d)][(c|d)].\w
The two forms are just different ways of expressing this in C# source code.
The backslashes are not being escaped e.g. \\ or
new Regex(#"\w[a][b][(c|d)][(c|d)].\w");

How do I escape a RegEx?

I have a Regex that I now need to moved into C#. I'm getting errors like this
Unrecognized escape sequence
I am using Regex.Escape -- but obviously incorrectly.
string pattern = Regex.Escape("^.*(?=.{7,})(?=.*[a-zA-Z])(?=.*(\d|[!##$%\?\(\)\*\&\^\-\+\=_])).*$");
hiddenRegex.Attributes.Add("value", pattern);
How is this correctly done?
The error you're getting is coming at compile time correct? That means C# compiler is not able to make sense of your string. Prepend # sign before the string and you should be fine. You don't need Regex.Escape.
See What's the # in front of a string in C#?
var pattern = new Regex(#"^.*(?=.{7,})(?=.*[a-zA-Z])(?=.*(\d|[!##$%\?\(\)\*\&\^\-\+\=_])).*$");
pattern.IsMatch("Your input string to test the pattern against");
The error you are getting is due to the fact that your string contains invalid escape sequences (e.g. \d). To fix this, either escape the backslashes manually or write a verbatim string literal instead:
string pattern = #"^.*(?=.{7,})(?=.*[a-zA-Z])(?=.*(\d|[!##$%\?\(\)\*\&\^\-\+\=_])).*$";
Regex.Escape would be used when you want to embed dynamic content to a regular expression, not when you want to construct a fixed regex. For example, you would use it here:
string name = "this comes from user input";
string pattern = string.Format("^{0}$", Regex.Escape(name));
You do this because name could very well include characters that have special meaning in a regex, such as dots or parentheses. When name is hardcoded (as in your example) you can escape those characters manually.

Disallowing Backslashes in a Regular Expression C#

For a Username field there are certain varaitions that cannot be chosen as an appropiate username nor can certain characters be used.
For example: TIM1....TIM9 cannot be used BIN1....BIN9 cannot be used, nor can the characters <>:\/|?* appear anywhere in the field.
The code I have so far is thus:
private bool ValidateId(string regexValue)
{
Regex regex = new Regex("TIM[1-9]|BIN[1-9]|[<>:\"/|?*]");
return !regex.IsMatch(regexValue);
}
What I'm struggling to allow for however is the backslash character. Trying to escape it as I have done with the quotation character doesn't appear to work.
Thanks in advance.
You need to do a double escape. Try this:
Regex regex = new Regex("TIM[1-9]|BIN[1-9]|[<>:\\\\\"/|?*]");
Explanation:
You need to escape the backslash in C# strings to get a backslash in the string. Additionally, the string needs to have two backslashes, because Regex also requires the backslashes to be escaped.
BTW, using verbatim strings makes it a bit more readable:
Regex regex = new Regex(#"TIM[1-9]|BIN[1-9]|[<>:\\""/|?*]");
Both codes will result in a Regex with this expression:
TIM[1-9]|BIN[1-9]|[<>:\\"/|?*]

Categories

Resources