I've noticed that C# adds additional slashes (\) to paths. Consider the path C:\Test. When I inspect the string with this path in the text visualiser, the actual string is C:\\Test.
Why is this? It confuses me, as sometimes I may want to split the path up (using string.Split()), but have to wonder which string to use (one or two slashes).
The \\ is used because the \ is an escape character and is need to represent the a single \.
So it is saying treat the first \ as an escape character and then the second \ is taken as the actual value. If not the next character after the first \ would be parsed as an escaped character.
Here is a list of available escape characters:
\' - single quote, needed for character literals
\" - double quote, needed for string literals
\\ - backslash
\0 – Null
\a - Alert
\b - Backspace
\f - Form feed
\n - New line
\r - Carriage return
\t - Horizontal tab
\v - Vertical quote
\u - Unicode escape sequence for character
\U - Unicode escape sequence for surrogate pairs.
\x - Unicode escape sequence similar to "\u" except with variable length.
EDIT: To answer your question regarding Split, it should be no issue. Use Split as you would normally. The \\ will be treated as only the one character of \.
.Net is not adding anything to your string here. What your seeing is an effect of how the debugger chooses to display strings. C# strings can be represented in 2 forms
Verbatim Strings: Prefixed with an # sign and removes the need o escape \\ characters
Normal Strings: Standard C style strings where \\ characters need to escape themselves
The debugger will display a string literal as a normal string vs. a verbatim string. It's just an issue of display though, it doesn't affect it's underlying value.
Debugger visualizers display strings in the form in which they would appear in C# code. Since \ is used to escape characters in non-verbatum C# strings, \\ is the correct escaped form.
Okay, so the answers above are not wholly correct. As such I am adding my findings for the next person who reads this post.
You cannot split a string using any of the chars in the table above if you are reading said string(s) from an external source.
i.e,
string[] splitStrings = File.ReadAllText([path]).Split((char)7);
will not split by those chars. However internally created strings work fine.
i.e.,
string[] splitStrings = "hello\agoodbye".Split((char)7);
This may not hold true for other methods of reading text from a file. I am unsure as I have not tested with other methods. With that in mind, it is probably best not to use those chars for delimiting strings!
Related
I'm writing a program in C# using Microsoft Visual Studio, i need the program to match the vertical bar, but when I try to escape it like this "\|" it gives me an unrecognized escape sequence error. What am I doing wrong?
In C#
string test = "\|";
Is going to fail because this is a C# string escape sequence, and no such escape exists. Because you are trying to include a backslash in the string, you need to escape the slash so the string actually contains a slash:
string test = "\\|";
What will actually be stored in this string is \|
The reason you get an unrecognized escape sequence is that backslash is used as an escape character in C# string literals as well as in regex.
You have several choices to fix this:
Use verbatim literals, i.e. #"\|", or
Use a second escape inside a regular literal, i.e. "\\|", or
Use a character class, i.e. [|]
The third one is my personal favorite, because it does not require counting backslashes.
The string is treating "\|" as an escaped pipe in C#. Try "\|" to escape the \ so that the regex actually sees the \| you want.
I've noticed that C# adds additional slashes (\) to paths. Consider the path C:\Test. When I inspect the string with this path in the text visualiser, the actual string is C:\\Test.
Why is this? It confuses me, as sometimes I may want to split the path up (using string.Split()), but have to wonder which string to use (one or two slashes).
The \\ is used because the \ is an escape character and is need to represent the a single \.
So it is saying treat the first \ as an escape character and then the second \ is taken as the actual value. If not the next character after the first \ would be parsed as an escaped character.
Here is a list of available escape characters:
\' - single quote, needed for character literals
\" - double quote, needed for string literals
\\ - backslash
\0 – Null
\a - Alert
\b - Backspace
\f - Form feed
\n - New line
\r - Carriage return
\t - Horizontal tab
\v - Vertical quote
\u - Unicode escape sequence for character
\U - Unicode escape sequence for surrogate pairs.
\x - Unicode escape sequence similar to "\u" except with variable length.
EDIT: To answer your question regarding Split, it should be no issue. Use Split as you would normally. The \\ will be treated as only the one character of \.
.Net is not adding anything to your string here. What your seeing is an effect of how the debugger chooses to display strings. C# strings can be represented in 2 forms
Verbatim Strings: Prefixed with an # sign and removes the need o escape \\ characters
Normal Strings: Standard C style strings where \\ characters need to escape themselves
The debugger will display a string literal as a normal string vs. a verbatim string. It's just an issue of display though, it doesn't affect it's underlying value.
Debugger visualizers display strings in the form in which they would appear in C# code. Since \ is used to escape characters in non-verbatum C# strings, \\ is the correct escaped form.
Okay, so the answers above are not wholly correct. As such I am adding my findings for the next person who reads this post.
You cannot split a string using any of the chars in the table above if you are reading said string(s) from an external source.
i.e,
string[] splitStrings = File.ReadAllText([path]).Split((char)7);
will not split by those chars. However internally created strings work fine.
i.e.,
string[] splitStrings = "hello\agoodbye".Split((char)7);
This may not hold true for other methods of reading text from a file. I am unsure as I have not tested with other methods. With that in mind, it is probably best not to use those chars for delimiting strings!
I am trying to get the length of a string that has \\\ values.
e.g. "C:\\\Dir1\\\Dir2\\\Dir3\\\Dir4\\\flower.bmp"
The length of the example is 38 characters.
When I use the length property the length is 33, basically it is treating \\\ as one character.
I have tried using StringInfo.LengthInTextElements and various other ways to try and get this working but with no joy.
Since the character \ is used to escape characters in a string, \\ actually represents the \ character literally.
Try a verbatim string if you want \\ to be treated as two characters:
#"C:\\Dir1\\Dir2\\Dir3\\Dir4\\flower.bmp"
MSDN Reference
My gut says you have a more fundamental problem, but have you tried wrapping it as a literal string?
string myString = #'C:\\Dir1\\Dir2\\Dir3\\Dir4\\flower.bmp'
it is one char. if you want it to be 2 chars either use # at the beginning or maybe \\ twice (haven't tried.. checking now)
That's because \\ in a C# string is known as an escape sequence. Your string in code:
"C:\\Dir1\\Dir2\\Dir3\\Dir4\\flower.bmp"
becomes this string on disk and in memory when the program is loaded.
"C:\Dir1\Dir2\Dir3\Dir4\flower.bmp"
So, the length of your example really is 33 characters. The original string, while it may be 38 characters in code, only represents 33 real characters.
33 is correct - \\ is indeed only one character, namely \. It's only the debugger that shows it escaped (\ has a special meaning for \n or \r, line feed and carriage return, respectively, for example).
The backslash \ is an escape character to put special characters in your string like \t for a tab and \n for a newline. A double backslash \\ will insert one backslash into the compiled string instead of your expected 2. The answer is to use the c# feature # in front of your string which prevent escaping or escaping all of your backslashes which would look like "C:\\\\Dir1\\\\Dir2\\\\Dir3\\\\Dir4\\\\flower.bmp"
I asked another question poorly so i'll ask something else.
According to http://www.c-point.com/javascript_tutorial/special_characters.htm there are a few escape characters such as \n and \b. However / is not one of them. What happens in this case? (\/) is the \ ignored?
I have a string in javascript 'http:\/\/www.site.com\/user'. Not that this is a literal with ' so with " it would look like \\/ anyways i would like to escape this string thus the question on what happens on non 'special' escape characters.
And another question is if i had name:\t me (or "name:\\t me" is there a function to escape it so there is a tab? i am using C# and these strings come from a JSON file
According to Mozilla:
For characters not listed [...] a preceding backslash is ignored, but this usage is deprecated and
should be avoided.
https://developer.mozilla.org/en/JavaScript/Guide/Values%2c_Variables%2c_and_Literals#section_19
The \/ sequence is not listed but there're at least two common usages:
<1> It's required to escape literal slashes in regular expressions that use the /foo/ syntax:
var re = /^http:\/\//;
<2> It's required to avoid invalid HTML when you embed JavaScript code inside HTML:
<script type="text/javascript"><!--
alert('</p>')
//--></script>
... triggers: end tag for element "P" which is not open
<script type="text/javascript"><!--
alert('<\/p>')
//--></script>
... doesn't.
If a backslash is found before a character which is not meaningful as an escape sequence, it will be ignored, i.e. "\/" and "/" are the same string in Javascript.
The / character is the regular expression delimiter, so it only has to be escaped in a regex context:
/[a-z]/[0-9]/ // Invalid.
/[a-z]\/[0-9]/ // Matches a lowercase letter, followed by a slash,
// followed by a digit.
Finally, if you want to collapse a backslash followed by a character into the corresponding escape sequence, you'll have to replace the whole expression:
string expr = "name:\\t me"; // Backslash followed by `t`.
expr = expr.Replace("\\t", "\t"); // Tab character.
\ is evaluated as \ if \ + next character is not an escape sequence.
examples:
\t -> escape sequence t -> tab
\\t -> escape \ and t -> \t
\\ -> escape sequence \ -> \
\c -> \c (not an escape sequence)
\a -> escape sequence a -> ???
Note that there are escape sequences also on completely weird symbols, so be careful. IMHO there is no good standard between languages and operating systems.
And actually, its even more non-stardard: in basic C '\y' -> y + warning, not \y. So this is very language dependent, be careful. (disregard my comment below).
br,
Juha
edit: What language are you using?= Java and c have slightly different behavior.
C and java seem to have the same escapes and python has different:
http://en.csharp-online.net/CSharp_FAQ:_What_are_the_CSharp_character_escape_sequences
http://www.cerritos.edu/jwilson/cis_182/language_resources/java_escape_sequences.htm
http://www.java2s.com/Code/Python/String/EscapeCodesbtnar.htm
In C# you can use the backslash character to tell the compiler what you really want. After compiling though, these escape characters do not exist.
If you use string myString = "\t"; the string will actually contain a TAB character, not just represent one. You can test this by checking myString.Length which is 1.
If you want to send the characters "backslash" and "t" to your JSON client however, you'll have to tell the compiler to keep his hands off the backslash, by escaping the backslash:
string myString = "\\t"; will result in a string of two characters, the "backslash" and the "t".
Things get messy if you have to cross multiple layers of escaping and unescaping, try to debug through these layers to see what's really happening under the hood.
I want to clean strings that are retrieved from a database.
I ran into this issue where a property value (a name from a database) had an embedded TAB character, and Chrome gave me an invalid TOKEN error while trying to load the JSON object.
So now, I went to http://www.json.org/ and on the side it has a specification. But I'm having trouble understanding how to write a cleanser using this spec:
string
""
" chars "
chars
char
char chars
char
any-Unicode-character-
except-"-or--or-
control-character
\"
\\
/
\b
\f
\n
\r
\t
\u four-hex-digits
Given a string, how can I "clean" it such that I conform to this spec?
Specifically, I am confused: does the spec allow TAB (0x0900) characters? If so, why did Chrome given an invalid TOKEN error?
Tab characters (actual 0x09, not escapes) cannot appear inside of quotes in JSON (though they are valid whitespace outside of quotes). You'll need to escape them with \t or \u0009 (the former being preferable).
json.org says an unescaped character of a string must be:
Any UNICODE character except " or \ or
control character
Tab counts as a control character.
This maybe what you are looking for it shows how to use the JavaScriptSerializer class in C#.
How to create JSON String in C#