C# Reading a file add special characters - c#

I am reading a text file with this structure:
20150218;"C7";"B895";00101;"FTBCCAL16"
I read the line and split like this:
System.IO.StreamReader fichero = new System.IO.StreamReader(ruta, Encoding.Default);
while ((linea = fichero.ReadLine()) != null)
{
// Split by ";"
String[] separador = linea.Split(';');
}
But when I see the content of "linea", I have this:
"20150218";\"C7\";\"B895\";"00101";\"FTBCCAL16\"
As you see, the streamreader add some special character to the output like "" and \. I want to obtain this.
20150218;"C7";"B895";00101;"FTBCCAL16"
Is there a way to obtain this?
Thanks in advance! Regards!

You are watching it in Visual Studio debugger, which just shows you your lines this way. You can write your result into a console or into the file. And you will see normal text without special characters.

StreamReader is not adding or modifying the strings read from the file at all.
If you are viewing the contents of separador in the Visual Studio debugger, it will add an escape sequence to any special characters (for display purposes).
The displayed format matches how you would have to enter them in the code editor if you were creating a string constant.
For example,
However, the real contents of these strings (in memory) are not escaped. They are exactly as you expect them to be in your question.
If you output them or try to manipulate them in code they will have the correct contents.
So, your code is correct. You just have to understand escape sequences and how strings appear in the Visual Studio debugger.
Update:
See this question for an explanation of how to display unquoted strings in the debugger.

Okay here is the quotation from MSDN
At compile time, verbatim strings are converted to ordinary strings with all the same escape sequences. Therefore, if you view a verbatim string in the debugger watch window, you will see the escape characters that were added by the compiler, not the verbatim version from your source code. For example, the verbatim string #"C:\files.txt" will appear in the watch window as "C:\files.txt".
In your case for " it uses \" (Verbatim string)and this can be visible at debugging time.
Why this happens ?
Double quotation mark " is an escape sequence
Escape sequences are typically used to specify actions such as carriage returns and tab movements on terminals and printers. They are also used to provide literal representations of nonprinting characters and characters that usually have special meanings, such as the double quotation mark (")
So when a string purposefully contains an escape sequence, you need to represent it as a verbatim string. That's what compiler do and that's what you see in debugger

Related

How to express a string literal containing double quotes without escape characters?

Is there a C# syntax with which I can express strings containing double quotes without having to escape them? I frequently copy and paste strings between C# source code to other apps, and it's frustrating to keep adding and removing backslashes.
Eg. presently for the following string (simple example)
"No," he said.
I write in C# "\"No,\" he said."
But I'd rather write something like Python '"No," he said.', or Ruby %q{"No," he said.}, so I can copy and paste the contents verbatim to other apps.
I frequently copy and paste strings between C# source code to other apps, and it's frustrating to keep adding and removing backslashes.
Then it sounds like you probably shouldn't have the strings within source code.
Instead, create text files which are embedded in your assembly, and load them dynamically... or create resource files so you can look up strings by key.
There's no form of string literal in C# which would allow you to express a double-quote as just a single double-quote character in source code.
You could try this but you're still effectively escaping:
string s = #"""No,"" he said.";
Update 2022: C# 11 in Visual Studio 2022 version 17.2 (or later) supports raw string literals between """ https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-11#raw-string-literals
Raw string literals are a new format for string literals. Raw string literals can contain arbitrary text, including whitespace, new lines, embedded quotes, and other special characters without requiring escape sequences. A raw string literal starts with at least three double-quote (""") characters. It ends with the same number of double-quote characters. Typically, a raw string literal uses three double quotes on a single line to start the string, and three double quotes on a separate line to end the string. The newlines following the opening quote and preceding the closing quote aren't included in the final content:
Example (note that StackOverflow doesn't yet highlight correctly)
string longMessage = """
This is a long message.
It has several lines.
Some are indented
more than others.
Some should start at the first column.
Some have "quoted text" in them.
""";

How to decode special characters (blank square unreadable) in c#?

I have a problem that i can manage to resolve, i need to replace unreadable characters (i can't paste it here since it's not taken into acount but it shows like a blank square in the visual C# debugger.
When those are inserted in the sql database they are replaced by a ? but i don't want it... I tried to do a simple replace on the string but visual c# makes the pasting of such characters impossible.
You can't. When you see the red squares, you've already lost the relevant character info and you can't convert them back.
You can replace any character if you only know the unicode character code:
s = s.Replace('\u0080', ' ');
You can use a regular expression to replace any character outside of a set of allowed characters:
s = Regex.Replace(s, #"[^0-9A-Za-z]", " ");
Another alternative is to use a unicode data type in the database, so that it can handle any character that you have in your string.

Backslashes not working properly in my web service

I have a simple line of code in a web service:
instance = #"\instanceNameHere";
Yet the output is always the same.
\\instanceNameHere
If I remove the # and use two slashes, I get the same result. I've never seen this before and my Google-fu has failed me. I even wrote a simple app and the result was correct. So why is it acting up in the web service?
It's escaping the slash for you in the debugger so you know that it's a slash and not an escape sequence like \t. If the debugger did not do this, how could you distinguish the string
\t
from the string
<tab>
in the debugger since the latter is represented in an escape sequence by \t? Therefor the former is shown as
\\t
and the latter as
\t
Write it to a stream or the console and you'll see that it only has one slash, or do instance.Length and compare to a count of the characters. You'll see 17 on the console, whereas \\instanceNameHere has eighteen characters.
The debugger displays strings as C# literals. So it's displaying them with characters escaped. It would also show carriage returns as \r and tabs as \t. This is purely for visualization -- the string does not literally contain these escape characters. If you write it out to a log, it will not include the escape characters -- it will look as you expect.
A UNC name of any format, which always start with two backslash characters ("\").
Link
Update : Please see #Jason post above! I didn't realise he was checking in the debugger.

string.replace seriously broken with \

"C://test/test/test.png" -> blub
blub = blub.Replace(#"/", #"\");
result = "C:\\\\test\\test\\test.png"
how does that make sense? It replaces a single / with two \
?
It's actually working:
string blub = "C://test/test/test.png";
string blub2 = blub.Replace(#"/", #"\");
Console.WriteLine(blub);
Console.WriteLine(blub2);
Output:
C://test/test/test.png
C:\\test\test\test.png
BUT viewing the string in the debugger does show the effect you describe (and is how you would write the string literal in code without the #).
I've noticed this before but never found out why the debugger chooses this formatting.
No, it doesn't.
What you're seeing is the properly formatted string according to C# rules, and since the output you're seeing is shown as though you haven't prefixed it with the # character, every backslash is doubled up, because that's what you would have to write if you wanted that string in the first place.
Create a new console app and write the result to the console, and you'll see that the string looks like you wanted it to.
So this is just an artifact of how you look at the string (I assume the debugger).
The \ character in C# is the escape character, so if you are going to use it as a \ character you need two - otherwise the next character gets treated specially (new line etc).
See What character escape sequences are available? (C#)
The character \ is a special character, which changes the meaning of the character after it in string literals. So when you refer to \ itself, it needs to be escaped: \\.
Look up "escape characters".
Its done what it should.
"\\" is the same as #"\"
"\" is an escape character. Without the verbatim indicator "#" before a string a single \ is shown as "\\"
You should think twice before saying something like that....
The string.Replace function is basic functionality that has been around for a long time.... Whenever you find you have a problem with something like that, it's probably not the function that is broken, but your understanding or use of it.

something like a python's triple-quote in F# (or C#)?

I want to assign a xml code into a string variable.
I can do this without escaping single or double-quotes by using triple-quote in python.
Is there a similar way to do this in F# or C#?
F# 3.0 supports triple quoted strings. See Visual Studio F# Team Blog Post on 3.0 features.
The F# 3.0 Spec Strings and Characters section specifically mentions the XML scenario:
A triple-quoted string is specified by using three quotation marks
(""") to ensure that a string that includes one or more escaped
strings is interpreted verbatim. For example, a triple-quoted string
can be used to embed XML blobs:
As far as I know, there is no syntax corresponding to this in C# / F#. If you use #"str" then you have to replace quote with two quotes and if you just use "str" then you need to add backslash.
In any case, there is some encoding of ":
var str = #"allows
multiline, but still need to encode "" as two chars";
var str = "need to use backslahs \" here";
However, the best thing to do when you need to embed large strings (such as XML data) into your application is probably to use .NET resources (or store the data somewhere else, depending on your application). Embedding large string literals in program is generally not very recommended. Also, there used to be a plugin for pasting XML as a tree that constructs XElement objects for C#, but I'm not sure whether it still exists.
Although, I would personally vote to add """ as known from Python to F# - it is very useful, especially for interactive scripting.
In case someone ran into this question when looking for triple quote strings in C# (rather than F#), C#11 now has raw string literals and they're (IMO) better than Python's (due to how indentation is handled)!
Raw string literals are a new format for string literals. Raw string literals can contain arbitrary text, including whitespace, new lines, embedded quotes, and other special characters without requiring escape sequences. A raw string literal starts with at least three double-quote (""") characters. It ends with the same number of double-quote characters. Typically, a raw string literal uses three double quotes on a single line to start the string, and three double quotes on a separate line to end the string. The newlines following the opening quote and preceding the closing quote are not included in the final content:
string longMessage = """
This is a long message.
It has several lines.
Some are indented
more than others.
Some should start at the first column.
Some have "quoted text" in them.
""";
Any whitespace to the left of the closing double quotes will be removed from the string literal. Raw string literals can be combined with string interpolation to include braces in the output text. Multiple $ characters denote how many consecutive braces start and end the interpolation:
var location = $$"""
You are at {{{Longitude}}, {{Latitude}}}
""";
The preceding example specifies that two braces starts and end an interpolation. The third repeated opening and closing brace are included in the output string.
https://devblogs.microsoft.com/dotnet/csharp-11-preview-updates/#raw-string-literals
https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-11
As shoosh said, you want to use the verbatim string literals in C#, where the string starts with # and is enclosed in double quotation marks. The only exception is if you need to put a double quotation mark in the string, in which case you need to double it
System.Console.WriteLine(#"Hello ""big"" world");
would output
Hello "big" world
http://msdn.microsoft.com/en-us/library/362314fe.aspx
In C# the syntax is #"some string"
see here

Categories

Resources