Handle filenames with en dash (–) using Uri.EscapeDataString() in C#

Handle filenames with en dash (–) using Uri.EscapeDataString() in C# - c#

We are trying to handle the error in downloading a file with an en dash (–) in its filename. The solution uses Uri.EscapeDataString() to escape the special characters in the filename.
Uri.EscapeDataString(file.FileName.Replace("&", "~|").Replace("#", "~^").Replace("%", "~)").Replace("+", "~("))
You can see that some of the special characters have been replaced and it has been converted to the replaced symbols when the line of code above was called. (see screenshot below)
We tried applying it to en dash and replaced it with "~]" but it was not converted unlike the other special characters that were also replaced by another special characters.
Uri.EscapeDataString(file.FileName.Replace("&", "~|").Replace("#", "~^").Replace("%", "~)").Replace("+", "~(").Replace("–", "~]"))
Is there someone who'd know how to handle this scenario? Thank you!

Related

C# Reading a file add special characters

I am reading a text file with this structure:
20150218;"C7";"B895";00101;"FTBCCAL16"
I read the line and split like this:
System.IO.StreamReader fichero = new System.IO.StreamReader(ruta, Encoding.Default);
while ((linea = fichero.ReadLine()) != null)
{
// Split by ";"
String[] separador = linea.Split(';');
}
But when I see the content of "linea", I have this:
"20150218";\"C7\";\"B895\";"00101";\"FTBCCAL16\"
As you see, the streamreader add some special character to the output like "" and \. I want to obtain this.
20150218;"C7";"B895";00101;"FTBCCAL16"
Is there a way to obtain this?
Thanks in advance! Regards!

You are watching it in Visual Studio debugger, which just shows you your lines this way. You can write your result into a console or into the file. And you will see normal text without special characters.

StreamReader is not adding or modifying the strings read from the file at all.
If you are viewing the contents of separador in the Visual Studio debugger, it will add an escape sequence to any special characters (for display purposes).
The displayed format matches how you would have to enter them in the code editor if you were creating a string constant.
For example,
However, the real contents of these strings (in memory) are not escaped. They are exactly as you expect them to be in your question.
If you output them or try to manipulate them in code they will have the correct contents.
So, your code is correct. You just have to understand escape sequences and how strings appear in the Visual Studio debugger.
Update:
See this question for an explanation of how to display unquoted strings in the debugger.

Okay here is the quotation from MSDN
At compile time, verbatim strings are converted to ordinary strings with all the same escape sequences. Therefore, if you view a verbatim string in the debugger watch window, you will see the escape characters that were added by the compiler, not the verbatim version from your source code. For example, the verbatim string #"C:\files.txt" will appear in the watch window as "C:\files.txt".
In your case for " it uses \" (Verbatim string)and this can be visible at debugging time.
Why this happens ?
Double quotation mark " is an escape sequence
Escape sequences are typically used to specify actions such as carriage returns and tab movements on terminals and printers. They are also used to provide literal representations of nonprinting characters and characters that usually have special meanings, such as the double quotation mark (")
So when a string purposefully contains an escape sequence, you need to represent it as a verbatim string. That's what compiler do and that's what you see in debugger

ASP.NET MVC Regular Expression for Username is not working correctly. or Convert Regex->JS

I have added a regular expression from a site to verify user name and and it should work but it is giving some error on the compile time. Please see the image and then I googled and learned that few of chars like '\w' is not going to work because js does not support it. Now I don't know how to convert it , can anyone please help to convert this to workable with ASP.NET MVC data-annotations.
[RegularExpression("^([a-zA-Z])[a-zA-Z_-]*[\w_-]*[\S]$|^([a-zA-Z])[0-9_-]*[\S]$|^[a-zA-Z]*[\S]$")]
Thank you all in advance.

Make your string a literal by adding # sign before the opening quote. Otherwise you would need to escape all the backslashes that the string contains. That would make regular expression like this even less readable.
[RegularExpression(#"^([a-zA-Z])[a-zA-Z_-]*[\w_-]*[\S]$|^([a-zA-Z])[0-9_-]*[\S]$|^[a-zA-Z]*[\S]$")]
A literal string enables you to use special characters such as a
backslash or double-quotes without having to use special codes or
escape characters. This makes literal strings ideal for file paths
that naturally contain many backslashes. To create a literal string,
add the at-sign # before the string’s opening quote

Backslashes not working properly in my web service

I have a simple line of code in a web service:
instance = #"\instanceNameHere";
Yet the output is always the same.
\\instanceNameHere
If I remove the # and use two slashes, I get the same result. I've never seen this before and my Google-fu has failed me. I even wrote a simple app and the result was correct. So why is it acting up in the web service?

It's escaping the slash for you in the debugger so you know that it's a slash and not an escape sequence like \t. If the debugger did not do this, how could you distinguish the string
\t
from the string
<tab>
in the debugger since the latter is represented in an escape sequence by \t? Therefor the former is shown as
\\t
and the latter as
\t
Write it to a stream or the console and you'll see that it only has one slash, or do instance.Length and compare to a count of the characters. You'll see 17 on the console, whereas \\instanceNameHere has eighteen characters.

The debugger displays strings as C# literals. So it's displaying them with characters escaped. It would also show carriage returns as \r and tabs as \t. This is purely for visualization -- the string does not literally contain these escape characters. If you write it out to a log, it will not include the escape characters -- it will look as you expect.

A UNC name of any format, which always start with two backslash characters ("\").
Link
Update : Please see #Jason post above! I didn't realise he was checking in the debugger.

Having trouble grokking CSS 2.1 grammar

I am writing a hand-coded CSS 2.1 parsing engine (in C#), and I'm working directly off the W3C CSS 2.1 grammar (http://www.w3.org/TR/CSS21/grammar.html). However, there's a token that I just don't quite get:
url ([!#$%&*-~]|{nonascii}|{escape})*
...
"url("{w}{url}{w}")" {return URI;}
"url("{w}{string}{w}")" {return URI;}
I don't get what the URL production is supposed to do. It appears to be a string of only !#$%&*-~, non-ascii, or escaped unicode code points. How is that a URL? Is this production just really badly named, and what purpose is it supposed to serve?
Any help appreciated. FYI, I've added the C# tag only to increase the audience to actual programmers who might have encountered this or have insights - I apologize if you think I shouldn't apply.

Dude, did you read the CONTEXT surrounding that expression?
baduri1 url\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}
baduri2 url\({w}{string}{w}
baduri3 url\({w}{badstring}
Hmmm... Bad, bad, bad. Bit of a giveaway, eh what? Generally, If something in the doco doesn't make sense to you, or appears just plain wrong, maybe it shouldn't make sense? Yes? So you read around it... to acquire the correct context.

[!#$%&*-~] breaks down to:
!, #, $, %, &, plus the character range * - ~.
This takes in most printable ASCII characters, including uppercase, lowercase, digits and a range of punctuation characters.
It's easier to list the printable ASCII characters which this regex doesn't match:
Double quote ", single quote ', and parenthesis (, ); i.e printable ascii characters minus delimiters. This makes it possible to parse urls that do not include quotation marks. E.g. url(http://example.com), instead of url("http://example.com").
Concise, but tricky!
P.S. The token name is confusing as well. A better name would have been something like: url_string or url_arg.
EDIT Feb 2015 The latest CSS3 Syntax Spec names the token url-unquoted

I don't get what the URL production is supposed to do. It appears to be a string of only !#$%&*-~, non-ascii, or escaped unicode code points. How is that a URL? Is this production just really badly named, and what purpose is it supposed to serve?
The first line defines url as a regular expression:
url ([!#$%&*-~]|{nonascii}|{escape})*
The second line defines URI as a token which can be produced/returned by the lexer:
"url("{w}{url}{w}")" {return URI;}
The second line says that if the lexer sees url( then {w} then {url} then {w} then ) then it has found a URI.
The {w} expression is optional whitespace.
So according to the definition, the {url} is a regular expression: which defines what characters are allow inside a URI token, between the initial url( and the final ).

how to merge or inject "#" character in a string including escape characters without definning the string varibale from scratch in C#

hi , I have 2 related questions.
1)suppose we have:
string strMessage="\nHellow\n\nWorld";
console.writeln(strMessage);
Result is:
Hellow
World
Now if we want to show the string in the original format in One Line
we must redefine the first variable from scratch.
string strOrignelMessage=#"\nHellow\n\nWorld" ;
console.writln(strOrignelMessage);
Result is:
\nHellow\n\nWorld --------------------->and everything is ok.
i am wondering is there a way to avoid definning
the new variable(strOrignelMessage) in code for this purpose and just using only
the first string variable(strMessage) and apply some tricks and print it in one line.
at first i tried the following workaround but it makes some bugs.suppose we have:
string strMessage="a\aa\nbb\nc\rccc";
string strOrigenalMessage=strMessage.replace("\n","\\n").replace("\r","\\r");
Console.writeln(strOrigenalMessage)
result is :aa\nbb\nc\rccc
notice that befor the first "\" not printed.and now my second question is:
2)How we can fix the new problem with single "\"in the string
i hope to entitle this issue correctly and my explanations would be enough,thanks

No, because the compiler has already converted all of your escaped characters in the original string to the characters they represent. After the fact, it is too late to convert them to non-special characters. You can do a search and replace, converting '\n' to literally #"\n", but that is whacky and you're better off defining the string correctly in the first place. If you wanted to escape the backslashes in the first place, why not put an extra backslash character in front of each of them:
Instead of "\n" use "\\n".
Updated in response to your comment:
If the string is coming from user input, you don't need to escape the backslash, because it will be stored as a backslash in the input string. The escape character only works as an escape character in string literals in code (and not preceded by #, which makes them verbatim string literals).

if you want "\n\n\a\a\r\blah" to print as \n\n\a\a\r\blah without # just replace all \ with \\
\ is the escaper in a non-verbatim string. So you simply need to escape the escaper, as it were.

If you want to use both strings, but want to have only one in the code then write the string with #, and construct the other one with Replace(#"\n","\n").

explanations for Anthony Pegram (if i understand u right) and anyone that found it usefull
i think i find my way in question2.
at first ,unfortunately,i thought that the
escape characters limts to \n,\t,\r,\v and
this made me confuesed becouse in my sample string i used \a and \b
and the compiler behaviuor was not understandable for me.
but finally i found that \a and \b is in
escape-characters set too.and if u use "\" without escap characters
a compile time error would be raised (its so funny when i think to My mistake again)
pls refers to this usefull msdn article for more info.
2.4.4.5 String literals
and you couldnt replace \ (single\) with \\
becouse fundamentally you couldnt have a (single \) without using
escape-characters after it in a string .so we coudnt write such a string in the code:
string strTest="abc\pwww"; ------> compile time error
and for retriving an inactived escape characters version of a string
we can use simply string.replace method as i used befor.
excuse me for long strory ,thank u all for cooperation.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Handle filenames with en dash (–) using Uri.EscapeDataString() in C# - c#

Related

C# Reading a file add special characters

ASP.NET MVC Regular Expression for Username is not working correctly. or Convert Regex->JS

Backslashes not working properly in my web service

Having trouble grokking CSS 2.1 grammar

how to merge or inject "#" character in a string including escape characters without definning the string varibale from scratch in C#

Categories

Resources