Uri Class doesn't convert correctly the special char - c#

I'm using the Uri class to request datas using a php script. In my case I need to use URL containing special char like: é or '. Here is my piece of code:
string NomArret = "Université";
uri = new Uri("http://localhost/getdata.php?aarret=" + NomArret);
But this return 0 results. I debugged and I notices that uri encode this URL like:
http://84.75.112.69/getdata.php?aarret=Universit%C3%A9
So he converts the char é to %C3%A9. In this website (www.degraeve.com/reference/urlencoding.php) I've seen that the é char does be convert to %E9.
When I try manually using this encoding:
http://84.75.112.69/getdata.php?aarret=Universit%E9
It works ! So how can I adapt my code to be able to convert correctly the special character ?

Can you use Uri.EscapeDataString ? (I'm not a C# dev so i can't verify it)

Related

Characters after "#" is not recognized by system.net.webclient.DownloadFile method [duplicate]

How do you properly encode a path that includes a hash (#) in it? Note the hash is not the fragment (bookmark?) indicator but part of the path name.
For example, if there is a path like this:
http://www.contoso.com/code/c#/somecode.cs
It causes problems when you for example try do this:
Uri myUri = new Uri("http://www.contoso.com/code/c#/somecode.cs");
It would seem that it interprets the hash as the fragment indicator.
It feels wrong to manually replace # with %23. Are there other characters that should be replaced?
There are some escaping methods in Uri and HttpUtility but none seem to do the trick.
There are a few characters you are not supposed to use. You can try to work your way through this very dry documentation, or refer to this handy URL summary on Stack Overflow.
If you check out this very website, you'll see that their C# questions are encoded %23.
Stack Overflow C# Questions
You can do this using either (for ASP.NET):
string.Format("http://www.contoso.com/code/{0}/somecode.cs",
Server.UrlEncode("c#")
);
Or for class libraries / desktop:
string.Format("http://www.contoso.com/code/{0}/somecode.cs",
HttpUtility.UrlEncode("c#")
);
Did some more digging friends and found a duplicate question for Java:
HTTP URL Address Encoding in Java
However, the .Net Uri class does not offer the constructor we need, but the UriBuilder does.
So, in order to construct a proper URI where the path contains illegal characters, do this:
// Build Uri by explicitly specifying the constituent parts. This way, the hash is not confused with fragment identifier
UriBuilder uriBuilder = new UriBuilder("http", "www.contoso.com", 80, "/code/c#/somecode.cs");
Debug.WriteLine(uriBuilder.Uri);
// This outputs: http://www.contoso.com/code/c%23/somecode.cs
Notice how it does not unnecessarily escape parts of the URI that does not need escaping (like the :// part) which is the case with HttpUtility.UrlEncode. It would seem that the purpose of this class is actually to encode the querystring/fragment part of the URL - not the scheme or hostname.
Use UrlEncode: System.Web.HttpUtility.UrlEncode(string)
class Program
{
static void Main(string[] args)
{
string url = "http://www.contoso.com/code/c#/somecode.cs";
string enc = HttpUtility.UrlEncode(url);
Console.WriteLine("Original: {0} ... Encoded {1}", url, enc);
Console.ReadLine();
}
}

System.Uri.AbsoluteURI charset encoding

I use System.Uri class to generate an URL for an HTML link. The problem is that it's encode special characters in UTF8 and my browser don't recognize them.
Sample code :
Uri uri = new Uri(#"\\computer\Temp\Réunion.txt");
Console.WriteLine(uri.AbsoluteUri);
Output :
file://computer/Temp/R%C3%A9union.txt
Expected :
file://computer/Temp/R%E9union.txt
How can I choose the encoding used by System.Uri.AbsoluteURI method ?
Does it exists any alternative solution to convert any path to valid URL ?

Navigate to uri with umlaut using wpf webbrowser control

I'm using a WPF WebBrowser control to navigate to an URI containing a PDF file like that:
XAML
<WebBrowser x:Name="Browser" Loaded="Browser_OnLoaded"/>
Code behind
url = #"file:///c:\A.pdf"; // This works
url = #"file:///c:\Ä.pdf"; // This shows error
Browser.Navigate(url);
Error with Ä.pdf
Question
How can I navigate to the file with umlaut?
I tried UrlEncoding, changing to ASCII encoding, using extended ASCII all without success. Is it possible?
Edit
Using WebUtility.UrlEncode("Ä"); produces %C3%84 Why?:
I think the solution here is not to try fancy encoding, but use the Uri class.
This works for me in the WPF WebBrowser control:
var uri = new Uri("c:\users\täto\AppData\Roaming\MarkdownMonster\_preview.html");
PreviewBrowser.Navigate(uri);
It appears the Uri class handles all the encoding with no fuss.
To make it work with WebBrowser, you must update the encoding. By referencing the System.Web assembly you can use:
System.Web.HttpUtility.UrlEncode("Ä", Encoding.GetEncoding("ISO-8859-1")));
For below examples, I have used the character Ä.
All, non ASCII characters must be encoded in the URL using percent-encoded characters. It is explained in the following RFC (page 21, last paragraph of section 3.2.2): https://www.rfc-editor.org/rfc/rfc3986.
Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent-encoded to be represented as URI characters.
So the UTF8 character corresponding to Ä is C3 84. Which corresponds to the percent-encoded value: %C3%84.
You can use the following code to encode your file name:
System.Net.WebUtility.UrlEncode("Ä");
or
Uri.EscapeUriString("Ä");
Have you tried: #"file:///c:\Certific%E4te.pdf" instead of #"file:///c:\Certificäte.pdf" (as an example)?
More umlauts:
Ä = %C4
Ö = %D6
Ü = %DC
ä = %E4
ö = %F6
ü = %FC
ß = %DF
€ = %u20AC
$ = %24
% = %25

Unable to encode Url properly using HttpUtility.UrlEncode() method

I have created an application in which I need to encode/decode special characters from the url which is entered by user.
For example : if user enters http://en.wikipedia.org/wiki/Å then it's respective Url should be http://en.wikipedia.org/wiki/%C3%85.
I made console application with following code.
string value = "http://en.wikipedia.org/wiki/Å";
Console.WriteLine(System.Web.HttpUtility.UrlEncode(value));
It decodes the character Å successfully and also encodes :// characters. After running the code I am getting output like : http%3a%2f%2fen.wikipedia.org%2fwiki%2f%c3%85 but I want http://en.wikipedia.org/wiki/%C3%85
What should I do?
Uri.EscapeUriString(value) returns the value that you expect. But it might have other problems.
There are a few URL encoding functions in the .NET Framework which all behave differently and are useful in different situations:
Uri.EscapeUriString
Uri.EscapeDataString
WebUtility.UrlEncode (only in .NET 4.5)
HttpUtility.UrlEncode (in System.Web.dll, so intended for web applications, not desktop)
You could use regular expressions to select hostname and then urlencode only other part of string:
var inputString = "http://en.wikipedia.org/wiki/Å";
var encodedString;
var regex = new Regex("^(?<host>https?://.+?/)(?<path>.*)$");
var match = regex.Match(inputString);
if (match.Success)
encodedString = match.Groups["host"] + System.Web.HttpUtility.UrlEncode(match.Groups["path"].ToString());
Console.WriteLine(encodedString);

A socket message from Python to C# comes through garbled

I'm trying to set up a very basic ZeroMQ-based socket link between Python server and C# client using simplejson and Json.NET.
I try to send a dict from Python and read it into an object in C#. Python code:
message = {'MessageType':"None", 'ContentType':"None", 'Content':"OK"}
message_blob = simplejson.dumps(message).encode(encoding = "UTF-8")
alive_socket.send(message_blob)
The message is sent as normal UTF-8 string or, if I use UTF-16, as "'\xff\xfe{\x00"\x00..." etc.
Code in C# is where my problem is:
string reply = client.Receive(Encoding.UTF8);
The UTF-8 message is received as "≻潃瑮湥≴›..." etc.
I tried to use UTF-16 and the message comes through OK, but the first symbols are still the little-endian \xFF \xFE BOM so when I try to feed it to the deserializer,
PythonMessage replyMessage = JsonConvert.DeserializeObject<PythonMessage>(reply);
//PythonMessage is just a very simple class with properties,
//not relevant to the problem
I get an error (obviously occurring at the first symbol, \xFF):
Unexpected character encountered while parsing value: .
Something is obviously wrong in the way I'm using encoding. Can you please show me the right way to do this?
The byte-order-mark is obligatory in UTF-16. You can use UTF-16LE or UTF-16BE to assume a particular byte order and the BOM will not be generated. That is, use:
message_blob = simplejson.dumps(message).encode(encoding = "UTF-16le")

Categories

Resources