Navigate to uri with umlaut using wpf webbrowser control - c#

I'm using a WPF WebBrowser control to navigate to an URI containing a PDF file like that:
XAML
<WebBrowser x:Name="Browser" Loaded="Browser_OnLoaded"/>
Code behind
url = #"file:///c:\A.pdf"; // This works
url = #"file:///c:\Ä.pdf"; // This shows error
Browser.Navigate(url);
Error with Ä.pdf
Question
How can I navigate to the file with umlaut?
I tried UrlEncoding, changing to ASCII encoding, using extended ASCII all without success. Is it possible?
Edit
Using WebUtility.UrlEncode("Ä"); produces %C3%84 Why?:

I think the solution here is not to try fancy encoding, but use the Uri class.
This works for me in the WPF WebBrowser control:
var uri = new Uri("c:\users\täto\AppData\Roaming\MarkdownMonster\_preview.html");
PreviewBrowser.Navigate(uri);
It appears the Uri class handles all the encoding with no fuss.

To make it work with WebBrowser, you must update the encoding. By referencing the System.Web assembly you can use:
System.Web.HttpUtility.UrlEncode("Ä", Encoding.GetEncoding("ISO-8859-1")));
For below examples, I have used the character Ä.
All, non ASCII characters must be encoded in the URL using percent-encoded characters. It is explained in the following RFC (page 21, last paragraph of section 3.2.2): https://www.rfc-editor.org/rfc/rfc3986.
Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent-encoded to be represented as URI characters.
So the UTF8 character corresponding to Ä is C3 84. Which corresponds to the percent-encoded value: %C3%84.
You can use the following code to encode your file name:
System.Net.WebUtility.UrlEncode("Ä");
or
Uri.EscapeUriString("Ä");

Have you tried: #"file:///c:\Certific%E4te.pdf" instead of #"file:///c:\Certificäte.pdf" (as an example)?
More umlauts:
Ä = %C4
Ö = %D6
Ü = %DC
ä = %E4
ö = %F6
ü = %FC
ß = %DF
€ = %u20AC
$ = %24
% = %25

Related

Characters after "#" is not recognized by system.net.webclient.DownloadFile method [duplicate]

How do you properly encode a path that includes a hash (#) in it? Note the hash is not the fragment (bookmark?) indicator but part of the path name.
For example, if there is a path like this:
http://www.contoso.com/code/c#/somecode.cs
It causes problems when you for example try do this:
Uri myUri = new Uri("http://www.contoso.com/code/c#/somecode.cs");
It would seem that it interprets the hash as the fragment indicator.
It feels wrong to manually replace # with %23. Are there other characters that should be replaced?
There are some escaping methods in Uri and HttpUtility but none seem to do the trick.
There are a few characters you are not supposed to use. You can try to work your way through this very dry documentation, or refer to this handy URL summary on Stack Overflow.
If you check out this very website, you'll see that their C# questions are encoded %23.
Stack Overflow C# Questions
You can do this using either (for ASP.NET):
string.Format("http://www.contoso.com/code/{0}/somecode.cs",
Server.UrlEncode("c#")
);
Or for class libraries / desktop:
string.Format("http://www.contoso.com/code/{0}/somecode.cs",
HttpUtility.UrlEncode("c#")
);
Did some more digging friends and found a duplicate question for Java:
HTTP URL Address Encoding in Java
However, the .Net Uri class does not offer the constructor we need, but the UriBuilder does.
So, in order to construct a proper URI where the path contains illegal characters, do this:
// Build Uri by explicitly specifying the constituent parts. This way, the hash is not confused with fragment identifier
UriBuilder uriBuilder = new UriBuilder("http", "www.contoso.com", 80, "/code/c#/somecode.cs");
Debug.WriteLine(uriBuilder.Uri);
// This outputs: http://www.contoso.com/code/c%23/somecode.cs
Notice how it does not unnecessarily escape parts of the URI that does not need escaping (like the :// part) which is the case with HttpUtility.UrlEncode. It would seem that the purpose of this class is actually to encode the querystring/fragment part of the URL - not the scheme or hostname.
Use UrlEncode: System.Web.HttpUtility.UrlEncode(string)
class Program
{
static void Main(string[] args)
{
string url = "http://www.contoso.com/code/c#/somecode.cs";
string enc = HttpUtility.UrlEncode(url);
Console.WriteLine("Original: {0} ... Encoded {1}", url, enc);
Console.ReadLine();
}
}

Uri Class doesn't convert correctly the special char

I'm using the Uri class to request datas using a php script. In my case I need to use URL containing special char like: é or '. Here is my piece of code:
string NomArret = "Université";
uri = new Uri("http://localhost/getdata.php?aarret=" + NomArret);
But this return 0 results. I debugged and I notices that uri encode this URL like:
http://84.75.112.69/getdata.php?aarret=Universit%C3%A9
So he converts the char é to %C3%A9. In this website (www.degraeve.com/reference/urlencoding.php) I've seen that the é char does be convert to %E9.
When I try manually using this encoding:
http://84.75.112.69/getdata.php?aarret=Universit%E9
It works ! So how can I adapt my code to be able to convert correctly the special character ?
Can you use Uri.EscapeDataString ? (I'm not a C# dev so i can't verify it)

Unable to encode Url properly using HttpUtility.UrlEncode() method

I have created an application in which I need to encode/decode special characters from the url which is entered by user.
For example : if user enters http://en.wikipedia.org/wiki/Å then it's respective Url should be http://en.wikipedia.org/wiki/%C3%85.
I made console application with following code.
string value = "http://en.wikipedia.org/wiki/Å";
Console.WriteLine(System.Web.HttpUtility.UrlEncode(value));
It decodes the character Å successfully and also encodes :// characters. After running the code I am getting output like : http%3a%2f%2fen.wikipedia.org%2fwiki%2f%c3%85 but I want http://en.wikipedia.org/wiki/%C3%85
What should I do?
Uri.EscapeUriString(value) returns the value that you expect. But it might have other problems.
There are a few URL encoding functions in the .NET Framework which all behave differently and are useful in different situations:
Uri.EscapeUriString
Uri.EscapeDataString
WebUtility.UrlEncode (only in .NET 4.5)
HttpUtility.UrlEncode (in System.Web.dll, so intended for web applications, not desktop)
You could use regular expressions to select hostname and then urlencode only other part of string:
var inputString = "http://en.wikipedia.org/wiki/Å";
var encodedString;
var regex = new Regex("^(?<host>https?://.+?/)(?<path>.*)$");
var match = regex.Match(inputString);
if (match.Success)
encodedString = match.Groups["host"] + System.Web.HttpUtility.UrlEncode(match.Groups["path"].ToString());
Console.WriteLine(encodedString);

Read txt files (in unicode and utf8) by means of C#

I created two txt files (windows notepad) with the same content "thank you - спасибо" and saved them in utf8 and unicode. In notepad they look fine. Then I tried to read them using .Net:
...File.ReadAllText(utf8FileFullName, Encoding.UTF8);
and
...File.ReadAllText(unicodeFileFullName, Encoding.Unicode);
But in both cases I got this "thank you - ???????". What's wrong?
Upd:
code for utf8
static void Main(string[] args)
{
var encoding = Encoding.UTF8;
var file = new FileInfo(#"D:\encodes\enc.txt");
Console.OutputEncoding = encoding;
var content = File.ReadAllText(file.FullName, encoding);
Console.WriteLine("encoding: " + encoding);
Console.WriteLine("content: " + content);
Console.ReadLine();
}
Result:
thanks ÑпаÑибо
Edited as UTF8 should support the characters. It seems that you're outputting to a console or a location which hasn't had its encoding set. If so, you need to set the encoding. For the console you can do this
string allText = File.ReadAllText(unicodeFileFullName, Encoding.UTF8);
Console.OutputEncoding = Encoding.UTF8;
Console.WriteLine(allText);
Use the Encoding type Default
File.ReadAllText(unicodeFileFullName, Encoding.Default);
It will fix the ???? Chracters.
When outputting Unicode or UTF-8 encoded multi-byte characters to the console you will need to set the encoding as well as ensure that the console has a font set that supports the multi-byte character in order to display the corresponding glyph. With your existing code a MessageBox.Show(content) or display on a Windows or Web Form would appear correctly.
Have a look at http://msdn.microsoft.com/en-us/library/system.console.aspx for an explanation on setting fonts within the console window.
"Support for Unicode characters requires the encoder to recognize a particular Unicode character, and also requires a font that has the glyphs needed to render that character. To successfully display Unicode characters to the console, the console font must be set to a non-raster or TrueType font such as Consolas or Lucida Console."
As a side note, you can use the FileStream class to read the first three bytes of the file and look for the byte order mark indicator to automatically set the encoding when reading the file. For example, if byte[0] == 0xEF && byte[1] == 0xBB && byte[2] == 0xBF then you have a UTF-8 encoded file. Refer to http://en.wikipedia.org/wiki/Byte_order_mark for more information.

C#: bytes to UTF-8 string conversion. Why doesn't it work?

There is a Chinese character 𤭢 which is presented in UTF-8 as F0 A4 AD A2. This character is described here: http://en.wikipedia.org/wiki/UTF-8
𤭢 U+24B62 F0 A4 AD A2
When I run this code in C# ...
byte[] data = { 0xF0, 0xA4, 0xAD, 0xA2 };
string abc = Encoding.UTF8.GetString(data);
Console.WriteLine("Test: description = {0}", abc);
... I redirect the output to the text file and then open it with notepad.exe choosing UTF-8 encoding. I expect to get 𤭢 in the output, but do get two question marks (??).
The byte sequence is right. It works in Perl:
print "\xF0\xA4\xAD\xA2";
In the output, I get 𤭢
So my question is: why do I get "??" instead of "𤭢" in C#?
P.S. Nothing special with this character: I got the same thing for any character (2, 3 or 4 byte long).
Console can't display Unicode characters by default. It displays only ASCII. To enable it display Unicode, use:
Console.OutputEncoding = System.Text.Encoding.Unicode
before writing to it.
But anyway it will fail on most OS, because Windows Command line doesn't support Unicode itself.
So, for testing purpose it would be better to write output to file
You need to write to a file using UTF8. The code below shows how you may do it. When opening the resulting file in Notepad, the character 𤭢 is shown correctly:
string c = "𤭢";
var bytes = Encoding.UTF8.GetBytes(c);
var cBack = Encoding.UTF8.GetString(bytes);
using (var writer = new StreamWriter(#"c:\temp\char.txt", false, Encoding.UTF8))
{
writer.WriteLine(cBack);
}

Categories

Resources