Strange characters when consuming JSON web service - c#

I'm consuming a JSON WebService by using the WebClient.DOwnloadStringAsync. The returning string contains some strange character pair:
"start_address" : "Goethestraße 7-9, Monaco di Baviera, Germania",
In place of some extended charachter. How can I see the correct one? In the example it sould be: ß

Solved Myself:
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8; // Specify the encoding here

That is the encoding of the German "Double S" character, still used in the word Strasse in parts of Germany. Switching to UTF8 should solve your problem.

Related

C# Get site source code with letters other than english

I'm trying to get a site's source in C# using
WebClient client = new WebClient();
string content = client.DownloadString(url);
And it gets it just fine.
However, the source code contains Hebrew characters which shows like Gibbrish in content variable.
What do I need to do for it to recognize it?
WebClient client = new WebClient();
client.Encoding = System.Text.UTF8Encoding.UTF8; // added
string content = client.DownloadString(url);
You have to specify the encoding, you are probably requesting ASCII by default and the content could be in UTF8. This is an example where the encoding is set to UTF8. If you are not sure what it is check the source manually first and then specify the encoding accordingly. For more info see Remarks in the documentation.
The problem is the Encoding of your WebClient. MSDN says:
... the method uses the encoding specified in the Encoding property to convert the resource to a String.
Solution: Set a specific Encoding like
client.Encoding = Encoding.UTF8;
and try it again
string content = client.DownloadString(url);
UTF8 should do the trick to encode also the hebrew characters.

Unable to encode Url properly using HttpUtility.UrlEncode() method

I have created an application in which I need to encode/decode special characters from the url which is entered by user.
For example : if user enters http://en.wikipedia.org/wiki/Å then it's respective Url should be http://en.wikipedia.org/wiki/%C3%85.
I made console application with following code.
string value = "http://en.wikipedia.org/wiki/Å";
Console.WriteLine(System.Web.HttpUtility.UrlEncode(value));
It decodes the character Å successfully and also encodes :// characters. After running the code I am getting output like : http%3a%2f%2fen.wikipedia.org%2fwiki%2f%c3%85 but I want http://en.wikipedia.org/wiki/%C3%85
What should I do?
Uri.EscapeUriString(value) returns the value that you expect. But it might have other problems.
There are a few URL encoding functions in the .NET Framework which all behave differently and are useful in different situations:
Uri.EscapeUriString
Uri.EscapeDataString
WebUtility.UrlEncode (only in .NET 4.5)
HttpUtility.UrlEncode (in System.Web.dll, so intended for web applications, not desktop)
You could use regular expressions to select hostname and then urlencode only other part of string:
var inputString = "http://en.wikipedia.org/wiki/Å";
var encodedString;
var regex = new Regex("^(?<host>https?://.+?/)(?<path>.*)$");
var match = regex.Match(inputString);
if (match.Success)
encodedString = match.Groups["host"] + System.Web.HttpUtility.UrlEncode(match.Groups["path"].ToString());
Console.WriteLine(encodedString);

C# decoding "â„¢" to "TM"

on a web page there is following string
"Qualcomm Snapdragon™ S4"
when i get this string in my .net code the string convert to "Qualcomm Snapdragonâ„¢ S4"
the character "TM" change to â„¢
how can i decode "â„¢" back to "TM"
Update
follwoing is the code for downloaded string using webproxy
wc is webproxy
wc.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8");
string html = Server.HtmlEncode(wc.DownloadString(url));
You should read the webpage in its proper encoding in the first place. In this case it seems you are reading with Encoding.Default (i.e. probably CP1252) and the page is really in UTF-8. This should be apparent either by reading the Content-Type header of the response or by looking for a <meta http-equiv="Content-Type" content='text/html; charset=utf-8'> in the content.
If you still need to do this after the fact, then use
var bytes = Encoding.Default.GetBytes(myString);
var correctString = Encoding.UTF8.GetString(bytes);
In any case you would need to know the exact encodings that were used on the page and for reading the malformed string in the first place. Furthermore I'd generally advise explicitly against using Encoding.Default because its value isn't fixed. It's just the legacy encoding on a Windows system for use in non-Unicode applications and also gets used as the default non-Unicode text file encoding. It should have no place whatsoever in handling external resources.

Accented vowels are come out strange character in C# WebClient [duplicate]

The following code:
var text = (new WebClient()).DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20"));
results in a variable text that contains, among many other things, the string
"$κ$-Minkowski space, scalar field, and the issue of Lorentz invariance"
However, when I visit that URL in Firefox, I get
$κ$-Minkowski space, scalar field, and the issue of Lorentz invariance
which is actually correct. I also tried
var data = (new WebClient()).DownloadData("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
var text = System.Text.UTF8Encoding.Default.GetString(data);
but this gave the same problem.
I'm not sure where the fault lies here. Is the feed lying about being UTF8-encoded, and the browser is smart enough to figure that out, but not WebClient? Is the feed properly UTF8-encoded, but WebClient is failing in some other way? What can I do to mitigate this?
It's not lying. You should set the webclient's encoding first before calling DownloadString.
using(WebClient webClient = new WebClient())
{
webClient.Encoding = Encoding.UTF8;
string s = webClient.DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
}
As for why your alternative isn't working, it's because the usage is incorrect. Its should be:
System.Text.Encoding.UTF8.GetString()

Google Translate Api and Special Characters

I've recently started using the google translate API inside a c# project. I am trying to translate some text from english to french. I am having issues with some special characters though.
For example the word Company comes thru as Société instead of Société as it should. Is there some way in code I can convert these to the correct special characters? ie (é to é)
Thanks
If you need anymore info let me know.
I ran into this same exact issue. If you're using the WebClient class to download the json response from google, try setting the Encoding property to UTF8.
using(var webClient = new WebClient { Encoding = Encoding.UTF8 })
{
string json = webClient.DownloadString(someUri);
...
}
I have reproduced your problem, and it looks like you are using the UTF7 encoding. UTF8 is the way you need to go.
I use Google's API by creating a WebRequest to get an HTTP response from the server, then I read the response stream with a StreamReader. StreamReader defaults to UTF8, but to reproduce your problem, I passed Encoding.UTF7 into the StreamReader's constructor.

Categories

Resources