WebClient.DownloadString uses wrong encoding

WebClient.DownloadString uses wrong encoding - c#

I'm downloading XML files from sharepoint online using webclient.
However, when I use WebClient.DownloadString(string url) method, some characters are not correctly decoded.
When I use WebClient.DownloadFile(string url, string file) and then I read the file all characters are correct.
The xml itself does not contain encoding declaration.
string wrongXml = webClient.DownloadString(url);
//wrongXml contains Ä™ instead of ę
webClient.DownloadFile(url, #"C:\temp\file1.xml");
string correctXml = File.ReadAllText(#"C:\temp\file1.xml");
//contains ę, like it should.
Also, when open the url in Internet Explorer, it is shown correctly.
Why is that? Is it because of the default windows encoding on my machine or webclient handles responses differently when using DownloadString, resp DownloadFile?

Probably the encoding it is using now is not the one the service returns.
You can set the encoding you expect before you make the request:
webClient.Encoding = Encoding.UTF8;
string previouslyWrongXml = webClient.DownloadString(url);

Related

C# Get site source code with letters other than english

I'm trying to get a site's source in C# using
WebClient client = new WebClient();
string content = client.DownloadString(url);
And it gets it just fine.
However, the source code contains Hebrew characters which shows like Gibbrish in content variable.
What do I need to do for it to recognize it?

WebClient client = new WebClient();
client.Encoding = System.Text.UTF8Encoding.UTF8; // added
string content = client.DownloadString(url);
You have to specify the encoding, you are probably requesting ASCII by default and the content could be in UTF8. This is an example where the encoding is set to UTF8. If you are not sure what it is check the source manually first and then specify the encoding accordingly. For more info see Remarks in the documentation.

The problem is the Encoding of your WebClient. MSDN says:
... the method uses the encoding specified in the Encoding property to convert the resource to a String.
Solution: Set a specific Encoding like
client.Encoding = Encoding.UTF8;
and try it again
string content = client.DownloadString(url);
UTF8 should do the trick to encode also the hebrew characters.

How to read javascript file using HttpWebRequest in c#

I have a javascript file in a remote server, and when I use httpwebrequest it returns some weird characters.
Thr url is http://goo.gl/0Ug5QI
Is this kind of compressed contents?
static string GetScriptSource(string _url)
{
string _retValue = string.Empty;
HttpWebRequest hwr = (HttpWebRequest)WebRequest.Create(_url);
hwr.Method = "GET";
HttpWebResponse res = (HttpWebResponse)hwr.GetResponse();
StreamReader sr = new StreamReader(res.GetResponseStream());
return sr.ReadToEnd();
}
My code to read that script file's content is very simple.

Looking at the js source that you linked to, it could be that is has been gzipped. Try saving the source as a file and use 7zip or something to see if you can unzip it. There is a GZip library in C# so if it has been gzipped then you should be able to unzip it easily enough.
Although it's a Korean web site so maybe the encoding is not correct.
Either way it's not a problem with the code that you posted.

C# decoding "â„¢" to "TM"

on a web page there is following string
"Qualcomm Snapdragon™ S4"
when i get this string in my .net code the string convert to "Qualcomm Snapdragonâ„¢ S4"
the character "TM" change to â„¢
how can i decode "â„¢" back to "TM"
Update
follwoing is the code for downloaded string using webproxy
wc is webproxy
wc.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8");
string html = Server.HtmlEncode(wc.DownloadString(url));

You should read the webpage in its proper encoding in the first place. In this case it seems you are reading with Encoding.Default (i.e. probably CP1252) and the page is really in UTF-8. This should be apparent either by reading the Content-Type header of the response or by looking for a <meta http-equiv="Content-Type" content='text/html; charset=utf-8'> in the content.
If you still need to do this after the fact, then use
var bytes = Encoding.Default.GetBytes(myString);
var correctString = Encoding.UTF8.GetString(bytes);
In any case you would need to know the exact encodings that were used on the page and for reading the malformed string in the first place. Furthermore I'd generally advise explicitly against using Encoding.Default because its value isn't fixed. It's just the legacy encoding on a Windows system for use in non-Unicode applications and also gets used as the default non-Unicode text file encoding. It should have no place whatsoever in handling external resources.

Copy whole xml file in other xml in c#

I have this xml file: http://www.studiovincent.net/list.xml
I need copy whole content in other xml file.
I tryed this code:
string sourcefile = "http://www.studiovincent.net/list.xml";
string destinationfile = "test.xml";
System.IO.File.Copy(sourcefile, destinationfile);
But not work, because I get this error: URI formats are not supported.
How Can I solve this problem?

File.Copy() does not support the http:// protocol, hence the URI formats are not supported error.
You can work around this by reading in the contents of the page into a string, and then writing it to a file.
WebClient client = new WebClient();
string contents = client.DownloadString("http://www.studiovincent.net/list.xml");
// write contents to test.xml
System.IO.File.WriteAllText ("test.xml", contents);
Note that WriteAllText() will create test.xml if it doesn't exist, and overwrite it if it does. You will also want to wrap the above code in a try / catch block and catch and handle the appropriate excpetions.

I would recommend using WebClient.DownloadFile. Downloading a string and then saving it could cause problems with character set mapping.
WebClient client = new WebClient();
client.DownloadFile("http://www.studiovincent.net/list.xml", "test.xml");
This copies the file directly rather than converting the data to a string, which might do some string conversions (for example, the file is Unicode, and WebClient thinks it's UTF-8) and then copying to a file.

Google Translate Api and Special Characters

I've recently started using the google translate API inside a c# project. I am trying to translate some text from english to french. I am having issues with some special characters though.
For example the word Company comes thru as SociÃ©tÃ© instead of Société as it should. Is there some way in code I can convert these to the correct special characters? ie (Ã© to é)
Thanks
If you need anymore info let me know.

I ran into this same exact issue. If you're using the WebClient class to download the json response from google, try setting the Encoding property to UTF8.
using(var webClient = new WebClient { Encoding = Encoding.UTF8 })
{
string json = webClient.DownloadString(someUri);
...
}

I have reproduced your problem, and it looks like you are using the UTF7 encoding. UTF8 is the way you need to go.
I use Google's API by creating a WebRequest to get an HTTP response from the server, then I read the response stream with a StreamReader. StreamReader defaults to UTF8, but to reproduce your problem, I passed Encoding.UTF7 into the StreamReader's constructor.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

WebClient.DownloadString uses wrong encoding - c#

Probably the encoding it is using now is not the one the service returns. You can set the encoding you expect before you make the request: webClient.Encoding = Encoding.UTF8; string previouslyWrongXml = webClient.DownloadString(url);

Related

C# Get site source code with letters other than english

How to read javascript file using HttpWebRequest in c#

C# decoding "â„¢" to "TM"

Copy whole xml file in other xml in c#

Google Translate Api and Special Characters

Categories

Resources