Google Translate Api and Special Characters - c#

I've recently started using the google translate API inside a c# project. I am trying to translate some text from english to french. I am having issues with some special characters though.
For example the word Company comes thru as Société instead of Société as it should. Is there some way in code I can convert these to the correct special characters? ie (é to é)
Thanks
If you need anymore info let me know.

I ran into this same exact issue. If you're using the WebClient class to download the json response from google, try setting the Encoding property to UTF8.
using(var webClient = new WebClient { Encoding = Encoding.UTF8 })
{
string json = webClient.DownloadString(someUri);
...
}

I have reproduced your problem, and it looks like you are using the UTF7 encoding. UTF8 is the way you need to go.
I use Google's API by creating a WebRequest to get an HTTP response from the server, then I read the response stream with a StreamReader. StreamReader defaults to UTF8, but to reproduce your problem, I passed Encoding.UTF7 into the StreamReader's constructor.

Related

Get subtitles from youtube video using video.google.com - text format

I want to get subtitles from a youtube video. When I write in the url "http://video.google.com/timedtext?lang=en&v=Dceyy0cX6J4&fmt=srv3" the text is as expected, but when I use C# the text has some characters with the &#39 ; (example)
The c# code is pretty simple:
using (HttpClient client = new HttpClient)
{
var response = client.GetString("http://video.google.com/timedtext?lang=en&v=Dceyy0cX6J4&fmt=srv3")
}
Is there any way to add a format header? How could I fix it ?
What you are seeing is url encoded content.
You will need to decode this.
Luckily you can use HttpUtility.HtmlDecode(response) from System.Web and this will give you a perfect readable response
Check out the URLEncode method:
https://learn.microsoft.com/en-us/dotnet/api/system.web.httputility.urlencode?view=netframework-4.7.2

WebClient.DownloadString uses wrong encoding

I'm downloading XML files from sharepoint online using webclient.
However, when I use WebClient.DownloadString(string url) method, some characters are not correctly decoded.
When I use WebClient.DownloadFile(string url, string file) and then I read the file all characters are correct.
The xml itself does not contain encoding declaration.
string wrongXml = webClient.DownloadString(url);
//wrongXml contains Ä™ instead of ę
webClient.DownloadFile(url, #"C:\temp\file1.xml");
string correctXml = File.ReadAllText(#"C:\temp\file1.xml");
//contains ę, like it should.
Also, when open the url in Internet Explorer, it is shown correctly.
Why is that? Is it because of the default windows encoding on my machine or webclient handles responses differently when using DownloadString, resp DownloadFile?
Probably the encoding it is using now is not the one the service returns.
You can set the encoding you expect before you make the request:
webClient.Encoding = Encoding.UTF8;
string previouslyWrongXml = webClient.DownloadString(url);

A socket message from Python to C# comes through garbled

I'm trying to set up a very basic ZeroMQ-based socket link between Python server and C# client using simplejson and Json.NET.
I try to send a dict from Python and read it into an object in C#. Python code:
message = {'MessageType':"None", 'ContentType':"None", 'Content':"OK"}
message_blob = simplejson.dumps(message).encode(encoding = "UTF-8")
alive_socket.send(message_blob)
The message is sent as normal UTF-8 string or, if I use UTF-16, as "'\xff\xfe{\x00"\x00..." etc.
Code in C# is where my problem is:
string reply = client.Receive(Encoding.UTF8);
The UTF-8 message is received as "≻潃瑮湥≴›..." etc.
I tried to use UTF-16 and the message comes through OK, but the first symbols are still the little-endian \xFF \xFE BOM so when I try to feed it to the deserializer,
PythonMessage replyMessage = JsonConvert.DeserializeObject<PythonMessage>(reply);
//PythonMessage is just a very simple class with properties,
//not relevant to the problem
I get an error (obviously occurring at the first symbol, \xFF):
Unexpected character encountered while parsing value: .
Something is obviously wrong in the way I'm using encoding. Can you please show me the right way to do this?
The byte-order-mark is obligatory in UTF-16. You can use UTF-16LE or UTF-16BE to assume a particular byte order and the BOM will not be generated. That is, use:
message_blob = simplejson.dumps(message).encode(encoding = "UTF-16le")

C# decoding "â„¢" to "TM"

on a web page there is following string
"Qualcomm Snapdragon™ S4"
when i get this string in my .net code the string convert to "Qualcomm Snapdragonâ„¢ S4"
the character "TM" change to â„¢
how can i decode "â„¢" back to "TM"
Update
follwoing is the code for downloaded string using webproxy
wc is webproxy
wc.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8");
string html = Server.HtmlEncode(wc.DownloadString(url));
You should read the webpage in its proper encoding in the first place. In this case it seems you are reading with Encoding.Default (i.e. probably CP1252) and the page is really in UTF-8. This should be apparent either by reading the Content-Type header of the response or by looking for a <meta http-equiv="Content-Type" content='text/html; charset=utf-8'> in the content.
If you still need to do this after the fact, then use
var bytes = Encoding.Default.GetBytes(myString);
var correctString = Encoding.UTF8.GetString(bytes);
In any case you would need to know the exact encodings that were used on the page and for reading the malformed string in the first place. Furthermore I'd generally advise explicitly against using Encoding.Default because its value isn't fixed. It's just the legacy encoding on a Windows system for use in non-Unicode applications and also gets used as the default non-Unicode text file encoding. It should have no place whatsoever in handling external resources.

Strange characters when consuming JSON web service

I'm consuming a JSON WebService by using the WebClient.DOwnloadStringAsync. The returning string contains some strange character pair:
"start_address" : "Goethestraße 7-9, Monaco di Baviera, Germania",
In place of some extended charachter. How can I see the correct one? In the example it sould be: ß
Solved Myself:
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8; // Specify the encoding here
That is the encoding of the German "Double S" character, still used in the word Strasse in parts of Germany. Switching to UTF8 should solve your problem.

Categories

Resources