Accented vowels are come out strange character in C# WebClient [duplicate] - c#

The following code:
var text = (new WebClient()).DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20"));
results in a variable text that contains, among many other things, the string
"$κ$-Minkowski space, scalar field, and the issue of Lorentz invariance"
However, when I visit that URL in Firefox, I get
$κ$-Minkowski space, scalar field, and the issue of Lorentz invariance
which is actually correct. I also tried
var data = (new WebClient()).DownloadData("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
var text = System.Text.UTF8Encoding.Default.GetString(data);
but this gave the same problem.
I'm not sure where the fault lies here. Is the feed lying about being UTF8-encoded, and the browser is smart enough to figure that out, but not WebClient? Is the feed properly UTF8-encoded, but WebClient is failing in some other way? What can I do to mitigate this?

It's not lying. You should set the webclient's encoding first before calling DownloadString.
using(WebClient webClient = new WebClient())
{
webClient.Encoding = Encoding.UTF8;
string s = webClient.DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
}
As for why your alternative isn't working, it's because the usage is incorrect. Its should be:
System.Text.Encoding.UTF8.GetString()

Related

Getting wrong translation from Google translate using C#

I was using this method to translate some text from my program using google translate, this was working perfectly until this week:
public string TranslateText(string input, string languagePair)
{
string r = WebUtility.HtmlDecode(input);
r = WebUtility.UrlEncode(r);
string url = String.Format("http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}", r, languagePair);
WebClient webClient = new WebClient();
webClient.Encoding = Encoding.GetEncoding("Windows-1252");
byte[] resultbyte = webClient.DownloadData(url);
string result = Encoding.Default.GetString(resultbyte);
result = result.Substring(result.IndexOf("TRANSLATED_TEXT=") + 16);
result = result.Replace("\\x26", "&");
result = result.Replace("\\x3d", "=");
result = WebUtility.HtmlDecode(result);
result = result.Remove(result.IndexOf(";"));
result = result.Replace("'", string.Empty);
return result;
}
But now I'm running the program just as always and I'm getting this translations always:
<html lang="en"> <head> <style>#import url(https://fonts.googleapis.com/css?lang=en&family=Product+Sans|Roboto:400,700)
And I don´t know what could happen. Anyone knows what's the problem?
A quick Google implies that the Google Translate API hasn't been designed to work like that for a while, the fact it's lasted that long for you is probably sheer luck.
The way you are using the Google Translate tools is not allowed under their terms (essentially screen scraping their free web tool). You should apply for an account with them and expect to pay, albeit a small amount if you are only translating a little bit of text. You may be able to get around it by modifying your URL and web page scraping code (if you haven't already been blocked), but you can't ask for help here to circumvent legal agreements.
If you decide to go the legal route, once you're up and running with an account you can access the API directly using your API key/token. See the quickstart guide for details.

C# WebClient DownloadString and DownloadFile giving different results

I am attempting to retrieve some information from a website, parse out a specific item, and then move on with my life.
I noticed that when I check "view source" on the website, the results match with what I see when I use the WebClient class' method of DownloadFile. On the other hand, when I use the DownloadString method, the contents of that string are different from both view source and DownloadFile.
I need DownloadString to return similar contents to view source and DownloadFile. Any suggestions? My relevant code is below:
string criticalPathUrl = "http://blahblahblah&sessionId=" + sessionId;
WebClient wc = new WebClient();
wc.Encoding = System.Text.Encoding.UTF8;
//this is different
string urlContentsString = wc.DownloadString(criticalPathUrl);
//than this
wc.DownloadFile(criticalPathUrl, "rawDlTxt2.txt");
Edit: Please ignore this question as I just didn't scroll up far enough. Ugh. One of those days.
use download data instead of downloadstring and use suitable encoding to convert the string then save the file!
watch details: https://www.pavey.me/2016/04/aspnet-c-downloadstring-vs-downloaddata.html

C# Json not handling accents correctly [duplicate]

The following code:
var text = (new WebClient()).DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20"));
results in a variable text that contains, among many other things, the string
"$κ$-Minkowski space, scalar field, and the issue of Lorentz invariance"
However, when I visit that URL in Firefox, I get
$κ$-Minkowski space, scalar field, and the issue of Lorentz invariance
which is actually correct. I also tried
var data = (new WebClient()).DownloadData("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
var text = System.Text.UTF8Encoding.Default.GetString(data);
but this gave the same problem.
I'm not sure where the fault lies here. Is the feed lying about being UTF8-encoded, and the browser is smart enough to figure that out, but not WebClient? Is the feed properly UTF8-encoded, but WebClient is failing in some other way? What can I do to mitigate this?
It's not lying. You should set the webclient's encoding first before calling DownloadString.
using(WebClient webClient = new WebClient())
{
webClient.Encoding = Encoding.UTF8;
string s = webClient.DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
}
As for why your alternative isn't working, it's because the usage is incorrect. Its should be:
System.Text.Encoding.UTF8.GetString()

Can't decode cyrillic value from Request.QueryString

On my IIS7 I have ASP.NET WebForms site, and I use cyrillic values in the query string. I use HttpUtility.UrlEncode for params when do redirect, in the end I have url like:
http://mysite.com/Search.aspx?SearchText=текст
When I try to read param SearchText value (include HttpUtility.Decode() function) it returns me a wrong value of ÑекÑÑ, but should return текст
It works on localhost on ASP.NET developer server, but doesn't on IIS7 (include local IIS7)
In my web.config I set up line
<globalization requestEncoding="utf-8" responseEncoding="utf-8" />
but it still doesn't work.
Appreciate any help,
Thanks a lot!
Problem actually was in UrlRewriting.net that I use in my web-application.
I solved the same problem by converting the value to ToBase64String:
Before redirecting to a target page I encoded the value:
Dim Data() As Byte 'For the data to be encoded
'Convert the string into a byte array
Dim encoding As New System.Text.UTF8Encoding
Data = encoding.GetBytes(ParamToPass)
'Converting to ToBase64String
Dim EncodedStringToPass as string = Convert.ToBase64String(Data)
Page.Response.Redirect("TargetPage.aspx?Param=" & EncodedStringToPass, False)
At the target page:
Dim Data() As Byte 'For the data to be decoded
Data = Convert.FromBase64String(Page.Request.Params("Param"))
Dim encoding As New System.Text.UTF8Encoding
Dim ParamToPass As String = encoding.GetString(Data)
P.S. The only disadvantage of the method is that one cannot see the real value of the parameters in url string of browsers. But in my case this made no problem
If you use the redirect function, yes inside it there is this call
url = UrlEncodeRedirect(url);
thats break the Cyrilic, Greece characters and probably others. If I remember well, (I say remember because this issue is from my experience some months ago) the break to the characters is after the ? symbol. In any case I have the same issue.
Possible solutions:
Make your custom redirect, maybe not so good as the original, but you can by pass this issue.
Find some alternative way to your redirect logic.
Make your custom text encode that use only valid url characters that are not change by the redirect, and then decodes them again back. The minous on that is that will be like hidden text and not visible readable search word.
This is the very basic of the redirect.
public static void RedirectSimple(string url, bool endResponse)
{
HttpResponse MyResponse = HttpContext.Current.Response;
MyResponse.Clear();
MyResponse.TrySkipIisCustomErrors = true;
MyResponse.StatusCode = 302;
MyResponse.Status = "302 Temporarily Moved";
MyResponse.RedirectLocation = url;
MyResponse.Write("<html><head><title>Object moved</title></head><body>\r\n");
MyResponse.Write("<h2>Object moved to here.</h2>\r\n");
MyResponse.Write("</body></html>\r\n");
if (endResponse){
MyResponse.End();
}
}
You can make it a function and try it to see if works correctly.

Google Translate Api and Special Characters

I've recently started using the google translate API inside a c# project. I am trying to translate some text from english to french. I am having issues with some special characters though.
For example the word Company comes thru as Société instead of Société as it should. Is there some way in code I can convert these to the correct special characters? ie (é to é)
Thanks
If you need anymore info let me know.
I ran into this same exact issue. If you're using the WebClient class to download the json response from google, try setting the Encoding property to UTF8.
using(var webClient = new WebClient { Encoding = Encoding.UTF8 })
{
string json = webClient.DownloadString(someUri);
...
}
I have reproduced your problem, and it looks like you are using the UTF7 encoding. UTF8 is the way you need to go.
I use Google's API by creating a WebRequest to get an HTTP response from the server, then I read the response stream with a StreamReader. StreamReader defaults to UTF8, but to reproduce your problem, I passed Encoding.UTF7 into the StreamReader's constructor.

Categories

Resources