Getting wrong translation from Google translate using C# - c#

I was using this method to translate some text from my program using google translate, this was working perfectly until this week:
public string TranslateText(string input, string languagePair)
{
string r = WebUtility.HtmlDecode(input);
r = WebUtility.UrlEncode(r);
string url = String.Format("http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}", r, languagePair);
WebClient webClient = new WebClient();
webClient.Encoding = Encoding.GetEncoding("Windows-1252");
byte[] resultbyte = webClient.DownloadData(url);
string result = Encoding.Default.GetString(resultbyte);
result = result.Substring(result.IndexOf("TRANSLATED_TEXT=") + 16);
result = result.Replace("\\x26", "&");
result = result.Replace("\\x3d", "=");
result = WebUtility.HtmlDecode(result);
result = result.Remove(result.IndexOf(";"));
result = result.Replace("'", string.Empty);
return result;
}
But now I'm running the program just as always and I'm getting this translations always:
<html lang="en"> <head> <style>#import url(https://fonts.googleapis.com/css?lang=en&family=Product+Sans|Roboto:400,700)
And I don´t know what could happen. Anyone knows what's the problem?

A quick Google implies that the Google Translate API hasn't been designed to work like that for a while, the fact it's lasted that long for you is probably sheer luck.
The way you are using the Google Translate tools is not allowed under their terms (essentially screen scraping their free web tool). You should apply for an account with them and expect to pay, albeit a small amount if you are only translating a little bit of text. You may be able to get around it by modifying your URL and web page scraping code (if you haven't already been blocked), but you can't ask for help here to circumvent legal agreements.
If you decide to go the legal route, once you're up and running with an account you can access the API directly using your API key/token. See the quickstart guide for details.

Related

C# Best Buy Web Scraping - Can't get add to cart element

I'm writing a simple web scraping application to retrieve information on certain PC components.
I'm using Best Buy as my test website and I'm using the HTMLAgilityPack as my scraper.
I'm able to retrieve the title and the price; however, I can't seem to get the availability.
So, I'm trying to read the Add to Cart button element's text. If it's available, it'll read "Add to Cart", otherwise, it'll read "Unavailable".
But, when I get the XPath and try to save it to a variable, it returns null. Can someone please help me out?
Here's my code.
var url = "https://www.bestbuy.com/site/pny-nvidia-geforce-gt-710-verto-2gb-ddr3-pci-express-2-0-graphics-card-black/5092306.p?skuId=5092306";
HtmlWeb web = new HtmlWeb();
HtmlDocument pageDocument = web.Load(url);
string titleXPath = "/html/body/div[3]/main/div[2]/div[3]/div[1]/div[1]/div/div/div[1]/h1";
string priceXPath = "/html/body/div[3]/main/div[2]/div[3]/div[2]/div/div/div[1]/div/div/div/div/div[2]/div/div/div/span[1]";
string availabilityXPath = "/html/body/div[3]/main/div[2]/div[3]/div[2]/div/div/div[7]/div[1]/div/div/div[1]/button";
var title = pageDocument.DocumentNode.SelectSingleNode(titleXPath);
var price = pageDocument.DocumentNode.SelectSingleNode(priceXPath);
bool availability = pageDocument.DocumentNode.SelectSingleNode(availabilityXPath) != null ? true : false;
Console.WriteLine(title.InnerText);
Console.WriteLine(price.InnerText);
Console.WriteLine(availability);
It correctly outputs the title and price, but availability is always null.
Try string availabilityXPath = "//button[. = 'Add to Cart']"
In web scraping, while a long generated xpath will always work on the same static page, when you're dealing with multiple pages across the same store, the location of certain elements can drift and break your xpaths. Yours is breaking at /html/body/div[3]/main/div[2]/div[3]/div[2]/div/div/div[7]/div[1]/div and I suspect that's what's happening here.
Learning to write one from scratch will be invaluable (and much easier to debug!).

C# Json not handling accents correctly [duplicate]

The following code:
var text = (new WebClient()).DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20"));
results in a variable text that contains, among many other things, the string
"$κ$-Minkowski space, scalar field, and the issue of Lorentz invariance"
However, when I visit that URL in Firefox, I get
$κ$-Minkowski space, scalar field, and the issue of Lorentz invariance
which is actually correct. I also tried
var data = (new WebClient()).DownloadData("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
var text = System.Text.UTF8Encoding.Default.GetString(data);
but this gave the same problem.
I'm not sure where the fault lies here. Is the feed lying about being UTF8-encoded, and the browser is smart enough to figure that out, but not WebClient? Is the feed properly UTF8-encoded, but WebClient is failing in some other way? What can I do to mitigate this?
It's not lying. You should set the webclient's encoding first before calling DownloadString.
using(WebClient webClient = new WebClient())
{
webClient.Encoding = Encoding.UTF8;
string s = webClient.DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
}
As for why your alternative isn't working, it's because the usage is incorrect. Its should be:
System.Text.Encoding.UTF8.GetString()

Pulling data from a webpage, parsing it for specific pieces, and displaying it

I've been using this site for a long time to find answers to my questions, but I wasn't able to find the answer on this one.
I am working with a small group on a class project. We're to build a small "game trading" website that allows people to register, put in a game they have they want to trade, and accept trades from others or request a trade.
We have the site functioning long ahead of schedule so we're trying to add more to the site. One thing I want to do myself is to link the games that are put in to Metacritic.
Here's what I need to do. I need to (using asp and c# in visual studio 2012) get the correct game page on metacritic, pull its data, parse it for specific parts, and then display the data on our page.
Essentially when you choose a game you want to trade for we want a small div to display with the game's information and rating. I'm wanting to do it this way to learn more and get something out of this project I didn't have to start with.
I was wondering if anyone could tell me where to start. I don't know how to pull data from a page. I'm still trying to figure out if I need to try and write something to automatically search for the game's title and find the page that way or if I can find some way to go straight to the game's page. And once I've gotten the data, I don't know how to pull the specific information I need from it.
One of the things that doesn't make this easy is that I'm learning c++ along with c# and asp so I keep getting my wires crossed. If someone could point me in the right direction it would be a big help. Thanks
This small example uses HtmlAgilityPack, and using XPath selectors to get to the desired elements.
protected void Page_Load(object sender, EventArgs e)
{
string url = "http://www.metacritic.com/game/pc/halo-spartan-assault";
var web = new HtmlAgilityPack.HtmlWeb();
HtmlDocument doc = web.Load(url);
string metascore = doc.DocumentNode.SelectNodes("//*[#id=\"main\"]/div[3]/div/div[2]/div[1]/div[1]/div/div/div[2]/a/span[1]")[0].InnerText;
string userscore = doc.DocumentNode.SelectNodes("//*[#id=\"main\"]/div[3]/div/div[2]/div[1]/div[2]/div[1]/div/div[2]/a/span[1]")[0].InnerText;
string summary = doc.DocumentNode.SelectNodes("//*[#id=\"main\"]/div[3]/div/div[2]/div[2]/div[1]/ul/li/span[2]/span/span[1]")[0].InnerText;
}
An easy way to obtain the XPath for a given element is by using your web browser (I use Chrome) Developer Tools:
Open the Developer Tools (F12 or Ctrl + Shift + C on Windows or Command + Shift + C for Mac).
Select the element in the page that you want the XPath for.
Right click the element in the "Elements" tab.
Click on "Copy as XPath".
You can paste it exactly like that in c# (as shown in my code), but make sure to escape the quotes.
You have to make sure you use some error handling techniques because Web scraping can cause errors if they change the HTML formatting of the page.
Edit
Per #knocte's suggestion, here is the link to the Nuget package for HTMLAgilityPack:
https://www.nuget.org/packages/HtmlAgilityPack/
I looked and Metacritic.com doesn't have an API.
You can use an HttpWebRequest to get the contents of a website as a string.
using System.Net;
using System.IO;
using System.Windows.Forms;
string result = null;
string url = "http://www.stackoverflow.com";
WebResponse response = null;
StreamReader reader = null;
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
response = request.GetResponse();
reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
result = reader.ReadToEnd();
}
catch (Exception ex)
{
// handle error
MessageBox.Show(ex.Message);
}
finally
{
if (reader != null)
reader.Close();
if (response != null)
response.Close();
}
Then you can parse the string for the data that you want by taking advantage of Metacritic's use of meta tags. Here's the information they have available in meta tags:
og:title
og:type
og:url
og:image
og:site_name
og:description
The format of each tag is: meta name="og:title" content="In a World..."
I recommend Dcsoup. There's a nuget package for it and it uses CSS selectors so it is familiar if you use jquery. I've tried others but it is the best and easiest to use that I've found. There's not much documentation, but it's open source and a port of the java jsoup library that has good documentation. (Documentation for the .NET API here.) I absolutely love it.
var timeoutInMilliseconds = 5000;
var uri = new Uri("http://www.metacritic.com/game/pc/fallout-4");
var doc = Supremes.Dcsoup.Parse(uri, timeoutInMilliseconds);
// <span itemprop="ratingValue">86</span>
var ratingSpan = doc.Select("span[itemprop=ratingValue]");
int ratingValue = int.Parse(ratingSpan.Text);
// selectors match both critic and user scores
var scoreDiv = doc.Select("div.score_summary");
var scoreAnchor = scoreDiv.Select("a.metascore_anchor");
int criticRating = int.Parse(scoreAnchor[0].Text);
float userRating = float.Parse(scoreAnchor[1].Text);
I'd recomend you WebsiteParser - it's based on HtmlAgilityPack (mentioned by Hanlet Escaño) but it makes web scraping easier with attributes and css selectors:
class PersonModel
{
[Selector("#BirdthDate")]
[Converter(typeof(DateTimeConverter))]
public DateTime BirdthDate { get; set; }
}
// ...
PersonModel person = WebContentParser.Parse<PersonModel>(html);
Nuget link

Accented vowels are come out strange character in C# WebClient [duplicate]

The following code:
var text = (new WebClient()).DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20"));
results in a variable text that contains, among many other things, the string
"$κ$-Minkowski space, scalar field, and the issue of Lorentz invariance"
However, when I visit that URL in Firefox, I get
$κ$-Minkowski space, scalar field, and the issue of Lorentz invariance
which is actually correct. I also tried
var data = (new WebClient()).DownloadData("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
var text = System.Text.UTF8Encoding.Default.GetString(data);
but this gave the same problem.
I'm not sure where the fault lies here. Is the feed lying about being UTF8-encoded, and the browser is smart enough to figure that out, but not WebClient? Is the feed properly UTF8-encoded, but WebClient is failing in some other way? What can I do to mitigate this?
It's not lying. You should set the webclient's encoding first before calling DownloadString.
using(WebClient webClient = new WebClient())
{
webClient.Encoding = Encoding.UTF8;
string s = webClient.DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
}
As for why your alternative isn't working, it's because the usage is incorrect. Its should be:
System.Text.Encoding.UTF8.GetString()

How can I migrate email functionality from ASP Classic to ASP.NET?

I previously used CDO.Message and CDO.Configuration in ASP Classic to create HTML emails which was VERY simple to do. In .NET, it appears that you have to give the System.Net.Mail.Message object an HTML string for the content and then somehow embed the required images. Is there an easy way to do this in .NET? I'm pretty new to .NET MVC and would most appreciate any help.
This is how it looks in ASP Classic:
Set objCDO = Server.CreateObject("CDO.Message")
objCDO.To = someone#somthing.com
objCDO.From = me#myaddress.com
objCDO.CreateMHTMLBody "http://www.example.com/somepage.html"
objCDO.Subject = sSubject
'the following are for advanced CDO schematics
'for authentication and external SMTP
Set cdoConfig = CreateObject("CDO.Configuration")
With cdoConfig.Fields
.Item(cdoSendUsingMethod) = cdoSendUsingPort '2 - send using port
.Item(cdoSMTPServer) = mail.myaddress.com
.Item(cdoSMTPServerPort) = 25
.Item(cdoSMTPConnectionTimeout) = 10
.Item(cdoSMTPAuthenticate) = cdoBasic
.Item(cdoSendUsername) = "myusername"
.Item(cdoSendPassword) = "mypassword"
.Update
End With
Set objCDO.Configuration = cdoConfig
objCDO.Send
Basically I would like to send one of my views (minus site.master) as an email, images embedded.
I don't know of a simple way right off, but you could use WebClient to get your page, then pass the response as the body.
Example:
var webClient = new WebClient();
byte[] returnFromPost = webClient.UploadValues(Url, Inputs);
var utf = new UTF8Encoding();
string returnValue = utf.GetString(returnFromPost);
return returnValue;
Note: Inputs is just a dictionary of post variables.
One problem I think you'll run into right off is that I don't think you'd get the images. You could parse the HTML you get and then make the images absolute back to your server.
Thank you both for your help - here is a very clean and comprehensive tutorial posted by a .NET MVP
http://msdn.microsoft.com/en-us/vbasic/bb630227.aspx

Categories

Resources