HttpClient one response from different countries - c#

I have a simple aplication in Windows Store.
This application download and parse HTML from website.
I using a HttpClient class
Now I have a big problem becouse a page looks diffrent form specific countries and my parsing is not success.
Example: When someone from USA using my app then app downloading diffrent HTML content becouse webpage looks diffrent in specific countries.
How to set a default location in http client?
I want to have a the same HTML in all executes.
EDIT
I calling this page: LINK

You need to set the default language header when you make the request and/or consider making it a user definable setting.
http://www.w3.org/TR/WCAG20-TECHS/SVR5

ignoring the initial question for a moment
PLEASE don't write an app that depends on any kind of HTML parsing for any functionality. All the site you are calling has to do is change an ID or two in the "wrong" place and your app will fail for every user until you put out an update.
back to the answer
OK, assuming that screen-scraping is the way you want to go with your app, and assuming, of course, that the site you are scraping from allows such behaviour in their terms of use (check - it wouldn't be fun for you to get sued if you didn't read them) then I'd suggest a slightly different approach.
Since you are not guaranteed to get the same page layout for any locale your users access your app from, why not set up a web service that does the parsing work for you, and interrogate that service from your app instead of going direct to the site?
Your app <--> Your web service <--> the site providing data
That way, you always know that the data you are getting back is consistently formatted as if for a specific locale (your web server), and then you only have to maintain one piece of code to parse it. That will be much simpler whenever there is a change to the underlying data structure (and believe me, there will be changes)

The answer to this depends on how the website implements default language selection. Both of the other answers are potentially correct depending on how the specific site works.
If you can share the site URL, we can tell you a suitable strategy to use.

Setting the design flaw consideration aside for a moment (you may have or have not good reason to do screen scraping), here's how to set the Accept-Language header:
var httpClient = new HttpClient();
var httpRequestMessage = new HttpRequestMessage(HttpMethod.Get, new Uri("http://www.livescore.com"));
httpRequestMessage.Headers.Add("Accept-Language", "en");
var response = await httpClient.SendAsync(httpRequestMessage);
string content = await response.Content.ReadAsStringAsync();

Try to always call the url in question with the cultureInfo path param if it has one, for example say that you are targeting microsoft.com then you would have something like this:
http://www.microsoft.com/en-us/default.aspx for english
http://www.microsoft.com/de-DE/default.aspx for german
and so on. If this is applicable to you, this would be an ideea.

Related

Handling Authentication for a File Display using a Web Service

This is my first time developing this kind of system, so many of these concepts are very new to me. Any and all help would be appreciated. I'll try to sum up what I'm doing as efficiently as possible.
Background: I have a web application running AngularJS with Bootstrap. The app communicates with the server and DB through a web service programmed using C#. On the site, users can upload files and reference them later using direct links. There's no restriction to file type (yet), so just about anything is allowed.
My Goal: Having direct links creates a big security problem for me, since the documents/images are supposed to be private data. What I would prefer to do is validate a user's credentials when the link is clicked, then load the file in the browser using a more generic url path.
--Example--
"mysite.com/attachments/1" ---> (Image)
--instead of--
"mysite.com/data/files/importantImg.jpg"
Where I'm At: Not very far. My first thought was to add a page that sends the server request and receives a file byte stream along with mime type that I can reassemble and present to the user. However, I have no idea if this is possible using a web service that sends JSON requests, nor do I have a clue about how the reassembling process would work client-side.
Like I said, I'll take any and all advice. I'd love to learn more about this subject for future projects as well, but for now I just need to be pointed in the right direction.
Your first thought is correct, for it, you need to use the Response object, and more specifically the AddHeader and Write functions. Of course this will be a different page that will only handle file downloads, so it will be perfectly fine in your JSON web service.
I don't think you want to do this with a web service. Just use a regular IHttpHandler to perform the validation and return the data. So you would have the URL "attachments/1" get rewritten to "attachments/download.ashx?id=1". When you've verified access, write the data to the response stream. You can use the Content Disposition header to set the file name.

accessing websites using C#

I have a problem here. Assume there's a basic calculator implemented in javascript hosted on a website ( I have googled it and to find an example and found this one: http://www.unitsconverter.net/calculator/ ). What I want to do is make a program that opens this website, enters some value and gets the return value. So, in our website calculator, the program:
- open the website
- enters an operand
- enters an operation
- enters an operand
- retrieve the result
Note: things should be done without the need to show anything to the user ( the browser for example ).
I did some search and found about HttpWebRequest and HttpWebRespond. But I think those can be used to post data to the server, which means, The file I'm sending data to must be php, aspx or jsp. But Javascript is client side. So, I think they are kind of useless to me in this case.
Any help?
Update:
I have managed to develop the web bot using WebBrowser Control tool ( found in System.Windows.Forms )
Here's a sample of the code:
webBrowser1.Navigate("LinkOfTheSiteYouWant"); // this will load the page specified in the string. You can add webBrowser1.ScriptErrorsSuppressed = true; to disable the script in a page
webBrowser1.Document.GetElementById("ElementId").SetAttribute("HTMLattrbute", "valueToBeSet");
Those are the main methods I have used to do what I wanted to.
I have found this video useful: http://www.youtube.com/watch?v=5P2KvFN_aLY
I guess you could use something like WatiN to pipe the user's input/output from your app to the website and return the results, but as another commenter pointed out, the value of this sort of thing when you could just write your own calculator fairly escapes me.
You'll need a JavaScript interpreter (engine) to parse all the JavaScript code on the page.
https://www.google.com/search?q=c%23+javascript+engine
What you're looking for is something more akin to a web service. The page you provided doesn't seem like it accepts any data in an HTTP POST and doesn't have any meaningful information in the source that you could scrape. If for example you wanted to programmatically make searches for eBay auctions, you could figure out how to correctly post data to it eg:
http://www.ebay.com/sch/i.html?_nkw=http+for+dummies&_sacat=267&_odkw=http+for+dummies&_osacat=0
and then look through the http response for the information you're looking for. You'd probably need to create a regular expression to match the markup you're looking for like if you wanted to know how many results, you'd search the http response for this bit of markup:
<div class="alt w"><div class="cnt">Your search returned <b>0 items.</b></div></div>
As far as clientside/javascript stuff, you just plain aren't going to be able to do anything like what you're going for.
It is a matter of API: "Does the remote website expose any API for the required functionality?".
Well web resources that expose interactive API are called web service. There are tons of examples (Google Maps for istance).
You can access the API -depending on the Terms & Conditions of the service- through a client. The nature of the client depends on the kind of web service you are accessing.
A SOAP based service is based on SOAP protocol.
A REST based service is based on REST principles.
So, if there is an accessible web service called "Calculator", then you can access the service and, for istance, invoke the sum method.
In your example, the calculator is a Javascript implementation, so it is not a web service and it cannot be accessed via HTTP requests. Though, its implementation is still accessible: it is the javascript file where the calculator is implemented. You can always include the file in your website and access its functions via javascript (always mind terms and conditions!!).
A very common example is the jQuery library stored in Google Libraries.

Both WebClient and HttpWebRequest suddenly failing to pull basic Amazon pages?

We have an in-house application that pulls some data from some of Amazon's pages occassionally (We know they have APIs for certain operations... what we're doing requires some custom info not included in the APIs). We have never had a problem pulling their pages, but suddenly Amazon is returning "(503) Server Unavailable" on pretty much every request, and this has been happening for several days, so we doubt it's a temporary thing. Even something as simple as this:
System.Net.WebClient client = new System.Net.WebClient();
string data = client.DownloadString(new Uri("http://www.amazon.com/Bose-Companion-multimedia-speaker-Graphite/dp/B000HZBR64/"));
The strange thing is that these pages load just fine in a web browser, but any time we try to pull them through code, it is failing.
What could cause these functions to fail? Is it possible that they changed something on their end and that we need to do some custom logic with our calls?
After some further testing, it turns out that this was happening because Amazon needs the Accept parameter of the HttpWebRequest to be specifically set. When setting it to:
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
Everything worked fine. This is a recent change, so they must have altered something on their end.
check the user-agent of you request. make the user-agent the same as your browser. And check if you set any proxy for your app? maybe your browser and your app are using different proxies

How to call url with basic authentication inside application

I need to create an application (ASP.NET or WinForms or Windows Service, not sure) that needs to make a call to a url including username and password for basic authentication and have the url return a csv file. I then need to use the csv file in the application. I don't know how to do this. How do I call the url in my app. There can be no user interaction, it needs to be completely automated in the returning of the csv file.
Try something like this:
var webClient = new WebClient();
webClient.Credentials = new NetworkCredential("username", "password");
webClient.DownloadFile("http://someurl/file.csv", "c:\\file.csv");
Try out the System.Net.WebRequest class. Here is a page showing general usage:
http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx
You can swap out the assignment of the Credentials property with your own NetworkCredential object to pass a custom user name and password.
I actually handle the class a little differently than the example. I ensure that every class that is IDisposable is initialized in a using statement to avoid accidentally leaving unclaimed resources. This is especially important if your service will receive frequent or rapid traffic.
Edit:
If you're interested in a CSV parsing library, there are many you can find from any search. You might like the code from this CodeProject article. This library is flexible enough to handle properly-escaped multi-line fields.
Good luck!

Getting data from a webpage

I have an idea for an App that would really help me out in work but I'm not sure if it's possible.
I want to run a C# desktop application that will ask for a value. When a value is supplied, the application will open a browswer, go to a webpage and add the value into a form on an online website. The form is then submitted and a new page is loaded that contains a table of results. I then want to extract the table of results from the page source and write code to parse the result values.
It is not important that the user see's this happen in an actual browser. In other words if there's a way to do it by reading HTTP requests then thats great.
The biggest problem I have is getting the values into the form and then retrieving the page source after the form is submitted and the next page loads.
Any help really appreciated.
Thanks
Provided that you're only using this in a legal context:
Usually, web forms are sent via POST request to the web server, specifically some script that handles it. You can look at the HTML code for the form's page and find out the destination for the form (form's action).
You can then use a HttpWebRequest in C# to "pretend you are the form", sending a POST request with all the required parameters (adding them to the HTTP header).
As a result you will get the source code of the destination page as it would be sent to the browser. You can parse this.
This is definitely possible and you don't need to use an actual web browser for this. You can simply use a System.Net.WebClient to send your HTTP request and get an HTTP response.
I suggest to use wireshark (or you can use Firefox + Firebug) it allows you to see HTTP requests and responses. By looking at the HTTP traffic you can see exactly how you should pass your HTTP request and which parameters you should be setting.
You don't need to involve the browser with this. WebClient should do all that you require. You'll need to see what's actually being posted when you submit the form with the browser, and then you should be able to make a POST request using the WebClient and retrieve the resulting page as a string.
The docs for the WebClient constructor have a nice example.
See e.g. this question for some pointers on at least the data retrieval side. You're going to know a lot more about the http protocol before you're done with this...
Why would you do this through web pages if you don't even want the user to do anything?
Web pages are purely for interaction with users, if you simply want data transfer, use WCF.
#Brian using Wireshark will result in a very angry network manager, make sure you are actually allowed to use it.

Categories

Resources