I'm experiencing a strange issue with WebClient.DownloadString that I can't seem to solve, my code:
Dim client As New WebClient()
Dim html = client.DownloadString("http://www.btctrade.com/")
The content doesn't seem to be fully AJAX, so it can't be that. Is it due to the web page being in Chinese? I'm guessing HTML is just served as HTML, so can't really be that either. The URL is fine when I go to it and there seems to be no redirects to https either.
Anyone know why this is happening?
You must set cookies and useragent in the webclient headers this works
client .Headers.Add(HttpRequestHeader.UserAgent, "UserAgent,Mozilla/5.0 (Windows NT 5.1; rv:14.0) Gecko/20100101 Firefox/14.0.1");
client .Headers.Add(HttpRequestHeader.Cookie, "USER_PW=9b1283bfe37ac47b243a1e0c9c1c9e52; PHPSESSID=f692406a0c84dba2605a7065d55a3b53")
and if u want that the request do all this work , you have to user httpwebrequest then save all the response's headers and use them in a new request
WebClient is not buggy, so probably the server is returning data you did not expect. Use Fiddler to watch what happens when you go to the site in a web browser.
When I executed your code the web site returned no data. When I visited the site in a web browser it returned data. Probably, the site is detecting that you are a bot and denying you access. Fake being a browser by mimicking what you see in Fiddler.
Related
Is it possible to make an exact identical POST with HttpWebRequest in C# as a browser would? Without a page being able to detect that it is actually no browser?
If so, were could i read up more on that?
Download and become familiar with a tool like Fiddler. It allows you to inspect web requests made from applications, like a normal browser, and see exactly what is being sent. You can then emulate the data being sent with a request created in C#, providing values for headers, cookies, etc.
I think this is doable.
Browser detection is done based on a header in the request. All you need to do is set that header. In HttpWebRequest we dont need to set the headers collection but rather the .UserAgent property.
Eg:
.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)";
There is quite a lot to user agents. Check this link for the complete list of User-Agents
Useful Links:
How to create a simple proxy in C#?
Is WebRequest The Right C# Tool For Interacting With Websites?
http://codehelp.smartdev.eu/2009/05/08/improve-webclient-by-adding-useragent-and-cookies-to-your-requests/
I have a web application (which I have no control over) I need to send HTTP post programatically to. Currently I've using HttpWebRequest like
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://someserver.com/blah/blah.aspx");
However the application was returning a "Unknown Server Error (not the IIS error, a custom application error page)" when posting to data. Using Fiddler to compare my Post vs IE post I can see the only difference is in the POST line of the request:
In Internet Explorer Fiddler (RAW view) shows traffic
POST /blah/blah.aspx HTTP/1.1
In my C# program fiddler (RAW view) records traffic as
POST https://someserver.com/blah/blah.aspx HTTP/1.1
This is only difference from both both requests.
From what I've researched so far it seems there is no way to make HttpWebRequest.Create post the relative URL.Note: I see many posts on "how to use relative URLs" but these suggestions do not work, as the actual post is still done using an absolute URL (when you sniff the HTTP traffic)
What is simplest way to accomplish this post with relative URL?
(Traffic is NOT going through a proxy)
Update: For the time being I'm using IE automation to do scheduled perf test, instead of method above. I might look at another scripting language as I did want to test without any browser.
No, you can't do POST without server in a Url.
One possible reason your program fails is if it does not use correct proxy and as result can't resolve server name.
Note: Fiddler shows path and host separately in the view you are talking about.
Configure you program to use Fiddler as proxy (127.0.0.1:8888) and compare requests that you are making with browser's ones. Don't forget to switch Fiddler to "show all proceses".
Here is article on configuring Fiddler for different type of environment including C# code: Fiddler: Configuring clients
objRequest = (HttpWebRequest)WebRequest.Create(url);
objRequest.Proxy= new WebProxy("127.0.0.1", 8888);
Here's the code I'm trying to run:
var wc = new WebClient();
var stream = wc.OpenRead(
"http://en.wikipedia.org/wiki/List_of_communities_in_New_Brunswick");
But I keep getting a 403 forbidden error. Don't understand why. It worked fine for other pages. I can open the page fine in my browser. How can I fix this?
I wouldn't normally use OpenRead(), try DownloadData() or DownloadString() instead.
Also it might be that wikipedia is deliberately blocking your request because you have not provided a user agent string:
WebClient client = new WebClient();
client.Headers.Add("user-agent",
"Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");
I use WebClient quite often, and learned quite quickly that websites can and will block your request if you don't provide a user agent string that matches a known web browser. Also, if you make up your own user agent string (eg "my super cool web scraper") you will also be blocked.
[Edit]
I changed my example user agent string to that of a modern version of Firefox. The original example I gave was the user agent string for IE6 which is not a good idea. Why? Some websites may perform filtering based on IE6 and send anyone with that browser a message or to a different page that says "Please update your browser" - this means you will not get the content you wanted to get.
I have a Silverlight (v3) application that uses WebRequest to make an HTTP POST request to a webpage on the same website as the Silverlight app. This HTTP request gets back a 302 (a redirect) to another page on the same website, which HttpWebRequest is automatically supposed to follow (according to the documentation).
There's nothing particularly special about the code that makes the request (it uses the browser's HTTP stack, it is not configured to use the alternate inbuilt Silverlight HTTP stack):
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(String.Format("{0}?name={1}&size={2}", _UploadUrl, Uri.EscapeUriString(Name), TotalBytes));
request.Method = "POST";
All this works fine in Firefox and Chrome; Silverlight makes the POST HTTP request, receives a 302 response and automatically does a GET HTTP request of the specified redirect URL and returns that to me (I know this because I used Fiddler to watch the HTTP requests going on).
However, in Internet Explorer (v8), Silverlight does the POST HTTP request and then throws a WebException with a 404 error code!
Using Fiddler, I can see that Silverlight/Internet Explorer was successfully returned the 302 status code for the request, and I assume that the 404 status code (and associated WebException) that I get in Silverlight is because as far as I know HTTP requests that are done via the browser stack can only return 200 or 404 due to limitations. The real question is why does Internet Explorer not follow through the redirect like the other browsers?
Thanks in advance for any help!
EDIT: I would prefer not to use the Silverlight client HTTP stack because to my knowledge requests issued by it do not include cookies that are a part of the browser's session, critically including the ASP.NET authentication cookie that I need to be attached to the HTTP requests being made by the Silverlight control.
EDIT 2: I have discovered that Internet Explorer only exhibits this behaviour when you do a POST request. A GET request redirects successfully. This seems like pretty bad behaviour considering how many websites now do things in the Post-Redirect-Get style.
IE is closer to the specification, in that in responding to a 302 for a POST the user agent should send a POST (though it should not do so without user confirmation).
On the other hand, FF and Chrome are deliberately wrong, in copying ways in which user agents were frequently wrong some considerable time ago (the problem started in the early days of HTTP).
For this reason, 307 was introduced in HTTP/1.1 to be clearer that the same HTTP method should be used (i.e. in this case, it should be a POST) while 303 has always meant that one should use GET.
Therefore, instead of doing Response.Redirect which results in a 302 - that different user agents will handle in different ways, send a 303. The following code does so (and includes a valid entity body just to be within the letter of the spec). There is an overload so you can call it with either a Uri or a string:
private void SeeOther(Uri uri)
{
if(!uri.IsAbsoluteUri)
uri = new Uri(Request.Url, uri);
Response.StatusCode = 303;
Response.AddHeader("Location", uri.AbsoluteUri);
Response.ContentType = "text/uri-list";
Response.Write(uri.AbsoluteUri);
Context.ApplicationInstance.CompleteRequest();
}
private void SeeOther(string relUri)
{
SeeOther(new Uri(Request.Url, relUri));
}
I believe this was a feature change in Internet Explorer 7, where by they changed the expected 200 response to a 302 telling IE to be redirected. There is no smooth solution to this problem that I know off. A similar question was posed a while back here.
Change in behavior with Internet Explorer 7 and later in regard to CONNECT requests
I am trying to download a file over HTTPS and I just keep running into a brick wall with correctly setting Cookies and Headers.
Does anyone have/know of any code that I can review for doing this correctly ? i.e. download a file over https and set cookies/headers ?
Thanks!
I did this the other day, in summary you need to create a HttpWebRequest and HttpWepResponse to submit/receive data. Since you need to maintain cookies across multiple requests, you need to create a cookie container to hold your cookies. You can set header properties on request/response if needed as well....
Basic Concept:
Using System.Net;
// Create Cookie Container (Place to store cookies during multiple requests)
CookieContainer cookies = new CookieContainer();
// Request Page
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("https://www.amazon.com");
req.CookieContainer = cookies;
// Response Output (Could be page, PDF, csv, etc...)
HttpWebResponse resp= (HttpWebResponse)req.GetResponse();
// Add Response Cookies to Cookie Container
// I only had to do this for the first "login" request
cookies.Add(resp.Cookies);
The key to figuring this out is capturing the traffic for real request. I did this using Fiddler and over the course of a few captures (almost 10), I figured out what I need to do to reproduce the login to a site where I needed to run some reports based on different selection critera (date range, parts, etc..) and download the results into CSV files. It's working perfect, but Fiddler was the key to figuring it out.
http://www.fiddler2.com/fiddler2/
Good Luck.
Zach
This fellow wrote an application to download files using HTTP:
http://www.codeproject.com/KB/IP/DownloadDemo.aspx
Not quite sure what you mean by setting cookies and headers. Is that required by the site you are downloading from? If it is, what cookies and headers need to be set?
I've had good luck with the WebClient class. It's a wrapper for HttpWebRequest that can save a few lines of code: http://msdn.microsoft.com/en-us/library/system.net.webclient.aspx