Is it possible to make an exact identical POST with HttpWebRequest in C# as a browser would? Without a page being able to detect that it is actually no browser?
If so, were could i read up more on that?
Download and become familiar with a tool like Fiddler. It allows you to inspect web requests made from applications, like a normal browser, and see exactly what is being sent. You can then emulate the data being sent with a request created in C#, providing values for headers, cookies, etc.
I think this is doable.
Browser detection is done based on a header in the request. All you need to do is set that header. In HttpWebRequest we dont need to set the headers collection but rather the .UserAgent property.
Eg:
.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)";
There is quite a lot to user agents. Check this link for the complete list of User-Agents
Useful Links:
How to create a simple proxy in C#?
Is WebRequest The Right C# Tool For Interacting With Websites?
http://codehelp.smartdev.eu/2009/05/08/improve-webclient-by-adding-useragent-and-cookies-to-your-requests/
Related
I have some issue with Cloudfront Distribution. I have used a dropdownlist and a gridview in my page. When dropdownlist changed accordingly I wanted to update the gridview. Its working pretty fine on my local machine as well as on my server when I am trying with IP address.
I am using Amazon Cloudfront as CDN, Its not working behind Cloudfront.
I may suppose to add some behavior on cloudfront console to resolve this, but i am not sure about it.
Any help appreciated.
A shot in the dark here (as Michael - sqlbot says - you really need to provide more info).
Is the gridview an ASP.NET web control? If so, it might be that ASP.NET isn't recognising the CloudFront user-agent string: Amazon CloudFront (as opposed to something like Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US which is an example of a user-agent string you'd typically see if making a direct request to your site) and therefore isn't rendering the appropriate Javascript (I've seen the __doPostBack javascript omitted in these circumstances)
From https://msdn.microsoft.com/en-us/library/x3k2ssx2.aspx
ASP.NET determines browser capabilities by reading the user-agent
information that is passed from the browser to the server during a
request. It compares the user-agent string that is received from the
browser to user agent strings that are stored in browser definition
files. These browser definition files contain information about the
capabilities of various user agents. When ASP.NET finds a match
between the current user-agent string and a user-agent string in a
browser definition file, it loads the corresponding browser
capabilities into the HttpBrowserCapabilities object. The properties
of the HttpBrowserCapabilities object can then be used to determine
whether the browser type that is represented by the user agent
supports scripting, styles, frames, and so on. Based on these
capabilities, the controls on the page render Web controls using
appropriate markup.
The page contains some details on how you can override this, but none of them seem ideal (i.e. explicitly targeting a specific browser / platform).
The other option is to configure CloudFront to whitelist the User-Agent header for cache behaviours that match the pages where you're using these controls (Edit Behavior > Forward Headers > Whitelist > Add Custom: User-Agent), but be aware this will effectively disable caching for those resources, as user-agent strings often vary per user.
I'm experiencing a strange issue with WebClient.DownloadString that I can't seem to solve, my code:
Dim client As New WebClient()
Dim html = client.DownloadString("http://www.btctrade.com/")
The content doesn't seem to be fully AJAX, so it can't be that. Is it due to the web page being in Chinese? I'm guessing HTML is just served as HTML, so can't really be that either. The URL is fine when I go to it and there seems to be no redirects to https either.
Anyone know why this is happening?
You must set cookies and useragent in the webclient headers this works
client .Headers.Add(HttpRequestHeader.UserAgent, "UserAgent,Mozilla/5.0 (Windows NT 5.1; rv:14.0) Gecko/20100101 Firefox/14.0.1");
client .Headers.Add(HttpRequestHeader.Cookie, "USER_PW=9b1283bfe37ac47b243a1e0c9c1c9e52; PHPSESSID=f692406a0c84dba2605a7065d55a3b53")
and if u want that the request do all this work , you have to user httpwebrequest then save all the response's headers and use them in a new request
WebClient is not buggy, so probably the server is returning data you did not expect. Use Fiddler to watch what happens when you go to the site in a web browser.
When I executed your code the web site returned no data. When I visited the site in a web browser it returned data. Probably, the site is detecting that you are a bot and denying you access. Fake being a browser by mimicking what you see in Fiddler.
I have a web application (which I have no control over) I need to send HTTP post programatically to. Currently I've using HttpWebRequest like
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://someserver.com/blah/blah.aspx");
However the application was returning a "Unknown Server Error (not the IIS error, a custom application error page)" when posting to data. Using Fiddler to compare my Post vs IE post I can see the only difference is in the POST line of the request:
In Internet Explorer Fiddler (RAW view) shows traffic
POST /blah/blah.aspx HTTP/1.1
In my C# program fiddler (RAW view) records traffic as
POST https://someserver.com/blah/blah.aspx HTTP/1.1
This is only difference from both both requests.
From what I've researched so far it seems there is no way to make HttpWebRequest.Create post the relative URL.Note: I see many posts on "how to use relative URLs" but these suggestions do not work, as the actual post is still done using an absolute URL (when you sniff the HTTP traffic)
What is simplest way to accomplish this post with relative URL?
(Traffic is NOT going through a proxy)
Update: For the time being I'm using IE automation to do scheduled perf test, instead of method above. I might look at another scripting language as I did want to test without any browser.
No, you can't do POST without server in a Url.
One possible reason your program fails is if it does not use correct proxy and as result can't resolve server name.
Note: Fiddler shows path and host separately in the view you are talking about.
Configure you program to use Fiddler as proxy (127.0.0.1:8888) and compare requests that you are making with browser's ones. Don't forget to switch Fiddler to "show all proceses".
Here is article on configuring Fiddler for different type of environment including C# code: Fiddler: Configuring clients
objRequest = (HttpWebRequest)WebRequest.Create(url);
objRequest.Proxy= new WebProxy("127.0.0.1", 8888);
There is a reports website which content I want to parse in C#. I tried downloading the html with WebClient but then I don't get the complete source since most of it is generated via js when I visit the website.
I tried using WebBrowser but could't get it to work in a console app, even after using Application.Run() and SetApartmentState(ApartmentState.STA).
Is there another way to access this generated html? I also took a look into mshtml but couldn't figure it out.
Thanks
The Javascript is executed by the browser. If your console app gets the JS, then it is working as expected, and what you really need is for your console app to execute the JS code that was downloaded.
You can use a headless browser - XBrowser may server.
If not, try HtmlUnit as described in this blog post.
Just a comment here. There shouldn't be any difference between performing an HTTP request with some C# code and the request generated by a browser. If the target web page is getting confused and not generating the correct markup because it can't make heads or tails of from the type of browser it thinks it's serving then maybe all you have to do is set the user agent like so:
((HttpWebRequest)myWebClientRequest).UserAgent = "<a valid user agent>";
For example, my current user agent is:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1
Maybe once you do that the page will work correctly. There may be other factors at work here, such as the referrer and so on, but I would try this first and see if it works.
Your best bet is to abandon the console app route and build a Windows Forms application. In that case the WebBrowser will work without any work needed.
Here's the code I'm trying to run:
var wc = new WebClient();
var stream = wc.OpenRead(
"http://en.wikipedia.org/wiki/List_of_communities_in_New_Brunswick");
But I keep getting a 403 forbidden error. Don't understand why. It worked fine for other pages. I can open the page fine in my browser. How can I fix this?
I wouldn't normally use OpenRead(), try DownloadData() or DownloadString() instead.
Also it might be that wikipedia is deliberately blocking your request because you have not provided a user agent string:
WebClient client = new WebClient();
client.Headers.Add("user-agent",
"Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");
I use WebClient quite often, and learned quite quickly that websites can and will block your request if you don't provide a user agent string that matches a known web browser. Also, if you make up your own user agent string (eg "my super cool web scraper") you will also be blocked.
[Edit]
I changed my example user agent string to that of a modern version of Firefox. The original example I gave was the user agent string for IE6 which is not a good idea. Why? Some websites may perform filtering based on IE6 and send anyone with that browser a message or to a different page that says "Please update your browser" - this means you will not get the content you wanted to get.