I want to download content of one website programatically and it looks like this content is loaded by ajax calls. When I simply disable javascript in my browser, only 1 request is made by this page and all content is loaded without AJAX.
What I need to achieve is to make a web request which will tell web page that I have disabled javascript so it returns me all the content and not just empty body tag with no content at all.
Any suggestions how to do that?
You need to mimic browser.
Steps:
Use Fiddler and see what is sent by browser.
Set the same headers/cookies/user agent via C# code.
If does not work - compare request your code makes with browser's one by using Fiddler as proxy for your C# code (set proxy to http://localhost:8888)
Related
We have tried using selenium for testing, but it has numerous setbacks, delays and sudden crashes.
Jquery sounds a good alternative, but the challenge is how to jquerify every page load on the browser.
Brandon Martinez here has an example of how to add jquery to the console of chrome to jquerify a page:
var element1 = document.createElement("script");
element1.src = "http://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js";
element1.type="text/javascript";
document.getElementsByTagName("head")[0].appendChild(element1);
we want that code to automatically be available in every browser page without the need to manually click a bookmark link on every page.
If we get around that then we can use C# code to:
Process.Start("chrome", #"target site");
and since jquery is already available for every page it will do the population and submit we want.
How can I automatically include jquery for every page that gets loaded on the browser? Is it possible to do that via a chrome plugin; jquery or C# code!? Is it at all possible?
I've decided to use Fiddler to modify response body before being displayed on the browser. Now I can jquerify all pages comes to the browser. Look at this link for a detailed example.
I create a web scraping application in C# 4.0. I use Web Client in this application.
now i am facing problem in a A tag click event. Please see the code below
2
Please see this image.
I try to download the html in that page. how to execute this href code from Webclient.
Please help.
You cannot execute javascript from WebClient. You can simulate such request and achieve the required response from the server but it takes some reverse-engineering.
To understand how to get the desired response from the server first you have to perform this operation through browser and record request that is generated by clicking on that link (for that you can use Fiddler Web Debugger). Than you have to recreate such request from your WebClient, that is - send all the required data to the server properly formatted, along with cookies and correct request type (GET or POST - synchronous or asynchronous).
Why you cannot use WebClient for javascript execution is nicely described here
And how you can create a request that resembles the one created by javascript click can be deduced from this.
I have learned to use http request (create and Getresponse) methods to get the header and content of a link.
Problem is that, it is not the link that I want, that I get as http response.
There is an authentication page that comes instead. Only when I click the accept button, do I reach the page I want.
So the header and content that I actually get is of the authentication page.
Is there a way I can use this header and content to create ones more http request to get the page that I want?
I need to click the accept button in the background.
Thanks.
I'd recommend using Fiddler to capture the interaction with the site using a browser. You can then use the Fiddler output as a guide for replicating the same functionality using your code.
If the site is keeping track of whether or not each user has clicked the Accept button on a per-session basis you'll need to replicate that, probably with an HTTP POST. You'll be able to see how to construct that POST, if relevant, from the Fiddler output.
I'm trying to read the contents of an AJAX response in the WebBrowser control in C#/WinForms. The Navigating/Navigated/etc. events seem to fire, but they don't give any access to the data being returned.
Is there any way to intercept the requests and read the data?
Note: If I send the request directly (using webBrowser.Navigate(ajaxUrl)) the WebBrowser controls pops up asking the user to Open/Save the page (as it has a content-disposition header), so that isn't an option. I tried doing it manually with a WebClient/WebRequest, but I can't get the cookies to work correctly (the cookies I read from document.cookie do not seemto match the cookies actually sent with the AJAX request!).
No, you cannot capture XMLHTTPRequests from the web-browser control using the methods of the Web Browser control. You might want to have a look at http://www.fiddler2.com/core/
I've application that uses another web sites data so how can i get it because it uses some JavaScript functions to get that data and it not show in page view-source.
Check the NET tab in firebug, XHR and check the resource that is requested, and request the same resource.
Basically you have to render the webpage and ensure the javascript functions are run (evaluated). You could do this by "borrowing" their javascript files (by linking to them from your own page), but this may not work as you don't know what's in those files - they could be accessing DOM elements that you don't have in your page, or calling to other domains which may prevent them from working correctly.
The easiest way to show the same data is to just host the page inside an iframe on your own page. If you are looking to do this from a normal client application (i.e. not a web app) then you will need a browser control that you navigate to the target page. If the browser control is invisible you could then scrape values from it and show them in your app, although this is a very clumsy way to do it, and it's debatable about how ethical it is.
If you want the another web site view source use the HTTPWebRequest to get the response stream in c#.