I have a basic C# HttpWebRequest. My problem is the page that it is sending the GET request to, requires javascript (on the client-side) to be enabled for the content to generate.
How can I add javascript support to my code? Is it even possible?
The server can't really know whether the client supports Javascript. It can only go on the data you give it.
So there are two possibilities:
It's using headers to work out what response to send, and inferring that you can't run Javascript. Solution: work out what headers it requires, and set them explicitly.
It's sending you back the page, but you're unable to use it because you're not displaying it on a browser. Solution: look at the page, work out what AJAX calls the Javascript is making, and make those instead. You may not even need to fetch the original hosting page.
HttpWebRequest just implements GET, you need full browser to execute JavaScript (and possibly need CSS files to as scripts may depend on them).
The built in approach is to use WebBrowser control to render pages, that grab innerHTML after you find that JavaScript rendering is done.
Related
I want to open some websites which may contain javascript code for example google analytics or piwik. I think the using a webclient is the easiest way to visit websites but does the webclient run javascript code in the background automaticly or how could I get javascriptcode running by visiting a website in C#?
Have you considered using a headless browser like PhantomJS.
If you are looking for something with a UI interface, look at the WebBrowser Control.
If you need to just get the DOM or underlying elements of the DOM, I would suggest Watin. It is very mature and works well and its fast.
WebClient only loads data from the web. It does not interpret it in any way.
If you need to treat the page the way a web browser would, then you need to use a Web Browser control.
in C# there is an Internet Explorer control. You can execute javascript code on a client PC by setting web URL link to this control. Internet Explorer control is a fully functional browser which executes all client code.
I am working on a project in which one functionality is that this page obtains data from other page (not web service) and then displays it on a grid and use hightcharts for charting.
The problem is that this data I want to read is in anotherpage.
I know that I can read html from other pages... but to get this information on the page, I need to fill 2 input text for a filter and press a submit button.. then it displays a table and this is the table where I need to extract the information.
Is there a way to do this automatically on c#?
There are plenty of ways to do this; the most common revolve around AJAX. You can initiate a callback from the client via Javascript to a method on the server, which can update controls in an UpdatePanel, for example.
You can also make client side calls to server side Page Methods. Effectively, this is a static method on your webform that you can call from the client via javascript/jquery and AJAX.
EDIT.
It turns out that you want to scrape another site. The easiest way for you to do this is have a server side page method on your website that does this - it requests the page from the client site, extracts the info you want, and then returns that to your client. Your client can of course call this as a page method.
See https://web.archive.org/web/20210513000146/http://www.4guysfromrolla.com/webtech/070601-1.shtml for a tutorial, and I do suggest using the HTML Agility Pack as that article mentions.
Further EDIT
You want to further manipulate the page on the remote site; if you can't or don't want to speak to the developers of that site to work out a way of doing it programmatically, then you'll have to cheat. Get Firebug and Tamper Data. Use Firebug and Tamper Data to see how clicking the button on the remote site makes a request and posts it to the server - you want to emulate doing the same. If you know what data is being posted then you can, from your server, make exactly the same post.
You often have this kind of problem when trying to scrape AJAX websites.
Is there a way using either C# or a scripting language, such as Python, to load up a website in the user's default webbrowser and continue to interact it via code (e.g. invoke existing Javascript methods)? When using WinForms, you can host a Webbrowser control and invoke scripts from there, but only IE is supported. Is there a way of doing the same thing in the user's default browser (not necessarily using WinForms)?
Update: The website is stored on the user's machine, not served from a third party server. It is a help page which works dynamically with my C# program. When the user interacts with my C# program, I want to be able to execute the Javascript methods on the website.
You might want to look into Selenium. It can automate interaction with FireFox, IE, Chrome (with chromedriver) and Opera. It may not be suitable for your purposes due to the fact that it uses a fresh, stripped down profile, rather than the user's normal browser profile.
If you look at the HTTP request header you can determine the user-agent making the request. Based upon that information you can write logic from the server side as to respond with a unique page per detected user-agent string. Then you add any unique JavaScript you want as a static string to be executed by the user-agent application.
I am need to have 7 tabs all having web browser controls and each should have different user agent.I saw this and thought how do they do that?
http://www.howtogeek.com/howto/18450/change-the-user-agent-string-in-internet-explorer-8/
and using this for as implementation
Changing the user agent of the WebBrowser control
works like this if i change one browsers string all get same
User-Agent is an HTTP header field. There are many ways to change it. For Firefox there are definitely plugins you can download that let you modify your HTTP headers on the fly. There is probably a plugin that will let you do do this on a per-tab basis.
This sounds like a browser-specific question. But, in general, when it comes to HTTP, you can send whatever you want. Your browser does it automatically for you, but for Firefox, at least, you can download plugins that give you more control. For other browsers there might be a config file or setting that you can edit.
If you're writing a web browser that lets you do this feature, well, that should be easy as well; the API you are using should let you modify the headers.
I have a .NET desktop application (not web) with a WebBrowser control.
I cannot find any information on how, or if it is even possible, to obtain the HTTP status code when a document is navigated to inside this control. Does anyone know if this is possible or how?
The purpose is to detect codes other than 200 and perform actions accordingly within the application.
A web page is not made up from a single HTTP GET request. The stackoverflow.com front page for example requires 16 requests. Stuff like javascript code, images, page visit counters, coming from different web sites as well. Some of it retrieve from cache instead of downloaded.
WebBrowser (aka Internet Explorer) doesn't support enumerating these individual requests. You'd have to use the HttpWebRequest class, but that of course doesn't make a web page.