Access text generated with javascript - c#

This website has a custom google search box:
http://ezinearticles.com/
The search results are generated by a piece of JS code. How would I access these results using wget and/or C#'s WebClient?

It looks like the searches on that page are normal google site searches. Try wget with the following url, where 'asdf' is your search
wget http://www.google.com/search?&q=site:ezinearticles.com+asdf

You need to to what your web browser does - render the page. Maybe you can extract the js call to the webservice providing the results and just execute this request and parse the output directly.

You need to access it with a programmable browser supporting JavaScript.
The HtmlUnit library for Java does this, and runs fine headless.
You can automate a real web browser, e.g. with WatiN on Windows, and access the page's content. This requires a GUI desktop though, because a real browser window is opened.

Related

How to communicate with Google Chrome using C# or Python

I'm developing a software on C# which has to get info from a website which the user opens in chrome, the user has to input some data and then the website returns a list of different items.
What I want is a way to be able to access to the source code of the page in order to get the info, I cant open the web myself as it doesnt show anything because I didnt input any data, so I need to get it directly from chrome.
How can I achieve this ? A chrome extension ? Or can I access to chrome directly from my software ?
Off the top of my head, I don't know any application that gets data directly from an open instance of Chrome. You'd have to write your own Chrome extension.
Alternatively, you can open the web browser from your application initially.
You can look into these libraries for doing so:
Watin (My personal favourite)
Selenium
Awesomium (You'd have to roll out your own UI, it's invisible)
Cef
Essential Objects Web Browser
EDIT: I didn't think about using QA tools as the actual browser hook as #TheAnathema mentions. That would probably work for your needs.
You're going to need to create it as Chrome extension if you must be dependent on the user actually going to a specific web site (i.e. not being able to do the requests yourself with either Selenium or standard web requests in Python).
The reason why a Chrome extension would be required is because think of how bad it could be for any software to easily read the pages you browse. Banking, medical, email, etc. could all be accessed anonymously from any process if Google allowed any outside process to tap into the web page.
Even Chrome extensions have to ask for permission to be able to do what they want, but at least it is software the user knowingly installed and agreed to the permissions.
A quick search yielded this example of modifying a page's HTML with a Chrome extension: https://blog.lateral.io/2016/04/create-chrome-extension-modify-websites-html-css/
It sounds like you want to do web scraping. Here's a good tutorial to get you started: HTML Scraping.
And this answer has a good example of how to scrape data from a website where you need to submit a form to get access to the data.

Open Website containing Javascript code

I want to open some websites which may contain javascript code for example google analytics or piwik. I think the using a webclient is the easiest way to visit websites but does the webclient run javascript code in the background automaticly or how could I get javascriptcode running by visiting a website in C#?
Have you considered using a headless browser like PhantomJS.
If you are looking for something with a UI interface, look at the WebBrowser Control.
If you need to just get the DOM or underlying elements of the DOM, I would suggest Watin. It is very mature and works well and its fast.
WebClient only loads data from the web. It does not interpret it in any way.
If you need to treat the page the way a web browser would, then you need to use a Web Browser control.
in C# there is an Internet Explorer control. You can execute javascript code on a client PC by setting web URL link to this control. Internet Explorer control is a fully functional browser which executes all client code.

log in to account serverside instead client side

what i'm trying to do is load a webpage from serverside ,for example www.facebook.com then insert username and password programmatically and log in.using desktop application i know it's possible .i know how to do that in c#but in desktop/client side.but what i looking is do that in server side.
for example
i send a request with username and password to a site[my site].let's say www.fbloger.com. then server logging to Facebook using that details .so server can send me important details.my final requirement is to get an alert when a specific friend is online.so i don't need to always logged and check is she online.i can log in to fb as soon as server give me a alert.i don't know is it really possible
It sounds like you are trying to write some kind of server-side web crawler/spider. If this is the case, all you need to do is examine the network requests being performed in a browser then emulate these in C#.
In c# if you send the request with HttpClient, exactly as your browser does, you can then capture the returned web page and scrape the content with something like the HTML Agility Pack which allows you to query the HTML like an XML document for extracting the values you require. See http://html-agility-pack.net (get it via NuGet).
Since you know how to do that in C# simply use C# for server side code.
ASP.Net allows to use C# for code behind and you can copy (or better reuse) desktop code that signs in to a web site.
If your desktop code used WebBrowser control - you'll need to rewrite crawling code with something like HttpClient and avoid pages that execution of JavaScript to render/log in.

Invoke javascript methods on a page

Is there a way using either C# or a scripting language, such as Python, to load up a website in the user's default webbrowser and continue to interact it via code (e.g. invoke existing Javascript methods)? When using WinForms, you can host a Webbrowser control and invoke scripts from there, but only IE is supported. Is there a way of doing the same thing in the user's default browser (not necessarily using WinForms)?
Update: The website is stored on the user's machine, not served from a third party server. It is a help page which works dynamically with my C# program. When the user interacts with my C# program, I want to be able to execute the Javascript methods on the website.
You might want to look into Selenium. It can automate interaction with FireFox, IE, Chrome (with chromedriver) and Opera. It may not be suitable for your purposes due to the fact that it uses a fresh, stripped down profile, rather than the user's normal browser profile.
If you look at the HTTP request header you can determine the user-agent making the request. Based upon that information you can write logic from the server side as to respond with a unique page per detected user-agent string. Then you add any unique JavaScript you want as a static string to be executed by the user-agent application.

Replicate steps in downloading file

I'm trying to automate the download of a file from a website. Normally to download the file, I login with a username and password. Navigate to a particular screen then click a button.
I've been trying to watch the sequence of POSTs using Chrome's developer mode, and then replicate all the steps using .Net WebClient class, but to no success. I've derived from the WebClient class and added cookie handling. Which seems to be working. I go to the login page and post using WebClient.UploadValues. About half the times it seems to work. The next step appears to make another POST action to a reporting URL. Once again I use WebClient.UploadValues, but the response from the server is a page showing an internal error.
I have a couple of questions.
1) Are there better tools than hand coding C# code to replicate a bunch of web browser interactions? I really only care about being able to download the file at a particular time each day onto a Windows box.
2) The WebClient does not seem to be the best class to use for this. Perhaps it's a bit to simplistic. I tried using HttpWebRequest, but it has no facilities for encoding POST requests. Any other recommendations?
3) Although Chrome's developer plugin appears to show all interaction, I find it a bit cumbersome to use. I'd be interested in seeing all of the raw communication (unencrypted though, the site is only accesses via https), so I can see if I'm really replicating all of the steps.
I can even post the exact code I'm using. The site I'm pulling data from, specifically is the Standard and Poors website. They have the ability to create custom reports for downloading historical data which I need for reporting, not republishing.
Using IE to download the file would be a much easier, as compared to writing C# / Perl / Java code to replicate http requests.
Reason is, even a slight change in JavaScript code can break the flow.
With IE, you can automate it using COM. Following VBA example opens IS and performs a google search:
Sub Search_Google()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate "http://www.google.com" 'load web page google.com
While IE.Busy
DoEvents 'wait until IE is done loading page.
Wend
IE.Document.all("q").Value = "what you want to put in text box"
ie.Document.all("btnG").Click
'clicks the button named "btng" which is google's "google search" button
While ie.Busy
DoEvents 'wait until IE is done loading page.
Wend
End Sub
3) Although Chrome's developer plugin appears to show all interaction, I find it a bit cumbersome to use. I'd be interested in seeing all of the raw communication (unencrypted though, the site is only accesses via https), so I can see if I'm really replicating all of the steps.
For this you can use Fiddler to view all the interaction going on and the RAW data going back and forth. To make it work with HTTPS you will need to install the Certificates to enable decryption of trafffic.

Categories

Resources