I have been using Selenium along side C# in Visual Studio 2013. I will make a call to:
driver.Navigate().GoToUrl("http://<insert webpage>");
...Which will open whichever WebDriver I choose to use.
From here, I will make calls to links/text boxes/menus as I need to.
However, I was wondering if there is a way to get the information from webpages without having to actually open a browser, and if so, could someone perhaps explain or link me to the right direction? It would save time and speed up a lot of my programs. I know applications can get information remotely without actually opening a browser, I just do not know how the process works or if Selenium alone will give that ability.
I appologize if this is wrong place to ask this question.
It is not clear whether or not you need to work with web page (like click on the links, or edit test), but here are two options:
You can use PhantomJS.It is headless browser and since there will be no UI execution may be faster. There is a selenium driver for it.
You can use Html Agility Pack to parse the page and WebClient to download the page. No selenium is required in that case. Html Agility Pack will allow you to make XPath queries, find objects by class name or ID. But: you won't be able to manipulate with DOM structure as you can do with real browser. It is just to parse and navigate over static html page.
Related
I'm trying to scrape a particular webpage which works as follows.
First the page loads, then it runs some sort of javascript to fetch the data it needs to populate the page. I'm interested in that data.
If I Get the page with HtmlAgilityPack - the script doesn't run so I get what it essentially a mostly-blank page.
Is there a way to force it to run a script, so I can get the data?
You are getting what the server is returning - the same as a web browser. A web browser, of course, then runs the scripts. Html Agility Pack is an HTML parser only - it has no way to interpret the javascript or bind it to its internal representation of the document. If you wanted to run the script you would need a web browser. The perfect answer to your problem would be a complete "headless" web browser. That is something that incorporates an HTML parser, a javascript interpreter, and a model that simulates the browser DOM, all working together. Basically, that's a web browser, except without the rendering part of it. At this time there isn't such a thing that works entirely within the .NET environment.
Your best bet is to use a WebBrowser control and actually load and run the page in Internet Explorer under programmatic control. This won't be fast or pretty, but it will do what you need to do.
Also see my answer to a similar question: Load a DOM and Execute javascript, server side, with .Net which discusses the available technology in .NET to do this. Most of the pieces exist right now but just aren't quite there yet or haven't been integrated in the right way, unfortunately.
You can use Awesomium for this, http://www.awesomium.com/. It works fairly well but has no support for x64 and is not thread safe. I'm using it to scan some web sites 24x7 and it's running fine for at least a couple of days in a row but then it usually crashes.
I have a good understanding of DOM+HTML etc but I'm new to c#, whats the best way currently of downloading then rendering (executing all javascript + DOM changes etc) and simulating user interaction with a webpage in c#?
I've seen HTML agility pack mentioned quite a few times but it doesn't look like its been updated since August 2012? Has anyone used this recently and encountered any problems? Does c# have anything built in for this?
Thanks!
First of all HTMLAgilityPack it's not for simulating user interaction in a web page, HTMLAgilityPack is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...).
HTMLAgilityPack not support JavaScript, it's a very important step, because many developers get trouble with the full load of the page in the browser and the request made by HTMLAgilityPack or any library you use to make the request.
For user interaction, full load of the web page, web testing I strongly recommend you Selenium, Selenium automates browsers. Selenium has support for several programming languages (Java, C#, Ruby, Python, etc), you can read more in the above link with a very good documentation.
The only drawback of Selenium is its open a browser to make the work, but it can be simulated in some environments to run headless browser, you can read more about this in the following links :
Selenium Headless Automated Testing in Ubuntu
Headless Browser and scraping - solutions
I hope this help you
Could anyone please advise me what is the best framework/library for web browser automatisation? The task is to open web browsers page, sign in, perform some long searches, and save gathered information to excel. Now I'm using IE references in C#, but at work I could use only IE8. If I've upgraded it to IE9, but some scripts on target sites started working with errors.
I tried to use awesomium, but I couldn't parse page with help of it, as I understand. Are there any variants to do this with high speed? Size of libs - doesn't matter.
If there are any solutions compatible with Scala it would be great.
As om-nom-nom hinted already, your best bet is probably a webdriver implementation like selenium webdriver. It has bindings for c# and java and can use IE, FF, Chrome, phantomjs (great if you want to go headless) and others.
Note that it might be not the best idea to do also the gathering of information directly with the webdriver, especially if the site content is changing fast. In such cases it might be useful to save the html page source with webdriver and then switch to some more efficient library for static content, like JSoup.
I'd like to create a web page where you can enter your domain name and have it fetch it and show you all the resources, their download times, etc -- similar to FireFox's NET tab.
Here's the page which I'd like emulate: http://tools.pingdom.com/
Now, I know this is a complex feature, but I'd like to hear general ideas. I know I could easily fetch the HTML via a WebClient, but that's the easy part. I need to fetch and time all the resources too, and not all at the same time. I want to mimic a browser. So, I thought about using something like System.Windows.Forms.WebBrowser, but that will only really give me the page load time.
Anyone have any thoughts / tips?
Using the Html Agility Pack you can easily find which external resources are referenced from an HTML page.
This won't tell you exactly when they would be loaded by the browser, and also won't help you with dynamically loaded resources, but is a good start.
I'm afraid the only way to be sure is to instantiate an entire browser. You could use a plug in for the Fiddler HTTP debugging proxy to intercept requests from the WebBrowser control to determine which resources are actually loaded in this case.
I need to display HTML in my silverlight application and cannot find a way of doing it. I cannot use the web browser control as it needs to be able to run in or out of a browser.
Does anyone know of a good way to do this, because all I can think of doing at the moment is running replace methods on the text to just replace the tags with C# equivalents eg(<br /> to \n).
The way I do it is to check if the application is running inside the browser and change the means of display accordingly. If running inside the browser, I overlay the application with an IFrame, as I describe in this article: http://www.silverlightshow.net/items/Building-a-Silverlight-Line-Of-Business-Application-Part-6.aspx. Otherwise, I use the WebBrowser control. I have a control which does this all for you in the source code that accompanies my book, which is downloadable from the Apress website here: http://www.apress.com/book/downloadfile/4638.
Hope this helps...
Chris
I believe what you are looking for is HTML Bridge.
Edit I'm am actually now unsure if you'll still have access to javascript if you're running this OOB. I'm going to look into this some more and will further update. I'll still leave the answer up though for reference.
Second Edit Here is what I've found. HTML Bridge is disabled when you run silverlight out of browser. This disables access to the HTML DOM as well as Javascript. However, according to a comment on this site:
HTML Bridge is not available when you first install a OOB app. But you CAN force it if you modify the index.html in the folder where the app is installed just adding the enablehtmlaccess parameter.
It works!
You can even create dynamic HTML elements using the well-known methods of the HtmlPage class. You can even open a new browser window with the Navigate() method and its "_blank" parameter.
Keep in mind this information was posted about SL 3. Its possible that this may have changed, but I doubt it. So it seems that what you may want to do is build a script into the startup of your SL app that detects whether or not your app is running out of browser. If it is then you may want to have some script to call that can modify this file for you.
There recently was a similar question.
I posted a link there to an implementation that parses and displays HTML inline in Silverlight. Of course, it will work only with simple HTML, but maybe you can expand it to your needs.