emulating a browser programmatically in C# / .Net - c#

We would like to automate certain tasks in a website, like having a user 'login', perform some functionality, read their account history etc.
We have tried emulating this with normal POST/GETs, however the problem is that for example for 'login', the website uses javascript code to execute an AJAX call, and also generate some random tokens.
Is it possible to literally emulate a web-browser? For example:
Visit 'www.[test-website].com'
Fill in these DOM items
DOM item 'username' fill in with 'testuser'
DOM item 'password' fill in with 'testpass'
Click' button DOM item 'btnSubmit'
Visit account history
Read HTML (So we can parse information about each distinct history item)
...
The above could be translated into say the below sample code:
var browser = new Browser();
var pageHomepage = browser.Load("www.test-domain.com");
pageHomepage.DOM.GetField("username").SetValue("testUser");
pageHomepage.DOM.GetField("password").SetValue("testPass");
pageHomepage.DOM.GetField("btnSubmit").Click();
var pageAccountHistory = browser.Load("www.test-domain.com/account-history/");
var html = pageAccountHistory.GetHtml();
var historyItems = parseHistoryItems(html);

You could use for example Selenium in C#. There is a good tutorial: Data Driven Testing Using Selenium (webdriver) in C#.

I would suggest to instantiate a WebBrowser control in code and do all your work with this instance but never show it on any form. I've done this several times and it works pretty good. The only flaw is that it makes use of the Internet Explorer ;-)

Try JMeter, it is a nice too for automating web requests, also quite popularly used for performance testing of web sites

Or just try System.Windows.Forms.WebBrowser, for example:
this.webBrowser1.Navigate("http://games.powernet.com.ru/login");
while (webBrowser1.ReadyState != WebBrowserReadyState.Complete)
System.Windows.Forms.Application.DoEvents();
HtmlDocument doc = webBrowser1.Document;
HtmlElement elem1 = doc.GetElementById("login");
elem1.Focus();
elem1.InnerText = "login";
HtmlElement elem2 = doc.GetElementById("pass");
elem2.Focus();
elem2.InnerText = "pass";

Related

C# Download a complete weired HTML page

I'm sorry if this question has allready been answered , but I litterally spent more than two weeks searching the Internet for a solution to my issue.
Now , I definitly do not perform the best google searches , and it might seem that my question has several effective answers on the Internet. but I really tried every single solution that I found , without any positive results.
What i'm trying to do is simple , and I did it successfully on many websites :
Navigating to a website using WebBrowser (1).
Waiting for everything to load properly (document completed event).
Download the page using DocumentText property (1).
(1) : I also use WebClient from time to time.
And there it is , I get the html page , and I can exploit it anyway I like. The issue is with a particular website that I cannot obtain the full content inspite of using all the different solutions that I found. I suspected the fact that this page might need to load several scripts before getting the full content. Yet again, I read that WebBrowser does run all the necessarry scripts before triggering the " completed " event, so , apparently , that's not the issue. The page that i'm inquiring about is : http://www.coolmod#com/tarjetas-graficas-nvidia-pci-express
I tried , after that the WebBrowser loads the entire page , looking for random elements using GetElementByID property and checking if I get a null result. It appears that when I try getting an element that does not belong to the products list , i'm successful. But , whenever I try to get an element that belongs to the list it self , I always get a null. Which means , the list it self does not load. and I really don't know why. By the way , I do not prevent the WebBrowser. Navigate () from delivering multiple responses , I allow it to give as many feedbacks as possible , and still , the product list does not load , even when I pass the cookies. I Even tried copying all the content of the document and pasting it through the clipboard. Here is a simple example of what I try to do :
private void catalogueDownload()
{
System.Windows.Forms.WebBrowser wb = new System.Windows.Forms.WebBrowser();
wb.ScriptErrorsSuppressed = true;
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(Catalogue_DocumentCompleted);
wb.Navigate("http://www.coolmod.com/tarjetas-graficas-nvidia-pci-express");
}
public void Catalogue_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var wb = sender as System.Windows.Forms.WebBrowser;
string output = wb.DocumentText;
File.WriteAllText("testing.html", output);
}
Thanks for giving up your time to read all this.
System.Windows.Forms.WebBrowser is a bit outdated, If I were you, I would consider using an external library for that, Selenium would be my 1st choice, given it has all the necessary integrations with .NET Framework (and a lot of other languages)

Selenium Webdriver not returning Javascript code

Hi I am new to Selenium Webdriver. I can successfully open a webpage and find elements on it.
In one case I have noted that there is a link on a page that becomes clickable after a while. In Firebug on the Script tab, I can see the code for the javascript that does the timer function.
But using Selenium Webdriver if I issue:
driver.PageSource
I cannot see the source code for the Javascript. Delaying for 30 seconds before requesting the source makes no difference. I have tried finding it with various By options using:
driver.FindElement
and so on, but it isnt there.
How does firebug manage to find and show the Javascript source code? Is there a way that I can coerce Selenium Webdriver to return all code referenced by the page?
Or is there a better approach?
Thanks for any advice!
EDIT---------------------
I tried the following in Firefox:
Dim Driver2 As IWebDriver = New Chrome.ChromeDriver
Driver2.Url = "http://mypage"
Dim js As IJavaScriptExecutor = TryCast(Driver2, IJavaScriptExecutor)
Dim title As String = DirectCast(js.ExecuteScript("return JSON.stringify(window)"), String)
and I got
Permission denied to access property 'toJSON'
I read that this wont work in firefox so I tried in Chrome, and got
Blocked a frame with origin "http://mypage" from accessing a
cross-origin frame
and from there no solutions because according to this its a security restriction, apparently you can't access an with Javascript
I'm starting to think Im a bit out of my depth here.
PageSource probably doesn't return an exact snapshot of the DOM & etc.
You can instead inspect javascript using driver.executeScript() but the burden of analyzing the return object may be discouraging.
Regardless - Here's a contrived example:
Object result = driver.executeScript("return JSON.stringify(window)");
System.out.println(result.toString());

Filling out a web form programatically in c# [duplicate]

I'm working on a Device that runs on windows CE and i need to automate a login process. I was able to achieve this in a Forms Application using the code below but it doesn't seem like I can use the same process on the smart device. Is there a way to do the same thing while working in CE?
string butts = webBrowser1.Url.AbsoluteUri;
HtmlDocument doc = webBrowser1.Document;
HtmlElement userValue = doc.GetElementById("username");
userValue.SetAttribute("value", "user");
HtmlElement passValue = doc.GetElementById("password");
passValue.SetAttribute("value", "pass");
HtmlElement subButton = doc.GetElementById("submit");
subButton.InvokeMember("click");
The HtmlDocument class, and really all of the System.Windows.Forms.HtmlXxx objects do not exist in the Compact Framework.
If you have a very small set of things you want to access, you might be able to roll your own implementation. You might be able to borrow some from the Mono code base as well. Otherwise, there really aren't any good answers.

Using Javascript for Google Maps API from WPF

I am creating an application that interfaces with Google's Maps API v3. My current approach is using a WebBrowser control by WebBrowser.Navigate("Map.html"). This is working correctly at the moment; however, I am also aware of WebBrowser.InvokeScript(). I have seen this used to execute a javascript function, but I would like to have something like the following structure:
APICalls.js - Contains different functions that can be called, or even separated out into a file for each function if necessary.
MapInterface.cs
WebBrowser.InvokeScript("APICalls.js", args) - Or control the javascript variables directly.
I have seen the InvokeScript method used, but none of the examples gave any detail to the source of the function, so I'm not sure if it was calling it from an html file or js file. Is it possible to have a structure like this, or a similarly organized structure, rather than creating an html file with javascript in each one and using Navigate()?
Additionally, are there any easier ways to use Google Maps with WPF. I checked around, but all of the resources I found were at least 2-3 years old, which I believe is older than the newest version of the maps API.
I can't suggest a better way of using Google Maps API with WPF (although I'm sure it exists), but I can try to answer the rest of the question.
First, make sure to enable FEATURE_BROWSER_EMULATION for your WebBrowser app, so Google Maps API recognizes is it as modern HTML5-capable browser.
Then, navigate to your "Map.html" page and let it finish loading. Here's how it can be done using async/await (the code is for the WinForms version of WebBrowser control, but the concept remains the same).
You can have your APICalls.js as a separate local file, but you'd need to create and populate a <script> element for it from C#. You do it once for the session.
Example:
var scriptText = File.ReadAllText("APICalls.js");
dynamic htmlDocument = webBrowser.Document;
var script = htmlDocument.createElement("script");
script.type = "text/javascript";
script.appendChild(htmlDocument.createTextNode(scriptText));
htmlDocument.body.appendChild(script);
Then you can call functions from this script in a few different ways.
For example, your JavaScript entry point function in APICalls.js may look like this:
(function() {
window.callMeFromCsharp = function(arg1, arg2) {
window.alert(arg1 + ", " +arg2);
}
})();
Which you could call from C# like this:
webBrowser.InvokeScript("callMeFromCsharp", "Hello", "World!");
[UPDATE] If you're looking for a bit more modular or object-oriented approach, you can utilize the dynamic feature of C#. Example:
JavaScript:
(function() {
window.apiObject = function() {
return {
property: "I'm a property",
Method1: function(arg) { alert("I'm method 1, " + arg); },
Method2: function() { return "I'm method 2"; }
};
}
})();
C#:
dynamic apiObject = webBrowser.InvokeScript("apiObject");
string property = apiObject.property;
MessageBox.Show(property);
apiObject.Method1("Hello!");
MessageBox.Show(apiObject.Method2());

can't get proper information from amazon.com using c#/htmlagilitpack

I want to get book information such as author name / pages / publish year / etc ...
from amazon using HtmlAgilityPack but seems amazon webpages have some problems and I can't access the appropriate fields.
here is what I've done :
I use Firefox and Firebug + FirePath to retrieve desired XPath and then inside my code I summon HtmlAgilityPack and instruct it to get information using acquired XPath that I've got it from Firebug
but no luck and till now I couldn't access the "Product Details" part of the amazon.com
and this is my XPath (which is working only with HtmlAgilityPack)
HtmlAgilityPack.HtmlNodeCollection cnt = doc.DocumentNode.SelectNodes("//*[#class='content']");
int i=1;
foreach (HtmlAgilityPack.HtmlNode content in cnt)
{
if (i != 3)
{
i++;
continue;
}
if (i == 3) // i==3 means I've reached the product details but I can't go any further :(
{
s = content.SelectSingleNode("").OuterHtml;
// break;
}
}
How can I access Product Details using appropriate understandable XPath for HtmlAgilityPack?
And why does the syntax of Firebug + FirePath XPath is different from HtmlAgilityPack?
As #Mystere said, I suggest using the API. But if you are doing this for test purpose, or just because you want to use web scraping to obtain the info (I'm not sure if Amazon allows it or not. You should check it before doing this), here is the thing:
Why are you doing this?
s = content.SelectSingleNode("").OuterHtml;
The following is what you are looking for in case you want to get the HTML source of that part of the page.
s = content.OuterHtml;
When you are scraping, I suggest you trying to identify the part you need to scrape, and see the particularities of that block of content.
If you use:
var node = doc.DocumentNode.SelectNodes("//td[#class='bucket']/div[#class='content']");
that will give you the Product Details block you are looking for.
If you want to get some fields like Paperback, Publisher, ... you can do:
string paperback = node.SelectSingleNode("./ul/li[1]/text()").InnerText;
string publisher = node.SelectSingleNode("./ul/li[2]/text()").InnerText;
string language = node.SelectSingleNode("./ul/li[3]/text()").InnerText;
...
If you want to be sure that the XPath you are using will be correct for HtmlAgilityPack, open the page on Internet Explorer 8 (or 9) and use the Developer Tools (F12) to get the XPath. The thing is that each browser renders the HTML in a particular way. For example, you will always see <tbody> tags in Firefox right after a <table>, so maybe HtmlAgilityPack doesn't, and that simple detail of adding /tbody/ to your XPath can make your program fail.
Why don't you just use amazon's web service api that is designed to do this?

Categories

Resources