I am a hobbyist programmer and want to do the following:
Log into a site via Username - / Password
Click an image which directs me to a certain (sub)site
Fill out a form
BEFORE submitting the form, I want to Load the whole page and see the content of the input provided by my application before submitting it.
I wrote several Web-scrapers / Parsers in the last few months (All in Java) but now I am facing quite some difficulties with C# and .NET.
I am using Visual Studio 2015 IDE and I do NOT want - if possible - to use 3rd party tools / plugins etc. (if possible, please try to not provide answers hinting at HtmlAgilityPack, JSoup.. equivalents or others). Everything Core to .NET (or just C# in general) etc. is good though.
(1) and (2) do already work, I can log in supplying username and password, and then click() on the picture and get redirected to the form using the SAME code as below.
I have the following Code at the moment: (CAVE: It is a WPF project (NOT WINFORMS), using IHTMLDocument (2)
Currently my Code looks like the following:
using System;
using System.Windows;
using System.Windows.Input;
using System.Windows.Navigation;
using mshtml;
using System.Diagnostics;
public partial class MainWindow : Window
{
private CustomObject testObject;
public IHTMLDocument2 doc2;
public MainWindow()
{
InitializeComponent();
Browser.Navigate("https://www.xXx.xXx/"); //The Browser is the standard WPF WebBrowser
// Several other functions like LoadComplete etc. here
}
// Function for login
private void Browser_Login(object sender, RoutedEventArgs e)
{
doc2.all.item("ID_OF_USERNAME_TEXTFIELD").value = "MyUsername";
doc2.all.item("ID_OF_PASSWORD_TEXTFIELD").value = "MyPassWord";
doc2.all.item("NAME_OF_SUBMIT_BUTTON").click();
}
private void Browser_ClickOnImageLinkToGetToForm(object sender, RoutedEventArgs e)
{
// Logic to get to the Form, everything works as expected
}
private void Browser_FillForm(object sender, RoutedEventArgs e)
{
doc2.all.item("NAME_OF_THE_TEXTFIELD_TO_FILL").value = "Text if want to put into the field";
// /repeat for all other TextFields and a couple of other input elements.
--> Exception!
}
Every time I run the code I do the following:
Start application --> WebBrowser opens, directs me to the homepage.
Click Button1 (Browser_Login) --> AutoFill username && Password --> click Submit (i am logged in now)
Click Button2 (Browser_ClickOnImageLinkToGetToForm) --> "Click()" on Image, get redirected to the form.
Click Button3 (Browser_FillForm) --> RunTimeException:
Additional information: 'System.__ComObject' does not contain a definition for 'textContent' OR definition for 'value' OR definition for 'innerText' OR definition for 'InnerHtml' etc..
I have tried A LOT of different things, none seem to work.
The TextField i want to fill has the following properties:
<input class="TxtField1" maxlength="800" type="text" id="Title" name="Title" value="" onkeyup="checkField(this.name);" onblur="checkField(this.name);" style="width: 550px; cursor: help; background-color: rgb(228, 234, 224);" title="Some Title">
I have never encountered such problems coding in Java, also some people mention I should check for 32 / 64 bit systems and some suggested to write a Wrapper for the_COM object and some other things. I don't want to write a Wrapper tough, nor do i want to check for 32 -/ 64 -bit, i want to run it on every system.
Would someone provide a simple standard .Net / C# solution for this? Please keep in mind, I am a hobbyist, I am NOT a professional developer (Maybe if won't understand some super in-depth examples (I'll most definitely learn them tough)).
TL;DR:
How to Fill a Form which checks content on keyup with WPF WebBrowser control using .NET / C#!
There are a lot of things you can do if you get hold of the DOM like this:
private dynamic GetDOM(WebBrowser wb)
{
dynamic document = null;
System.Threading.Thread.Sleep(500);
while (document == null)
{
Dispatcher.Invoke(() => document = wb.Document);
System.Threading.Thread.Sleep(100);
}
return document;
}
Not sure why what you were doing wasn't working, but I copied this code from a working solution. You can cut out the Dispatcher stuff if you are on the main thread.
You get a COM object that has a lot of methods just like in JavaScript. So to set the text you can do something like this:
document.getElementById("bob").value = "fred";
Related
I am trying to create a program in C# (maybe using WinForms) which will enter a licence number into a form field of a specific website and validate whether or not the licence number is a currently valid licence.
I am unsure as to where to start, as I can't even find the form field id in the source code of the website, and am unsure what technologies the website uses.
Additionally, the purpose of this program will be to enter a list of license numbers and return the names and validation status of each license. Datasource being the website.
Any information on how to go about this would be much appreciated, I am an intermediate C# developer - having mostly worked in ASP.Net, though feel Winforms may be better suited for this project.
Kind Regards
You can use a WebBrowser control:
You can load the page using webBrowser1.Navigate("url of site")
Find elements in page using webBrowser1.Document.GetElementById("buttonid") also you can iterate over HtmlElement of webBrowser1.Document.Body.All and check for example element.GetAttribute("value") == "some vaule" to find it.
Set value for element using element.InnerText ="some value" or element.SetAttribute("value", "some value")
Submit your form by invoking the submit of form or click of its submit button using element.InvokeMember("method")
Example
For example, if you browse google and look at page source, you will see name of search text box is "q" and name of the form that contains the search box is "f", so you can write this codes to automate search.
Create a form with name BrowserSample.
From toolbox, drag a WebBrowser and drop on form.
Hanfdle Load event of form and navigate to google.
Handle DocumentCompleted event of webBrowser1 and find f and find q and set InnerText of q and invoke submit of f. This event fires after the navigation and document load completed.
In a real application add required null checking.
Code:
private void BrowserSample_Load(object sender, EventArgs e)
{
this.webBrowser1.Navigate("https://www.google.com/");
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//Because submitting f causes navigation
//to pervent a loop, we check the url of navigation
//and if it's different from google url, return
if (e.Url.AbsoluteUri != "https://www.google.com/")
return;
var f = this.webBrowser1.Document.Body.All.GetElementsByName("f")
.Cast<HtmlElement>()
.FirstOrDefault();
var q = f.All.GetElementsByName("q")
.Cast<HtmlElement>()
.FirstOrDefault();
q.InnerText = "C# Webbrowser Control";
f.InvokeMember("submit");
}
If you execute the program, it first navigate to google and then shows search result:
In your special case
Since the site loads content using ajax, then you should make a delay in DocumentCompleted:
async void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (e.Url.AbsoluteUri != "https://www.onegov.nsw.gov.au/PublicRegister/#/publicregister/search/Security")
return;
await Task.Delay(5000);
var f = this.webBrowser1.Document.Body.All.GetElementsByName("searchForm")
.Cast<HtmlElement>()
.FirstOrDefault();
var q = f.All.GetElementsByName("searchText")
.Cast<HtmlElement>()
.FirstOrDefault();
q.InnerText = "123456789";
f.InvokeMember("submit");
}
Don't forget to add using System.Threading.Tasks; or if you use .Net 4.0 you simple can use System.Threading.Thread.Sleep(5000) and remove async/await.
It looks like the website uses JSON POSTs. If you have FireFox open Developer -> Network and look at the "PerformSearch" entry. That will tell you everything you need to know as far as what the website is expecting in a POST request so you can read the response.
I'm using C#, and I've been struggling for a few days for grabbing the final rendered HTML from an URL.
I've tried using several browser engines, Awesomium, WebBrowser and so on, but none of them returns the actual rendered HTML of the page, as if I right clicked in chrome and chose "inspect element".
What I do is roughly the following (using the WebBrowser WinForms control):
public static string GetDomSource(WebBrowser wb)
{
var dd = wb.Document.DomDocument as IHTMLDocument2;
return dd.body.parentElement.outerHTML;
}
(Though I don't know whether you already tried this or whether you are using WinForms at all).
To introduce the IHTMLDocument2 interface, I've add a reference to the "Microsoft.mshtml" assembly.
I am trying to extract some information from a website. But when I navigate to it, it uses javascript to connect me to a server before dynamically loading a php-page. I can follow the sequence in Chrome with the developer tools. I figured it would be easiest to reproduce it in C# with the Webbrowser control and simply navigate to the website. Then the webbrowser control must contain all the javascript files, the text from the dynamically loaded php page and so on. But is this true and where in the control are they stored? I can't seem to find them.
Recreate the whole sequence diagram implemented in Chrome would be a lot of work. However, "extract some information from a website" is something that can be done quite easily.
Disclaimer: I assumed this question was for the WPF's WebBrower control (it would be almost the same for WinForms)
You can get the HTMLDocument once the page is loaded, using:
using mshtml; // <- don't forget to add the reference
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
browser.Navigate("http://google.com/");
browser.LoadCompleted += browser_LoadCompleted;
}
void browser_LoadCompleted(object sender, NavigationEventArgs e)
{
HTMLDocument doc = (HTMLDocument)browser.Document;
string html = doc.documentElement.innerHTML.ToString();
// from here, you should be able to parse the HTML
// or sniff the HTMLDocument (using HTML Agility Pack for instance)
}
}
From this HTMLDocument, you have access to a lot of properties, including HTML elements, CSS styles and scripts. I invite you to put a break-point and check out what best fits your needs.
Nevertheless, since the page you want to load uses JavaScript to fill its content, the HTMLDocument will probably not be complete a the time the LoadCompleted is raise.
In that case, I suggest to use a timer to poll until the content is stable.
You could also use HTMLDocument to inject your own JavaScript code, and call C# methods througth WebBrowser.ObjectForScripting, but this is gonna be much more complicated and harder to maintain.
I have a situation where a rather clever website updates the latest information on the site via Shockwave Flash through a TCP connection. The data received is then updated onto the page via JavaScript so in order to get the latest data a browser is required. If attempts are made to hit the website with continual requests then a) you get banned and b) you're not actually getting the latest data, only the last updated base framework.
So I need to run a browser with scripts enabled.
My first question is, using the standard WPF WebBrowser in .NET I get the following warnings which I don't get in standard IE, Chrome or Firefox. What is causing this and how do I supress/allow it but still allowing scripts for the site to be run?
My second question relates to is there a better way do to this or are there any better alternatives to the WebBrowser control that will
Allow scripts to run
can access the DOM or html and scripts returned in at least text format
is compatible with WPF
can hide the browser as I don't actually want it displayed.
So far I've looked into WebKit.NET which doesn't seem to allow access to the DOM and didn't like WPF windows when I tested and also Awesomium but again didn't appear to allow direct access to the DOM without javascript.
Are there any other options (apart from hacking their scripts)?
Thank you
set WebBrowser.ScriptErrorsSuppressed = true;
Ultimately I ended up keeping the WPF control and used this code to inject a JavaScript script to disable JavaScript errors. The Microsoft HTML Object Library needs to be added.
private const string DisableScriptError = #"function noError() { return true;} window.onerror = noError;";
private void webBrowser1_Navigated(object sender, System.Windows.Navigation.NavigationEventArgs e)
{
InjectDisableScript();
}
private void InjectDisableScript()
{
HTMLDocumentClass doc = webBrowser1.Document as HTMLDocumentClass;
HTMLDocument doc2 = webBrowser1.Document as HTMLDocument;
IHTMLScriptElement scriptErrorSuppressed = (IHTMLScriptElement)doc2.createElement("SCRIPT");
scriptErrorSuppressed.type = "text/javascript";
scriptErrorSuppressed.text = DisableScriptError;
IHTMLElementCollection nodes = doc.getElementsByTagName("head");
foreach (IHTMLElement elem in nodes)
{
HTMLHeadElementClass head = (HTMLHeadElementClass)elem;
head.appendChild((IHTMLDOMNode)scriptErrorSuppressed);
}
}
WPF WebBrowser does not have this property as the WinForms control.
You'd be better using a WindowsFormsHost in your WPF application and use the WinForms WebBrowser (so that you can use SuppressScriptErrors.) Make sure you run in full trust.
I'm able to navigate to gmail, but then I want to do something as simple as enter the credientials and click the login button.
private void btnSubmit_Click(object sender, EventArgs e)
{
btnSubmit.Enabled = false;
webGmail.LoadURL("http://www.gmail.com");
webGmail.LoadCompleted += ExecuteSomething;
}
private void ExecuteSomething(object sender, EventArgs eventArgs)
{
webGmail.ExecuteJavascript(#"<script src = 'http://code.jquery.com/jquery-latest.min.js' type = 'text/javascript'></script>");
webGmail.ExecuteJavascript(#"$('#Email').val('foo');");
webGmail.ExecuteJavascript(#"$('#Passwd').val('bar');");
webGmail.ExecuteJavascript(#"$('#signIn').click();");
}
Nothing happens. I know using developer tools with Chrome that you cant modify anything on the page. But is there a way of filling in forms?
Are there any other better headless browsers? I actually need one that supports a web control that I can put into my form so that I can see what is going on. This is mandatory
The problem is that the script tag is not javascript - it's HTML - so executing it as javascript will just throw an error. To load a script with the ExecuteJavascript method, you'd need to create a script element in javascript and inject it into the page head.
See here for an example:
http://www.kobashicomputing.com/injecting-jquery-into-awesomium
I recently came across a similar problem. I tried cefsharp, awesomium, open-webkit-sharp, geckofx. The most advanced was, oddly enough, WebBrowser. It allows you to perform almost all activities directly with C#. For example, click on a submit button in C# you could only in WebBrowser. If you still want to use an alternative engine, I recommend the open-webkit-sharp - it is the most advanced of them (although it has the same problem with the click of buttons).
WatiN has an Javascript implementation for Webkit, which Awesomium is based on, the source code is free and can be downloaded at their homepage. Good luck.
Maybe this question could help you too, calling Javascript from c# using awesomium.