DotNetBrowser FinishLoadingFrameEvent multiple use - c#

im trying to load multiple pages using DotNetBrowser , and i need to know each time when the new url is loaded,
myBro.FinishLoadingFrameEvent += delegate (object send, FinishLoadingEventArgs es)
{
if (es.IsMainFrame && es.ValidatedURL.Contains("login"))
{
DOMDocument document = myBro.GetDocument();
DOMElement user = document.GetElementById("LoginForm_login");
user.SetAttribute("value", "email");
DOMElement pass = document.GetElementById("LoginForm_password");
pass.SetAttribute("value", "pass");
DOMElement loginbtn = document.GetElementByTagName("button");
loginbtn.Click();
// can't add nothing more here //
};
but this code does inform me only if the first page is loaded

The FinishLoadingFrameEvent is fired for each frame loaded on the web page, even after the page is reloaded. You can use it multiple times to be notified when a browser has loaded the web page completely after the LoadURL method is called.
Here is a sample code based on the documentation article https://dotnetbrowser.support.teamdev.com/support/solutions/articles/9000110055-loading-url-synchronously :
ManualResetEvent waitEvent = new ManualResetEvent(false);
browser.FinishLoadingFrameEvent += delegate(object sender, FinishLoadingEventArgs e)
{
// Wait until main document of the web page is loaded completely.
if (e.IsMainFrame)
{
waitEvent.Set();
}
};
//Load URL
browser.LoadURL("http://www.google.com");
waitEvent.WaitOne();
//The page http://www.google.com is now loaded completely
//Then, reset the event and load the next URL
waitEvent.Reset();
browser.LoadURL("http://www.microsoft.com");
waitEvent.WaitOne();
//The page http://www.microsoft.com is now loaded completely

Related

Get HTML source code from CefSharp web browser

I am using aCefSharp.Wpf.ChromiumWebBrowser (Version 47.0.3.0) to load a web page. Some point after the page has loaded I want to get the source code.
I have called:
wb.GetBrowser().MainFrame.GetSourceAsync()
however it does not appear to be returning all the source code (I believe this is because there are child frames).
If I call:
wb.GetBrowser().MainFrame.ViewSource()
I can see it lists all the source code (including the inner frames).
I would like to get the same result as ViewSource(). Could some one point me in the right direction please?
Update – Added Code example
Note: The address the web browser is pointing too will only work up to and including 10/03/2016. After that it may display different data which is not what I would be looking at.
In the frmSelection.xaml file
<cefSharp:ChromiumWebBrowser Name="wb" Grid.Column="1" Grid.Row="0" />
In the frmSelection.xaml.cs file
public partial class frmSelection : UserControl
{
private System.Windows.Threading.DispatcherTimer wbTimer = new System.Windows.Threading.DispatcherTimer();
public frmSelection()
{
InitializeComponent();
// This timer will start when a web page has been loaded.
// It will wait 4 seconds and then call wbTimer_Tick which
// will then see if data can be extracted from the web page.
wbTimer.Interval = new TimeSpan(0, 0, 4);
wbTimer.Tick += new EventHandler(wbTimer_Tick);
wb.Address = "http://www.racingpost.com/horses2/cards/card.sd?race_id=644222&r_date=2016-03-10#raceTabs=sc_";
wb.FrameLoadEnd += new EventHandler<CefSharp.FrameLoadEndEventArgs>(wb_FrameLoadEnd);
}
void wb_FrameLoadEnd(object sender, CefSharp.FrameLoadEndEventArgs e)
{
if (wbTimer.IsEnabled)
wbTimer.Stop();
wbTimer.Start();
}
void wbTimer_Tick(object sender, EventArgs e)
{
wbTimer.Stop();
string html = GetHTMLFromWebBrowser();
}
private string GetHTMLFromWebBrowser()
{
// call the ViewSource method which will open up notepad and display the html.
// this is just so I can compare it to the html returned in GetSourceAsync()
// This is displaying all the html code (including child frames)
wb.GetBrowser().MainFrame.ViewSource();
// Get the html source code from the main Frame.
// This is displaying only code in the main frame and not any child frames of it.
Task<String> taskHtml = wb.GetBrowser().MainFrame.GetSourceAsync();
string response = taskHtml.Result;
return response;
}
}
I don't think I quite get this DispatcherTimer solution. I would do it like this:
public frmSelection()
{
InitializeComponent();
wb.FrameLoadEnd += WebBrowserFrameLoadEnded;
wb.Address = "http://www.racingpost.com/horses2/cards/card.sd?race_id=644222&r_date=2016-03-10#raceTabs=sc_";
}
private void WebBrowserFrameLoadEnded(object sender, FrameLoadEndEventArgs e)
{
if (e.Frame.IsMain)
{
wb.ViewSource();
wb.GetSourceAsync().ContinueWith(taskHtml =>
{
var html = taskHtml.Result;
});
}
}
I did a diff on the output of ViewSource and the text in the html variable and they are the same, so I can't reproduce your problem here.
This said, I noticed that the main frame gets loaded pretty late, so you have to wait quite a while until the notepad pops up with the source.
I was having the same issue trying to get click on and item located in a frame and not on the main frame. Using the example in your answer, I wrote the following extension method:
public static IFrame GetFrame(this ChromiumWebBrowser browser, string FrameName)
{
IFrame frame = null;
var identifiers = browser.GetBrowser().GetFrameIdentifiers();
foreach (var i in identifiers)
{
frame = browser.GetBrowser().GetFrame(i);
if (frame.Name == FrameName)
return frame;
}
return null;
}
If you have a "using" on your form for the module that contains this method you can do something like:
var frame = browser.GetFrame("nameofframe");
if (frame != null)
{
string HTML = await frame.GetSourceAsync();
}
Of course you need to make sure the page load is complete before using this, but I plan to use it a lot. Hope it helps!
Jim

How do I know when a website finished loaded into WebBrowser?

First i have this button click event:
private void toolStripButton3_Click(object sender, EventArgs e)
{
GetHtmls();
}
Then the method GetHtmls:
private void GetHtmls()
{
for (int i = 1; i < 2; i++)
{
adrBarTextBox.Text = sourceUrl + i;
getCurrentBrowser().Navigate(adrBarTextBox.Text);
targetHtmls = (combinedHtmlsDir + "\\Html" + i + ".txt");
}
}
Now the loop is for one html but later i will change the loop to be i < 45
getCurrentBrowser method:
private WebBrowser getCurrentBrowser()
{
return (WebBrowser)browserTabControl.SelectedTab.Controls[0];
}
Then in the load form1 event I have:
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(Form1_DocumentCompleted);
And the completed event:
private void Form1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser currentBrowser = getCurrentBrowser();
StreamWriter writer = File.CreateText(targetHtmls);
writer.Write(getCurrentBrowser().DocumentText);
writer.Close();
}
What I'm doing here is loading the html to the web browser and create a file on the hard disk of the html source content.
But i'm getting two problems:
In the completed event it keep calling the completed event and create the html file over and over again untill the codument is loaded. How can i make that it will do it once in the completed event ? I mean that it will wait untill the dosument loaded and then write to the file and create the file only once ?
How do I make it all when the loop will be i < 45 and not 2?
So it will wait for the first html to be loaded then write to the file when finish the writing make the next html in the loop then write again in the completed event and so on so it will not move on each other one.
The completed event of the web browser document doesn't act like other completed events it keep calling the completed event every second or so until it finish loading the html.
I am not sure what the purpose of the WebBrowser is in this case. That control is for human interaction, not loading X number of sites.
I would recommand to use HttpWebRequest or the newer WebClient class. This is much easier to use in the case you show here.
The WebClient class can be used like this:
WebClient wc = new WebClient();
string html = wc.DownloadString("yourUrl");
This will wait until the request is completed and the result is returned. No need for event handlers or such. You could improve the performance by using async though.

WebBrowser Control in ASP.NET Web Session

Scenario:
I'd like to use a WebBrowser Control to proxy website navigation on external websites for a research project. Therefore I tried to use the WebBrowser Control to load the site within a page request and forward the received HTML with some modifications (as changed src/href and javascript event handlers aso.). When a participant/user triggers an onclick event on the proxied website, I fetch this event on the server and would like to re-trigger it within my WebBrowser Control.
Problem:
I can't figure out how to handle the WebBrowser Control. Initially I thought it is just the matter of storing it as a session object, but the fact that it has to run in an STA thread makes this difficult. I need the same, active, browser object when the user invokes an onclick event to allow me to proxy this onclick on the control.
For now I use a Wrapper Class IEBrowser: System.Windows.Forms.ApplicationContext.
I copied the code from different sources, mainly from (http://www.codeproject.com/Articles/50544/Using-the-WebBrowser-Control-in-ASP-NET) but it does not consider using the same WebBrowser Control over many Requests.
Here is some of the code from the IEBrowser class:
public void Nav(string url)
{
this.url = url;
this.resultEvent = new AutoResetEvent(false);
htmlResult = null;
ths = new ThreadStart(delegate
{
// create a WebBrowser control
ieBrowser = new WebBrowser();
//Reset Session
InternetSetOption(IntPtr.Zero, INTERNET_OPTION_END_BROWSER_SESSION, IntPtr.Zero, 0);
// set WebBrowser event handls
ieBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(IEBrowser_DocumentCompleted);
//make request
ieBrowser.Navigate(url);
System.Windows.Forms.Application.Run(this);
//remove WebBrowser event handler
ieBrowser.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(IEBrowser_DocumentIsCompleted);
//for now, we keep the webBrowser open
//ieBrowser.Dispose();
});
thrd = new Thread(ths);
thrd.Name = "Thread 2";
thrd.IsBackground = true;
// set thread to STA state before starting
thrd.SetApartmentState(ApartmentState.STA);
thrd.Start();
EventWaitHandle.WaitAll(new AutoResetEvent[] { resultEvent });
thrd.Join();
}
// DocumentCompleted event handle
void IEBrowser_DocumentIsCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (ieBrowser.ReadyState == WebBrowserReadyState.Complete && ieBrowser.IsBusy == false)
{
//Replace or Set IDs on every HTML Element [...]
//...
//
ieBrowser.Stop();
ExitThread();
//Dispose();
resultEvent.Set();
}
}
Limitations:
This is not about performance, I need to do this remote, but only 1-5 person will use the site simultaniously. I know that using WebBrowser Control is probably not a good solution in general, but in this case it is exactly what I need to capture all user navigation.

Web scraping - can't get a table present in the webpage

I'm parsing the HTML of a webpage for getting some information. In my webpage, I have a <table> which I'm trying to access. But when I write the following code, 0 elements are returned:
WebBrowser csexBrowser = new WebBrowser();
HtmlElementCollection table2 = this.csexBrowser.Document.GetElementsByTagName("table");
Here, table2 has nothing. 0 elements.I'm using winforms.
EDIT: This is the link. If you search for a name, then it will show you some results in a table.
There is a verification step which precedes access to the link you provided. In the http://www.nsopw.gov/en-US/Search/Verification document, there are no tables.
Are you sure you pass the verification URL first?
[EDIT]
Please try this:
public Form1()
{
InitializeComponent();
WebBrowser csexBrowser = new WebBrowser();
//here we say what we want to do when the Navigated event occurs
csexBrowser.Navigated += csexBrowser_Navigated;
//this takes some time
csexBrowser.Navigate("http://www.nsopw.gov/en-US/Search");
}
void csexBrowser_Navigated(object sender, WebBrowserNavigatedEventArgs e)
{
//here the document is loaded and we will find the table
HtmlElementCollection table2 = ((WebBrowser)sender).Document.GetElementsByTagName("table");
}
If you insist on navigating with a browser then you must wait for the navigation to finish. Personally I hate this method plus the event fire most people go for can multi-trigger I have found.
Do this:
csexBrowser.Navigate(Url);
while (csexBrowser.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
Simply navigates to the given url and doesn't continue until the page has finished loading.
Done and done.

How to make WebBrowser wait till it loads fully?

I have a C# form with a web browser control on it.
I am trying to visit different websites in a loop.
However, I can not control URL address to load into my form web browser element.
This is the function I am using for navigating through URL addresses:
public String WebNavigateBrowser(String urlString, WebBrowser wb)
{
string data = "";
wb.Navigate(urlString);
while (wb.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
data = wb.DocumentText;
return data;
}
How can I make my loop wait until it fully loads?
My loop is something like this:
foreach (string urlAddresses in urls)
{
WebNavigateBrowser(urlAddresses, webBrowser1);
// I need to add a code to make webbrowser in Form to wait till it loads
}
Add This to your code:
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
Fill in this function
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) {
//This line is so you only do the event once
if (e.Url != webBrowser1.Url)
return;
//do you actual code
}
After some time of anger of the crappy IE functionality I've came across making something which is the most accurate way to judge page loaded complete.
Never use the WebBrowserDocumentCompletedEventHandler event
use WebBrowserProgressChangedEventHandler with some modifections seen below.
//"ie" is our web browser object
ie.ProgressChanged += new WebBrowserProgressChangedEventHandler(_ie);
private void _ie(object sender, WebBrowserProgressChangedEventArgs e)
{
int max = (int)Math.Max(e.MaximumProgress, e.CurrentProgress);
int min = (int)Math.Min(e.MaximumProgress, e.CurrentProgress);
if (min.Equals(max))
{
//Run your code here when page is actually 100% complete
}
}
Simple genius method of going about this, I found this question googling "How to sleep web browser or put to pause"
According to MSDN (contains sample source) you can use the DocumentCompleted event for that. Additional very helpful information and source that shows how to differentiate between event invocations can be found here.
what you experiencend happened to me . readyStete.complete doesnt work in some cases. here i used bool in document_completed to check state
button1_click(){
//go site1
wb.Navigate("site1.com");
//wait for documentCompleted before continue to execute any further
waitWebBrowserToComplete(wb);
// set some values in html page
wb.Document.GetElementById("input1").SetAttribute("Value", "hello");
// then click submit. (submit does navigation)
wb.Document.GetElementById("formid").InvokeMember("submit");
// then wait for doc complete
waitWebBrowserToComplete(wb);
var processedHtml = wb.Document.GetElementsByTagName("HTML")[0].OuterHtml;
var rawHtml = wb.DocumentText;
}
// helpers
//instead of checking readState . we get state from DocumentCompleted Event via bool value
bool webbrowserDocumentCompleted = false;
public static void waitWebBrowserToComplete(WebBrowser wb)
{
while (!webbrowserDocumentCompleted )
Application.DoEvents();
webbrowserDocumentCompleted = false;
}
form_load(){
wb.DocumentCompleted += (o, e) => {
webbrowserDocumentCompleted = true;
};
}

Categories

Resources