Get HTML source code from CefSharp web browser

Get HTML source code from CefSharp web browser - c#

I am using aCefSharp.Wpf.ChromiumWebBrowser (Version 47.0.3.0) to load a web page. Some point after the page has loaded I want to get the source code.
I have called:
wb.GetBrowser().MainFrame.GetSourceAsync()
however it does not appear to be returning all the source code (I believe this is because there are child frames).
If I call:
wb.GetBrowser().MainFrame.ViewSource()
I can see it lists all the source code (including the inner frames).
I would like to get the same result as ViewSource(). Could some one point me in the right direction please?
Update – Added Code example
Note: The address the web browser is pointing too will only work up to and including 10/03/2016. After that it may display different data which is not what I would be looking at.
In the frmSelection.xaml file
<cefSharp:ChromiumWebBrowser Name="wb" Grid.Column="1" Grid.Row="0" />
In the frmSelection.xaml.cs file
public partial class frmSelection : UserControl
{
private System.Windows.Threading.DispatcherTimer wbTimer = new System.Windows.Threading.DispatcherTimer();
public frmSelection()
{
InitializeComponent();
// This timer will start when a web page has been loaded.
// It will wait 4 seconds and then call wbTimer_Tick which
// will then see if data can be extracted from the web page.
wbTimer.Interval = new TimeSpan(0, 0, 4);
wbTimer.Tick += new EventHandler(wbTimer_Tick);
wb.Address = "http://www.racingpost.com/horses2/cards/card.sd?race_id=644222&r_date=2016-03-10#raceTabs=sc_";
wb.FrameLoadEnd += new EventHandler<CefSharp.FrameLoadEndEventArgs>(wb_FrameLoadEnd);
}
void wb_FrameLoadEnd(object sender, CefSharp.FrameLoadEndEventArgs e)
{
if (wbTimer.IsEnabled)
wbTimer.Stop();
wbTimer.Start();
}
void wbTimer_Tick(object sender, EventArgs e)
{
wbTimer.Stop();
string html = GetHTMLFromWebBrowser();
}
private string GetHTMLFromWebBrowser()
{
// call the ViewSource method which will open up notepad and display the html.
// this is just so I can compare it to the html returned in GetSourceAsync()
// This is displaying all the html code (including child frames)
wb.GetBrowser().MainFrame.ViewSource();
// Get the html source code from the main Frame.
// This is displaying only code in the main frame and not any child frames of it.
Task<String> taskHtml = wb.GetBrowser().MainFrame.GetSourceAsync();
string response = taskHtml.Result;
return response;
}
}

I don't think I quite get this DispatcherTimer solution. I would do it like this:
public frmSelection()
{
InitializeComponent();
wb.FrameLoadEnd += WebBrowserFrameLoadEnded;
wb.Address = "http://www.racingpost.com/horses2/cards/card.sd?race_id=644222&r_date=2016-03-10#raceTabs=sc_";
}
private void WebBrowserFrameLoadEnded(object sender, FrameLoadEndEventArgs e)
{
if (e.Frame.IsMain)
{
wb.ViewSource();
wb.GetSourceAsync().ContinueWith(taskHtml =>
{
var html = taskHtml.Result;
});
}
}
I did a diff on the output of ViewSource and the text in the html variable and they are the same, so I can't reproduce your problem here.
This said, I noticed that the main frame gets loaded pretty late, so you have to wait quite a while until the notepad pops up with the source.

I was having the same issue trying to get click on and item located in a frame and not on the main frame. Using the example in your answer, I wrote the following extension method:
public static IFrame GetFrame(this ChromiumWebBrowser browser, string FrameName)
{
IFrame frame = null;
var identifiers = browser.GetBrowser().GetFrameIdentifiers();
foreach (var i in identifiers)
{
frame = browser.GetBrowser().GetFrame(i);
if (frame.Name == FrameName)
return frame;
}
return null;
}
If you have a "using" on your form for the module that contains this method you can do something like:
var frame = browser.GetFrame("nameofframe");
if (frame != null)
{
string HTML = await frame.GetSourceAsync();
}
Of course you need to make sure the page load is complete before using this, but I plan to use it a lot. Hope it helps!
Jim

Related

Winform WebBrowser Pass Cookie Then Process Links?

I asked this question a while ago but seems that there are no answers, so i tried to go with an alternative solution but i am stuck now, please see the following code:
WebBrowser objWebBrowser = new WebBrowser();
objWebBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(objWebBrowser_DocumentCompleted);
objWebBrowser.Navigate("http://www.website.com/login.php?user=xxx&pass=xxx");
objWebBrowser.Navigate("http://www.website.com/page.php?link=url");
And here is the event code:
WebBrowser objWebBrowser = (WebBrowser)sender;
String data = new StreamReader(objWebBrowser.DocumentStream).ReadToEnd();
Since it's impossible for me to use the WebBrowser.Document.Cookies before a document is loaded, i have first to navigate the login page, that will store a cookie automatically, but after that i want to call the other navigate in order to get a result. Now using the above code it doesn't work cause it always takes the second one, and it won't work for me to put it in the event cause what i want is like this:
Navigate with the login page and store cookie for one time only.
Pass a different url each time i want to get some results.
Can anybody give a solution ?
Edit:
Maybe the sample of code i provided was misleading, what i want is:
foreach(url in urls)
{
Webborwser1.Navigate(url);
//Then wait for the above to complete and get the result from the event, then continue
}

I think you want to simulate a blocking call to Navigate if you are not authorized. There are probably many ways to accomplish this and other approaches to get what you want, but here's some code I wrote up quickly that might help you get started.
If you have any questions about what I'm trying to do here, let me know. I admit it feels like "a hack" which makes me think there's a smarter solution, but anyway....
bool authorized = false;
bool navigated;
WebBrowser objWebBrowser = new WebBrowser();
void GetResults(string url)
{
if(!authorized)
{
NavigateAndBlockWithSpinLock("http://www.website.com/login.php?user=xxx&pass=xxx");
authorized = true;
}
objWebBrowser.Navigate(url);
}
void NavigateAndBlockWithSpinLock(string url)
{
navigated = false;
objWebBrowser.DocumentCompleted += NavigateDone;
objWebBrowser.Navigate(url);
int count = 0;
while(!navigated && count++ < 10)
Thread.Sleep(1000);
objWebBrowser.DocumentCompleted -= NavigateDone;
if(!navigated)
throw new Exception("fail");
}
void NavigateDone(object sender, WebBrowserDocumentCompletedEventArgs e)
{
navigated = true;
}
void objWebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if(authorized)
{
WebBrowser objWebBrowser = (WebBrowser)sender;
String data = new StreamReader(objWebBrowser.DocumentStream).ReadToEnd();
}
}

How to make WebBrowser wait till it loads fully?

I have a C# form with a web browser control on it.
I am trying to visit different websites in a loop.
However, I can not control URL address to load into my form web browser element.
This is the function I am using for navigating through URL addresses:
public String WebNavigateBrowser(String urlString, WebBrowser wb)
{
string data = "";
wb.Navigate(urlString);
while (wb.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
data = wb.DocumentText;
return data;
}
How can I make my loop wait until it fully loads?
My loop is something like this:
foreach (string urlAddresses in urls)
{
WebNavigateBrowser(urlAddresses, webBrowser1);
// I need to add a code to make webbrowser in Form to wait till it loads
}

Add This to your code:
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
Fill in this function
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) {
//This line is so you only do the event once
if (e.Url != webBrowser1.Url)
return;
//do you actual code
}

After some time of anger of the crappy IE functionality I've came across making something which is the most accurate way to judge page loaded complete.
Never use the WebBrowserDocumentCompletedEventHandler event
use WebBrowserProgressChangedEventHandler with some modifections seen below.
//"ie" is our web browser object
ie.ProgressChanged += new WebBrowserProgressChangedEventHandler(_ie);
private void _ie(object sender, WebBrowserProgressChangedEventArgs e)
{
int max = (int)Math.Max(e.MaximumProgress, e.CurrentProgress);
int min = (int)Math.Min(e.MaximumProgress, e.CurrentProgress);
if (min.Equals(max))
{
//Run your code here when page is actually 100% complete
}
}
Simple genius method of going about this, I found this question googling "How to sleep web browser or put to pause"

According to MSDN (contains sample source) you can use the DocumentCompleted event for that. Additional very helpful information and source that shows how to differentiate between event invocations can be found here.

what you experiencend happened to me . readyStete.complete doesnt work in some cases. here i used bool in document_completed to check state
button1_click(){
//go site1
wb.Navigate("site1.com");
//wait for documentCompleted before continue to execute any further
waitWebBrowserToComplete(wb);
// set some values in html page
wb.Document.GetElementById("input1").SetAttribute("Value", "hello");
// then click submit. (submit does navigation)
wb.Document.GetElementById("formid").InvokeMember("submit");
// then wait for doc complete
waitWebBrowserToComplete(wb);
var processedHtml = wb.Document.GetElementsByTagName("HTML")[0].OuterHtml;
var rawHtml = wb.DocumentText;
}
// helpers
//instead of checking readState . we get state from DocumentCompleted Event via bool value
bool webbrowserDocumentCompleted = false;
public static void waitWebBrowserToComplete(WebBrowser wb)
{
while (!webbrowserDocumentCompleted )
Application.DoEvents();
webbrowserDocumentCompleted = false;
}
form_load(){
wb.DocumentCompleted += (o, e) => {
webbrowserDocumentCompleted = true;
};
}

Filling Wrappanel with own usercontrol in code

I'm trying to create a page where you get an overview of the sponsors of the project, the data is fetched from the database with the following service:
[OperationContract]
public IEnumerable<Sponsor> getSponsors()
{
var query = (from p in dc.Sponsors select p);
IEnumerable<Sponsor> i = query;
return i;
}
When I put my breakpoint on the i I can see that the data is correctly in there.
In my Sponsorspage I do the following
public partial class Sponsorspage : UserControl
{
IEnumerable<Sponsor> sponsors = null;
public Sponsorspage()
{
SponsorsServiceClient client = new SponsorsServiceClient();
client.getSponsorsCompleted +=new EventHandler<getSponsorsCompletedEventArgs>(client_getSponsorsCompleted);
client.getSponsorsAsync();
InitializeComponent();
}
void client_getSponsorsCompleted(object sender, getSponsorsCompletedEventArgs e)
{
if (e.Error != null)
MessageBox.Show(e.Error.ToString());
else
{
sponsors = e.Result;
foreach (Sponsor s in sponsors)
{
SponsorView control = new SponsorView(s.tekst);
SLWrapPanel.Children.Add(control);
}
}
}
For each sponsor in the database, I create the Sponsorview to which I give the source and text. You can see the code for my Sponsorview here.
public partial class SponsorView : UserControl
{
public SponsorView(string tekst)
{
txtSponsor.Text = tekst;
//Uri uri = new Uri(imageSource, UriKind.Relative);
//ImageSource imgSource = new BitmapImage(uri);
//imgSponsor.Source = imgSource;
InitializeComponent();
}
}
But when I run the page, I get the following error:
Object reference not set to an instance of an object.
at OndernemersAward.Views.SponsorView..ctor(String tekst)
at OndernemersAward.Views.Sponsorspage.client_getSponsorsCompleted(Object sender, getSponsorsCompletedEventArgs e)
at OndernemersAward.SponsorsServiceReference.SponsorsServiceClient.OngetSponsorsCompleted(Object state)
What I'm trying to do is give information (here string tekst) from the sponsor s to my user control, which it then uses to fill a textblock. Am I doing this wrong or?
Thanks! :)

Well, you're trying to iterate over results that you are supposed to hold in sponsors variable. However, please note that you're calling asynchronus version (and the only one available in Silverlight, as I recall) of getSponsors method. What it means is, you will not get results immediately after calling service method, but instead you need to wait until event with completed execution will be called.
I don't know why such thing could create some problems with debug, but it's definitely error in code that could result in problems with showing the page.
Here is very simple example on how you should retrieve result from service. Hope this will help you notice error in your approach.

Get HtmlDocument after javascript manipulations

In C#, using the System.Windows.Forms.HtmlDocument class (or another class that allows DOM parsing), is it possible to wait until a webpage finishes its javascript manipulations of the HTML before retrieving that HTML? Certain sites add innerhtml to pages through javascript, but those changes do not show up when I parse the HtmlElements of the HtmlDocument.
One possibility would be to update the HtmlDocument of the page after a second. Does anybody know how to do this?

Someone revived this question by posting what I think is an incorrect answer. So, here are my thoughts to address it.
Non-deterministically, it's possible to get close to finding out if the page has finished its AJAX stuff. However, it completely depends on the logic of that particular page: some pages are perpetually dynamic.
To approach this, one can handle DocumentCompleted event first, then asynchronously poll the WebBrowser.IsBusy property and monitor the current HTML snapshot of the page for changes, like below.
The complete sample can be found here.
// get the root element
var documentElement = this.webBrowser.Document.GetElementsByTagName("html")[0];
// poll the current HTML for changes asynchronosly
var html = documentElement.OuterHtml;
while (true)
{
// wait asynchronously, this will throw if cancellation requested
await Task.Delay(500, token);
// continue polling if the WebBrowser is still busy
if (this.webBrowser.IsBusy)
continue;
var htmlNow = documentElement.OuterHtml;
if (html == htmlNow)
break; // no changes detected, end the poll loop
html = htmlNow;
}

In general aswer is "no" - unless script on the page notifies your code in some way you have to simply wait some time and grab HTML. Waiting a second after document ready notification likley will cover most sites (i.e. jQuery's $(code) cases).

You need to give the application a second to process the Java. Simply halting the current thread will delay the java processing as well so your doc will still come up outdated.
WebBrowserDocumentCompletedEventArgs cachedLoadArgs;
private void TimerDone(object sender, EventArgs e)
{
((Timer)sender).Stop();
respondToPageLoaded(cachedLoadArgs);
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
cachedLoadArgs = e;
System.Windows.Forms.Timer timer = new Timer();
int interval = 1000;
timer.Interval = interval;
timer.Tick += new EventHandler(TimerDone);
timer.Start();
}

What about using 'WebBrowser.Navigated' event?

I made with WEbBrowser take a look at my class:
public class MYCLASSProduct: IProduct
{
public string Name { get; set; }
public double Price { get; set; }
public string Url { get; set; }
private WebBrowser _WebBrowser;
private AutoResetEvent _lock;
public void Load(string url)
{
_lock = new AutoResetEvent(false);
this.Url = url;
browserInitializeBecauseJavascriptLoadThePage();
}
private void browserInitializeBecauseJavascriptLoadThePage()
{
_WebBrowser = new WebBrowser();
_WebBrowser.DocumentCompleted += webBrowser_DocumentCompleted;
_WebBrowser.Dock = DockStyle.Fill;
_WebBrowser.Name = "webBrowser";
_WebBrowser.ScrollBarsEnabled = false;
_WebBrowser.TabIndex = 0;
_WebBrowser.Navigate(Url);
Form form = new Form();
form.Hide();
form.Controls.Add(_WebBrowser);
Application.Run(form);
_lock.WaitOne();
}
private void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlAgilityPack.HtmlDocument hDocument = new HtmlAgilityPack.HtmlDocument();
hDocument.LoadHtml(_WebBrowser.Document.Body.OuterHtml);
this.Price = Convert.ToDouble(hDocument.DocumentNode.SelectNodes("//td[#class='ask']").FirstOrDefault().InnerText.Trim());
_WebBrowser.FindForm().Close();
_lock.Set();
}
if your trying to do this in a console application, you need to put this tag above your main, because Windows needs to communicate with COM Components:
[STAThread]
static void Main(string[] args)
I did not like this solution, But I think that is no one better!

How can I display multiple images in a loop in a WP7 app?

In my (Silverlight) weather app I am downloading up to 6 seperate weather radar images (each one taken about 20 mins apart) from a web site and what I need to do is display each image for a second then at the end of the loop, pause 2 seconds then start the loop again. (This means the loop of images will play until the user clicks the back or home button which is what I want.)
So, I have a RadarImage class as follows, and each image is getting downloaded (via WebClient) and then loaded into a instance of RadarImage which is then added to a collection (ie: List<RadarImage>)...
//Following code is in my radar.xaml.cs to download the images....
int imagesToDownload = 6;
int imagesDownloaded = 0;
RadarImage rdr = new RadarImage(<image url>); //this happens in a loop of image URLs
rdr.FileCompleteEvent += ImageDownloadedEventHandler;
//This code in a class library.
public class RadarImage
{
public int ImageIndex;
public string ImageURL;
public DateTime ImageTime;
public Boolean Downloaded;
public BitmapImage Bitmap;
private WebClient client;
public delegate void FileCompleteHandler(object sender);
public event FileCompleteHandler FileCompleteEvent;
public RadarImage(int index, string imageURL)
{
this.ImageIndex = index;
this.ImageURL = imageURL;
//...other code here to load in datetime properties etc...
client = new WebClient();
client.OpenReadCompleted += new OpenReadCompletedEventHandler(wc_OpenReadCompleted);
client.OpenReadAsync(new Uri(this.ImageURL, UriKind.Absolute));
}
private void wc_OpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
{
if (e.Error == null)
{
StreamResourceInfo sri = new StreamResourceInfo(e.Result as Stream, null);
this.Bitmap = new BitmapImage();
this.Bitmap.SetSource(sri.Stream);
this.Downloaded = true;
FileCompleteEvent(this); //Fire the event to let the app page know to add it to it's List<RadarImage> collection
}
}
}
As you can see, in the class above I have exposed an event handler to let my app page know when each image has downloaded. When they have all downloaded I then run the following code in my xaml page - but only the last image ever shows up and I can't work out why!
private void ImageDownloadedEventHandler(object sender)
{
imagesDownloaded++;
if (imagesDownloaded == imagesToDownload)
{
AllImagesDownloaded = true;
DisplayRadarImages();
}
}
private void DisplayRadarImages()
{
TimerSingleton.Timer.Stop();
foreach (RadarImage img in radarImages)
{
imgRadar.Source = img.Bitmap;
Thread.Sleep(1000);
}
TimerSingleton.Timer.Start(); //Tick poroperty is set to 2000 milliseconds
}
private void SingleTimer_Tick(object sender, EventArgs e)
{
DisplayRadarImages();
}
So you can see that I have a static instance of a timer class which is stopped (if running), then the loop should show each image for a second. When all 6 have been displayed then it pauses, the timer starts and after two seconds DisplayRadarImages() gets called again.
But as I said before, I can only ever get the last image to show for some reason and I can't seem to get this working properly.
I'm fairly new to WP7 development (though not to .Net) so just wondering how best to do this - I was thinking of trying this with a web browser control but surely there must be a more elegant way to loop through a bunch of images!
Sorry this is so long but any help or suggestions would be really appreciated.
Mike

You can use a background thread with either a Timer or Sleep to periodically update your image control.
Phạm Tiểu Giao - Threads in WP7
You'll need to dispatch updates to the UI with
Dispatcher.BeginInvoke( () => { /* your UI code */ } );

Why don't you add the last image twice to radarImages, set the Timer to 1000 and display just one image on each tick?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get HTML source code from CefSharp web browser - c#

Related

Winform WebBrowser Pass Cookie Then Process Links?

How to make WebBrowser wait till it loads fully?

Filling Wrappanel with own usercontrol in code

Get HtmlDocument after javascript manipulations

How can I display multiple images in a loop in a WP7 app?

Categories

Resources