Get source of Webpage in Webbrowser C# - c#

Currently I have a website loaded in a WebBrowser component, which keeps changing stuff inside a certain <a> inside the page. In order for me to get the data, I have to create another WebRequest each 5 seconds, just to refresh the data (I think they're called dynamic pages). I've tried fetching the data from the WebBrowser (WebBrowser.DocumentText), but the value just remained the same, even though I am pretty sure it changed, because I can see it changed. I think the webrequest each 5 seconds takes up unnecesary memory space, and that this can be done easier.
Do you guys maybe know a way for me to do this?

Guessing at Winforms. You'll want to use the Document property to read back the DOM. Here's an example. Start a new Winforms project and drop a WebBrowser on the form. Then a label and a timer. Make the code look like this:
public partial class Form1 : Form {
public Form1() {
InitializeComponent();
webBrowser1.Url = new Uri("http://stackoverflow.com/questions/10781011/get-source-of-webpage-in-webbrowser-c-sharp");
webBrowser1.DocumentCompleted += webBrowser1_DocumentCompleted;
timer1.Interval = 100;
timer1.Tick += new EventHandler(timer1_Tick);
}
void timer1_Tick(object sender, EventArgs e) {
var elem = webBrowser1.Document.GetElementById("wmd-input");
label1.Text = elem.InnerText;
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) {
timer1.Enabled = true;
}
}
The browser will navigate to your question. Type something in the Answer box, note how the label displays what you typed.
You'll need to tweak this code to work with your specific web page, alter the "wmd-input" element name. Use a DOM inspection tool to find the name. I like Firebug.

You could try to get the source via JavaScript.
Use the InvokeScript method to execute return document.documentElement.outerHTML;
This will return an Object which you should be able to type cast to a String.

Related

C# winforms webbrowser not going to url's asked for

I was asked by a friend to develop a winform app to be able to extract data. I figured it would be easy enough - how wrong I was!
In my winform, I have included a webbrowser control and some buttons. The URL for the webbrowser is http://www.racingpost.com/greyhounds/card.sd and as you can imagine, it is the place to get data for greyhounds. When on the page above, there are a number of links within this area which are specific to a race time. If you click on any of these, it takes you to that race, and its this data that I need to extract. So, my initial thoughts were to get ALL links off the link above, then store them in a list, then just have a button available to take in whatever link it is, and then take the webbrowser to that location. Once there, I can then look to extract the data and store it as needed.
So, in the first instance, I use
//url = link above
wb1.Url = new Uri(url);
grab the data (which are links for each race on that day)
once I have this, use a further button to go to the specific race
wb1.Url = new Uri("http://www.racingpost.com/greyhounds/card.sd#resultday=2015-01-17&raceid=1344640");
then, once there, click another button to capture the data, after which, return to the original link above.
The problem is, it will not go to the location present in the link. BUT, if I click the link manually within the webbrowser, it goes there no problem.
I have looked at the properties of the webbrowser, and these all look fine - although I can't qualify that tbh!
I know if I try to go to the links manually, I can, but if I try to do it through code, it just wont budge. I can only assume I have done something wrong in the code.
Hope some of that makes sense - first posting, so apologies if I made a mess of it. I will provide all code no problem, but cant seem to figure out how to post the code in 'code format'?
//here is the code
public partial class Form1 : Form
{
Uri _url;
public Form1()
{
InitializeComponent();
wb1.Url = new Uri("http://www.racingpost.com/greyhounds/card.sd");
wb1.Navigated +=new WebBrowserNavigatedEventHandler(wb1_Navigated);
}
classmodules.trackUrl tu;
private void btnGrabData_Click(object sender, EventArgs e)
{
classmodules.utility u = new classmodules.utility();
rtb1.Text = u.GetWebData("http://www.racingpost.com/greyhounds/card.sd");
HtmlDocument doc = wb1.Document;
string innerText = (((mshtml.HTMLDocument)(doc.DomDocument)).documentElement).outerHTML;
innerText = Regex.Replace(innerText, #"\r\n?|\n", "");
rtb1.Text = innerText;
tu = new classmodules.trackUrl();
u.splitOLs(ref tu, innerText);
classmodules.StaticUtils su = new classmodules.StaticUtils();
su.SerializeObject(tu, typeof(classmodules.trackUrl)).Save(#"d:\dogsUTL.xml");
classmodules.ExcelProcessor xl = new classmodules.ExcelProcessor();
xl.createExcel(tu);
}
private void wb1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb1 = sender as WebBrowser;
this.Text = wb1.Url.ToString();
}
void wb1_Navigated(object sender, WebBrowserNavigatedEventArgs e)
{
_url = e.Url;
}
private void btnGoBack_Click(object sender, EventArgs e)
{
goBack();
}
private void goBack()
{
wb1.Url = new Uri("http://www.racingpost.com/greyhounds/card.sd");
}
private void btnGetRaceData_Click(object sender, EventArgs e)
{
HtmlDocument doc = wb1.Document;
string innerText = (((mshtml.HTMLDocument)(doc.DomDocument)).documentElement).outerHTML;
rtb2.Text = innerText;
}
//###############################
//OK, here is the point where I want to take in the URL and click a button //to instruct the webbrowser to go to that location. I add an initial //counter to 0, and then get the first url from the list, increment the //counter, then when I click the button again, urlNo wil be 1, so then it //tries the second url
int urlNo = 0;
private void btnUseData_Click(object sender, EventArgs e)
{
if (tu.race.Count > urlNo)
{
string url = tu.race[urlNo].url;
wb1.Url = new Uri(url);
lblUrl.Text = url;
urlNo++;
}
else
{
lblUrl.Text = "No More";
}
}
Did you try the Navigate(...) method? In theory, the behavior of Navigate and Url is the same, but I can infer that they behave a bit different.
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.navigate(v=vs.110).aspx

WebBrowser control - wait for page loading after submit form

I am new to c# and its concepts, so i am sorry if this question is kind of dumb.
I try to do some automatation using the winforms webbrowser control
elements = webBrowser1.Document.GetElementsByTagName("input");
foreach (HtmlElement element in elements)
{
if (element.GetAttribute("value") == "Anzeigen")
element.InvokeMember("click");
}
while (webBrowser1.ReadyState != WebBrowserReadyState.Complete) Application.DoEvents();
// do some math on recived html
// ......
// show results
MessageBox.Show(numPlanets.ToString() );
So, to explain it:
Im looking for a Button with the value "Anzeigen", simulate a click on it, then wait till NEW page is loaded and do my calculations then.
Unfortunately my calculations are done on the OLD HTML content, because its not waiting for the page to load. Strangely if i enter an Thread.Sleep(5000); after the foreach loop, this Sleep is executed BEFORE the click is simulated, and the calculation fails also.
I just need some synchronous behavior for that click, withouth using an event.
Hope u can help me with that, sorry for my bad english
EDIT:
Solved it like this:
Variable bool webbbrowserfinished = false inside the class, if i want synchronous behavior, i do it like this:
webbrowserfinished = false;
// do navigation here
while (!webbrowserfinished)
{
Application.DoEvents();
Thread.Sleep(100);
}
webbrowserfinished = false;
You can try WebBrowser.DocumentCompleted Event
Occurs when the WebBrowser control finishes loading a document.
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.Navigate("google.com");
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
MessageBox.Show("Completed Now!");
}
Well it isn't the best of solutions but you could always start a timer when the web browser navigates, and set timer1.Tick to something like 3000, then within the timer1_Tick method you can do your calculations on the new page, and then timer1.Stop();.
There is probably a better solution using events but I'm not too good with web browsers myself.
You can use Thread.sleep(5000) to wait for your page to load, because if you don't, the navigate method will load a new doc into web browser control but it will not call document_complete_event_handler

Determining the page number of an inline element while using FlowDocumentPageViewer?

I have a FlowDocumentPageViewer control in my application that programmatically advances through each block and inline element in a FlowDocument (this is because it's part of a typing application and doing so gives visual cues which tell the user what to type). Each time I change the inline element I'm focused on, I want to check what page the inline element is on, and if it's not on the current page, to navigate to the page it is on.
If this is not possible, please suggest any alternate solutions.
Also, if it matters, every inline element I'm dealing with is a Run element.
Are you just trying to automatically navigate to the page? If so we don't need to know the page number and should be able to just use BringIntoView? I'm assuming you have a reference to the block?
The following code navigates to the page the 301st block is on when the button is pressed
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
this.Loaded += new RoutedEventHandler(MainWindow_Loaded);
}
void MainWindow_Loaded(object sender, RoutedEventArgs e)
{
FlowDocument fd = new FlowDocument();
for (int i = 0; i < 1000; i++)
{
fd.Blocks.Add(new Paragraph(new Run(i.ToString())));
}
view.Document = fd;
}
private void Button_Click(object sender, RoutedEventArgs e)
{
(view.Document as FlowDocument) .Blocks.Skip(300).First().BringIntoView();
}
}
If you really, really want the page number you could do:
var previousPage = view.MasterPageNumber;
(view.Document as FlowDocument) .Blocks.Skip(300).First().BringIntoView();
var pageOfControl = view.MasterPageNumber;
view.GoToPage(previousPage);
It didn't flicker or anything in the test app lol! My mate didn't like that though so he suggested:
var ddp = (DynamicDocumentPaginator)view.Document.DocumentPaginator;
var position = ddp.GetObjectPosition(document.Blocks.Skip(300).First());
var page = ddp.GetPageNumber(position);
Just be aware that it is 0 indexed as opposed to the "lol" method which starts at 1

How to find a control inside an asp.net calendar control

After adding a control in the dayrender event, is there a way to find the control later? I have tried
calendar.FindControl("lblSample")
but without success.
Here is some of my code to be more clear:
protected void calSample_DayRender(object sender, DayRenderEventArgs e)
{
Label lblSample = new Label();
lblSample.ID = "lblSample";
lblSample.Text = "Sample";
e.Cell.Controls.Add(lblSample);
}
After the day render event and the page loads completely, I have a link button event where I try and get the control back
protected void lbtnSave_Click(object sender, EventArgs e)
{
//Not working
Label lblSample = calSample.FindControl(lblSample);
//Also can't get to work, this was using Ross' suggestion and the recursive find function he wrote about. I'm probably just not using it correctly.
Label lblSample = ControlFinder.FindControl<Label>(calSample, "lblSample");
}
The issue was because the control was not added to the page until the dayrender method - meaning you could not get a reference to it on a post back. Using the Page.Request.Params collection the OP was able to grab the value out on the postback.
The problem is that the find control is not recursive and the control you want is probably inside another control.
This shows you how to make a recursive find control method that would help: http://stevesmithblog.com/blog/recursive-findcontrol/
Alternatively if you post the calendar controls code I can probably help you a bit more.
Ross
This answer is because of Ross' comment above showing me that I could use the Page.Request.Params to find the value I was after. It's not the cleanest solution but it works!
If you add a dropdownlist to a calendar control in the day render event
protected void calSample_DayRender(object sender, DayRenderEventArgs e)
{
DropDownList ddlSample = new DropDownList();
ddlSample.ID = "ddlSample";
ddlSample.DataSource = sampleDS;
ddlSample.DataBind();
e.Cell.Controls.Add(ddlSample);
}
You can get the selected value back like this, of course I need to put in more checks to verify that the dropdownlist exists, but you get the picture
protected void lbtnSave_Click(object sender, EventArgs e)
{
string sampleID = Page.Request.Params.GetValues("ddlSample")[0];
}

.NET C#: WebBrowser control Navigate() does not load targeted URL

I'm trying to programmatically load a web page via the WebBrowser control with the intent of testing the page & it's JavaScript functions. Basically, I want to compare the HTML & JavaScript run through this control against a known output to ascertain whether there is a problem.
However, I'm having trouble simply creating and navigating the WebBrowser control. The code below is intended to load the HtmlDocument into the WebBrowser.Document property:
WebBrowser wb = new WebBrowser();
wb.AllowNavigation = true;
wb.Navigate("http://www.google.com/");
When examining the web browser's state via Intellisense after Navigate() runs, the WebBrowser.ReadyState is 'Uninitialized', WebBrowser.Document = null, and it overall appears completely unaffected by my call.
On a contextual note, I'm running this control outside of a Windows form object: I do not need to load a window or actually look at the page. Requirements dictate the need to simply execute the page's JavaScript and examine the resultant HTML.
Any suggestions are greatly appreciated, thanks!
You should handle the WebBrowser.DocumentComplete event, once that event is raised you will have the Document etc.
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
private void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = sender as WebBrowser;
// wb.Document is not null at this point
}
Here is a complete example, that I quickly did in a Windows Forms application and tested.
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
WebBrowser wb = new WebBrowser();
wb.AllowNavigation = true;
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
wb.Navigate("http://www.google.com");
}
private void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = sender as WebBrowser;
// wb.Document is not null at this point
}
}
Edit: Here is a simple version of code that runs a window from a console application. You can of course go further and expose the events to the console code etc.
using System;
using System.Windows;
using System.Windows.Forms;
namespace ConsoleApplication1
{
class Program
{
[STAThread]
static void Main(string[] args)
{
Application.Run(new BrowserWindow());
Console.ReadKey();
}
}
class BrowserWindow : Form
{
public BrowserWindow()
{
ShowInTaskbar = false;
WindowState = FormWindowState.Minimized;
Load += new EventHandler(Window_Load);
}
void Window_Load(object sender, EventArgs e)
{
WebBrowser wb = new WebBrowser();
wb.AllowNavigation = true;
wb.DocumentCompleted += wb_DocumentCompleted;
wb.Navigate("http://www.bing.com");
}
void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
Console.WriteLine("We have Bing");
}
}
}
You probably need to host the control in a parent window. You can do this without breaking requirements by simply not showing the window that hosts the browser control by moving it off screen. It might also be useful for development to "see" that it does actually load something for testing, verification etc.
So try:
// in a form's Load handler:
WebBrowser wb = new WebBrowser();
this.Controls.Add(wb);
wb.AllowNavigation = true;
wb.Navigate("http://www.google.com/");
Also check to see what other properties are set on the WebBrowser object when you instantiate it via the IDE. E.g. create a Form, drop a browser control onto it and then check the form's designer file to see what code is generated. You might be missing some key property that needs to be set. I've discovered many-an-omission in my code in this way and also learned how to properly instantiate visual objects programmatically.
P.S. If you do use a host window, it should only be visible during development. You would hide in some manner for production.
Another approach:
You could go "raw" by tryiing something like this:
System.Net.WebClient wc = new System.Net.WebClient();
System.IO.StreamReader webReader = new System.IO.StreamReader(
wc.OpenRead("http://your_website.com"));
string webPageData = webReader.ReadToEnd();
...then RegEx or parse webPageData for what you need. Or do you need the jscript in the page to actually execute? (Which should be possible with .NET 4.0)
I had this problem, and I did not realize that I had uninstalled Internet Explorer. If you have, nothing will ever happen, since the WebBrowser control only instantiates IE.
The Webbrowser control is just a wrapper around Internet Explorer.
You can set in onto an invisible Windows Forms window to completely instantiate it.

Categories

Resources