load page into the webBrowser and grab HTML - c#

I need to load the page into the webBrowser, wait for this page to load (including the ajax) and then grab the HTML of that page.
I tried this, but it seems to be not working as intended. Any help would be great!
WebBrowser webBrowser = new WebBrowser();
webBrowser.Navigate("http://www.mysite.com");
String htmldoc = webBrowser.DocumentText;

Subscribe to DocumentCompleted...
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
string htmldoc = webBrowser.Document.Body.InnerHtml;
}
That should do the trick.

Related

Why the Login web page isn't showing in webBrowser?

I am using C# to login to a local web page.
I am using webBrowser in order to display the page after the log.
First, I navigate to page then I fill the username & password then I invoke a click.The element to be clicked is recognized; so I assume that the click happened. But the result page isn't showing, nothing appears when I execute.
I tried this:
public WebBrowser webBrowser;
public MainWindow()
{
InitializeComponent();
webBrowser = new WebBrowser();
webBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(LoginEvent);
webBrowser.AllowNavigation = true;
webBrowser.Navigate("http://192.168.1.100/login.html");
}
private void LoginEvent(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser webBrowser = sender as WebBrowser;
//To execute the event just one time
webBrowser.DocumentCompleted -= LoginEvent;
//load page's document
HtmlDocument doc = webBrowser.Document;
doc.GetElementById("u").SetAttribute("value", "admin");
doc.GetElementById("pw").SetAttribute("value", "123456");
foreach (HtmlElement elem in doc.GetElementsByTagName("a"))
{
elem.InvokeMember("click");
}
}
Can anyone help me please to figure why the page isn't showing?
1) Your WebBrowser object is a local variable in your MainWindow() constructor.
This object is being deposed once the MainWindow constructor ends.
You need to declare the WebBrowser object as a class member.
2) There might be a multiple DocumentComplete events being fired. You need to filter out all iFrame events and wait before the page being fully loaded:
private void LoginEvent(object sender, WebBrowserDocumentCompletedEventArgs e)
{
// filter out non main documents
if (e.Url.AbsolutePath != (sender as WebBrowser).Url.AbsolutePath)
return;
//To execute the event just one time
webBrowser.DocumentCompleted -= LoginEvent;
//load page's document
HtmlDocument doc = webBrowser.Document;
doc.GetElementById("u").SetAttribute("value", "admin");
doc.GetElementById("pw").SetAttribute("value", "123456");
foreach (HtmlElement elem in doc.GetElementsByTagName("a"))
{
elem.InvokeMember("click");
}
}

How to get Pop up window control item from c# page

I am working for a screen scrapping application from windows application
I can automatically navigate through login page and all the pages using the we browser methods and sometimes having to use the '.Click' to trigger buttons on some of the pages.
Here's the problem. When I do the final 'click' to get my data, web browser opens up a new explorer window(pop up windows) that contains the another link button and I have to do click on this link button using c# to get my final data.
How can I access the new window(pop up window) to scrape it?
I am using below code and this code open the URL in new pop up window.
HtmlElement toollinkbutton = WebBrowser1.Document.Window.Document.Body.Document.GetElementsByTagName("a")[48];
toollinkbutton .InvokeMember("click");
The new window may be due to target="_blank" or javascript and using InvokeMember will result in the new window opening. Add a handler to the WebBrowser control NewWindow event and handle the click by calling Navigate() instead.
private string url = "";
public Form1()
{
InitializeComponent();
WebBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
WebBrowser1.NewWindow += new System.ComponentModel.CancelEventHandler(webBrowser1_NewWindow);
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection links = WebBrowser1.Document.Links;
foreach (HtmlElement var in links)
{
var.AttachEventHandler("onclick", LinkClicked);
}
}
private void LinkClicked(object sender, EventArgs e)
{
HtmlElement link = WebBrowser1.Document.ActiveElement;
url = link.GetAttribute("href");
}
void webBrowser1_NewWindow(object sender, System.ComponentModel.CancelEventArgs e)
{
WebBrowser webBrowser = (WebBrowser)sender;
HtmlElement link = webBrowser.Document.ActiveElement;
Uri urlNavigated = new Uri(link.GetAttribute("href"));
WebBrowser1.Navigate(url);
e.Cancel = true;
}

How to show an HTML string in WebBrowser control when the address is "about:blank"

I would like to write a specific string (e.g. a help instuction) in the webbrowser control when it navigates to "about:blank", I can write my string in the Form_Load using DocumentText and it automatically navigates to "about:blank"
webBrowser1.DocumentText = "introduction....";
But now if the user refreshes the webbrowser control it shows a blank page. I would like it again shows my string whenever the address is "about:blank". Where is the best place to put my string into the webbrowser control?
A document Refresh simply reloads the current page, so the Navigating, Navigated, and DocumentCompleted events do not occur when you call the Refresh method.
Using Navigating or Navigated event you should check if the browser is navigating or navigated to about:blank then disable the ways that user can refresh page, including browser shortcuts, browser context menu or any other point like custom toolbar buttons and context menus you created or refresh.
For other urls, enable them again.
private void webBrowser1_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{
var state = (e.Url.ToString().ToLower() == "about:blank");
this.webBrowser1.WebBrowserShortcutsEnabled = !state;
this.webBrowser1.IsWebBrowserContextMenuEnabled = !state;
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var content = "Custom Content";
if (e.Url.ToString().ToLower() == "about:blank" &&
this.webBrowser1.DocumentText != content)
{
this.webBrowser1.DocumentText = content;
}
}

Browser control NavigateToString display HTML code instead of rendering page

I am developing a browser app using Windows Phone 8 browser control.
The app download an external webpage using WebClient into a string in the background. Then the browser navigate to the content using
webBrowser.NavigateToString(str);
However, instead of rendering the page, the browser shows the HTML code. I thought since no changes were made to the string, NavigateToString should handle it seamlessly. Or perhaps I am missing something.
So how do I display the HTML page instead of its code?
EDIT
Here's some of my code
webClient = new WebClient();
webClient.DownloadStringCompleted += new DownloadStringCompletedEventHandler(webClient_DownloadStringCompleted);
webClient.DownloadStringAsync(new Uri(uri));
private void webClient_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
PageString = e.Result;
}
...
webBrowser.NavigateToString(PageString);
This is an issue with Windows Phone 8.
Here you have a workaround.
When you use DownloadStringAsync, it also downloads the DOCTYPE declaration. You can remove this and start your code with the <html> block as NavigateToString doesn't seem to like the <!DOCTYPE HTML> declaration.
webClient = new WebClient();
webClient.DownloadStringCompleted += webClient_DownloadStringCompleted;
webClient.DownloadStringAsync(new Uri(uri));
void webClient_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
//remove "<!DOCTYPE HTML>"
PageString = e.Result.Replace("<!DOCTYPE HTML>","").Trim();
}
webBrowser.NavigateToString(PageString);
Documentation for WebBrowser.NavigateToString says:
If the text parameter is not in valid HTML format, it will be displayed as plain text.
Can you check if str is in valid HTML format?
private void webClient_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
PageString = e.Result;
webBrowser.NavigateToString(PageString);
}
Another Way:
wb.Navigate("");
do
{
Application.DoEvents();
} while ((wb.ReadyState != WebBrowserReadyState.Complete));
wb.Document.Body.InnerHtml = "Html";

Printing WebBrowser control content

I'm absolutely new to printing in .NET. I would like to print a page that is displayed in WebBrowser control. How do I do that?
MSDN has an article about this, however their code example demonstrates how use the WebBrowser control to print a Web page without displaying it. :
How to: Print with a WebBrowser Control
The c# code:
private void PrintHelpPage()
{
// Create a WebBrowser instance.
WebBrowser webBrowserForPrinting = new WebBrowser();
// Add an event handler that prints the document after it loads.
webBrowserForPrinting.DocumentCompleted +=
new WebBrowserDocumentCompletedEventHandler(PrintDocument);
// Set the Url property to load the document.
webBrowserForPrinting.Url = new Uri(#"\\myshare\help.html");
}
private void PrintDocument(object sender,
WebBrowserDocumentCompletedEventArgs e)
{
// Print the document now that it is fully loaded.
((WebBrowser)sender).Print();
// Dispose the WebBrowser now that the task is complete.
((WebBrowser)sender).Dispose();
}

Categories

Resources