WebBrowser.Document.Body is always null - c#

I have a WebBrowser document set to be in edit mode. I am trying to manipulate the inner text of the body element by using WebBrowser.Document.Body.InnerText, however, WebBrowser.Document.Body remains null.
Here is the code where I create the document contents:
private WebBrowser HtmlEditor = new WebBrowser();
public HtmlEditControl()
{
InitializeComponent();
HtmlEditor.DocumentText = "<html><body></body></html>";
myDoc = (IHTMLDocument2)HtmlEditor.Document.DomDocument;
myDoc.designMode = "On";
HtmlEditor.Refresh(WebBrowserRefreshOption.Completely);
myContentsChanged = false;
}
I can edit code and everything fine, but I don't understand why HtmlEditor.Document.Body remains null. I know I could always just reset the document body whenever I need to load text into the form, but I would prefer to understand why this is behaving the way it is, if nothing else then for the knowledge.
Any help on this is greatly appreciated.

You have to wait for the Web Browser's DocumentCompleted event to fire for the DomDocument.Body to not be null. I just tested this to verify. I suppose the question still remains: how are you able to edit through the underlying COM interface when the document has not completely loaded?
I checked to see if the IHTMLDocument2 pointers were the same in DocumentCompleted and the constructor. They are, which might indicate that the underlying COM object reuses a single HTML document object. It seems like any changes you make in the constructor at least have a pretty good chance of getting overwritten or throwing an exception.
For example, if I do this in the constructor, I get an error:
IHTMLDocument2 p1 = (IHTMLDocument2) HTMLEditor.Document.DomDocument;
p1.title = "Hello world!";
If I do the same in a DocumentCompleted handler, it works fine.
Hope this helps. Thanks.

Use DocumentCompleted event first, it occurs when the WebBrowser control finishes loading a document:
public HtmlEditControl()
{
InitializeComponent();
HtmlEditor.DocumentText = "<html><body></body></html>";
HtmlEditor.DocumentCompleted += HtmlEditorDocumentCompleted;
}
void HtmlEditorDocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
{
myDoc = (IHTMLDocument2)((WebBrowser)sender).Document.DomDocument;
myDoc.designMode = "On";
HtmlEditor.Refresh(WebBrowserRefreshOption.Completely);
myContentsChanged = false;
}
or simple way:
public HtmlEditControl()
{
InitializeComponent();
HtmlEditor.DocumentText = "<html><body></body></html>";
HtmlEditor.DocumentCompleted += (sender, e) =>
{
myDoc = (IHTMLDocument2) HtmlEditor.Document.DomDocument;
myDoc.designMode = "On";
HtmlEditor.Refresh(WebBrowserRefreshOption.Completely);
myContentsChanged = false;
};
}

You need to let the WebBrowser control to work alone a bit to give it some time to set the Document.Body property.
I do that by calling Application.DoEvents();.
For instance in your code:
private WebBrowser HtmlEditor = new WebBrowser();
public HtmlEditControl()
{
InitializeComponent();
HtmlEditor.DocumentText = "<html><body></body></html>";
// Let's leave the WebBrowser control working alone.
while (HtmlEditor.Document.Body == null)
{
Application.DoEvents();
}
myDoc = (IHTMLDocument2)HtmlEditor.Document.DomDocument;
myDoc.designMode = "On";
HtmlEditor.Refresh(WebBrowserRefreshOption.Completely);
myContentsChanged = false;
}

if (HtmlEditor.Document.Body == null)
{
HtmlEditor.Document.OpenNew(false).Write(#"<html><body><div id=""editable""></div></body></html>");
}
HtmlEditor.Document.Body.SetAttribute("contentEditable", "true");

Related

WebBrowser.Document is null on return from thread, not updating in new thread

public static User registerUser()
{
Uri test = new Uri("https://www.example.com/signup");
HtmlDocument testdoc = runBrowserThread(test);
string tosend = "test";
User user = new User();
user.apikey = tosend;
return user;
}
public static HtmlDocument runBrowserThread(Uri url)
{
HtmlDocument value = null;
var th = new Thread(() =>
{
var br = new WebBrowser();
br.DocumentCompleted += browser_DocumentCompleted;
br.Navigate(url);
value = br.Document;
Application.Run();
});
th.SetApartmentState(ApartmentState.STA);
th.Start();
th.Join(8000);
return value;
}
static void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var br = sender as WebBrowser;
if (br.Url == e.Url)
{
Console.WriteLine("Natigated to {0}", e.Url);
Console.WriteLine(br.Document.Body.InnerHtml);
System.Console.ReadLine();
Application.ExitThread(); // Stops the thread
}
}
I am trying to scan this page, and while it does get the HTML it does not pass it back in to the function call, but instead sends back null (I presume that is post processing).
How can I make it so that the new thread passes back its result?
There are several problems with your approach.
You're not waiting till the webpage is navigated, I mean till Navigated event. So document could be null till then.
You're quitting after 8 seconds, if page takes more than 8 seconds to load you won't get the document.
If document isn't properly loaded, you're leaving the thread alive.
I guess WebBrowser control will not work as expected unless you add it into a form and show it(it needs to be visible in screen).
Etc..
Don't mix up things. Your goal can't be to use WebBrowser. If you need to just download the string from webpage, use HttpClient.GetStringAsync.
Once you get the page as string format, If you want to manipulate the html, use HtmlAgilityPack.
Moved over to using WaitN instead of the default browser model. A bit buggy but now works like it should do.
using (var browser = new FireFox("https://www.example.com/signup"))
{
browser.GoTo("https://example.com/signup");
browser.WaitForComplete();
}

Supress the "are you sure you want to leave this page" popup in the .NET webbrowser control

I have a web browser automation project written in WinForms C#.
During the navigation there is a point where the browser does the "are you sure you want to leave this page?" popup.
We need this popup, so I cannot remove it from the website code, which means I have to override it in my automation app.
Does anyone have an idea how to do this?
and here was the smooth solution..
add a reference to mshtml and add using mshtml;
Browser.Navigated +=
new WebBrowserNavigatedEventHandler(
(object sender, WebBrowserNavigatedEventArgs args) => {
Action<HtmlDocument> blockAlerts = (HtmlDocument d) => {
HtmlElement h = d.GetElementsByTagName("head")[0];
HtmlElement s = d.CreateElement("script");
IHTMLScriptElement e = (IHTMLScriptElement)s.DomElement;
e.text = "window.alert=function(){};";
h.AppendChild(s);
};
WebBrowser b = sender as WebBrowser;
blockAlerts(b.Document);
for (int i = 0; i < b.Document.Window.Frames.Count; i++)
try { blockAlerts(b.Document.Window.Frames[i].Document); }
catch (Exception) { };
}
);
Are you able to make any changes to the website code?
If so, you might look at exposing an object through ObjectForScripting, then having the website code check window.external (and possibly interrogating your object) before it decides to display the popup - so if it can't find your object, it assumes it's being used normally and shows it.
Don't need add anymore. Try it. Work like a charm. ^_^
private void webNavigated(object sender, WebBrowserNavigatedEventArgs e)
{
HtmlDocument doc = webBrowser.Document;
HtmlElement head = doc.GetElementsByTagName("head")[0];
HtmlElement s = doc.CreateElement("script");
s.SetAttribute("text", "function cancelOut() { window.onbeforeunload = null; window.alert = function () { }; window.confirm=function () { }}");
head.AppendChild(s);
webBrowser.Document.InvokeScript("cancelOut");
}

How to make WebBrowser wait till it loads fully?

I have a C# form with a web browser control on it.
I am trying to visit different websites in a loop.
However, I can not control URL address to load into my form web browser element.
This is the function I am using for navigating through URL addresses:
public String WebNavigateBrowser(String urlString, WebBrowser wb)
{
string data = "";
wb.Navigate(urlString);
while (wb.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
data = wb.DocumentText;
return data;
}
How can I make my loop wait until it fully loads?
My loop is something like this:
foreach (string urlAddresses in urls)
{
WebNavigateBrowser(urlAddresses, webBrowser1);
// I need to add a code to make webbrowser in Form to wait till it loads
}
Add This to your code:
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
Fill in this function
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) {
//This line is so you only do the event once
if (e.Url != webBrowser1.Url)
return;
//do you actual code
}
After some time of anger of the crappy IE functionality I've came across making something which is the most accurate way to judge page loaded complete.
Never use the WebBrowserDocumentCompletedEventHandler event
use WebBrowserProgressChangedEventHandler with some modifections seen below.
//"ie" is our web browser object
ie.ProgressChanged += new WebBrowserProgressChangedEventHandler(_ie);
private void _ie(object sender, WebBrowserProgressChangedEventArgs e)
{
int max = (int)Math.Max(e.MaximumProgress, e.CurrentProgress);
int min = (int)Math.Min(e.MaximumProgress, e.CurrentProgress);
if (min.Equals(max))
{
//Run your code here when page is actually 100% complete
}
}
Simple genius method of going about this, I found this question googling "How to sleep web browser or put to pause"
According to MSDN (contains sample source) you can use the DocumentCompleted event for that. Additional very helpful information and source that shows how to differentiate between event invocations can be found here.
what you experiencend happened to me . readyStete.complete doesnt work in some cases. here i used bool in document_completed to check state
button1_click(){
//go site1
wb.Navigate("site1.com");
//wait for documentCompleted before continue to execute any further
waitWebBrowserToComplete(wb);
// set some values in html page
wb.Document.GetElementById("input1").SetAttribute("Value", "hello");
// then click submit. (submit does navigation)
wb.Document.GetElementById("formid").InvokeMember("submit");
// then wait for doc complete
waitWebBrowserToComplete(wb);
var processedHtml = wb.Document.GetElementsByTagName("HTML")[0].OuterHtml;
var rawHtml = wb.DocumentText;
}
// helpers
//instead of checking readState . we get state from DocumentCompleted Event via bool value
bool webbrowserDocumentCompleted = false;
public static void waitWebBrowserToComplete(WebBrowser wb)
{
while (!webbrowserDocumentCompleted )
Application.DoEvents();
webbrowserDocumentCompleted = false;
}
form_load(){
wb.DocumentCompleted += (o, e) => {
webbrowserDocumentCompleted = true;
};
}

Get HtmlDocument after javascript manipulations

In C#, using the System.Windows.Forms.HtmlDocument class (or another class that allows DOM parsing), is it possible to wait until a webpage finishes its javascript manipulations of the HTML before retrieving that HTML? Certain sites add innerhtml to pages through javascript, but those changes do not show up when I parse the HtmlElements of the HtmlDocument.
One possibility would be to update the HtmlDocument of the page after a second. Does anybody know how to do this?
Someone revived this question by posting what I think is an incorrect answer. So, here are my thoughts to address it.
Non-deterministically, it's possible to get close to finding out if the page has finished its AJAX stuff. However, it completely depends on the logic of that particular page: some pages are perpetually dynamic.
To approach this, one can handle DocumentCompleted event first, then asynchronously poll the WebBrowser.IsBusy property and monitor the current HTML snapshot of the page for changes, like below.
The complete sample can be found here.
// get the root element
var documentElement = this.webBrowser.Document.GetElementsByTagName("html")[0];
// poll the current HTML for changes asynchronosly
var html = documentElement.OuterHtml;
while (true)
{
// wait asynchronously, this will throw if cancellation requested
await Task.Delay(500, token);
// continue polling if the WebBrowser is still busy
if (this.webBrowser.IsBusy)
continue;
var htmlNow = documentElement.OuterHtml;
if (html == htmlNow)
break; // no changes detected, end the poll loop
html = htmlNow;
}
In general aswer is "no" - unless script on the page notifies your code in some way you have to simply wait some time and grab HTML. Waiting a second after document ready notification likley will cover most sites (i.e. jQuery's $(code) cases).
You need to give the application a second to process the Java. Simply halting the current thread will delay the java processing as well so your doc will still come up outdated.
WebBrowserDocumentCompletedEventArgs cachedLoadArgs;
private void TimerDone(object sender, EventArgs e)
{
((Timer)sender).Stop();
respondToPageLoaded(cachedLoadArgs);
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
cachedLoadArgs = e;
System.Windows.Forms.Timer timer = new Timer();
int interval = 1000;
timer.Interval = interval;
timer.Tick += new EventHandler(TimerDone);
timer.Start();
}
What about using 'WebBrowser.Navigated' event?
I made with WEbBrowser take a look at my class:
public class MYCLASSProduct: IProduct
{
public string Name { get; set; }
public double Price { get; set; }
public string Url { get; set; }
private WebBrowser _WebBrowser;
private AutoResetEvent _lock;
public void Load(string url)
{
_lock = new AutoResetEvent(false);
this.Url = url;
browserInitializeBecauseJavascriptLoadThePage();
}
private void browserInitializeBecauseJavascriptLoadThePage()
{
_WebBrowser = new WebBrowser();
_WebBrowser.DocumentCompleted += webBrowser_DocumentCompleted;
_WebBrowser.Dock = DockStyle.Fill;
_WebBrowser.Name = "webBrowser";
_WebBrowser.ScrollBarsEnabled = false;
_WebBrowser.TabIndex = 0;
_WebBrowser.Navigate(Url);
Form form = new Form();
form.Hide();
form.Controls.Add(_WebBrowser);
Application.Run(form);
_lock.WaitOne();
}
private void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlAgilityPack.HtmlDocument hDocument = new HtmlAgilityPack.HtmlDocument();
hDocument.LoadHtml(_WebBrowser.Document.Body.OuterHtml);
this.Price = Convert.ToDouble(hDocument.DocumentNode.SelectNodes("//td[#class='ask']").FirstOrDefault().InnerText.Trim());
_WebBrowser.FindForm().Close();
_lock.Set();
}
if your trying to do this in a console application, you need to put this tag above your main, because Windows needs to communicate with COM Components:
[STAThread]
static void Main(string[] args)
I did not like this solution, But I think that is no one better!

IE Instance, DocumentCompleted Executing Too Soon

I create an instance of IE outside my program, which the program finds and attaches to correctly. I set up my event handler and tell the program to advance to the login screen. The DocumentCompleted handle is supposed to fire when the web page is completely loaded, but mine seems to be firing before the new page has appeared.. The handle only fires once (meaning there is only one frame?).
This code executes fine if I modify it to work straight from the login page also.. Am I doing something wrong? Thanks for any assistance :)
Process.Start(#"IESpecial.exe");
SHDocVw.ShellWindows allBrowsers = new SHDocVw.ShellWindows();
while (true)
{
foreach (SHDocVw.WebBrowser ie in allBrowsers)
{
if (ie.LocationURL == "http://website/home.asp")
{
loggingIn = true;
webBrowser = ie;
webBrowser.DocumentComplete += new SHDocVw.DWebBrowserEvents2_DocumentCompleteEventHandler(webBrowser1_DocumentCompleted);
webBrowser.Navigate("http://website/logon.asp");
return;
}
}
Thread.Sleep(10);
}
}
private void webBrowser1_DocumentCompleted(object pDisp, ref object URL)
{
//we are attempting to log in
if (loggingIn)
{
mshtml.HTMLDocumentClass doc = (mshtml.HTMLDocumentClass)webBrowser.Document;
mshtml.HTMLWindow2 window = (mshtml.HTMLWindow2)doc.IHTMLDocument2_parentWindow;
doc.getElementById("Username").setAttribute("value", "MLAPAGLIA");
doc.getElementById("Password").setAttribute("value", "PASSWORD");
window.execScript("SubmitAction()", "javascript");
loggingIn = false;
return;
}

Categories

Resources