I am using the GeckoFX 22 c# web browser control but cannot manage to access tags within an iframe. When I check the gecko innerhtml it seems that although the iframe tag shows in the html, the contents of it do not.
This is the code I used to get the inner html of the browser control which just shows the iframe tag as empty (when it should have another doc inside of it):
GeckoHtmlElement element = null;
var geckoDomElement = webBrowser.Document.DocumentElement;
if (geckoDomElement is GeckoHtmlElement)
{
element = (GeckoHtmlElement)geckoDomElement;
var innerHtml = element.InnerHtml;
}
Previously I used code similar to the code below to access individual elements which works fine:
GeckoDocument checkDoc = (GeckoDocument)webBrowser.Window.Document;
var x = (checkDoc.GetElementsByTagName("a").Where(b => b.Id == "ipt-form-format-aside").First());
I am able to get individual elements and change their values/trigger events etc without problems with the main html document but anything in an iframe is impossible to get the elements of. I think perhaps the Iframe has not been loaded yet or something like that. Is there a way to force the control to wait for the I frame to load before attempting to access its elements?
string content = null;
var iframe = webBrowser.Document.GetElementsByTagName("iframe").FirstOrDefault() as Gecko.DOM.GeckoIFrameElement;
if(iframe != null)
{
var html = iframe.ContentDocument.DocumentElement as GeckoHtmlElement;
if (html != null)
content = html.OuterHtml;
}
I'm just posting this for anyone else that might get this problem
foreach (GeckoIFrameElement _E in geckoWebBrowser1.Document.GetElementsByTagName("iframe"))
{
if (_E.GetAttribute("class") == "testClass")
{
var innerHTML = _E.ContentDocument;
foreach (GeckoHtmlElement _A in innerHTML.GetElementsByTagName("input"))
{
_A.SetAttribute("value", "Test");
}
}
}
I got a similar problem so i did this
checkDoc.Window.Frames(1)
instead of
checkDoc.GetElementsByTagName("iframe")
value within the parenthesis (i.e. 1 here) depends of your index
Related
I am trying to parse the number shown in this page:
https://www.edf.org/embed/methane-counters
I have tried WebBrowser, WebClient ... etc. with no good result. Every time I try something new, in the HTML returned I get this (HTML area where the number is shown):
<strong id=\"methane\"></strong>
... as you see there is no number between the 'strong' tags. Just in case, this is the latest code I have tried, that still do not work:
using (WebBrowser myWebBrowser = new WebBrowser()) {
myWebBrowser.ScriptErrorsSuppressed = true;
myWebBrowser.Navigate(myURL);
while ((myWebBrowser.ReadyState != WebBrowserReadyState.Complete))
Application.DoEvents();
myContent = myWebBrowser.Document.Body.InnerHtml;
myContent = myWebBrowser.DocumentText;
}
... neither of the last two calls returns the HTML with the number on it.
Any ideas on how to get the proper content of this page?
Background Info: I'm using an ItemCheckedIn receiver in SharePoint 2010, targeting .NET 3.5 Framework. The goal of the receiver is to:
Make sure the properties (columns) of the page match the data in a Content Editor WebPart on the page so that the page can be found in a search using Filter web parts. The pages are automatically generated, so barring any errors they are guaranteed to fit the expected format.
If there is a mismatch, check out the page, fix the properties, then check it back in.
I've kept the receiver from falling into an infinite check-in/check-out loop, although right now it's a very clumsy fix that I'm trying to work on. However, right now I can't work on it because I'm getting a DisconnectedContext error whenever I hit the UpdatePage function:
public override void ItemCheckedIn(SPItemEventProperties properties)
{
// If the main page or machine information is being checked in, do nothing
if (properties.AfterUrl.Contains("home") || properties.AfterUrl.Contains("machines")) return;
// Otherwise make sure that the page properties reflect any changes that may have been made
using (SPSite site = new SPSite("http://san1web.net.jbtc.com/sites/depts/VPC/"))
using (SPWeb web = site.OpenWeb())
{
SPFile page = web.GetFile(properties.AfterUrl);
// Make sure the event receiver doesn't get called infinitely by checking version history
...
UpdatePage(page);
}
}
private static void UpdatePage(SPFile page)
{
bool checkOut = false;
var th = new Thread(() =>
{
using (WebBrowser wb = new WebBrowser())
using (SPLimitedWebPartManager manager = page.GetLimitedWebPartManager(PersonalizationScope.Shared))
{
// Get web part's contents into HtmlDocument
ContentEditorWebPart cewp = (ContentEditorWebPart)manager.WebParts[0];
HtmlDocument htmlDoc;
wb.Navigate("about:blank");
htmlDoc = wb.Document;
htmlDoc.OpenNew(true);
htmlDoc.Write(cewp.Content.InnerText);
foreach (var prop in props)
{
// Check that each property matches the information on the page
string element;
try
{
element = htmlDoc.GetElementById(prop).InnerText;
}
catch (NullReferenceException)
{
break;
}
if (!element.Equals(page.GetProperty(prop).ToString()))
{
if (!prop.Contains("Request"))
{
checkOut = true;
break;
}
else if (!element.Equals(page.GetProperty(prop).ToString().Split(' ')[0]))
{
checkOut = true;
break;
}
}
}
if (!checkOut) return;
// If there was a mismatch, check the page out and fix the properties
page.CheckOut();
foreach (var prop in props)
{
page.SetProperty(prop, htmlDoc.GetElementById(prop).InnerText);
page.Item[prop] = htmlDoc.GetElementById(prop).InnerText;
try
{
page.Update();
}
catch
{
page.SetProperty(prop, Convert.ToDateTime(htmlDoc.GetElementById(prop).InnerText).AddDays(1));
page.Item[prop] = Convert.ToDateTime(htmlDoc.GetElementById(prop).InnerText).AddDays(1);
page.Update();
}
}
page.CheckIn("");
}
});
th.SetApartmentState(ApartmentState.STA);
th.Start();
}
From what I understand, using a WebBrowser is the only way to fill an HtmlDocument in this version of .NET, so that's why I have to use this thread.
In addition, I've done some reading and it looks like the DisconnectedContext error has to do with threading and COM, which are subjects I know next to nothing about. What can I do to prevent/fix this error?
EDIT
As #Yevgeniy.Chernobrivets pointed out in the comments, I could insert an editable field bound to the page column and not worry about parsing any html, but because the current page layout uses an HTML table within a Content Editor WebPart, where this kind of field wouldn't work properly, I'd need to make a new page layout and rebuild my solution from the bottom up, which I would really rather avoid.
I'd also like to avoid downloading anything, as the company I work for normally doesn't allow the use of unapproved software.
You shouldn't do html parsing with WebBrowser class which is part of Windows Forms and is not suited for web as well as for pure html parsing. Try using some html parser like HtmlAgilityPack instead.
I've been trying, without luck, to use IJavaScriptExecutor to find a specific header string in a page. Here's the html code form the page:
<div class="wrap">
<h2>Edit Page <a href="http://www.webtest.bugrit.net/wordpress/wp-admin/post-
new.php?post_type=page" class="add-new-h2">Add New</a></h2>
<div id...
The text I need to check for is the "Edit Page" string.
This is the closest I've come, which isn't very close:
var element = FFDriver.Instance.FindElements(By.ClassName("add-new-h2"));
IJavaScriptExecutor js = FFDriver.Instance as IJavaScriptExecutor;
if (js != null) {
string innerHtml = (string)js.ExecuteScript("return arguments[0].innerHTML;", element);
//System.Windows.Forms.MessageBox.Show(innerHtml);
if (innerHtml.Equals("Edit Page")) {
return true;
} else {
return false;
}
}
Now, I realize that the text I should expect to get from that code isn't the exact string "Edit Page". But shouldn't it return something? When I enable the MessageBox line, the innerHtml string is empty.
Or, of couse - if someone knows another, possible easier, way to check for the existance of a specific string inside a specific html tag, I'm all ears.
Your element returns you <a> element, not <h2>. Your <a> doesn't contain Edit Page string.
Try find your element like this to the parent element <h2> (only if class name add-new-h2 is unique, otherwise you will get the first matching one):
var element = FFDriver.Instance.FindElement(By.XPath(".//a[#class='add-new-h2']/.."));
var containsText = element.Text.Contains("Edit Page");
I use an IE object in my app, so I have to use mshtml to interact with the IE's document. But I have several problems:
I. Using element.innerText/innerHTML/outerText/outerHTML returns className. Here's the code example:
public SHDocVw.InternetExplorer internetExplorer = new SHDocVw.InternetExplorer();
<...>
foreach (mshtml.HTMLSpanElement element in webBrowser.Document.GetElementsByTagName("SPAN"))
{
if (category.className == "classNameNeeded") //ClassName returns className
{
if (category.innerText == "InnerTextNeeded") //InnerText too
{
webBrowser.Navigate(category.parentElement.getAttribute("HREF"));
return true;
}
}
}
return false;
So my app works incorrectly
II. Using getElementById returns DBNull instead of element on a webpage. I'm completely sure that there's such an element on a webpage. Code example:
if(!webBrowser.Document.getElementById("IdNeeded").Equals(DBNull.value)){
<...>
}
I think that this problem is connected with HTML code of the page and it's parsing.
How can I solve these problems?
Thanks in advance.
Good day
I have question about displaying html documents in a windows forms applications. App that I'm working on should display information from the
database in the html format. I will try to describe actions that I have taken (and which failed):
1) I tried to load "virtual" html page that exists only in memory and dynamically change it's parameters (webbMain is a WebBrowser control):
public static string CreateBookHtml()
{
StringBuilder sb = new StringBuilder();
//Declaration
sb.AppendLine(#"<?xml version=""1.0"" encoding=""utf-8""?>");
sb.AppendLine(#"<?xml-stylesheet type=""text/css"" href=""style.css""?>");
sb.AppendLine(#"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.1//EN""
""http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"">");
sb.AppendLine(#"<html xmlns=""http://www.w3.org/1999/xhtml"" xml:lang=""en"">");
//Head
sb.AppendLine(#"<head>");
sb.AppendLine(#"<title>Exemplary document</title>");
sb.AppendLine(#"<meta http-equiv=""Content-Type"" content=""application/xhtml+xml;
charset=utf-8""/ >");
sb.AppendLine(#"</head>");
//Body
sb.AppendLine(#"<body>");
sb.AppendLine(#"<p id=""paragraph"">Example.</p>");
sb.AppendLine(#"</body>");
sb.AppendLine(#"</html>");
return sb.ToString();
}
void LoadBrowser()
{
this.webbMain.Navigate("about:blank");
this.webbMain.DocumentText = CreateBookHtml();
HtmlDocument doc = this.webbMain.Document;
}
This failed, because doc.Body is null, and doc.getElementById("paragraph") returns null too. So I cannot change paragraph InnerText property.
Furthermore, this.webbMain.DocumentText is "\0"...
2) I tried to create html file in specified folder, load it to the WebBrowser and then change its parameters. Html is the same as created by
CreateBookHtml() method:
private void LoadBrowser()
{
this.webbMain.Navigate("HTML\\BookPage.html"));
HtmlDocument doc = this.webbMain.Document;
}
This time this.webbMain.DocumentText contains Html data read from the file, but doc.Body returns null again, and I still cannot take element using
getByElementId() method. Of course, when I have text, I would try regex to get specified fields, or maybe do other tricks to achieve a goal, but I wonder - is there simply way to mainipulate html? For me, ideal way would be to create HTML text in memory, load it into the WebBrowser control, and then dynamically change its parameters using IDs. Is it possible? Thanks for the answers in advance, best regards,
Paweł
I've worked some time ago with the WebControl and like you wanted to load a html from memory but have the same problem, body being null. After some investigation, I noticed that the Navigate and NavigateToString methods work asynchronously, so it needs a little time for the control to load the document, the document is not available right after the call to Navigate. So i did something like (wbChat is the WebBrowser control):
wbChat.NavigateToString("<html><body><div>first line</div></body><html>");
DoEvents();
where DoEvents() is implemeted as:
[SecurityPermissionAttribute(SecurityAction.Demand, Flags = SecurityPermissionFlag.UnmanagedCode)]
public void DoEvents()
{
DispatcherFrame frame = new DispatcherFrame();
Dispatcher.CurrentDispatcher.BeginInvoke(DispatcherPriority.Background,
new DispatcherOperationCallback(ExitFrame), frame);
Dispatcher.PushFrame(frame);
}
and it worked for me, after the DoEvents call, I could obtain a non-null body:
mshtml.IHTMLDocument2 doc2 = (mshtml.IHTMLDocument2)wbChat.Document;
mshtml.HTMLDivElement div = (mshtml.HTMLDivElement)doc2.createElement("div");
div.innerHTML = "some text";
mshtml.HTMLBodyClass body = (mshtml.HTMLBodyClass)doc2.body;
if (body != null)
{
body.appendChild((mshtml.IHTMLDOMNode)div);
body.scrollTop = body.scrollHeight;
}
else
Console.WriteLine("body is still null");
I don't know if this is the right way of doing this, but it fixed the problem for me, maybe it helps you too.
Later Edit:
public object ExitFrame(object f)
{
((DispatcherFrame)f).Continue = false;
return null;
}
The DoEvents method is necessary on WPF. For System.Windows.Forms one can use Application.DoEvents().
Another way to do the same thing is:
webBrowser1.DocumentText = "<html><body>blabla<hr/>yadayada</body></html>";
this works without any extra initialization