Multiple mshtml problems

Multiple mshtml problems - c#

I use an IE object in my app, so I have to use mshtml to interact with the IE's document. But I have several problems:
I. Using element.innerText/innerHTML/outerText/outerHTML returns className. Here's the code example:
public SHDocVw.InternetExplorer internetExplorer = new SHDocVw.InternetExplorer();
<...>
foreach (mshtml.HTMLSpanElement element in webBrowser.Document.GetElementsByTagName("SPAN"))
{
if (category.className == "classNameNeeded") //ClassName returns className
{
if (category.innerText == "InnerTextNeeded") //InnerText too
{
webBrowser.Navigate(category.parentElement.getAttribute("HREF"));
return true;
}
}
}
return false;
So my app works incorrectly
II. Using getElementById returns DBNull instead of element on a webpage. I'm completely sure that there's such an element on a webpage. Code example:
if(!webBrowser.Document.getElementById("IdNeeded").Equals(DBNull.value)){
<...>
}
I think that this problem is connected with HTML code of the page and it's parsing.
How can I solve these problems?
Thanks in advance.

Related

c# selenium form submission

I'm having issues with if submit form is available then submit data and check for response, it seems to check for submit form, submit the data but then doesn't process the response given, example of code:
if (driver.FindElements(By.Name("search")).Count > 0 && driver.FindElement(By.Name("search")).Displayed)
{
driver.FindElement(By.Name("search")).SendKeys(query + Keys.Enter);
if (driver.FindElements(By.XPath("//*[#id='not found']/h2")).Count > 0 && driver.FindElement(By.XPath("//*[#id='not found']/h2")).Displayed)
{
Console.WriteLine("search not found");
driver.Manage().Cookies.DeleteAllCookies();
driver.Navigate().GoToUrl("https://example.com");
}
}
what this should doing is:
if
driver.findelement(by.name("search")
is true, then
driver.findelement(by.name("search").sendkeys(query)
then, check for response provided and handle using given commands within the if statement.

I would rewrite this a little to make it a little more readable and not hit the page so many times. Every time you do driver.findElement(), Selenium scrapes the page. Scrape it once, do all your analysis using that first scrape, and then proceed.
IReadOnlyCollection<IWebElement> search = GetVisibleElements(By.Name("search"));
if (search.Any())
{
search.ElementAt(0).SendKeys(query + Keys.Enter);
if (GetVisibleElements(By.XPath("//*[#id='not found']/h2")).Any())
{
// search not found
Console.WriteLine("search not found");
Driver.Manage().Cookies.DeleteAllCookies();
Driver.Navigate().GoToUrl("https://example.com");
}
else
{
// search found
// do stuff here
}
}
Since you are checking more than once if an element exists and is visible, I would wrap that code in a function to make it more usable and make your code easier to read.
public IReadOnlyCollection<IWebElement> GetVisibleElements(By locator)
{
return Driver.FindElements(locator).Where(e => e.Displayed).ToList();
}
This function locates the elements based on the locator provided, filters it down to only those elements that are displayed, and then returns the list. You can then see if there are any elements in the returned list in your script.

DisconnectedContext detected when using STA thread to modify SharePoint page

Background Info: I'm using an ItemCheckedIn receiver in SharePoint 2010, targeting .NET 3.5 Framework. The goal of the receiver is to:
Make sure the properties (columns) of the page match the data in a Content Editor WebPart on the page so that the page can be found in a search using Filter web parts. The pages are automatically generated, so barring any errors they are guaranteed to fit the expected format.
If there is a mismatch, check out the page, fix the properties, then check it back in.
I've kept the receiver from falling into an infinite check-in/check-out loop, although right now it's a very clumsy fix that I'm trying to work on. However, right now I can't work on it because I'm getting a DisconnectedContext error whenever I hit the UpdatePage function:
public override void ItemCheckedIn(SPItemEventProperties properties)
{
// If the main page or machine information is being checked in, do nothing
if (properties.AfterUrl.Contains("home") || properties.AfterUrl.Contains("machines")) return;
// Otherwise make sure that the page properties reflect any changes that may have been made
using (SPSite site = new SPSite("http://san1web.net.jbtc.com/sites/depts/VPC/"))
using (SPWeb web = site.OpenWeb())
{
SPFile page = web.GetFile(properties.AfterUrl);
// Make sure the event receiver doesn't get called infinitely by checking version history
...
UpdatePage(page);
}
}
private static void UpdatePage(SPFile page)
{
bool checkOut = false;
var th = new Thread(() =>
{
using (WebBrowser wb = new WebBrowser())
using (SPLimitedWebPartManager manager = page.GetLimitedWebPartManager(PersonalizationScope.Shared))
{
// Get web part's contents into HtmlDocument
ContentEditorWebPart cewp = (ContentEditorWebPart)manager.WebParts[0];
HtmlDocument htmlDoc;
wb.Navigate("about:blank");
htmlDoc = wb.Document;
htmlDoc.OpenNew(true);
htmlDoc.Write(cewp.Content.InnerText);
foreach (var prop in props)
{
// Check that each property matches the information on the page
string element;
try
{
element = htmlDoc.GetElementById(prop).InnerText;
}
catch (NullReferenceException)
{
break;
}
if (!element.Equals(page.GetProperty(prop).ToString()))
{
if (!prop.Contains("Request"))
{
checkOut = true;
break;
}
else if (!element.Equals(page.GetProperty(prop).ToString().Split(' ')[0]))
{
checkOut = true;
break;
}
}
}
if (!checkOut) return;
// If there was a mismatch, check the page out and fix the properties
page.CheckOut();
foreach (var prop in props)
{
page.SetProperty(prop, htmlDoc.GetElementById(prop).InnerText);
page.Item[prop] = htmlDoc.GetElementById(prop).InnerText;
try
{
page.Update();
}
catch
{
page.SetProperty(prop, Convert.ToDateTime(htmlDoc.GetElementById(prop).InnerText).AddDays(1));
page.Item[prop] = Convert.ToDateTime(htmlDoc.GetElementById(prop).InnerText).AddDays(1);
page.Update();
}
}
page.CheckIn("");
}
});
th.SetApartmentState(ApartmentState.STA);
th.Start();
}
From what I understand, using a WebBrowser is the only way to fill an HtmlDocument in this version of .NET, so that's why I have to use this thread.
In addition, I've done some reading and it looks like the DisconnectedContext error has to do with threading and COM, which are subjects I know next to nothing about. What can I do to prevent/fix this error?
EDIT
As #Yevgeniy.Chernobrivets pointed out in the comments, I could insert an editable field bound to the page column and not worry about parsing any html, but because the current page layout uses an HTML table within a Content Editor WebPart, where this kind of field wouldn't work properly, I'd need to make a new page layout and rebuild my solution from the bottom up, which I would really rather avoid.
I'd also like to avoid downloading anything, as the company I work for normally doesn't allow the use of unapproved software.

You shouldn't do html parsing with WebBrowser class which is part of Windows Forms and is not suited for web as well as for pure html parsing. Try using some html parser like HtmlAgilityPack instead.

Get GeckoFx firefox browser control iframe html not accessible

I am using the GeckoFX 22 c# web browser control but cannot manage to access tags within an iframe. When I check the gecko innerhtml it seems that although the iframe tag shows in the html, the contents of it do not.
This is the code I used to get the inner html of the browser control which just shows the iframe tag as empty (when it should have another doc inside of it):
GeckoHtmlElement element = null;
var geckoDomElement = webBrowser.Document.DocumentElement;
if (geckoDomElement is GeckoHtmlElement)
{
element = (GeckoHtmlElement)geckoDomElement;
var innerHtml = element.InnerHtml;
}
Previously I used code similar to the code below to access individual elements which works fine:
GeckoDocument checkDoc = (GeckoDocument)webBrowser.Window.Document;
var x = (checkDoc.GetElementsByTagName("a").Where(b => b.Id == "ipt-form-format-aside").First());
I am able to get individual elements and change their values/trigger events etc without problems with the main html document but anything in an iframe is impossible to get the elements of. I think perhaps the Iframe has not been loaded yet or something like that. Is there a way to force the control to wait for the I frame to load before attempting to access its elements?

string content = null;
var iframe = webBrowser.Document.GetElementsByTagName("iframe").FirstOrDefault() as Gecko.DOM.GeckoIFrameElement;
if(iframe != null)
{
var html = iframe.ContentDocument.DocumentElement as GeckoHtmlElement;
if (html != null)
content = html.OuterHtml;
}

I'm just posting this for anyone else that might get this problem
foreach (GeckoIFrameElement _E in geckoWebBrowser1.Document.GetElementsByTagName("iframe"))
{
if (_E.GetAttribute("class") == "testClass")
{
var innerHTML = _E.ContentDocument;
foreach (GeckoHtmlElement _A in innerHTML.GetElementsByTagName("input"))
{
_A.SetAttribute("value", "Test");
}
}
}

I got a similar problem so i did this
checkDoc.Window.Frames(1)
instead of
checkDoc.GetElementsByTagName("iframe")
value within the parenthesis (i.e. 1 here) depends of your index

The difference between "IE9 debug tools" HTML output and webpage source HTML that I got via C#

I'm not the first time here with questions like this.
I have a Volvo auto parts catalog that is implemented as a client application to a local database and works only in IE8/9. I need to find and get some positions displayed in IE.
Here's an example of IE output:
It's just a table and nothing more.
And here's what I see in IE9 debug tools:
IE shows me full layout of a page where I can see a target table and rows with the data I need to get.
I wrote a simple class that should walk through all IE tabs and get HTML from the target page:
using System.Globalization;
using System.Text.RegularExpressions;
using SHDocVw;
namespace WebpageHtmlMiner
{
static class HtmlMiner
{
public static string GetWebpageHtml(string uriPattern)
{
var uriRegexPattern = uriPattern;
var regex = new Regex(uriRegexPattern);
var shellWindows = new ShellWindows();
InternetExplorer internetExplorer = null;
foreach (InternetExplorer ie in shellWindows)
{
Match match = regex.Match(ie.LocationURL);
if (!string.IsNullOrEmpty(match.Value))
{
internetExplorer = ie;
break;
}
}
if (internetExplorer == null)
{
return "Target page is not opened in IE";
}
var mshtmlDocument = (mshtml.IHTMLDocument2)internetExplorer.Document;
var webpageHtml = mshtmlDocument.body.parentElement.outerHTML.ToString(CultureInfo.InvariantCulture);
return webpageHtml; //profit
}
}
}
It seems to work fine but instead of what I see in IE debug tools I get HTML code with tons of javascript functions and no data in target table.
Is there any way to get exactly what I see in IE debug tools?
Thanks.

You can get the original source (the one sent by the server) in "Script" tab (this works both on my IE8 and my IE10).
If you do not use AJAX, I think you can right-click on the page and choose Display Souce option too.

How to feed WebBrowser control and manipulate the HTML document?

Good day
I have question about displaying html documents in a windows forms applications. App that I'm working on should display information from the
database in the html format. I will try to describe actions that I have taken (and which failed):
1) I tried to load "virtual" html page that exists only in memory and dynamically change it's parameters (webbMain is a WebBrowser control):
public static string CreateBookHtml()
{
StringBuilder sb = new StringBuilder();
//Declaration
sb.AppendLine(#"<?xml version=""1.0"" encoding=""utf-8""?>");
sb.AppendLine(#"<?xml-stylesheet type=""text/css"" href=""style.css""?>");
sb.AppendLine(#"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.1//EN""
""http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"">");
sb.AppendLine(#"<html xmlns=""http://www.w3.org/1999/xhtml"" xml:lang=""en"">");
//Head
sb.AppendLine(#"<head>");
sb.AppendLine(#"<title>Exemplary document</title>");
sb.AppendLine(#"<meta http-equiv=""Content-Type"" content=""application/xhtml+xml;
charset=utf-8""/ >");
sb.AppendLine(#"</head>");
//Body
sb.AppendLine(#"<body>");
sb.AppendLine(#"<p id=""paragraph"">Example.</p>");
sb.AppendLine(#"</body>");
sb.AppendLine(#"</html>");
return sb.ToString();
}
void LoadBrowser()
{
this.webbMain.Navigate("about:blank");
this.webbMain.DocumentText = CreateBookHtml();
HtmlDocument doc = this.webbMain.Document;
}
This failed, because doc.Body is null, and doc.getElementById("paragraph") returns null too. So I cannot change paragraph InnerText property.
Furthermore, this.webbMain.DocumentText is "\0"...
2) I tried to create html file in specified folder, load it to the WebBrowser and then change its parameters. Html is the same as created by
CreateBookHtml() method:
private void LoadBrowser()
{
this.webbMain.Navigate("HTML\\BookPage.html"));
HtmlDocument doc = this.webbMain.Document;
}
This time this.webbMain.DocumentText contains Html data read from the file, but doc.Body returns null again, and I still cannot take element using
getByElementId() method. Of course, when I have text, I would try regex to get specified fields, or maybe do other tricks to achieve a goal, but I wonder - is there simply way to mainipulate html? For me, ideal way would be to create HTML text in memory, load it into the WebBrowser control, and then dynamically change its parameters using IDs. Is it possible? Thanks for the answers in advance, best regards,
Paweł

I've worked some time ago with the WebControl and like you wanted to load a html from memory but have the same problem, body being null. After some investigation, I noticed that the Navigate and NavigateToString methods work asynchronously, so it needs a little time for the control to load the document, the document is not available right after the call to Navigate. So i did something like (wbChat is the WebBrowser control):
wbChat.NavigateToString("<html><body><div>first line</div></body><html>");
DoEvents();
where DoEvents() is implemeted as:
[SecurityPermissionAttribute(SecurityAction.Demand, Flags = SecurityPermissionFlag.UnmanagedCode)]
public void DoEvents()
{
DispatcherFrame frame = new DispatcherFrame();
Dispatcher.CurrentDispatcher.BeginInvoke(DispatcherPriority.Background,
new DispatcherOperationCallback(ExitFrame), frame);
Dispatcher.PushFrame(frame);
}
and it worked for me, after the DoEvents call, I could obtain a non-null body:
mshtml.IHTMLDocument2 doc2 = (mshtml.IHTMLDocument2)wbChat.Document;
mshtml.HTMLDivElement div = (mshtml.HTMLDivElement)doc2.createElement("div");
div.innerHTML = "some text";
mshtml.HTMLBodyClass body = (mshtml.HTMLBodyClass)doc2.body;
if (body != null)
{
body.appendChild((mshtml.IHTMLDOMNode)div);
body.scrollTop = body.scrollHeight;
}
else
Console.WriteLine("body is still null");
I don't know if this is the right way of doing this, but it fixed the problem for me, maybe it helps you too.
Later Edit:
public object ExitFrame(object f)
{
((DispatcherFrame)f).Continue = false;
return null;
}
The DoEvents method is necessary on WPF. For System.Windows.Forms one can use Application.DoEvents().

Another way to do the same thing is:
webBrowser1.DocumentText = "<html><body>blabla<hr/>yadayada</body></html>";
this works without any extra initialization

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Multiple mshtml problems - c#

Related

c# selenium form submission

DisconnectedContext detected when using STA thread to modify SharePoint page

Get GeckoFx firefox browser control iframe html not accessible

The difference between "IE9 debug tools" HTML output and webpage source HTML that I got via C#

How to feed WebBrowser control and manipulate the HTML document?

Categories

Resources