Selenium WebDriver and C# (VS): Look for a specific header string - c#

I've been trying, without luck, to use IJavaScriptExecutor to find a specific header string in a page. Here's the html code form the page:
<div class="wrap">
<h2>Edit Page <a href="http://www.webtest.bugrit.net/wordpress/wp-admin/post-
new.php?post_type=page" class="add-new-h2">Add New</a></h2>
<div id...
The text I need to check for is the "Edit Page" string.
This is the closest I've come, which isn't very close:
var element = FFDriver.Instance.FindElements(By.ClassName("add-new-h2"));
IJavaScriptExecutor js = FFDriver.Instance as IJavaScriptExecutor;
if (js != null) {
string innerHtml = (string)js.ExecuteScript("return arguments[0].innerHTML;", element);
//System.Windows.Forms.MessageBox.Show(innerHtml);
if (innerHtml.Equals("Edit Page")) {
return true;
} else {
return false;
}
}
Now, I realize that the text I should expect to get from that code isn't the exact string "Edit Page". But shouldn't it return something? When I enable the MessageBox line, the innerHtml string is empty.
Or, of couse - if someone knows another, possible easier, way to check for the existance of a specific string inside a specific html tag, I'm all ears.

Your element returns you <a> element, not <h2>. Your <a> doesn't contain Edit Page string.
Try find your element like this to the parent element <h2> (only if class name add-new-h2 is unique, otherwise you will get the first matching one):
var element = FFDriver.Instance.FindElement(By.XPath(".//a[#class='add-new-h2']/.."));
var containsText = element.Text.Contains("Edit Page");

Related

How to prevent "stale element" inside a foreach loop?

I'm using Selenium for retrieve data from this site, and I encountered a little problem when I try to click an element within a foreach.
What I'm trying to do
I'm trying to get the table associated to a specific category of odds, in the link above we have different categories:
As you can see from the image, I clicked on Asian handicap -1.75 and the site has generated a table through javascript, so inside my code I'm trying to get that table finding the corresponding element and clicking it.
Code
Actually I have two methods, the first called GetAsianHandicap which iterate over all categories of odds:
public List<T> GetAsianHandicap(Uri fixtureLink)
{
//Contains all the categories displayed on the page
string[] categories = new string[] { "-1.75", "-1.5", "-1.25", "-1", "-0.75", "-0.5", "-0.25", "0", "+0.25", "+0.5", "+0.75", "+1", "+1.25", "+1.5", "+1.75" };
foreach(string cat in categories)
{
//Get the html of the table for the current category
string html = GetSelector("Asian handicap " + asian);
if(html == string.Empty)
continue;
//other code
}
}
and then the method GetSelector which click on the searched element, this is the design:
public string GetSelector(string selector)
{
//Get the available table container (the category).
var containers = driver.FindElements(By.XPath("//div[#class='table-container']"));
//Store the html to return.
string html = string.Empty;
foreach (IWebElement container in containers)
{
//Container not available for click.
if (container.GetAttribute("style") == "display: none;")
continue;
//Get container header (contains the description).
IWebElement header = container.FindElement(By.XPath(".//div[starts-with(#class, 'table-header')]"));
//Store the table description.
string description = header.FindElement(By.TagName("a")).Text;
//The container contains the searched category
if (description.Trim() == selector)
{
//Get the available links.
var listItems = driver.FindElement(By.Id("odds-data-table")).FindElements(By.TagName("a"));
//Get the element to click.
IWebElement element = listItems.Where(li => li.Text == selector).FirstOrDefault();
//The element exist
if (element != null)
{
//Click on the container for load the table.
element.Click();
//Wait few seconds on ChromeDriver for table loading.
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(20);
//Get the new html of the page
html = driver.PageSource;
}
return html;
}
return string.Empty;
}
Problem and exception details
When the foreach reach this line:
var listItems = driver.FindElement(By.Id("odds-data-table")).FindElements(By.TagName("a"));
I get this exception:
'OpenQA.Selenium.StaleElementReferenceException' in WebDriver.dll
stale element reference: element is not attached to the page document
Searching for the error means that the html page source was changed, but in this case I store the element to click in a variable and the html itself in another variable, so I can't get rid to patch this issue.
Someone could help me?
Thanks in advance.
I looked at your code and I think you're making it more complicated than it needs to be. I'm assuming you want to scrape the table that is exposed when you click one of the handicap links. Here's some simple code to do this. It dumps the text of the elements which ends up unformatted but you can use this as a starting point and add functionality if you want. I didn't run into any StaleElementExceptions when running this code and I never saw the page refresh so I'm not sure what other people were seeing.
string url = "http://www.oddsportal.com/soccer/europe/champions-league/paok-spartak-moscow-pIXFEt8o/#ah;2";
driver.Url = url;
// get all the (visible) handicap links and click them to open the page and display the table with odds
IReadOnlyCollection<IWebElement> links = driver.FindElements(By.XPath("//a[contains(.,'Asian handicap')]")).Where(e => e.Displayed).ToList();
foreach (var link in links)
{
link.Click();
}
// print all the odds tables
foreach (var item in driver.FindElements(By.XPath("//div[#class='table-container']")))
{
Console.WriteLine(item.Text);
Console.WriteLine("====================================");
}
I would suggest that you spend some more time learning locators. Locators are very powerful and can save you having to stack nested loops looking for one thing... and then children of that thing... and then children of that thing... and so on. The right locator can find all that in one scrape of the page which saves a lot of code and time.
As you mentioned in related Post, this issue is because site executes an auto refresh.
Solution 1:
I would suggest if there is an explicit way to do refresh, perform that refresh on a periodic basis, or (if you are sure, when you need to do refresh).
Solution 2:
Create a Extension method for FindElement and FindElements, so that it try to get element for a given timeout.
public static void FindElement(this IWebDriver driver, By by, int timeout)
{
if(timeout >0)
{
return new WebDriverWait(driver, TimeSpan.FromSeconds(timeout)).Until(ExpectedConditions.ElementToBeClickable(by));
}
return driver.FindElement(by);
}
public static IReadOnlyCollection<IWebElement> FindElements(this IWebDriver driver, By by, int timeout)
{
if(timeout >0)
{
return new WebDriverWait(driver, TimeSpan.FromSeconds(timeout)).Until(ExpectedConditions.PresenceOfAllElementsLocatedBy(by));
}
return driver.FindElements(by);
}
so your code will use these like this:
var listItems = driver.FindElement(By.Id("odds-data-table"), 30).FindElements(By.TagName("a"),30);
Solution 3:
Handle StaleElementException using an Extension Method:
public static void FindElement(this IWebDriver driver, By by, int maxAttempt)
{
for(int attempt =0; attempt <maxAttempt; attempt++)
{
try
{
driver.FindElement(by);
break;
}
catch(StaleElementException)
{
}
}
}
public static IReadOnlyCollection<IWebElement> FindElements(this IWebDriver driver, By by, int maxAttempt)
{
for(int attempt =0; attempt <maxAttempt; attempt++)
{
try
{
driver.FindElements(by);
break;
}
catch(StaleElementException)
{
}
}
}
Your code will use these like this:
var listItems = driver.FindElement(By.Id("odds-data-table"), 2).FindElements(By.TagName("a"),2);
Use this:
string description = header.FindElement(By.XPath("strong/a")).Text;
instead of your:
string description = header.FindElement(By.TagName("a")).Text;

HTMLAgilityPack Selectnodes always returns null

I heard good things about the HTMLAgilityPack library, so I thought I'd give it a try but I have had absolutely zero success with it. I've been trying to figure this out for months. No matter what I do, I cannot get this code to give me anything other than null. I tried following this example (http://www.c-sharpcorner.com/uploadfile/9b86d4/getting-started-with-html-agility-pack/), but I do not get the same results and I cannot explain why.
I try loading the file and then run SelectNodes to select all hyperlinks, but it always returns an empty list. I've tried selecting all kinds of nodes (divs, p, a, everything and anything) and it always returns an empty list. I've tried using doc.Descendants, I've tried using different source files, locally and on the the web and nothing I do will ever return an actual result.
I must have overlooked something important, but I cannot figure out what it is. What could I be missing?
Code:
public string GetSource()
{
try
{
string result = "";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
if (!System.IO.File.Exists("htmldoc.html"))
throw new Exception("Unable to load doc");
doc.LoadHtml("htmldoc.html"); // copied locally to bin folder, confirmed it found the file and loaded it
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//a"); // Always returns null, regardless of what I put in here
if (nodes != null)
{
foreach (HtmlNode item in nodes)
{
result += item.InnerText;
}
}
else
{
// Every. Single. Time.
throw new Exception("No matching nodes found in document");
}
return result;
}
catch (Exception ex)
{
return ex.ToString();
}
}
The source HTML file 'htmldoc.html' I'm using looks like this:
<html>
<head>
<title>Testing HTML Agility Pack</title>
</head>
<body>
<div id="div1">
Link 1 inside div1
Link 2 inside div1
</div>
Link 3 outside all divs
<div id="div2">
Link 1 inside div2
Link 2 inside div2
</div>
</body>
</html>
To load a file you should use Load method.. LoadHtml is used for strings containing html
doc.Load("htmldoc.html");

How to parse a dynamically updating webpage in C#

I am trying to parse the number shown in this page:
https://www.edf.org/embed/methane-counters
I have tried WebBrowser, WebClient ... etc. with no good result. Every time I try something new, in the HTML returned I get this (HTML area where the number is shown):
<strong id=\"methane\"></strong>
... as you see there is no number between the 'strong' tags. Just in case, this is the latest code I have tried, that still do not work:
using (WebBrowser myWebBrowser = new WebBrowser()) {
myWebBrowser.ScriptErrorsSuppressed = true;
myWebBrowser.Navigate(myURL);
while ((myWebBrowser.ReadyState != WebBrowserReadyState.Complete))
Application.DoEvents();
myContent = myWebBrowser.Document.Body.InnerHtml;
myContent = myWebBrowser.DocumentText;
}
... neither of the last two calls returns the HTML with the number on it.
Any ideas on how to get the proper content of this page?

Get GeckoFx firefox browser control iframe html not accessible

I am using the GeckoFX 22 c# web browser control but cannot manage to access tags within an iframe. When I check the gecko innerhtml it seems that although the iframe tag shows in the html, the contents of it do not.
This is the code I used to get the inner html of the browser control which just shows the iframe tag as empty (when it should have another doc inside of it):
GeckoHtmlElement element = null;
var geckoDomElement = webBrowser.Document.DocumentElement;
if (geckoDomElement is GeckoHtmlElement)
{
element = (GeckoHtmlElement)geckoDomElement;
var innerHtml = element.InnerHtml;
}
Previously I used code similar to the code below to access individual elements which works fine:
GeckoDocument checkDoc = (GeckoDocument)webBrowser.Window.Document;
var x = (checkDoc.GetElementsByTagName("a").Where(b => b.Id == "ipt-form-format-aside").First());
I am able to get individual elements and change their values/trigger events etc without problems with the main html document but anything in an iframe is impossible to get the elements of. I think perhaps the Iframe has not been loaded yet or something like that. Is there a way to force the control to wait for the I frame to load before attempting to access its elements?
string content = null;
var iframe = webBrowser.Document.GetElementsByTagName("iframe").FirstOrDefault() as Gecko.DOM.GeckoIFrameElement;
if(iframe != null)
{
var html = iframe.ContentDocument.DocumentElement as GeckoHtmlElement;
if (html != null)
content = html.OuterHtml;
}
I'm just posting this for anyone else that might get this problem
foreach (GeckoIFrameElement _E in geckoWebBrowser1.Document.GetElementsByTagName("iframe"))
{
if (_E.GetAttribute("class") == "testClass")
{
var innerHTML = _E.ContentDocument;
foreach (GeckoHtmlElement _A in innerHTML.GetElementsByTagName("input"))
{
_A.SetAttribute("value", "Test");
}
}
}
I got a similar problem so i did this
checkDoc.Window.Frames(1)
instead of
checkDoc.GetElementsByTagName("iframe")
value within the parenthesis (i.e. 1 here) depends of your index

Regular Expression to avoid HTML tags and empty values

I have applied a textbox click validation and wanted to avoid any html tags in text box also the simple < (open tag) and >(close tag). The below code is working for but i want to add additional validations also for empty strings and other tags in html. Can some one please help modify the regex for the requirement.
function htmlValidation()
{
var re = /(<([^>]+)>)/gi;
if (document.getElementById(’<%=TextBox2.ClientID%>’).value.match(re)){ document.getElementById(’<%=TextBox2.ClientID%>’).value = “”;
return false;
}
return true;
}
Corrected Code above
In my opinion, I believe you'll have a good hard work if you want to validate such things.
Instead of preventing HTML content in a text box, other solution could be just html entity encode Text property, so <p>a</p> would be converted to >p<a>p<.
Result of that is you're going to render the HTML "as text" instead of getting it interpreted by Web browser.
Check this MSDN article:
http://msdn.microsoft.com/en-us/library/73z22y6h(v=vs.110).aspx
$("#<%= btnAdd.ClientID %>").click(function () {
var txt = $("#<%= txtBox1.ClientID %>");
var svc = $(txt).val(); //Its Let you know the textbox's value
var re = /(<([^>]+)>)/gi;
if(txt.val()!=""){
if (!txt.val().match(re)) {
//my Operations
//goes here
});
return false;
}
else {
alert("Invalid Content");
}
}
else {
alert("Blank value selected");
}
I have used Jquery function to check for regular expresion. This question is a linked question with
Using Jquery to add items in Listbox from Textbox
Now i can mark this as my final answer.

Categories

Resources