I'm using Selenium for retrieve data from this site, and I encountered a little problem when I try to click an element within a foreach.
What I'm trying to do
I'm trying to get the table associated to a specific category of odds, in the link above we have different categories:
As you can see from the image, I clicked on Asian handicap -1.75 and the site has generated a table through javascript, so inside my code I'm trying to get that table finding the corresponding element and clicking it.
Code
Actually I have two methods, the first called GetAsianHandicap which iterate over all categories of odds:
public List<T> GetAsianHandicap(Uri fixtureLink)
{
//Contains all the categories displayed on the page
string[] categories = new string[] { "-1.75", "-1.5", "-1.25", "-1", "-0.75", "-0.5", "-0.25", "0", "+0.25", "+0.5", "+0.75", "+1", "+1.25", "+1.5", "+1.75" };
foreach(string cat in categories)
{
//Get the html of the table for the current category
string html = GetSelector("Asian handicap " + asian);
if(html == string.Empty)
continue;
//other code
}
}
and then the method GetSelector which click on the searched element, this is the design:
public string GetSelector(string selector)
{
//Get the available table container (the category).
var containers = driver.FindElements(By.XPath("//div[#class='table-container']"));
//Store the html to return.
string html = string.Empty;
foreach (IWebElement container in containers)
{
//Container not available for click.
if (container.GetAttribute("style") == "display: none;")
continue;
//Get container header (contains the description).
IWebElement header = container.FindElement(By.XPath(".//div[starts-with(#class, 'table-header')]"));
//Store the table description.
string description = header.FindElement(By.TagName("a")).Text;
//The container contains the searched category
if (description.Trim() == selector)
{
//Get the available links.
var listItems = driver.FindElement(By.Id("odds-data-table")).FindElements(By.TagName("a"));
//Get the element to click.
IWebElement element = listItems.Where(li => li.Text == selector).FirstOrDefault();
//The element exist
if (element != null)
{
//Click on the container for load the table.
element.Click();
//Wait few seconds on ChromeDriver for table loading.
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(20);
//Get the new html of the page
html = driver.PageSource;
}
return html;
}
return string.Empty;
}
Problem and exception details
When the foreach reach this line:
var listItems = driver.FindElement(By.Id("odds-data-table")).FindElements(By.TagName("a"));
I get this exception:
'OpenQA.Selenium.StaleElementReferenceException' in WebDriver.dll
stale element reference: element is not attached to the page document
Searching for the error means that the html page source was changed, but in this case I store the element to click in a variable and the html itself in another variable, so I can't get rid to patch this issue.
Someone could help me?
Thanks in advance.
I looked at your code and I think you're making it more complicated than it needs to be. I'm assuming you want to scrape the table that is exposed when you click one of the handicap links. Here's some simple code to do this. It dumps the text of the elements which ends up unformatted but you can use this as a starting point and add functionality if you want. I didn't run into any StaleElementExceptions when running this code and I never saw the page refresh so I'm not sure what other people were seeing.
string url = "http://www.oddsportal.com/soccer/europe/champions-league/paok-spartak-moscow-pIXFEt8o/#ah;2";
driver.Url = url;
// get all the (visible) handicap links and click them to open the page and display the table with odds
IReadOnlyCollection<IWebElement> links = driver.FindElements(By.XPath("//a[contains(.,'Asian handicap')]")).Where(e => e.Displayed).ToList();
foreach (var link in links)
{
link.Click();
}
// print all the odds tables
foreach (var item in driver.FindElements(By.XPath("//div[#class='table-container']")))
{
Console.WriteLine(item.Text);
Console.WriteLine("====================================");
}
I would suggest that you spend some more time learning locators. Locators are very powerful and can save you having to stack nested loops looking for one thing... and then children of that thing... and then children of that thing... and so on. The right locator can find all that in one scrape of the page which saves a lot of code and time.
As you mentioned in related Post, this issue is because site executes an auto refresh.
Solution 1:
I would suggest if there is an explicit way to do refresh, perform that refresh on a periodic basis, or (if you are sure, when you need to do refresh).
Solution 2:
Create a Extension method for FindElement and FindElements, so that it try to get element for a given timeout.
public static void FindElement(this IWebDriver driver, By by, int timeout)
{
if(timeout >0)
{
return new WebDriverWait(driver, TimeSpan.FromSeconds(timeout)).Until(ExpectedConditions.ElementToBeClickable(by));
}
return driver.FindElement(by);
}
public static IReadOnlyCollection<IWebElement> FindElements(this IWebDriver driver, By by, int timeout)
{
if(timeout >0)
{
return new WebDriverWait(driver, TimeSpan.FromSeconds(timeout)).Until(ExpectedConditions.PresenceOfAllElementsLocatedBy(by));
}
return driver.FindElements(by);
}
so your code will use these like this:
var listItems = driver.FindElement(By.Id("odds-data-table"), 30).FindElements(By.TagName("a"),30);
Solution 3:
Handle StaleElementException using an Extension Method:
public static void FindElement(this IWebDriver driver, By by, int maxAttempt)
{
for(int attempt =0; attempt <maxAttempt; attempt++)
{
try
{
driver.FindElement(by);
break;
}
catch(StaleElementException)
{
}
}
}
public static IReadOnlyCollection<IWebElement> FindElements(this IWebDriver driver, By by, int maxAttempt)
{
for(int attempt =0; attempt <maxAttempt; attempt++)
{
try
{
driver.FindElements(by);
break;
}
catch(StaleElementException)
{
}
}
}
Your code will use these like this:
var listItems = driver.FindElement(By.Id("odds-data-table"), 2).FindElements(By.TagName("a"),2);
Use this:
string description = header.FindElement(By.XPath("strong/a")).Text;
instead of your:
string description = header.FindElement(By.TagName("a")).Text;
I create a WindownsFormApplication for a group of friends. I'm using HtmlAgilityPack for my application.
I need to find all version of taco addon's , like this:
<li><a href='https://www.dropbox.com/s/nks140nf794tx77/GW2TacO_034r.zip?dl=0'>Download Build 034.1866r</a></li>
Additionally, I need to check the latest version for downloading the file with the url as in the code below:
public static bool Tacoisuptodate(string Version)
{
// Load HtmlDocuments
var doc = new HtmlWeb().Load("http://www.gw2taco.com/");
var body = doc.DocumentNode.SelectNodes("//body").Single();
// Sort out the document to take that he to interest us
//SelectNodes("//div"))
foreach (var node in doc.DocumentNode.SelectNodes("//div"))
{
// Check for null value
var classeValue = node.Attributes["class"]?.Value;
var idValue = node.Attributes["id"]?.Value;
var hrefValue = node.Attributes["href"]?.Value;
// We search <div class="widget LinkList" id="LinkList1" into home page >
if (classeValue == "widget LinkList" && idValue == "LinkList1")
{
foreach(HtmlNode content in node.SelectNodes("//li"))
{
Debug.Write(content.GetAttributeValue("href", false));
}
}
}
return false;
}
If somebody could help me, I would really appreciate it.
A single xpath is enough.
var xpath = "//h2[text()='Downloads']/following-sibling::div[#class='widget-content']/ul/li/a";
var doc = new HtmlAgilityPack.HtmlWeb().Load("http://www.gw2taco.com/");
var downloads = doc.DocumentNode.SelectNodes(xpath)
.Select(li => new
{
href = li.Attributes["href"].Value,
name = li.InnerText
})
.ToList();
I am using HtmlAgilityPack and on univerity I had got task to get all links, located next to the word "source" and related. I tried with such a code:
foreach (HtmlNode link in document.DocumentNode.SelectNodes(".//a[#href]"))
{
if (document.DocumentNode.InnerHtml.ToString().Contains(sourcesDictionary[i]))
{
string hrefValue = link.GetAttributeValue("href", string.Empty);
Console.WriteLine(hrefValue);
}
}
But it just prints all links at HTML Document. What can I change to get it working properly?
Adding the highlighted line may help
foreach (HtmlNode link in document.DocumentNode.SelectNodes(".//a[#href]"))
{
**if(link.ParentNode.InnerHtml.Contains("source")**
{
if (document.DocumentNode.InnerHtml.ToString().Contains(sourcesDictionary[i]))
{
string hrefValue = link.GetAttributeValue("href", string.Empty);
Console.WriteLine(hrefValue);
}
}
}
I'm trying to parse a website's content on Windows Phone using the HtmlAgilityPack. My current code is:
HtmlWeb.LoadAsync(url, DownloadCompleted);
...
void DownloadCompleted(object sender, HtmlDocumentLoadCompleted e)
{
if (e.Error == null)
{
HtmlDocument doc = e.Document;
if (doc != null)
{
string test = doc.DocumentNode.Element("html").Element("body").Element("form").Elements("div").ElementAt(2).Element("table").Element("tbody").Elements("tr").ElementAt(4).Element("td").Element("center").Element("div").InnerText.ToString();
System.Diagnostics.Debug.WriteLine(test);
}
}
}
Currently, when I run the above, I get an ArgumentOutOfRangeException at string test = doc.DocumentNode.Element("html").Element("body").Element("form").Elements("div").ElementAt(2).Element("table").Element("tbody").Elements("tr").ElementAt(4).Element("td").Element("center").Element("div").InnerText.ToString();.
doc.DocumentNode.Element("html").InnerText.ToString() seems to give me the source code for the entire page.
The URL of the website I'm trying to parse is: http://polyclinic.singhealth.com.sg/Webcams/QimgPage.aspx?Loc_Code=BDP
It looks like you're after a specific DIV, if I'm not mistaking the one you're after has a unique identifier <td class="queueNo"><center><div id="divRegPtwVal">0</div></center></td>.
Why not simply use doc.DocumentNode.SelectSingleNode("//div[#id='divRegPtwVal']") or doc.DocumentNode.Descendants("div").Where(div => div.Id == "divRegPtwVal").FirstOrDefault()
Select the image source for a specific image with id:
var attrib = doc.DocumentNode.SelectSingleNode("//img[#id='imgCam2']/#src");
//I suspect, might be a slightly different property, I can't check right now
string src = attrib.InnerText;
Or:
var img = doc.DocumentNode.Descendants("img").Where(img => img.Id=="imgCam2");
string src = img.Attributes["Source"].Value;
I am trying to figure out how to read header links using C#.NET. I want to get the edit link from Browser1 and put it in browser 2. My problem is that I can't figure out how to get at attributes, or even the link tags for that matter. Below is what I have now.
using System.XML.Linq;
...
string source = webKitBrowser1.DocumentText.ToString();
XDocument doc = new XDocument(XDocument.Parse(source));
webKitBrowser2.Navigate(doc.Element("link").Attribute("href").Value.ToString());
This would work except that xml is different than html, and right off the bat, it says that it was expecting "doctype" to be uppercase.
I finally figured it out, so I will post it for anyone who has the same question.
string site = webKitBrowser1.Url.Scheme + "://" + webKitBrowser1.Url.Authority;
WebKit.DOM.Document doc = webKitBrowser1.Document;
WebKit.DOM.NodeList links = doc.GetElementsByTagName("link");
WebKit.DOM.Element link;
string editlink = "none";
foreach (var item in links)
{
link = (WebKit.DOM.Element)item;
if (link.Attributes["rel"].NodeValue == "edit") { editlink = link.Attributes["href"].NodeValue; }
}
if (editlink != "none") { webKitBrowser2.Navigate(site + editlink); }