I'm using Selenium for retrieve data from this site, and I encountered a little problem when I try to click an element within a foreach.
What I'm trying to do
I'm trying to get the table associated to a specific category of odds, in the link above we have different categories:
As you can see from the image, I clicked on Asian handicap -1.75 and the site has generated a table through javascript, so inside my code I'm trying to get that table finding the corresponding element and clicking it.
Code
Actually I have two methods, the first called GetAsianHandicap which iterate over all categories of odds:
public List<T> GetAsianHandicap(Uri fixtureLink)
{
//Contains all the categories displayed on the page
string[] categories = new string[] { "-1.75", "-1.5", "-1.25", "-1", "-0.75", "-0.5", "-0.25", "0", "+0.25", "+0.5", "+0.75", "+1", "+1.25", "+1.5", "+1.75" };
foreach(string cat in categories)
{
//Get the html of the table for the current category
string html = GetSelector("Asian handicap " + asian);
if(html == string.Empty)
continue;
//other code
}
}
and then the method GetSelector which click on the searched element, this is the design:
public string GetSelector(string selector)
{
//Get the available table container (the category).
var containers = driver.FindElements(By.XPath("//div[#class='table-container']"));
//Store the html to return.
string html = string.Empty;
foreach (IWebElement container in containers)
{
//Container not available for click.
if (container.GetAttribute("style") == "display: none;")
continue;
//Get container header (contains the description).
IWebElement header = container.FindElement(By.XPath(".//div[starts-with(#class, 'table-header')]"));
//Store the table description.
string description = header.FindElement(By.TagName("a")).Text;
//The container contains the searched category
if (description.Trim() == selector)
{
//Get the available links.
var listItems = driver.FindElement(By.Id("odds-data-table")).FindElements(By.TagName("a"));
//Get the element to click.
IWebElement element = listItems.Where(li => li.Text == selector).FirstOrDefault();
//The element exist
if (element != null)
{
//Click on the container for load the table.
element.Click();
//Wait few seconds on ChromeDriver for table loading.
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(20);
//Get the new html of the page
html = driver.PageSource;
}
return html;
}
return string.Empty;
}
Problem and exception details
When the foreach reach this line:
var listItems = driver.FindElement(By.Id("odds-data-table")).FindElements(By.TagName("a"));
I get this exception:
'OpenQA.Selenium.StaleElementReferenceException' in WebDriver.dll
stale element reference: element is not attached to the page document
Searching for the error means that the html page source was changed, but in this case I store the element to click in a variable and the html itself in another variable, so I can't get rid to patch this issue.
Someone could help me?
Thanks in advance.
I looked at your code and I think you're making it more complicated than it needs to be. I'm assuming you want to scrape the table that is exposed when you click one of the handicap links. Here's some simple code to do this. It dumps the text of the elements which ends up unformatted but you can use this as a starting point and add functionality if you want. I didn't run into any StaleElementExceptions when running this code and I never saw the page refresh so I'm not sure what other people were seeing.
string url = "http://www.oddsportal.com/soccer/europe/champions-league/paok-spartak-moscow-pIXFEt8o/#ah;2";
driver.Url = url;
// get all the (visible) handicap links and click them to open the page and display the table with odds
IReadOnlyCollection<IWebElement> links = driver.FindElements(By.XPath("//a[contains(.,'Asian handicap')]")).Where(e => e.Displayed).ToList();
foreach (var link in links)
{
link.Click();
}
// print all the odds tables
foreach (var item in driver.FindElements(By.XPath("//div[#class='table-container']")))
{
Console.WriteLine(item.Text);
Console.WriteLine("====================================");
}
I would suggest that you spend some more time learning locators. Locators are very powerful and can save you having to stack nested loops looking for one thing... and then children of that thing... and then children of that thing... and so on. The right locator can find all that in one scrape of the page which saves a lot of code and time.
As you mentioned in related Post, this issue is because site executes an auto refresh.
Solution 1:
I would suggest if there is an explicit way to do refresh, perform that refresh on a periodic basis, or (if you are sure, when you need to do refresh).
Solution 2:
Create a Extension method for FindElement and FindElements, so that it try to get element for a given timeout.
public static void FindElement(this IWebDriver driver, By by, int timeout)
{
if(timeout >0)
{
return new WebDriverWait(driver, TimeSpan.FromSeconds(timeout)).Until(ExpectedConditions.ElementToBeClickable(by));
}
return driver.FindElement(by);
}
public static IReadOnlyCollection<IWebElement> FindElements(this IWebDriver driver, By by, int timeout)
{
if(timeout >0)
{
return new WebDriverWait(driver, TimeSpan.FromSeconds(timeout)).Until(ExpectedConditions.PresenceOfAllElementsLocatedBy(by));
}
return driver.FindElements(by);
}
so your code will use these like this:
var listItems = driver.FindElement(By.Id("odds-data-table"), 30).FindElements(By.TagName("a"),30);
Solution 3:
Handle StaleElementException using an Extension Method:
public static void FindElement(this IWebDriver driver, By by, int maxAttempt)
{
for(int attempt =0; attempt <maxAttempt; attempt++)
{
try
{
driver.FindElement(by);
break;
}
catch(StaleElementException)
{
}
}
}
public static IReadOnlyCollection<IWebElement> FindElements(this IWebDriver driver, By by, int maxAttempt)
{
for(int attempt =0; attempt <maxAttempt; attempt++)
{
try
{
driver.FindElements(by);
break;
}
catch(StaleElementException)
{
}
}
}
Your code will use these like this:
var listItems = driver.FindElement(By.Id("odds-data-table"), 2).FindElements(By.TagName("a"),2);
Use this:
string description = header.FindElement(By.XPath("strong/a")).Text;
instead of your:
string description = header.FindElement(By.TagName("a")).Text;
Related
The webscraper from the library works in htmlnodes, it's hard to explain but I am sort of scraping a tag and then the inside and I want to handle the inside like an array, which it is by default in this library but the issue is, I can iterate over it with a "for loop" like any other array, but I cannot access it with an index for some reason...
this is my code with the website link exactly like the documentation of the library uses:
In the main:
static void Main(string[] args) {
var scraper = new HelloScraper();
scraper.Start();
}
then Init:
public override void Init() {
this.LoggingLevel = WebScraper.LogLevel.None;
this.Request("https://1337x.to/sort-search/Aquaman/time/desc/1/", Parse);
}
And now the Parse which gives me trouble and I will split it to show what works and what doesn't.
This works:
public override void Parse(Response response) {
foreach (var torrentLink in response.Css("tr")) {
HtmlNode[] torrentContents = torrentLink.Css("td");
for (int i = 0; i < torrentContents.Length; i++) {
Console.WriteLine($"{i}: {torrentContents[i].InnerText}");
}
Console.WriteLine();
}
}
To make it easier to understand I will talk about a single "torrent" here.
this working piece of code produces:
0: Aquaman IMAX (2019) AC3 5.1 ITA.ENG 1080p H265 sub NUita.eng Sp33dy94 MIRCrew1
1: 7
2: 0
3: 8pm Oct. 2nd
4: 4.2 GB7
5: Sp33dy94
but this piece of code which basically selects what I need based on the same array with the indexes that I can see that work from the for loop:
public override void Parse(Response response) {
foreach (var torrentLink in response.Css("tr")) {
HtmlNode[] torrentContents = torrentLink.Css("td");
string torrentName = torrentContents[0].InnerText;
string torrentSeeds = torrentContents[1].InnerText;
string torrentSize = torrentContents[4].InnerText;
Console.WriteLine($"{torrentName} --> [Size:{torrentSize} | Seeds:{torrentSeeds}]");
Console.WriteLine();
}
}
this produces nothing... console doesn't display an error, and when I tried to debug it, it looks as when I try to access by index it "points to a null reference".
Maybe I am missing something, but if an array can be access by index in a for loop, it should be accessible outside of it too, am I wrong? what is the issue here?
btw I don't know whether 1337x.to allows web scraping or not, but I am not intending nor to use this commercially or myself, it is just a website I chose to practice with...
After many hours of messing around in the debugger I got it,
when I iterate with a for loop, it skips empty array, and the first was empty, it is the title of the page table, which has no values inside.
adding a simple if statement to check whether the length is more than 0 fixes the issue:
public override void Parse (Response response) {
foreach (var torrentLink in response.Css ("tr")) {
HtmlNode[] torrentContents = torrentLink.Css ("td");
if (torrentContents.Length > 0) {
string torrentName = torrentContents[0].InnerText;
string torrentSeeds = torrentContents[1].InnerText;
string torrentSize = torrentContents[4].InnerText;
Console.WriteLine ($"{torrentName} --> [Size:{torrentSize} | Seeds:{torrentSeeds}]");
Console.WriteLine ();
}
}
}
I have an async method which calls a mapper for turning HTML string into an IEnumerable:
public async Task<IEnumerable<MovieRatingScrape>> GetMovieRatingsAsync(string username, int page)
{
var response = await _httpClient.GetAsync($"/betyg/{username}?p={page}");
response.EnsureSuccessStatusCode();
var html = await response.Content.ReadAsStringAsync();
return new MovieRatingsHtmlMapper().Map(html);
}
...
public class MovieRatingsHtmlMapper : HtmlMapperBase<IEnumerable<MovieRatingScrape>>
{
// In reality, this method belongs to base class with signature T Map(string html)
public IEnumerable<MovieRatingScrape> Map(string html)
{
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
return Map(htmlDocument);
}
public override IEnumerable<MovieRatingScrape> Map(HtmlDocument item)
{
var movieRatings = new List<MovieRatingScrape>();
var nodes = item.DocumentNode.SelectNodes("//table[#class='list']/tr");
foreach (var node in nodes)
{
var title = node.SelectSingleNode("//td[1]/a")?.InnerText;
movieRatings.Add(new MovieRatingScrape
{
Date = DateTime.Parse(node.SelectSingleNode("//td[2]")?.InnerText),
Slug = node.SelectSingleNode("//td[1]/a[starts-with(#href, '/film/')]")?
.GetAttributeValue("href", null)?
.Replace("/film/", string.Empty),
SwedishTitle = title,
Rating = node.SelectNodes($"//td[3]/i[{XPathHasClass("fa-star")}]").Count
});
}
return movieRatings;
}
}
The resulting list movieRatings contains copies of the same object, but when I look at the HTML and when I debug and view the HtmlNode node they differ as they are supposed to.
Either I'm blind to something really obvious, or I am hitting some async issue which I do not grasp. Any ideas? I should be getting 50 unique objects out of this call, now I am only getting the first 50 times.
Thank you in advance, Viktor.
Edit: Adding some screenshots to show my predicament. Look at locals InnerHtml (node) and title for item 1 and 2 of the foreach loop.
Edit 2: Managed to reproduce on .NET Fiddle: https://dotnetfiddle.net/A2I4CQ
You need to use .// and not //
Here is the fixed Fiddle: https://dotnetfiddle.net/dZkSRN
// will search anywhere in the document
.// will search anywhere in the current node
i am not super sure how to describe this but your issue is here (i think)
//table[#class='list']/tr"
specifically the //
I experienced the same thing while looking for a span. i had to use something similar
var nodes = htmlDoc.DocumentNode.SelectNodes("//li[#class='itemRow productItemWrapper']");
foreach(HtmlNode node in nodes)
{
var nodeDoc = new HtmlDocument();
nodeDoc.LoadHtml(node.InnerHtml);
string name = nodeDoc.DocumentNode.SelectSingleNode("//span[#class='productDetailTitle']").InnerText;
}
I am trying to check all links on a page. Some questions already were asked on this topic, but for some reason none are working when I tried. One particular issue I'm having is that after going to a page and after getting all links into a list variable, when looping through them, error message shows the link to be a stale reference. Here is the code snippet:
var driver = new FirefoxDriver();
driver.Navigate().GoToUrl(URLPROD);
driver.Manage().Window.Maximize();
ICollection<IWebElement> links = driver.FindElements(By.TagName("a"));
foreach (var link in links)
{
if (!(link.Text.Contains("Email")) || !(link.Text == "") || !(link.Text == null) || !(link.Text.Contains("Element")))
{
((IJavaScriptExecutor)driver).ExecuteScript("arguments[0].scrollIntoView(true);", link);
Console.WriteLine(link);
driver.ExecuteScript("arguments[0].click();", link);
driver.Navigate().Back();
}
}
Error message: OpenQA.Selenium.StaleElementReferenceException: 'The element reference of is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed'
What should I be doing to correct this error so that I can check each link on a page?
You could just re-find the links.
So 1. get number of links 2. loop that number getting the links fresh each time (to avoid stale errors).
var links = driver.FindElements(By.TagName("a"));
for (int i=0; i < links.Count(); i++)
{
var newLinks = driver.FindElements(By.TagName("a"));
newLinks[i].Click();
driver.Navigate().Back();
}
I am using c# and sitecore to basically use tokens in certain places ( see: how to create a custom token in sitecore ). I think I have a solution, but am not sure as to why it is not working, even though I am getting no errors.
Item tokenItem = Sitecore.Context.Database.Items["/sitecore/content/Site Content/Tokens"];
if (tokenItem.HasChildren)
{
var sValue = args.FieldValue.ToString();
foreach (Item child in tokenItem.Children)
{
if (child.Template.Name == "Token")
{
string home = child.Fields["Title"].Value;
string hContent = child.Fields["Content"].Value;
if (sValue.Contains(home))
{
home.Replace(home, hContent);
}
}
}
}
home and hContent pull up the correct values of each container, but when the page loads, it still has the "home" value inputted (the ie: ##sales) in the content area instead of the new value, which is stored in hContent. The sValue contains everything (tables, divs, text) and I was trying to single out a value that equals to "home" and replace the "home" value with hContent. What am I missing?
If your code is implemented as a processor for the RenderField pipeline, you need to put the result of your work back into args. Try something like this:
Item tokenItem = Sitecore.Context.Database.Items["/sitecore/content/Site Content/Tokens"];
if (tokenItem.HasChildren)
{
var sValue = args.Result.FirstPart;
foreach (Item child in tokenItem.Children){
if (child.Template.Name == "Token") {
string home = child.Fields["Title"].Value;
string hContent = child.Fields["Content"].Value;
if (sValue.Contains(home)) {
sValue = sValue.Replace(home, hContent);
}
}
}
args.Result.FirstPart = sValue;
}
Note that you need to be sure to patch this processor into the pipeline after the GetFieldValue processor. That processor is responsible for pulling the field value into args.Result.FirstPart.
You code isn't really doing anything. You seem to be replacing the tokens on the token item field itself (child.Fields["Title"] and child.Fields["Content"]), not on the output content stream.
Try the following, you need to set the args to the replaced value, replacing both the FirstPart and LastPart properties: Replace Tokens in Rich Text Fields Using the Sitecore ASP.NET CMS (link to the code in the "untested prototype" link).
I would refactor your code to make it easier:
public void Process(RenderFieldArgs args)
{
args.Result.FirstPart = this.Replace(args.Result.FirstPart);
args.Result.LastPart = this.Replace(args.Result.LastPart);
}
protected string Replace(string input)
{
Item tokenItem = Sitecore.Context.Database.Items["/sitecore/content/Site Content/Tokens"];
if (tokenItem.HasChildren)
{
foreach (Item child in tokenItem.Children)
{
if (child.Template.Name == "Token")
{
string home = child.Fields["Title"].Value;
string hContent = child.Fields["Content"].Value;
if (input.Contains(home))
{
return input.Replace(home, hContent);
}
}
}
}
return input;
}
This is still not optimal, but gets you closer.
Well, Do you know what happens when you performs home.Replace(home, hContent);, it will create a new instance by replacing the content of the come with what is in hContent so what you need to do is, assign this instance to a new variable or to home itself. hence the snippet will be like the following:
if (sValue.Contains(home))
{
home = home.Replace(home, hContent);
}
Have you tried:
home = home.Replace(home,hContent);
I am trying to verify if the added text in a list box has been successfully removed or not. What is the best way to handle this type of scenario in Selenium with C#?
Given below is the code I am using currently.
//Verify that the subject is added and then deleted
public static void VerifySubjectDel()
{
string subjectAddValue = GenerateRandomAlphaCode(200);
productPage.subjectAddTxtBx.SendKeys(subjectAddValue);
productPage.subjectAddBtn.Click();
IWebElement elem = WebDriver.FindElement(By.Id("Subjects_ListBox"));
SelectElement selectList = new SelectElement(elem);
IList<IWebElement> options = selectList.Options;
if (options.ToList().Any(tagname => tagname.Text.Contains(subjectAddValue)))
{
Assert.IsTrue(true);
selectList.SelectByText(subjectAddValue);
productPage.subjectDelBtn.Click();
WebDriver.SwitchTo().Alert().Accept();
bool subjectDel = WebDriver.FindElements(By.XPath(".//*[#id='Subjects_ListBox']//option[contains(text(),'" + subjectAddValue + "')]")).Count == 0;
if (subjectDel)
{
Assert.IsTrue(subjectDel);
}
else
Assert.IsTrue(subjectDel, "Subject not deleted successfully");
}
else
Assert.IsTrue(false, "The Subject added is not present in the Subject-ListBox");
}
I would call FindElements on the IWebElement you captured above and return all the elements within the list box. Then using LINQ you could do something like
bool success = !listBoxItems.Any(x => string.Compare(x.Text, subjectAddValue) == 0):