Selenium C# - How to Check All Links

Selenium C# - How to Check All Links - c#

I am trying to check all links on a page. Some questions already were asked on this topic, but for some reason none are working when I tried. One particular issue I'm having is that after going to a page and after getting all links into a list variable, when looping through them, error message shows the link to be a stale reference. Here is the code snippet:
var driver = new FirefoxDriver();
driver.Navigate().GoToUrl(URLPROD);
driver.Manage().Window.Maximize();
ICollection<IWebElement> links = driver.FindElements(By.TagName("a"));
foreach (var link in links)
{
if (!(link.Text.Contains("Email")) || !(link.Text == "") || !(link.Text == null) || !(link.Text.Contains("Element")))
{
((IJavaScriptExecutor)driver).ExecuteScript("arguments[0].scrollIntoView(true);", link);
Console.WriteLine(link);
driver.ExecuteScript("arguments[0].click();", link);
driver.Navigate().Back();
}
}
Error message: OpenQA.Selenium.StaleElementReferenceException: 'The element reference of is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed'
What should I be doing to correct this error so that I can check each link on a page?

You could just re-find the links.
So 1. get number of links 2. loop that number getting the links fresh each time (to avoid stale errors).
var links = driver.FindElements(By.TagName("a"));
for (int i=0; i < links.Count(); i++)
{
var newLinks = driver.FindElements(By.TagName("a"));
newLinks[i].Click();
driver.Navigate().Back();
}

Related

Element not interactable with C# app that uses Chrome WebDriver

PREFACE: After a lengthy Stack Overflow search I found two suggested solutions to solve the "element not interactable" problem I am having when I try to interact with the target node element. Neither of them worked, as described below.
I have a C# app that uses the OpenQA.Selenium package to remote control a YouTube web page. I am trying to click on a button on the page that opens a dialog box, but when I do I get the notorious "element not interactable" message. I found the following two suggestions on Stack Overflow:
Actions actions = new Actions(chromeDriver);
actions.MoveToElement(webElem);
actions.Perform();
And this suggestion that one commenter said is ill-advised because it can click on elements that are not visible or are below modal objects:
IJavaScriptExecutor executor = (IJavaScriptExecutor)chromeDriver;
executor.ExecuteScript("arguments[0].click();", webElem);
I tried the second one anyways to see if it worked. Unfortunately, with the first suggestion that uses the Actions interface, I still got "element not interactable" message but this time on the Perform() statement. The third attempt did not get the error message but it failed to click the button. I know this because clicking the button opens a dialog window when it works, and no dialog window appeared when I tried the third solution.
Below is the code I am using to try and click on the element. The collection it iterates are the elements I select via an XPath statement that finds the button I am want to click. It tries every button that matches the XPath statement and skips those that fail to work. Unfortunately, none of the 3 buttons found by the XPath statement work.
What is strange is that if I take the exact same XPath statement I am using in my C# app and plug it into the Chrome DevTools debugger, referencing the first element in the array of found elements, it works:
$x(strXPath)[0].click()
But so far nothing I have tried from C# app works. Does anyone have an idea on why I am having this problem?
public IWebElement ClickFirstInteractable(ChromeDriver chromeDriver)
{
string errPrefix = "(ClickFirstInteractable) ";
if (this.DOM_WebElemensFound == null || this.DOM_WebElemensFound.Count() < 1)
throw new NullReferenceException(errPrefix + "The DOM_WebElementsFound collection is empty.");
IWebElement webElemClicked = null;
foreach (IWebElement webElem in this.DOM_WebElemensFound)
{
// Try and "click" it.
try
{
// First make sure the element is visible, or we will get
// the "element not interactable" error.
/* FIRST ATTEMPT, didn't work.
*
webElem.scrollIntoView(true);
webElem.Click(); // <<<<<----- Error occurs here
*/
/* SECOND ATTEMPT using Actions, didn't work
* and I go the error message when the Perform() statement executes.
Actions actions = new Actions(chromeDriver);
actions.MoveToElement(webElem);
actions.Perform(); // <<<<<----- Error occurs here
*/
/* THIRD ATTEMPT using script execution, didn't work.
* I did not get the error message, but the button did not get clicked.
*/
IJavaScriptExecutor executor = (IJavaScriptExecutor)chromeDriver;
executor.ExecuteScript("arguments[0].scrollIntoView();", webElem);
executor.ExecuteScript("arguments[0].click();", webElem);
// Click operation accepted. Stop iteration.
webElemClicked = webElem;
break;
}
catch (ElementNotInteractableException exc)
{
// Swallow this exception and go on to the next element found by the XPath expression.
System.Console.WriteLine(exc.Message);
}
}
return webElemClicked;
}

I tried to reproduce your scenario by clicking on a "hidden" button, waiting for the modal to appear, then acting on that modal, etc.
I hope it helps you!
const string Target = #"https://www.youtube.com/";
using var driver = new ChromeDriver();
var wait = new WebDriverWait(driver, TimeSpan.FromSeconds(20))
{
PollingInterval = TimeSpan.FromMilliseconds(250),
};
driver.Navigate().GoToUrl(Target);
// i don't consent cookies to
// save time, so just do it
// here manually and then press enter to console
Console.ReadLine();
var menuLocator = By.XPath("//a[#id = 'video-title-link'][1]" +
"/ancestor::div[#id = 'meta']" +
"/following-sibling::div[#id = 'menu']" +
"//button[#class = 'style-scope yt-icon-button']");
var menu = wait.Until(d => d.FindElement(menuLocator));
var actions = new Actions(driver);
actions.MoveToElement(menu).Click().Perform();
var shareLocator = By.XPath("//div[#id = 'contentWrapper']//*[normalize-space(text()) = 'Share']");
var share = wait.Until(d => d.FindElement(shareLocator));
actions.MoveToElement(share).Click().Perform();
var copyLinkLocator = By.XPath("//button[#aria-label = 'Copy']");
var copyLink = wait.Until(d => d.FindElement(copyLinkLocator));
actions.MoveToElement(copyLink).Click().Perform();

Iterate through web pages and download PDFs

I have a code for crawling through all PDF files on web page and download them to folder. However now it started to drop an error:
System.NullReferenceException HResult=0x80004003 Message=Object
reference not set to an instance of an object. Source=NW Crawler
StackTrace: at NW_Crawler.Program.Main(String[] args) in
C:\Users\PC\source\repos\NW Crawler\NW Crawler\Program.cs:line 16
Pointing to ProductListPage in foreach (HtmlNode src in ProductListPage)
Is there any hint on how to fix this issue? I have tried to implement async/await with no success. Maybe I was doing something wrong tho...
Here is the process to be done:
Go to https://www.nordicwater.com/products/waste-water/
List all links in section (related products). They are: <a class="ap-area-link" href="https://www.nordicwater.com/product/mrs-meva-multi-rake-screen/">MRS MEVA multi rake screen</a>
Proceed to each link and search for PDF files. PDF files are in:
<div class="dl-items">
<a href="https://www.nordicwater.com/wp-content/uploads/2016/04/S1126-MRS-brochure-EN.pdf" download="">
Here is my full code for testing:
using HtmlAgilityPack;
using System;
using System.Net;
namespace NW_Crawler
{
class Program
{
static void Main(string[] args)
{
{
HtmlDocument htmlDoc = new HtmlWeb().Load("https://www.nordicwater.com/products/waste-water/");
HtmlNodeCollection ProductListPage = htmlDoc.DocumentNode.SelectNodes("//a[#class='ap-area-link']//a");
Console.WriteLine("Here are the links:" + ProductListPage);
foreach (HtmlNode src in ProductListPage)
{
htmlDoc = new HtmlWeb().Load(src.Attributes["href"].Value);
// Thread.Sleep(5000); // wait some time
HtmlNodeCollection LinkTester = htmlDoc.DocumentNode.SelectNodes("//div[#class='dl-items']//a");
if (LinkTester != null)
{
foreach (var dllink in LinkTester)
{
string LinkURL = dllink.Attributes["href"].Value;
Console.WriteLine(LinkURL);
string ExtractFilename = LinkURL.Substring(LinkURL.LastIndexOf("/"));
var DLClient = new WebClient();
// Thread.Sleep(5000); // wait some time
DLClient.DownloadFileAsync(new Uri(LinkURL), #"C:\temp\" + ExtractFilename);
}
}
}
}
}
}
}

Made a couple of changes to cover the errors you might be seeing.
Changes
Use of src.GetAttributeValue("href", string.Empty) instead of src.Attribute["href"].Value;. If the href is not present or null, you will get Object Reference Not Set to an instance of an object
Check if ProductListPage is valid and not null.
ExtractFileName includes a / in the name. You want to use + 1 in the substring method to skip that 'Last / from index of)'.
Move on to the next iteration if the href is null on either of the loops
Changed the Product List query to //a[#class='ap-area-link'] from //a[#class='ap-area-link']//a. You were searching for <a> within the <a> tag which is null. Still, if you want to query it this way, the first IF statement to check if ProductListPage != null will take care of errors.
HtmlDocument htmlDoc = new HtmlWeb().Load("https://www.nordicwater.com/products/waste-water/");
HtmlNodeCollection ProductListPage = htmlDoc.DocumentNode.SelectNodes("//a[#class='ap-area-link']");
if (ProductListPage != null)
foreach (HtmlNode src in ProductListPage)
{
string href = src.GetAttributeValue("href", string.Empty);
if (string.IsNullOrEmpty(href))
continue;
htmlDoc = new HtmlWeb().Load(href);
HtmlNodeCollection LinkTester = htmlDoc.DocumentNode.SelectNodes("//div[#class='dl-items']//a");
if (LinkTester != null)
foreach (var dllink in LinkTester)
{
string LinkURL = dllink.GetAttributeValue("href", string.Empty);
if (string.IsNullOrEmpty(LinkURL))
continue;
string ExtractFilename = LinkURL.Substring(LinkURL.LastIndexOf("/") + 1);
new WebClient().DownloadFileAsync(new Uri(LinkURL), #"C:\temp\" + ExtractFilename);
}
}

The Xpath that you used seems to be incorrect. I tried loading the web page in a browser and did a search for the xpath and got no results. I replaced it with //a[#class='ap-area-link'] and was able to find matching elements, screenshot below.

Adding page numbers to PDFs including the cover page

I'm trying to add a page number to the bottom of the page, which doesn't seem to work as expected.
Currently i'm using this:
<span class="page"></span>/<span class="topage"></span>
The problem with this solution is that it doesn't count the cover as a page.
So a 7 page PDF "has" 6 pages according to my code.
I'm looking for a way to include the Cover as a page, so the number is correct.
Currently i'm looking into some JS to manipulate it afterwards, but there have to me some "official" solution?

Solved using javascript. :)
If anyone are looking for the solution here you go:
var x=window.location.search.substring(1).split('&');
for (var i in x) {
var z=x[i].split('=',2);
vars[z[0]] = unescape(z[1]);
}
var pageNumberStart = parseInt(vars.page);
var pageNumberEnd = parseInt(vars.topage);
if (pageNumberStart != null && pageNumberEnd != null) {
document.getElementById('page').innerHTML = pageNumberStart + 1;
document.getElementById('topage').innerHTML = pageNumberEnd + 1;
}
Maybe someone got the official way to do it? :D

How to prevent "stale element" inside a foreach loop?

I'm using Selenium for retrieve data from this site, and I encountered a little problem when I try to click an element within a foreach.
What I'm trying to do
I'm trying to get the table associated to a specific category of odds, in the link above we have different categories:
As you can see from the image, I clicked on Asian handicap -1.75 and the site has generated a table through javascript, so inside my code I'm trying to get that table finding the corresponding element and clicking it.
Code
Actually I have two methods, the first called GetAsianHandicap which iterate over all categories of odds:
public List<T> GetAsianHandicap(Uri fixtureLink)
{
//Contains all the categories displayed on the page
string[] categories = new string[] { "-1.75", "-1.5", "-1.25", "-1", "-0.75", "-0.5", "-0.25", "0", "+0.25", "+0.5", "+0.75", "+1", "+1.25", "+1.5", "+1.75" };
foreach(string cat in categories)
{
//Get the html of the table for the current category
string html = GetSelector("Asian handicap " + asian);
if(html == string.Empty)
continue;
//other code
}
}
and then the method GetSelector which click on the searched element, this is the design:
public string GetSelector(string selector)
{
//Get the available table container (the category).
var containers = driver.FindElements(By.XPath("//div[#class='table-container']"));
//Store the html to return.
string html = string.Empty;
foreach (IWebElement container in containers)
{
//Container not available for click.
if (container.GetAttribute("style") == "display: none;")
continue;
//Get container header (contains the description).
IWebElement header = container.FindElement(By.XPath(".//div[starts-with(#class, 'table-header')]"));
//Store the table description.
string description = header.FindElement(By.TagName("a")).Text;
//The container contains the searched category
if (description.Trim() == selector)
{
//Get the available links.
var listItems = driver.FindElement(By.Id("odds-data-table")).FindElements(By.TagName("a"));
//Get the element to click.
IWebElement element = listItems.Where(li => li.Text == selector).FirstOrDefault();
//The element exist
if (element != null)
{
//Click on the container for load the table.
element.Click();
//Wait few seconds on ChromeDriver for table loading.
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(20);
//Get the new html of the page
html = driver.PageSource;
}
return html;
}
return string.Empty;
}
Problem and exception details
When the foreach reach this line:
var listItems = driver.FindElement(By.Id("odds-data-table")).FindElements(By.TagName("a"));
I get this exception:
'OpenQA.Selenium.StaleElementReferenceException' in WebDriver.dll
stale element reference: element is not attached to the page document
Searching for the error means that the html page source was changed, but in this case I store the element to click in a variable and the html itself in another variable, so I can't get rid to patch this issue.
Someone could help me?
Thanks in advance.

I looked at your code and I think you're making it more complicated than it needs to be. I'm assuming you want to scrape the table that is exposed when you click one of the handicap links. Here's some simple code to do this. It dumps the text of the elements which ends up unformatted but you can use this as a starting point and add functionality if you want. I didn't run into any StaleElementExceptions when running this code and I never saw the page refresh so I'm not sure what other people were seeing.
string url = "http://www.oddsportal.com/soccer/europe/champions-league/paok-spartak-moscow-pIXFEt8o/#ah;2";
driver.Url = url;
// get all the (visible) handicap links and click them to open the page and display the table with odds
IReadOnlyCollection<IWebElement> links = driver.FindElements(By.XPath("//a[contains(.,'Asian handicap')]")).Where(e => e.Displayed).ToList();
foreach (var link in links)
{
link.Click();
}
// print all the odds tables
foreach (var item in driver.FindElements(By.XPath("//div[#class='table-container']")))
{
Console.WriteLine(item.Text);
Console.WriteLine("====================================");
}
I would suggest that you spend some more time learning locators. Locators are very powerful and can save you having to stack nested loops looking for one thing... and then children of that thing... and then children of that thing... and so on. The right locator can find all that in one scrape of the page which saves a lot of code and time.

As you mentioned in related Post, this issue is because site executes an auto refresh.
Solution 1:
I would suggest if there is an explicit way to do refresh, perform that refresh on a periodic basis, or (if you are sure, when you need to do refresh).
Solution 2:
Create a Extension method for FindElement and FindElements, so that it try to get element for a given timeout.
public static void FindElement(this IWebDriver driver, By by, int timeout)
{
if(timeout >0)
{
return new WebDriverWait(driver, TimeSpan.FromSeconds(timeout)).Until(ExpectedConditions.ElementToBeClickable(by));
}
return driver.FindElement(by);
}
public static IReadOnlyCollection<IWebElement> FindElements(this IWebDriver driver, By by, int timeout)
{
if(timeout >0)
{
return new WebDriverWait(driver, TimeSpan.FromSeconds(timeout)).Until(ExpectedConditions.PresenceOfAllElementsLocatedBy(by));
}
return driver.FindElements(by);
}
so your code will use these like this:
var listItems = driver.FindElement(By.Id("odds-data-table"), 30).FindElements(By.TagName("a"),30);
Solution 3:
Handle StaleElementException using an Extension Method:
public static void FindElement(this IWebDriver driver, By by, int maxAttempt)
{
for(int attempt =0; attempt <maxAttempt; attempt++)
{
try
{
driver.FindElement(by);
break;
}
catch(StaleElementException)
{
}
}
}
public static IReadOnlyCollection<IWebElement> FindElements(this IWebDriver driver, By by, int maxAttempt)
{
for(int attempt =0; attempt <maxAttempt; attempt++)
{
try
{
driver.FindElements(by);
break;
}
catch(StaleElementException)
{
}
}
}
Your code will use these like this:
var listItems = driver.FindElement(By.Id("odds-data-table"), 2).FindElements(By.TagName("a"),2);

Use this:
string description = header.FindElement(By.XPath("strong/a")).Text;
instead of your:
string description = header.FindElement(By.TagName("a")).Text;

Working with Pagination (Selenium webdriver C#)

I am facing a problem while automating my test for Pagination.
My code is able to iterate through table on first page and also able to click the Next page if 'Search Element not Found'. But, the problem is that for the second page it is not holding/getting the data of table. Although Class of table used is static for all pages.
Please Help me through this. Here's my chunk of code:
IWebElement pagingInfo = webDriver.FindElement(By.ClassName("Dj")); //Getting text from Page info in the form of "1-emailsPerPage of totalNumberOfEmails"
string[] stringArray = pagingInfo.Text.Split(' ');
int totalNumberOfEmails = Convert.ToInt32(stringArray[2]);
int emailsPerPage = Convert.ToInt32(stringArray[0].Substring(2));
int clickCount = totalNumberOfEmails / emailsPerPage;
for (int i = 0; i <= clickCount; i++)
{
IWebElement tableInbox = webDriver.FindElement(By.ClassName("Cp")).FindElement(By.ClassName("F"));
IList<IWebElement> rowsCollection = tableInbox.FindElements(By.TagName("tr"));
foreach (IWebElement row in rowsCollection)
{
IList<IWebElement> columnCollection = row.FindElements(By.TagName("td"));
if (columnCollection[5].Text.Contains("Fwd: Security"))
{
Console.WriteLine("Record found");
recordFound = true;
break;
}
}
if (recordFound == true)
break;
webDriver.FindElement(By.ClassName("ar5")).FindElement(By.ClassName("amJ")).Click();
Thread.Sleep(5000);
}
if (recordFound == true)
Console.WriteLine("Record Found");
else
Console.WriteLine("Record Not Found");
Please Help!! Thanks in advance :)

Actually, the page is using Ajax so i have to use iFrame for navigating through the same table for each page.
Using iFrame solves my purpose.
The problem is finally solved after a lot of research. :)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Selenium C# - How to Check All Links - c#

Related

Element not interactable with C# app that uses Chrome WebDriver

Iterate through web pages and download PDFs

Adding page numbers to PDFs including the cover page

How to prevent "stale element" inside a foreach loop?

Working with Pagination (Selenium webdriver C#)

Categories

Resources