HTMLAgilityPach - How to get a child div ignoring subgroups

HTMLAgilityPach - How to get a child div ignoring subgroups - c#

I have these two following HTML:
-- first HTML
<div id="FIRST">
<span>foo</span>
<div id="SECOND">
<span>bar</span>
</div>
</div>
-- second HTML
<div id="FIRST">
<div id="SECOND">
<span>bar</span>
</div>
</div>
I would like to get the span inside the FIRST div on the first HTML, but there are situations when this span inside the FIRST div doesn't exists as you can see on the second HTML.
Now I am using the following code, but the code is getting the span inside the SECOND div.
SelectSingleNode(".//span")
Obs: Remember that in my example I have only two levels of divs but in my real HTML I have a loooooooot of levels.
I need to get the span considering only tags in the first div

To get only <span>s that is direct child of the <div id="FIRST">, you can either use ./span or span, assuming that the context where you want to call SelectSingleNode() is the aforementioned <div id="FIRST"> :
SelectSingleNode("./span")
SelectSingleNode("span")

Here is an alternative:
SelectSingleNode("span[1]");
This selects the first span element in the HtmlDocument

Related

Selenium not recognizing span element within a div, thinks it's text?

I'm trying to grab the text from a span that's inside a div. The div is currently selected, so it has "curr" within its class.
The DOM:
<a id="ctl00_oAjaxContentPlaceHolder_LinkButtonAlerts" href="javascript:__doPostBack('ctl00$oAjaxContentPlaceHolder$LinkButtonAlerts','')">
<div id="ctl00_oAjaxContentPlaceHolder_divAlertAlertsHolder" class="profile-menu-alerts curr" title="Activities & Alerts">
<span>Activities & Alerts</span>
</div>
</a>
This XPath should find the span (it works when I use the Find tool in DevTools), but it fails to find the element
//div[contains(#class,'curr')]/span
If I remove the /span from the xpath, it finds the div just fine. And the strange part is that if I grab the text of that div with
driver.FindElement(By.XPath("//div[contains(#class,'curr')]")).Text;
it returns "<span>Activities & Alerts</span>". Why is this span element being incorrectly recognized as Text?

I ran this on my solution using the below and had no issues.
var test = Driver.FindElement_byXPath("//div[contains(#class,'curr')]/span").Text;
html - added another option:
<a id="ctl00_oAjaxContentPlaceHolder_LinkButtonAlerts" href="javascript:__doPostBack('ctl00$oAjaxContentPlaceHolder$LinkButtonAlerts','')">
<div id="ctl00_oAjaxContentPlaceHolder_divAlertAlertsHolder" class="profile-menu-alerts" title="Activities & Alerts">
<span>Test 1</span>
</div>
</a>
<a id="ctl00_oAjaxContentPlaceHolder_LinkButtonAlerts" href="javascript:__doPostBack('ctl00$oAjaxContentPlaceHolder$LinkButtonAlerts','')">
<div id="ctl00_oAjaxContentPlaceHolder_divAlertAlertsHolder" class="profile-menu-alerts curr" title="Activities & Alerts">
<span>Activities & Alerts</span>
</div>
</a>

xPath working for only one of multiple div tags found at the same level

Writing tests with Selenium webdriver in C#. I absolutely can't understand why only the first in a list of (same-level) div elements can be identified with xPath.
I have this html, I have inspected two elements on the page, two different divs. I managed to copy just the text of the first element, by running this SIMPLE code:
IWebElement chapterElement = webDriver.FindElement(By.XPath("/html/body/div[3]/main/div[2]/div[3]/article/div[1]"));
...after which I can just type:
chapterElement.Text to find out the inner text.
And the other one is another div, at the same level as the first, the xPath I just copied from the HTML (copy entire xPath):
IWebElement chapterElement = webDriver.FindElement(By.XPath("/html/body/div[3]/main/div[2]/div[3]/article/div[2]"));
... and it doesn't fail, but it doesn't copy the text also, the text is "" (empty string).
The only differences between the two divs are:
the last segment in the path: div[1] versus div[2].
the second div is actually hidden from the page (probably because it lacks the class "chapter_visible"), but does show up completely in the html with Inspect!
In case this helps, I'm gonna say
"/html/body/div[3]/main/div[2]/div[3]/article/div[1]"
corresponds with:
<div class="chapter chapter chapter_visible" data-chapterno="0" data-chapterid="5e8798266cee070006f5a3d1" style="display: block;">
<h1>some text</h1>
<div class="chapter__content"><p>some text</p>
<p>some text</p>
<p>some text</p>
<ul>
<li>some text</li>
<li>some text</li>
<li>some text.</li>
</ul></div>
</div>
and
"/html/body/div[3]/main/div[2]/div[3]/article/div[2]" (the second xPath)
corresponds to the following (as is located at the same level as the first):
<div class="chapter chapter" data-chapterno="1" data-chapterid="5e8798436cee070006f5a3d2">
<h1>some text</h1>
<div class="chapter__content"><p>some text</p>
<p><strong>some text</strong></p>
<p>some text.</p>
<p>some text</p>
<p>some text</p></div>
</div>
This is my first experience playing around with xPath, a bit disappointed because I just copied the xPath, I didn't even write it manually. It was supposed to be fast and straightforward, right? Thank you.

IWebElement chapterElement = webDriver.FindElement(By.XPath("//div[#class='chapter chapter']"));
Can u try this?
if you want get_attribute
IWebElement chapterElement = webDriver.FindElement(By.XPath("//div[#class='chapter chapter']")).GetAttribute("attribute_name");

Selenium XPath Query - FindElement After Text

I am trying to get a link in a website which changes name on a daily basis. The structure is similar to this (but with many more levels):
<li>
<div class = "contentPlaceHolder1">
<div class="content">
<p>
<strong>'Today's File Here:<strong>
</p>
</div>
</div>
</li>
<li>...<li>
<li>...<li>
<li>...<li>
<li>
<div class = "contentPlaceHolder1">
<div class="content">
<div class="DocLink">
<li>
Download
</li>
</div>
</div>
</div>
</li>
<li>...<li>
etc...
If I find the text (which will remain constant) which is immediately above it in the page by using
IWebElement foundTextElement = chrome.FindElement(By.XPath("//p/strong['Today's File Here:']"));
How can I find the next link in the page by using XPath (or alternative solution)? I am unsure of how to search for the next element after this.
If I use
IWebElement link = chrome.FindElement(By.XPath("//a[#class='txtLnk'"));
then this finds the first link in the page. I only want the first occurance of it after 'foundTextElement'
I have had it working by navigating up the tree to the parent above <li>, and finding the 4th sibling using By.XPath("following-sibling::*[4]/div/div/div/li/a[#class='txtLnk']") but that seems a little precarious to me.
I could parse the HTML until it finds the next occurrence in the html, but was wondering whether there is a more clever way of doing this?
Thanks.

You can try this xpath. It's complicated, as we don't see the rest of the page to optimize it
//li[preceding-sibling::li[.//*[contains(text(),'File Here')]]][.//a[contains(#class,'txtLnk')]][1]
it searches first li which has inside a tag with txtLnk class and it is first found followed after li element with text containing File Here

By.XPath("//a[#class='txtLnk'")
Is a very generic selector, there might be other elements on the page using the same class
You can find this using a CssSelector, try this:
IWebElement aElement = chrome.FindElement(By.CssSelector("div.contentPlaceHolder1 div.content div.DocLink li a"));
Then you can get the href using:
string link = aElement.getAttribute("href") ;

Retrieving deep nested values looping through a HTML page using HTMLAgilityPack C#

I'm trying to use the HTMLAgilityPack to retrieve various specific values from a web page. The web page is always the same an the data I want to scrape from it is always in the same place (same divs/classes/attributes etc).
I've tried to loop through and get the values, but I always mess up somewhere. I'd provide some code to help but honestly I've tried 5 times and each time I don't get results close to what I want to - I'm well and truly in a pickle.
I have written the main chunk of HTML:
<div id ="markers">
<div class="row">
<div class="span2 filter-pane ">
<div class="teaser teaser-small">
<h1 class="teaser-title">
...
</div>
<p> Value4 </p>
</div>
</div>
<div class="span2 filter-pane ">
</div>
<div class="span2 filter-pane ">
</div>
</div>
<div class="row"></div>
<div class="row"></div>
</div>
Basically the values (1-4) are the values I want to extract from the data.
The <div id="markers"> is ONE div on the page, all the information I need is in this div.
There are multiple <div class="row"> divs, I need to loop through all of these.
Inside each of these divs, there are three or less <div class="span2 filter-pane "> divs. I need to loop through these 3 divs also.
My data is inside here - Value3 is here in the <p>...</p>. And the other values can be found within the <h1 class="teaser-title"> node, where they are attributes in an <a> element.
I hope somebody can provide me with a solution, or at least some good guidance to accessing all pieces of data I want. I've tried various things but I don't get the results I want.
Thanks.

Here are some hints for you. So first you need to get div#markers because you mentioned that it contains all your info you need.
string mainURL = your url;
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(mainURL);
var markerDiv = doc.DocumentNode.Descendants("div").FirstOrDefault(n => n.Id.Equal("markers');
//Check if marketDiv is null or not
//Same idea, get list of row divs
var rows = marketDiv.Descendants("div").HasClass("row") //I will provide .HasClass function or you can write your own, it's simple;
//Iterate throw your rows object
//for each row object
var aElement = row.Descendants("a").FirstOrDefault()//you can have more criteria here if it has more than 1 a element
aElement.GetAttributeValue("data-lat", "") //will return Value1 here, do the same thing for other attributes and p.
Hope it helps

webdriver - Get a count on all class names contained within a specific div

Using webdriver with c#, I'm trying to get a count of all items contained within a specific drop down menu (it's not a select element). The trouble is, there are many other elements on my page containing the same class name so what I need is a way to filter class names within a specific div only.
Here is an example of the code I'm looking at:
<div id="DropDownMenu1">
<span class="drop-combobox">
<div class="drop-item-content">list item number 1</div></div>
<div class="drop-item-content">list item number 2</div></div>
<div class="drop-item-content">list item number 3</div></div>
</span>
</div>
Throughout my page I will also have additional drop down menus like this (all class names are the same - just the div's have a different name):
<div id="DropDownMenu2">
<span class="drop-combobox">
<div class="drop-item-content">list item number 1</div></div>
<div class="drop-item-content">list item number 2</div></div>
<div class="drop-item-content">list item number 3</div></div>
</span>
</div>
<div id="DropDownMenu3">
<span class="drop-combobox">
<div class="drop-item-content">list item number 1</div></div>
<div class="drop-item-content">list item number 2</div></div>
<div class="drop-item-content">list item number 3</div></div>
</span>
</div>
I have been able to get a count using (from memory) something like this:
driver.FindElements(By.ClassName("drop-item-content").Count());
Trouble is, my count returns all classes with the name of "drop-item-content" but I need to get a count on all class names contained within 1 specific div.
I hope that makes sense (and I hope that someone could help) :)
Thanks very much

You try to get first, the hole element with the id, then you filter with the element(div in thi case becasue the span you do not want it) and then, use the class name.The class name is not requred becasue you want all the Div, in case the selector fail, use it with out the class name. CssSelector must work. Try the next line:
Driver.FindElements(By.CssSelector("#DropDownMenu3 div.drop-item-content").count();

From top of my head:
WebElement masterDiv = driver.FindElements(By.Id("DropDownMenu3"));
// Find subelements of that element
int count = masterDiv.FindElements(By.XPath("./div[#class="drop-item-content"])).Count();

Try using XPath to find all elements,
//div[#id='DropDownMenu1']//div[#class='drop-item-content']
FindElements will hold all the classes inside the specified div,
driver.FindElements(By.xpath("//div[#id='DropDownMenu1']//div[#class='drop-item-content']").Count());

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

HTMLAgilityPach - How to get a child div ignoring subgroups - c#

To get only <span>s that is direct child of the <div id="FIRST">, you can either use ./span or span, assuming that the context where you want to call SelectSingleNode() is the aforementioned <div id="FIRST"> : SelectSingleNode("./span") SelectSingleNode("span")

Here is an alternative: SelectSingleNode("span[1]"); This selects the first span element in the HtmlDocument

Related

Selenium not recognizing span element within a div, thinks it's text?

xPath working for only one of multiple div tags found at the same level

Selenium XPath Query - FindElement After Text

Retrieving deep nested values looping through a HTML page using HTMLAgilityPack C#

webdriver - Get a count on all class names contained within a specific div

Categories

Resources