When to use double slash in HtmlAgilityPack SelectNodes - c#

I want to loop through all rows in a table and select all <p> in a row.
foreach (var r in Table.SelectNodes("tr"))
{
var Paragraphs = r.SelectNodes("//p");
}
Why do I have have to use SelectNodes("//p") and not just SelectNodes("p")? If I do the latter I always get null.
I'm wondering why don't I use //tr in the foreach statement.

As such written //p, in this case, it will find "p" nodes located at any depth within the HTML tree of your tr element.
If you write it /p it will search only in the root node of the HTML tree of your tr element
Example:
With //p you will find 2 <p> elements, with only /p you will not find it and null will be return.
<tr>
<div>
<p></p>
</div>
<div>
<div>
<p></p>
</div>
<div>
</tr>
In this case, if you search by /p, the element will be found.
<tr>
<p></p>
</tr>

Related

How to get a table inside a div, searching div by a certain id using htmlagilitypack

There is a html like
<div id="info_tab_members">
<div id="info_members" class="tabslevel">
<ul>
<li>Past members</li>
<li>Live musicians</li>
</ul>
<div id="info_tab_members_all">
<div class="ui-tabs-panel">
<!-- THIS TABLE I WANT -->
<table class="display tblClass" cellpadding="0" cellspacing="0">....
<!-- DATA I WANT -->
</table>
</div>
</div>
<div id="info_tab_members_current">
<div class="ui-tabs-panel">
<table class="display tblClass" cellpadding="0" cellspacing="0"> ...
</table>
</div>
</div>
</div>
</div>
How to get the table that is within div with id info_tab_members_all?
something to consider is that there are several tables that have a common class display tblClass
I have tried:
first I tried to do
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("table[#class='display tblClass']/tbody/tr"))
{
...
}
but the issue is that I get data from all tables that have display tblClass
so then I tried:
var tbl = doc.DocumentNode.
SelectSingleNode("//*[#id='info_tab_members_all']").
SelectNodes("table[#class='display tblClass']/tbody/tr").
ToList();
but I get error:
“Object reference not set to an instance of an object”
How can I specify the table i want with the div id: 'info_tab_members_all' ?
If you're able to use HtmlAgilityPack.CssSelectors, then you're in luck,
var table = htmlDoc.QuerySelectorAll("#info_tab_members_all table");
// table is `IList<HtmlNode>`
If not, then you just need the right XPath. Here's a great reference for converting CSS to XPath and vice versa.
var table = htmlDoc.DocumentNode.SelectSingleNode("//*[#id='info_tab_members_all']/*/table")
// table is `HtmlNode`

Identify XPath from particular element

I'm working on a page, where page loads dynamically and the data gets added while scrolling. To identify the properties of an item, I identified the parent div, where to identify the address, I have to locate an XPath from the parent to span element.
Below is my DOM structure:
<div class = "parentdiv">
<div class = "search">
<div class="header">
<div class="data"></div>
<div class="address-data">
<div class="address" itemprop="address">
<a itemprop="url" href="/search/Los-Angeles-CA-90025">
<span itemprop="streetAddress">
Avenue
</span>
<br>
<span itemprop="Locality">Los Angeles</span>
<span itemprop="Region">CA</span>
</a>
</div>
</div>
</div>
</div>
</div>
</div>
Here I want to locate the three spans, where I' currently in parent div.
Can someone guide how to locate an element using XPath from particular div?
You can try the following XPaths,
To locate the street address:
//div[#class="parentdiv"]/div/div/a/span[#itemprop="streetAddress"]
To locate the locality/city:
//div[#class="parentdiv"]/div/div/a/span[#itemprop="Locality"]
To locate the state:
//div[#class="parentdiv"]/div/div/a/span[#itemprop="Region"]
To print the list of <span> tagged WebElements with texts like Avenue with respect to div class = "parentdiv" node you can use the following block of code :
IList<IWebElement> myList = Driver.FindElements(By.CssSelector("div.parentdiv > div.address > a[itemprop=url] > span"));
foreach (IWebElement element in myList)
{
string my_add = element.GetAttribute("innerHTML");
Console.WriteLine(my_add);
}
Your DOM might become fairly large, since it adds elements while scrolling, so using CSS selectors might be quicker.
To get all the span tags in the div, use:
div[class='address'] span
To get a specific span by using the itemprop attribute use:
div[class='address'] span[itemprop='streetAddress']
div[class='address'] span[itemprop='Locality']
div[class='address'] span[itemprop='Region']
You can store the elements in a variable like so:
var streetAddress = driver.FindElement(By.CssSelector("div[class='address'] span[itemprop='streetAddress']"));
var locality = driver.FindElement(By.CssSelector("div[class='address'] span[itemprop='Locality']"));
var region = driver.FindElement(By.CssSelector("div[class='address'] span[itemprop='Region']"));

Fetch child elements under selected element using Selenium C#

Find the elements bellow the ul element, as per the following sample HTML:
<ul _ngcontent-nkg-43="" ngmodelgroup="option">
<span _ngcontent-nkg-17="" style="cursor: pointer;">Option 1</span>
<span _ngcontent-nkg-17="" style="cursor: pointer;">Option 2</span>
<span _ngcontent-nkg-17="" style="cursor: pointer;">Option 3</span>
</ul>
var yourParentElement = driver.FindElement(By.XPath(".//ul[ngmodelgroup='option']"));
var children = yourParentElement.FindElements(By.XPath(".//*"))
This latter call will return all children elements of yourParentElement
If you're trying to fetch the span elements you could do:
driver.FindElement(By.Xpath(".//ul[ngmodelgroup='option']")).FindElements(By.TagName("span"));

HtmlAgilityPack adding div elements to existing html file

This is my original html:
<tr>
<td style="padding-left: 40pt;"><font style="background-color: lightgreen" color="black">Tove</font></td>
<td style="padding-left: 40pt;"><font style="background-color: lightgreen" color="black">To</font></td>
</tr>
And my goal is to have this:
<div class="select-me" /> <tr>...<tr/>
I am using HtmlAgilityPack and essentially going through each font tag and checking to see if it's style is light-green. But I'm not sure how to jump to back the table row tags and put a div tag around the table row tags.
You can use the following code to wrap them with div:
foreach(var node in selectMe)
node.ParentNode.OuterHtml = "<div class=\"select-me\">" + node.ParentNode.InnerHtml + "</div>";
Also you can select selectMe with this instead of checking one by one:
var selectMe = doc.DocumentNode.SelectNodes("//td[contains(#style,'background-color: lightgreen')]");

get Node by it's attribute in linq to xml

consider this HTML:
<table>
<tr>
<td width="45%" align="right">
<h2>1.00 NZD</h2>
</td>
<td valign="top" align="center">
<h2>=</h2>
</td>
<td width="45%" align="left">
<h2>0.415528 GBP</h2>
</td>
</tr>
</table>
I use this code as string and convert it to a XML file:
string raw = "<table><tr><td width=\"45%\" align=\"right\"><h2>1.00 NZD</h2></td><td valign=\"top\" align=\"center\"><h2>=</h2></td><td width=\"45%\" align=\"left\"> <h2>0.415528 GBP</h2> </td></tr></table>";
XElement info = XElement.Parse(raw);
now I want to get All td that have align="right" and write this code:
var elementToChange = (from c in info.Elements("td")
where c.Attribute("align").Value == "right"
select c);
Label1.Text = Server.HtmlEncode(elementToChange.First().ToString());
but I get this Error:
Sequence contains no elements
where is problem?
thanks
You look for "td" elements but at the top level of the node you only find a "tr" element. That's why you don't get any elements. Try this:
from c in info.Elements("tr").Elements("td")
Also, you should check if there are any elements in the sequence before calling First() (or use FirstOrDefault() which returns null if there is no element).
If you don't know the path and only want all "td" elements with that tag value, you can also use the Descendants extension method:
from c in info.Descendants("td")
Elements would give you the immediate child elements of the root in this case info. You need to drill down to the elements you are looking for and then extract them by the Where clause. In your case the immediate children are the <tr>s.

Categories

Resources