how to get a text from xpath in c# - c#

i want to show data from my xml file and
this is my xml file
<table>
<tr class="even">
<td class="ltid">1</td>
<td class="ltn">لستر سیتی</td>
<td class="ltg">31</td>
<td class="ltw">19</td>
<td class="ltd">9</td>
<td class="ltl">3</td>
<td class="ltgf">54</td>
<td class="ltga">31</td>
<td class="ltgd" dir="ltr">+23</td>
<td class="ltp">66</td>
</tr>
<tr>
<td class="ltid">2</td>
<td class="ltn">تاتنهام</td>
<td class="ltg">31</td>
<td class="ltw">17</td>
<td class="ltd">10</td>
<td class="ltl">4</td>
<td class="ltgf">56</td>
<td class="ltga">24</td>
<td class="ltgd" dir="ltr">+32</td>
<td class="ltp">61</td>
</tr>
<tr>
<td class="ltid">3</td>
<td class="ltn">آرسنال</td>
<td class="ltg">30</td>
<td class="ltw">16</td>
<td class="ltd">7</td>
<td class="ltl">7</td>
<td class="ltgf">48</td>
<td class="ltga">30</td>
<td class="ltgd" dir="ltr">+18</td>
<td class="ltp">55</td>
</tr>
</table>
and i want to get the third team so
i want to get '<td class="ltid">3</td>'
and this is the code i tried
var doc = XDocument.Parse(richTextBox2.Text);
var navigator = doc.CreateNavigator();
var contentCell = navigator.SelectSingleNode("//td[#class='ltid']");
txtTeam.Text = contentCell.Value;
but i don't know how to get the third td with this class value
i searched for find an answer but i couldn't find answer
and i wrote an another code before this one but in first <tr> we have 3 so it just find that from first <tr> not the third <tr>
please help me to get value from third <tr>

This is one way :
(//td[#class='ltid'])[3]
The XPath will return the 3rd occurrence of td[#class='ltid'] from the entire XML document.

You can try:
var nav = doc.CreateNavigator();
XPathNodeIterator iterator = nav.Select("//td[#class='ltid']");
while (iterator.MoveNext())
{
// do whatever you want with your item
}

There is 3 ways you could do this:
xpath 1: //tr[3]/td[#class='ltid']
xpath 2: //td[#class='ltid'])[3]
xpath 3: //td[contains(text()='3')]

Related

How to check a Condition in .cshtml file

I want to check the FormattedLastFillDate field ...Some how syntax is throwing an error...Can any one help to write a If condition in .cshtml file...Below is the block of code.
#if ( FormattedLastFillDate!= "My logic")
<tr>
<td class="td--numeric">{{OrderNumber}}</td>
<td>
{{DrugName}}
<div class="order-directions">{{Directions}}</div>
<div class="order-message">{{Message}}</div>
</td>
<td>{{DrugStrength}}</td>
<td>{{DrugForm}}</td>
<td class="td--numeric">{{FormattedRefillsLeft}}</td>
<td class="td--numeric">{{Ndc}}</td>
<td class="td--numeric">{{FormattedLastFillDate}}</td>
</tr>
you need to try this one:
#if ( FormattedLastFillDate!= "My logic")
{
<tr>
<td class="td--numeric">{{OrderNumber}}</td>
<td>
{{DrugName}}
<div class="order-directions">{{Directions}}</div>
<div class="order-message">{{Message}}</div>
</td>
<td>{{DrugStrength}}</td>
<td>{{DrugForm}}</td>
<td class="td--numeric">{{FormattedRefillsLeft}}</td>
<td class="td--numeric">{{Ndc}}</td>
<td class="td--numeric">{{FormattedLastFillDate}}</td>
</tr>
}
the variable should be accessible mode.
I think you were pretty close, try this:
*#{string FormattedLastFillDate= "test";}
#if (FormattedLastFillDate != "test")
{ <tr>
<td class="td--numeric">{{OrderNumber}}</td>
<td>
{{DrugName}}
<div class="order-directions">{{Directions}}</div>
<div class="order-message">{{Message}}</div>
</td>
<td>{{DrugStrength}}</td>
<td>{{DrugForm}}</td>
<td class="td--numeric">{{FormattedRefillsLeft}}</td>
<td class="td--numeric">{{Ndc}}</td>
<td class="td--numeric">{{FormattedLastFillDate}}</td>
</tr>
}*

HtmlAgilityPack multiple tbody in table

There are multiple Tbodies in a table and I am trying to parse them out by using HTMLagilitypack. Normally the code below would work but it doesn't. Right now it only prints the first tbody and ignores the 2nd.
Code
var tableOffense = doc.DocumentNode.SelectSingleNode("//table[#id='OFF']");
var tbody = tableOffense.SelectNodes("tbody");
foreach(var bodies in tbody)
{
Console.WriteLine("id "+offender.offenderId +" "+ Utilities.RemoveHtmlCharacters(bodies.InnerText));
}
HTML
<table id="OFF" class="centerTable" cols="2" style="margin-top:0; width:100%;" cellpadding="0" cellspacing="0">
<tbody>
<!-- %%$SPLIT -->
<tr> <th id="offenseCodeColHdr" scope="row" style="width:25%;" class="uline">Offense Code</th> <td headers="offenseCodeColHdr" class="uline">288(a)</td> </tr> <tr> <th id="descriptionColHdr" scope="row" style="width:25%;" class="uline">Description</th> <td headers="descriptionColHdr" class="uline">LEWD OR LASCIVIOUS ACTS WITH A CHILD UNDER 14 YEARS OF AGE</td> </tr> <tr> <th id="lastConvictionColHdr" scope="row" style="width:25%;" class="uline">Year of Last Conviction</th> <td headers="lastConvictionColHdr" class="uline"> </td> </tr> <tr> <th id="lastReleaseColHdr" scope="row" style="width:25%;" class="uline">Year of Last Release</th> <td headers="lastReleaseColHdr" class="uline"> </td> </tr>
<tr><th colspan="2"><hr style="height:2px;background-color:#000;"></th></tr> </tbody>
<!-- %%$SPLIT -->
<tbody><tr> <th id="offenseCodeColHdr" scope="row" style="width:25%;" class="uline">Offense Code</th> <td headers="offenseCodeColHdr" class="uline">261(a)(2)</td> </tr> <tr> <th id="descriptionColHdr" scope="row" style="width:25%;" class="uline">Description</th> <td headers="descriptionColHdr" class="uline">RAPE BY FORCE OR FEAR</td> </tr> <tr> <th id="lastConvictionColHdr" scope="row" style="width:25%;" class="uline">Year of Last Conviction</th> <td headers="lastConvictionColHdr" class="uline"> </td> </tr> <tr> <th id="lastReleaseColHdr" scope="row" style="width:25%;" class="uline">Year of Last Release</th> <td headers="lastReleaseColHdr" class="uline"> </td> </tr>
<tr><th colspan="2"><hr style="height:2px;background-color:#000;"></th></tr> </tbody>
<!-- %%$SPLIT -->
</table>
I've printed just the tableOffense node by itself to make sure the 2nd tbody exists at load and it does.
Question
Why does the code only print out the first tbody and not both?
I haven't figured out why your code only gives you one tbody, but may I suggest an alternative solution, to select all your <tbody> elements?
Personally I would make use of XPAth and just select all tbody elements in one go, without an additional SelectNodes():
var tbody = doc.DocumentNode.SelectNodes("//table[#id='OFF']//tbody");
foreach (var elem in tbody)
{
//Dump only works in LinqPad
elem.InnerText.Dump();
}
Edit:
The following code (your code) also yields the same results
var tableOffense = doc.DocumentNode.SelectSingleNode("//table[#id='OFF']");
var tbody = tableOffense.SelectNodes("//tbody");

xpath expression not working properly on HtmlAgilityPack

I'm trying to search for a html node using xpath expressions.
The objective is to match all tr nodes which have 2 children td nodes with the attribute class="shiftHolder" (the seconds tr in the example).
<table>
<tr class="staffRow">
<td class="staff" data-staffid="2" data-primaryrole="1">
<div class="sn">Leyla-claire Collins</div>
</td>
<td class="shiftHolder">
</td>
<td class="shiftHolder unavailable">
Holiday
</td>
</tr>
<tr class="staffRow">
<td class="staff" data-staffid="11" data-primaryrole="4">
<div class="sn">Natale Dersley</div>
</td>
<td class="shiftHolder">
</td>
<td class="shiftHolder">
</td>
</tr>
</table>
The following expressions are working here, here, and here but not on HtmlAgilityPack, both tr are returned.
//tr[#class='staffRow'][count(td[#class='shiftHolder'])=2][td[#class='staff' and #data-staffid and #data-primaryrole][div[#class='sn']]]
//tr[#class='staffRow' and count(td[#class='shiftHolder'])=2 and td[#class='staff' and #data-staffid and #data-primaryrole and div[#class='sn']]]
Is there any difference on HtmlAgilityPack that I'm not aware of?

Scraping With HtmlAgilityPack

I have a huge html page that i want to scrap values from it.
I tried to use Firebug to get the XPath of the element i want but it is not a static XPath as it is changes from time to time so how could i get the values i want.
In the following snippet i want to get the Production of Lumber per hour which is located in the 20
<div class="boxes-contents cf"><table id="production" cellpadding="1" cellspacing="1">
<thead>
<tr>
<th colspan="4">
Production per hour: </th>
</tr>
</thead>
<tbody>
<tr>
<td class="ico">
<img class="r1" src="img/x.gif" alt="Lumber" title="Lumber" />
</td>
<td class="res">
Lumber:
</td>
<td class="num">
20 </td>
</tr>
<tr>
<td class="ico">
<img class="r2" src="img/x.gif" alt="Clay" title="Clay" />
</td>
<td class="res">
Clay:
</td>
<td class="num">
20 </td>
</tr>
<tr>
<td class="ico">
<img class="r3" src="img/x.gif" alt="Iron" title="Iron" />
</td>
<td class="res">
Iron:
</td>
<td class="num">
20 </td>
</tr>
<tr>
<td class="ico">
<img class="r4" src="img/x.gif" alt="Crop" title="Crop" />
</td>
<td class="res">
Crop:
</td>
<td class="num">
59 </td>
</tr>
</tbody>
</table>
</div>
Using Html agility pack you will want to do something like the following.
byte[] htmlBytes;
MemoryStream htmlMemStream;
StreamReader htmlStreamReader;
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlBytes = webclient.DownloadData(url);
htmlMemStream = new MemoryStream(htmlBytes);
htmlStreamReader = new StreamReader(htmlMemStream);
htmlDoc.LoadHtml(htmlStreamReader.ReadToEnd());
var table = htmlDoc.DocumentNode.Descendants("table").FirstOrDefault();
var lumberTd = table.Descendants("td").Where(node => node.Attributes["class"] != null && node.Attributes["class"].Value == "num").FirstOrDefault();
string lumberValue = lumberTd.InnerText.Trim();
Warning, that 'FirstOrDefault()' can return null so you should probably put some checks in there.
Hope that helps.
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(fileName);
var result = doc.DocumentNode.SelectNodes("//div[#class='boxes-contents cf']//tbody/tr")
.First(tr => tr.Element("td").Element("img").Attributes["title"].Value == "Lumber")
.Elements("td")
.First(td=>td.Attributes["class"].Value=="num")
.InnerText
.Trim();

extract data from an html tbody using c#

I am using c# Web.Client to download an html string.
A small example of the html been returned is
<tbody class='resultBody ' id='Tbody2'>
<tr id='Tr2' class='firstRow'>
<td class='cbrow tier_Gold' rowspan='4'>
<input type='checkbox' name='listingId' value='452' id='Checkbox2' />
</td>
<td class='resNum' rowspan='4'>
<div class='node'>
B</div>
</td>
<td class='datarow busName' id='Td2'>
</td>
<td rowspan='2' class='resLinks'>
</td>
<td class="hoops" rowspan='2'>
</td>
</tr>
<tr>
<td class="datarow">
<dl class="addrBlock">
<dd class="bizAddr">
123 ABC St</dd>
</dl>
</td>
</tr>
</tbody>
<tbody class='resultBody ' id='Tbody3'>
<tr id='Tr3' class='firstRow'>
<td class='cbrow tier_Gold' rowspan='4'>
<input type='checkbox' name='listingId' value='99' id='Checkbox3' />
</td>
<td class='resNum' rowspan='4'>
<div class='node'>
B</div>
</td>
<td class='datarow busName' id='Td3'>
</td>
<td rowspan='2' class='resLinks'>
</td>
<td class="hoops" rowspan='2'>
</td>
</tr>
<tr>
<td class="datarow">
<dl class="addrBlock">
<dd class="bizAddr">
1111 Some St</dd>
</dl>
</td>
</tr>
</tbody>
I am interested in 2 elements of the html but I have no idea the best way to get to them. How would be the best way for me to get the value from and get the inner html from the element
Any suggestions would be great!!!
download the HTML Agility Pack (free)
create a new HtmlDocument
loadhtml
use DOM navigation or an xpath query (SelectSingleNode etc) to find the elements
access InerHtml of the elements you want
The API is similar to XmlDocument, but it works on html that isn't xhtml.

Categories

Resources