c# htmlagilitypack get table values - c#

I have a webpage that needs to parsed ad values to be stored in sqlserver db. I have tried to use HTMLagility pack.
HtmlDocument hdoc = new HtmlDocument();
hdoc.LoadHtml(HTML);
var cols = hdoc.DocumentNode.SelectNodes("//table[#id='results']//tr//th//td");
for (int i = 0; i < cols.Count; i = i + 2)
{
DataRow dr = dt.NewRow();
string name = cols[i].InnerText.Trim();
}
This is how my html looks
<table id="results">
<tr>
<th style="white-space: nowrap;">
ID
</th>
<th style="text-align: left;">
Entity Name /<br>
Type
</th>
<th style="white-space: nowrap;">
Registered<br>
Effective Date
</th>
<th>
Status /<br>
Status Date
</th>
</tr>
<tr class="exactMatch" valign="top">
<td class="entityID">
123456
</td>
<td class="nameAndTypeDescription">
<span class="name"><a href="test.aspx?entityID=123456&hash=2055339395&orgTypes=01%2c99">
NAME1 COMPANY </a></span>
<br />
<span class="typeDescription">55 - TRadeUnion Company </span>
</td>
<td class="registeredEffectiveDate">
01/12/1912
</td>
<td class="statusDescriptionAndStatusDate">
<span class="statusDescription">Exists Now </span>
<br>
<span class="statusDate">12/14/1943</span>
</td>
</tr>
<tr class="exactMatch" valign="top">
<td class="entityID">
A23456
</td>
<td class="nameAndTypeDescription">
<span class="name"><a href="test.aspx?entityID=A23456&hash=615278445&orgTypes=01%2c99">
TESTA, INC. </a></span>
<br />
<span class="typeDescription">09 - Domestic Corporation </span>
</td>
<td class="registeredEffectiveDate">
04/29/1926
</td>
<td class="statusDescriptionAndStatusDate">
<span class="statusDescription">Dissolved Company </span>
<br>
<span class="statusDate">06/16/1998</span>
</td>
</tr>
</table>
I need to insert entityID,name, hyperlink, type description,registeredeffectivedate,status description,status date. Right now they all print in one single line and I do know how to parse it. Please help.
Thanks
MR

The TD's are not nested under TH's.
Try this: SelectNodes("//table[#id='results']/tr/td");

Related

Unable to locate element inside a table using xpath with selenium

I want to click on a button inside my table each row has a update button I want to click on a specfic button inside my table.
Here is a what my table looks like:
<table _ngcontent-vhp-c82="" datatable="" id="dtOptionsComments" class="display table table-striped table-bordered dt-responsive dataTable dtr-inline" aria-describedby="dtOptionsComments_info" style="width: 100%;" width="100%">
<thead _ngcontent-vhp-c82="">
<tr _ngcontent-vhp-c82="">
<th _ngcontent-vhp-c82="" class="no-marking sorting_disabled" rowspan="1" colspan="1" style="width: 50.4px;" aria-label=""></th>
<th _ngcontent-vhp-c82="" class="sorting sorting_asc" tabindex="0" aria-controls="dtOptionsComments" rowspan="1" colspan="1" style="width: 1109.4px;" aria-label="Comment.Comment Shipping.ShippingDatatable.aria.sortDescending" aria-sort="ascending">Comentario Shipping.Shipping</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td class="no-marking dtr-control">
<a href="javascript:void(0);">
<span data-toggle="modal" data-target="#update-modal" update-comment-text="6 MESES DE GARANTIA" update-comment-id="5" class="material-icons md-18 clickable"> edit </span>
</a>
<a href="javascript:void(0);">
<span data-toggle="modal" data-target="#delete-modal" delete-comment-id="5" class="material-icons clickable">delete</span>
</a>
</td>
<td class="sorting_1">6 MESES DE GARANTIA</td>
</tr>
<!-- MORE ROWS!!! -->
</tbody>
<tfoot _ngcontent-vhp-c82="">
<tr _ngcontent-vhp-c82="">
<td _ngcontent-vhp-c82="" class="no-marking" rowspan="1" colspan="1">
<a _ngcontent-vhp-c82="" href="javascript:void(0);">
<span _ngcontent-vhp-c82="" class="material-icons clickable"> add_box </span>
</a>
</td>
<td _ngcontent-vhp-c82="" rowspan="1" colspan="1">
<input _ngcontent-vhp-c82="" formcontrolname="addComment" type="text" id="addComment" name="addComment" class="form-control ng-untouched ng-pristine ng-invalid">
<!---->
</td>
</tr>
</tfoot>
</table>
Here is my code trials:
IWebElement btnUpdate = _driver.FindElement(By.XPath("//*[update-comment-id='" + commentAction.GetLastQuoteInsertId().ToString() + "']"));
btnUpdate.Click();
I have validated that the function GetLastQuoteInsertId returns the proper value
Why is my xPath selector wrong how can I fix it thank you for your help.
You were almost there. While considering a xpath the attribute_name should be always preceded by a # sign.
Additionally to make the xpath more canonical as the element is a <span> element you can mention //span to start the xpath.
Effectively, your line of code will be:
IWebElement btnUpdate = _driver.FindElement(By.XPath("//span[#update-comment-id='" + commentAction.GetLastQuoteInsertId().ToString() + "']"));
btnUpdate.Click();

Move a button to another tr programatically

I have a login form designed using a <table> in an ASP.NET application.
A client is asking to modify the layout by moving around a few UI controls. They want to move the Forgot Password link below the Login button so they are on separate lines.
However, for all other clients, everything needs to remain the same, so if any CSS or style changes need to be done, I would need to do it programatically. Are there any easy ways to do this?
I am trying to avoid creating duplicate user controls to fit the layout they want, and then hide/show controls, depending on the client.
<table style="TABLE-LAYOUT: fixed; WIDTH: 370px;">
<COLGROUP>
<COL width="120px">
<COL width="250px">
</COLGROUP>
<tr>
<td>Username:</td>
<td><input type="text" name="username"></td>
</tr>
<tr>
<td>Password:</td>
<td><input type="text" name="password"></td>
</tr>
<tr>
<td> </td>
<td>
<table>
<tr>
<td style="text-align:right;">
Forgot Password
</td>
<td>
<button type="submit">Login</button>
</td>
</tr>
</table>
</td>
</tr>
</table>
JSfiddle Demo
EDIT: jQuery NOT allowed. JavaScript OK.
Jquery can do this for you. the code might look something like this:
$("#toggle").click(toggleButtonPosition);
function toggleButtonPosition() {
var buttonRow = $("button[type=submit]").closest("tr"), buttonCell, newTR, tr;
if (buttonRow.find("td").length === 2) {
// the table is in its initial configuration. Need to extract the button cell and add it to a new row
buttonCell = buttonRow.find("button").closest("td");
buttonCell.remove();
newTR = $("<tr />").append(buttonCell);
newTR.insertBefore(buttonRow);
} else {
// Reverse the process
buttonCell = buttonRow.find("td");
tr = buttonRow.siblings().first();
tr.append(buttonCell);
buttonRow.remove();
}
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table style="TABLE-LAYOUT: fixed; WIDTH: 370px;">
<COLGROUP>
<COL width="120px">
<COL width="250px">
</COLGROUP>
<tr>
<td>Username:</td>
<td><input type="text" name="username"></td>
</tr>
<tr>
<td>Password:</td>
<td><input type="text" name="password"></td>
</tr>
<tr>
<td> </td>
<td>
<table>
<tr>
<td style="text-align:right;">
Forgot Password
</td>
<td>
<button type="submit">Login</button>
</td>
</tr>
</table>
</td>
</tr>
</table>
Toggle Button Position
You don't need to duplicate the layout. You can do it by adding/ removing the css class depending upon the client using jquery or javascript.
$(document).ready(function(){
var client1 = true; // variable to store client
if(client1){
$('button').removeClass('client2').addClass('client1');
}
else {
$('button').removeClass('client1').addClass('client2');
}});
Css classes to be added/removed from the button depending upon the client. I have put 'client1' css class on the button.
button.client1{display:block;margin:0 auto}
button.client2{float:right;margin-left:4px}
Html code as follows:
<table style="TABLE-LAYOUT: fixed; WIDTH: 370px;">
<COLGROUP>
<COL width="120px">
<COL width="250px">
</COLGROUP>
<tr>
<td>Username:</td>
<td><input type="text" name="username"></td>
</tr>
<tr>
<td>Password:</td>
<td><input type="text" name="password"></td>
</tr>
<tr>
<td> </td>
<td>
<table>
<tr>
<td>
<button type="submit" class='client1'>Login</button>
Forgot Password
</td>
</tr>
</table>
</td>
</tr>
You could use jQuery to archive that.
Here is an example of moving the button one step back
$(document).ready(function() {
var loginLayOut = $(".login"); // login Table
var btnTd = loginLayOut.find("button").parent(); // button container, in this case its td
var tbody = btnTd.parent().parent(); // we get the tbody by moving two step back
var tr = $("<tr></tr>"); // create a new tr
tr.append(btnTd); // append the td to the new created tr
tbody.prepend(tr); // insert the tr to the tbody at position 0
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table class='login' style="TABLE-LAYOUT: fixed; WIDTH: 370px;">
<COLGROUP>
<COL width="120px">
<COL width="250px">
</COLGROUP>
<tr>
<td>Username:</td>
<td><input type="text" name="username"></td>
</tr>
<tr>
<td>Password:</td>
<td><input type="text" name="password"></td>
</tr>
<tr>
<td> </td>
<td>
<table>
<tr>
<td style="text-align:right;">
Forgot Password
</td>
<td>
<button type="submit">Login</button>
</td>
</tr>
</table>
</td>
</tr>
</table>

Handling tables in c# selenium

<table cellpadding="0" cellspacing="0" onclick="" style="width: 1345px;">
<tbody>
<tr id="item_tcm:222-382904-131104" title="2. Publish to WIP (tcm:222-382904-131104)" class="item even" c:drawn="true">
<td class="col0 icon odd" value="T131104L0P0">
<div class="icon" style="background-image: url("/WebUI/Editors/CME/Themes/Carbon2/icon_v7.1.0.66.55_.png?name=T131104L0P0&size=16");"></div>
</td>
<td class="col1 even">
<div class="text">2. Publish to WIP</div>
</td>
<td class="col2 odd">
<div class="text">JH Anchor link 2</div>
</td>
<td class="col3 even">
<div class="text">S070 Public Site US English</div>
</td>
<td class="col4 odd" value="2015-12-23T14:41:04">
<div class="text">12/23/2015 2:41 PM</div>
</td>
<td class="col5 even">
<div class="text">NT AUTHORITY\SYSTEM</div>
</td>
<td class="col6 odd" value="">
<div class="text">
<span style="color: #f00"></span>
</div>
</td>
<td class="col7 even" value="16">
<div class="text">Suspended</div>
</td>
<td class="col8 odd">
<div class="text">NT AUTHORITY\SYSTEM</div>
</td>
<td class="col9 even">
<div class="text">Publishing Failed</div>
</td>
</tr>
<tr id="item_tcm:222-382901-131104" title="2. Publish to WIP (tcm:222-382901-131104)" class="item even" c:drawn="true">
<td class="col0 icon odd" value="T131104L0P0">
<div class="icon" style="background-image: url("/WebUI/Editors/CME/Themes/Carbon2/icon_v7.1.0.66.55_.png?name=T131104L0P0&size=16");"></div>
</td>
<td class="col1 even">
<div class="text">2. Publish to WIP</div>
</td>
<td class="col2 odd">
<div class="text">JH_anchor link</div>
</td>
<td class="col3 even">
<div class="text">S070 Public Site US English</div>
</td>
<td class="col4 odd" value="2015-12-23T14:17:51">
<div class="text">12/23/2015 2:17 PM</div>
</td>
<td class="col5 even">
<div class="text">NT AUTHORITY\SYSTEM</div>
</td>
<td class="col6 odd" value="">
<div class="text">
<span style="color: #f00"></span>
</div>
</td>
<td class="col7 even" value="16">
<div class="text">Suspended</div>
</td>
<td class="col8 odd">
<div class="text">NT AUTHORITY\SYSTEM</div>
</td>
<td class="col9 even">
<div class="text">Publishing Failed</div>
</td>
</tr>
.....
</tbody>
</table>
I have collection of rows. Inside each row i have 10 columns(td). I want to iterate to each row. For each row I want to get the 8th and 10 th column.
Note :- The test case will get Fail if the 8th column value is "Suspended" and 10th column value is "Publishing Failed" or else the test case would get Pass
I tried the below logic
IWebElement tableElement = driver.FindElement(By.XPath("/html/body/table"));
IList<IWebElement> tableRow = tableElement.FindElements(By.TagName("tr"));
foreach (var item in tableRow)
{
}
I'm not sure how to proceed further. Could anyone help me? Thanks in advance
Your logic is good:
IWebElement tableElement = driver.FindElement(By.XPath("/html/body/table"));
IList<IWebElement> tableRow = tableElement.FindElements(By.TagName("tr"));
IList<IWebElement> rowTD;
foreach (IWebElement row in tableRow)
{
rowTD = row.FindElements(By.TagName("td"));
if(rowTD.Count > 9)
{
if(rowTD[8].Text.Equals("Suspended") && rowTD[10].Text.Equals("Publishing Failed");
//test failed
}
}
What if you would just try to find the rows having the 8th column value "Suspended" and 10th column value "Publishing Failed":
IList<IWebElement> rows = tableElement.FindElements(By.TagName("//table//tr[td[8]/div = 'Suspended' and td[10]/div = 'Publishing Failed']"));
Then, you can fail the test if rows list is not empty.
Try this:
foreach (var item in tableRow)
{
IWebElement column7 = item.FindElement(By.CssSelector("[class*='col7']"));
IWebElement column9 = item.FindElement(By.CssSelector("[class*='col9']"));
if (column7.Text.Equals("Suspended") && column9.Text.Equals("Publishing Failed"))
Assert.Fail("Failed because column8 is 'Suspended' and column10 is 'Publishing Failed'");
else
Assert.Pass();
}
Please note that this code will stop testing when it has found the "Suspended" and "Publishing Failed". If you want to continue testing until the final row in table, you have to use multiple assertions. NUnit, is it possible to continue executing test after Assert fails?

Scraping With HtmlAgilityPack

I have a huge html page that i want to scrap values from it.
I tried to use Firebug to get the XPath of the element i want but it is not a static XPath as it is changes from time to time so how could i get the values i want.
In the following snippet i want to get the Production of Lumber per hour which is located in the 20
<div class="boxes-contents cf"><table id="production" cellpadding="1" cellspacing="1">
<thead>
<tr>
<th colspan="4">
Production per hour: </th>
</tr>
</thead>
<tbody>
<tr>
<td class="ico">
<img class="r1" src="img/x.gif" alt="Lumber" title="Lumber" />
</td>
<td class="res">
Lumber:
</td>
<td class="num">
20 </td>
</tr>
<tr>
<td class="ico">
<img class="r2" src="img/x.gif" alt="Clay" title="Clay" />
</td>
<td class="res">
Clay:
</td>
<td class="num">
20 </td>
</tr>
<tr>
<td class="ico">
<img class="r3" src="img/x.gif" alt="Iron" title="Iron" />
</td>
<td class="res">
Iron:
</td>
<td class="num">
20 </td>
</tr>
<tr>
<td class="ico">
<img class="r4" src="img/x.gif" alt="Crop" title="Crop" />
</td>
<td class="res">
Crop:
</td>
<td class="num">
59 </td>
</tr>
</tbody>
</table>
</div>
Using Html agility pack you will want to do something like the following.
byte[] htmlBytes;
MemoryStream htmlMemStream;
StreamReader htmlStreamReader;
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlBytes = webclient.DownloadData(url);
htmlMemStream = new MemoryStream(htmlBytes);
htmlStreamReader = new StreamReader(htmlMemStream);
htmlDoc.LoadHtml(htmlStreamReader.ReadToEnd());
var table = htmlDoc.DocumentNode.Descendants("table").FirstOrDefault();
var lumberTd = table.Descendants("td").Where(node => node.Attributes["class"] != null && node.Attributes["class"].Value == "num").FirstOrDefault();
string lumberValue = lumberTd.InnerText.Trim();
Warning, that 'FirstOrDefault()' can return null so you should probably put some checks in there.
Hope that helps.
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(fileName);
var result = doc.DocumentNode.SelectNodes("//div[#class='boxes-contents cf']//tbody/tr")
.First(tr => tr.Element("td").Element("img").Attributes["title"].Value == "Lumber")
.Elements("td")
.First(td=>td.Attributes["class"].Value=="num")
.InnerText
.Trim();

extract data from an html tbody using c#

I am using c# Web.Client to download an html string.
A small example of the html been returned is
<tbody class='resultBody ' id='Tbody2'>
<tr id='Tr2' class='firstRow'>
<td class='cbrow tier_Gold' rowspan='4'>
<input type='checkbox' name='listingId' value='452' id='Checkbox2' />
</td>
<td class='resNum' rowspan='4'>
<div class='node'>
B</div>
</td>
<td class='datarow busName' id='Td2'>
</td>
<td rowspan='2' class='resLinks'>
</td>
<td class="hoops" rowspan='2'>
</td>
</tr>
<tr>
<td class="datarow">
<dl class="addrBlock">
<dd class="bizAddr">
123 ABC St</dd>
</dl>
</td>
</tr>
</tbody>
<tbody class='resultBody ' id='Tbody3'>
<tr id='Tr3' class='firstRow'>
<td class='cbrow tier_Gold' rowspan='4'>
<input type='checkbox' name='listingId' value='99' id='Checkbox3' />
</td>
<td class='resNum' rowspan='4'>
<div class='node'>
B</div>
</td>
<td class='datarow busName' id='Td3'>
</td>
<td rowspan='2' class='resLinks'>
</td>
<td class="hoops" rowspan='2'>
</td>
</tr>
<tr>
<td class="datarow">
<dl class="addrBlock">
<dd class="bizAddr">
1111 Some St</dd>
</dl>
</td>
</tr>
</tbody>
I am interested in 2 elements of the html but I have no idea the best way to get to them. How would be the best way for me to get the value from and get the inner html from the element
Any suggestions would be great!!!
download the HTML Agility Pack (free)
create a new HtmlDocument
loadhtml
use DOM navigation or an xpath query (SelectSingleNode etc) to find the elements
access InerHtml of the elements you want
The API is similar to XmlDocument, but it works on html that isn't xhtml.

Categories

Resources