How do I get my XPath to search only within each table? - c#

I have a bit of HTML that looks like this:
<table class="resultsTable">
<tbody>
<tr class="even">
<td width="35%"><strong>Name</strong></td>
<td>ACME ANVILS, INC</td>
</tr>
</tbody>
</table>
and some C# code that looks like this:
var name = document.DocumentNode
.SelectSingleNode("//*[text()='Name']/following::td").InnerText
which happily returns
ACME ANVILS, INC.
However, there's a new wrinkle. The page in question now returns multiple results:
<table class="resultsTable">
<tbody>
<tr class="even">
<td width="35%"><strong>Name</strong></td>
<td>ACME ANVILS, INC.</td>
</tr>
</tbody>
</table>
<table class="resultsTable">
<tbody>
<tr class="even">
<td width="35%"><strong>Name</strong></td>
<td>ROAD RUNNER RACES, LLC</td>
</tr>
</tbody>
</table>
So now I'm working with
var tables = document.DocumentNode.SelectNodes("//table/tbody");
foreach (var table in tables)
{
var name = table.SelectSingleNode("//*[text()='Name']/following::td").InnerText;
...
}
Which falls over, because SelectSingleNode returns null.
How do I get my XPath to actually return a result, searching only within the specific table I have selected?

With the addition of a second table, two adjustments are required:
Change your absolute XPath,
//*[text()='Name']/following::td
to one relative to the current table or tbody element:
.//*[text()='Name']/following::td
Account for there now being more than one td element on the
following:: axis.
Either just grab the first,
(.//*[text()='Name']/following::td)[1]
or, better, use the following-sibling:: axis instead in combination
with a test on the string value of td rather than a test on a text node, which might be buried beneath intervening formatting elements:
.//td[.='Name']/following-sibling::td
See also Difference between Testing text() nodes vs string values in XPath.

Related

Using HtmlAgilityPack with C# to find all href links within td elements in html page

I am attempting to use HtmlAgilityPack package to find each of the href links within td tags throughout an entire html page. The trick is that these tables start deep down into the html structure. I noticed with HtmlAgilityPack you can't just say get all tds that are within trs on a page. There is a parent div wrapped around each table with a class on it "table-group" that I am not showing in my sample below. Maybe I can use that as a starting point? The biggest trouble that I am dealing with is that there are several parent elements above everything in my sample below, but I want to skip all of that and start here.
Here is a sample of the structure I am trying to navigate:
<table>
<thead>
</thead>
<tbody>
<tr>
<td>Link 1</td>
<td>1</td>
</tr>
<tr>
<td>Link 2</td>
<td>2</td>
</tr>
<tr>
<td>Link 3</td>
<td>3</td>
</tr>
</tbody>
</table>
<table>
<thead>
</thead>
<tbody>
<tr>
<td>Link 4</td>
<td>4</td>
</tr>
<tr>
<td>Link 5</td>
<td>5</td>
</tr>
<tr>
<td>Link 6</td>
<td>6</td>
</tr>
</tbody>
</table>
I would like my end result to be:
https://path-to-pdf1
https://path-to-pdf2
https://path-to-pdf3
https://path-to-pdf4
https://path-to-pdf5
https://path-to-pdf6
Here is what I have tried:
var html = #"https://myurl.com";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var nodes = htmlDoc.DocumentNode.SelectNodes("//table/tbody/tr/td/a[0]");
foreach (var item in nodes)
{
Console.WriteLine(item.Attributes["href"].Value);
}
Console.ReadKey();
Modify
var nodes = htmlDoc.DocumentNode.SelectNodes("//table/tbody/tr/td/a[0]");
to
var nodes = htmlDoc.DocumentNode.SelectNodes("//table/tbody/tr/td[1]/a");
then you wil get the result you want ,you could read the documents related with XPath for more details
I tried in a MVC project with the same html file:
Update:
I copied the html codes to the html page in my local and get the nodes successfully

GridView with Jquery data table implementation

Recently I was trying to use the features of "https://datatables.net" in one GridView render. It wasn't possible because the render always gives a table without the correct formatting (without thead). Is there a way to transform the render into the correct format?
Correct format:
<table id="table_id" class="display">
<thead>
<tr>
<th>Column 1</th>
<th>Column 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Row 1 Data 1</td>
<td>Row 1 Data 2</td>
</tr>
<tr>
<td>Row 2 Data 1</td>
<td>Row 2 Data 2</td>
</tr>
</tbody>
</table>
This code formats one table to the correct format and then runs the .DataTable(); on the corrected formatted table.
To use this replace the ID of your table in '#gdVscQuote'
If the page you are working on has more than one table this will not work, is not tested.
$(document).ready( function () {
//replace tr
$($('#gdVscQuote')[0].childNodes[1].childNodes[0]).wrap('<thead/>').contents().unwrap();
//replace all td with th inside thead
$('thead td').wrap('<th/>').contents().unwrap();
//get thead
var thead = $("thead").get(0);
//remove saved thead to replace above tbody thead
$("thead").remove();
//add thead correctly
$('#gdVscQuote')[0].prepend(thead);
// replace tds for tr
$($('thead')[0].childNodes).wrapAll("<tr/>")
//add jQuery table functionality
$('#gdVscQuote').DataTable();
});

Multiplying a textbox with a cell in a dynamically created table with JQuery

I have a dynamically created table with id called "editTable" that looks as follows:
<tbody>
#{var i = 0;}
#foreach (var item in Model)
{
<tr>
<td width="25%">
#Html.DisplayFor(modelItem => item.Product.Name)
</td>
<td width="25%">
#Html.DisplayFor(modelItem => item.Quantity)
</td>
<td width="25%">
<div class="editor-field">
#Html.EditorFor(modelItem => item.UnitPrice)
#Html.ValidationMessageFor(model => item.UnitPrice)
</div>
</td>
<td width="25%" id="total"></td>
</td>
</tr>
}
</tbody>
The 3th td-element consists of a C# textbox that is turned into a element in html.
Now I want to multiply the quantity by the unit price to display this value in the 4th td element next to it. This value should update every time the value in the textbox is adjusted. I am a newbie at JQuery / JavaScript and came up with the following code:
// Calculating quantity*unitprice
$('#editTable tr td:nth-child(3) input').each( function (event) {
var $quant = $('#editTable tr td:nth-child(2)', this).val();
var $unitPrice = $('#editTable tr td:nth-child(3) input', this).val();
$('#editTable tr td:nth-child(4)').text($quant * $unitPrice);
});
This doesn't work and only displays NaN in the 4th element. Can anyone help me updating this code to a working version? Any help would be very much appreciated.
I geussed you accidentally switched units and price because it has more logic to change the number of units then the price. I took your html and javascript and tried to change as little as possible to make it work (I'm not saying the solution is perfect, I just don't want to give you a totaly different example of how to do it).
The html (The C# is irrelevant for this problem):
<table id="editTable">
<tbody>
<tr>
<td width="25%">
Product name
</td>
<td width="25%">
5
</td>
<td width="25%">
<div class="editor-field">
<input id="UnitPrice" name="UnitPrice" type="number" value="2" style="width:40px" />
</div>
</td>
<td width="25%" id="total"></td>
</tr>
</tbody>
</table>
The javascript/jquery (which should run on load):
$('#editTable tr td:nth-child(3) input').each(updateTotal);
$('#editTable tr td:nth-child(3) input').change(updateTotal);
var element;
function updateTotal(element)
{
var quantity = $(this).closest('tr').find('td:nth-child(2)').text();
var price = $(this).closest('tr').find('td:nth-child(3) input').val();
$(this).closest('tr').find('td:nth-child(4)').text(quantity * price);
}
The problem you had were with jquery. I've created a function that recieves an element (in our case it's your UnitPrice input), then it grabs the closest ancestor of type tr (the row it's in) and from there it does what you've tried to do.
You've used jquery selector to get all 2nd cells in all table rows, the closest('tr').find limits it to the current row.
You've tried to use .val() on a td element, you should use either .text() or .html(). Instead, You can also add a data-val="<%=value%>" on the td and then use .data('val').
It will be better to take the units directly from $(element).val() and no going to the tr and then back into the td and the input.
To see it working: http://jsfiddle.net/Ynsgf/1/
I hope I didn't caused you any confusion with my explanation and the options I gave you.
Here is another way to write the jquery part.
$('#editTable tr').each(function (i, row) {
var $quant = $(row).find('.editor-field input').val();
var $unitPrice = $(row).find('.editor-field input').val();
$(row).find('td:nth-child(4)').text($quant * $unitPrice);
});

How to search through html table rows?

Using Windows Forms and C#.
For example...
<table id=tbl>
<tbody>
<tr>
<td>HELLO</td>
<td>YES</td>
<td>TEST</td>
</tr>
<tr>
<td>BLAH BLAH</td>
<td>YES</td>
<td>TEST</td>
</tr>
</tbody>
</table>
I load the page using the WebBrowser Control. The page loads perfectly.
The next thing I want to do is search through all the rows in the table and check if they contain a specific value ; for example in this instance YES.
If they contain it I want the row to be passed on to me so I can store it as string.
But I want the row to be in HTML form. (containing the tags).
How can I accomplish this ?
Please help me.
You can use the HtmlAgilityPack to easily parse the html. For example, to get all of the TD elements, you can do this:
string value = #" <table id=tbl>
<tbody>
<tr>
<td>HELLO</td>
<td>YES</td>
<td>TEST</td>
</tr>
<tr>
<td>BLAH BLAH</td>
<td>YES</td>
<td>TEST</td>
</tr>
</tbody>
</table>";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(value);
var nodes = doc.GetElementbyId("tbl").SelectNodes("tbody/tr/td");
foreach (var node in nodes)
{
Debug.WriteLine(node.InnerText);
}
You can use this: http://simplehtmldom.sourceforge.net/ , its really simple way how to search in HTML files
Just include simple_html_dom.php to your file and then just follow this manual
http://simplehtmldom.sourceforge.net/manual.htm
and your php code will looks like
$html = file_get_html('File.html');
foreach($html->find('td') as $element)
echo $element->text. '<br>';

HTMLAgilityPack - Detecting a blank table?

I'm using c# with htmlagilitypack. Everything works fine except when the table I'm looking for contains no rows. I'm trying to read only the data from the 1st table on the page. The problem is if the first table contains no rows, the htmlagilitypack seems to jump down to the 2nd table for some reason.
The html I'm trying to read looks something like this:
<table class='stats'>
<tr>
<td colspan='2'>This is the 1st table</td>
<tr>
<td>Column A</td>
<td>Column B</td>
</tr>
<tr>
<td>Value A</td>
<td>Value B</td>
</tr>
</table>
<table class='stats'>
<tr>
<td colspan='2'>This is the 2nd table</td>
<tr>
<td>Column 1</td>
<td>Column 2</td>
</tr>
<tr>
<td>Value 111</td>
<td>Value 222</td>
</tr>
</table>
I then retrieve the 1st table's values using the following line:
foreach (HtmlNode node in root.SelectNodes("//table[#class='stats']/tr[position() > 2]/td"))
How do I ensure the data I'm grabbing is only from the 1st table?
Thanks.
You could ensure that you only select the first matching table by using a position index [1] after the table selector.
Try the following:
"//table[#class='stats'][1]/tr[position()>2]/td"
If the first table has no rows, then you will get null back so you should check for that before iterating in the foreach.
For example you might want to do the following:
var elements = root.SelectNodes("//table[#class='stats'][1]/tr[position()>2]/td");
if (elements != null)
{
foreach (HtmlNode node in elements)
{
// process the td node
}
}
You need to have an id on the table or row which uniquely identifies the table or or and then use the id in the xpath.

Categories

Resources