How to count the href - c#

How can I count the href attributes of my HTML?
<table>
<tbody>
<tr>
<td align="right" colspan="8">
2
3
4
</td>
</tr>
</tbody>
</table>

Get the elements by tagname and het the size of the result:
driver.findElements(By.xpath("//a[#href]")).size()

Whilst I generally avoid XPath, this seems like the time to use it.
If you are simply trying to get the number of links on a page without having to filter on specific links, you can do this in C# by:
int linkCount = _driver.FindElements(By.XPath("//a")).Count;
You can then Assert on that number returned (to actually do a test on this, if you don't assert, the test will always pass). If you want to filter on specific links I would use something other than XPath.

Related

C# and Html Agility Pack

I have multiple files, from which I have to extract tables containing data. Problem is tables don't have IDs, so I have to search based on the content (which is constant in each file). There are multiple tables in each file and the table of interest doesn't have constant XPath.
<table border="0" cellspacing="0" cellpadding="0" style="BORDER-COLLAPSE: collapse" bordercolor="#111111">
<tbody>
<tr>
<td class="s">CONSTANT_TEXT</td>
<td class="l">CHANGING_VALUE</td>
</tr>
<tr>
<td class="s"> </td>
<td class="l"><a style="" id="CONSTANT_ID" href="mailto: XXXX</a>
</td>
</tr>
</tbody>
</table>
How do I:
1. Search based on the CONSTANT_TEXT CONSTANT_TEXT , return the value of 2nd TD CHANGING_VALUE , without knowing the Path (it doesn't have ID and it's position changes from file to file).
2. Search based on CONSTANT_TEXT CONSTANT_TEXT , return the Parent table of that TD
What I did is to search and return CONSTANT_TEXT , with Html Agility Pack, then iterate the XPath upwards until the Table is reached.
var output= document.DocumentNode.SelectNodes("//a[#id='CONSTANT_ID']");
output[0].XPath ="/html[1]/body[1]/table[1]/thead[1]/tr[1]/td[1]/table[1]/tbody[1]/tr[2]/td[2]/a[1]"
My plan was to iterate each output and get the XPath for lowest table occurring, table[1], then extract the data.
Thanks,
Mike
Strictly speaking, you'll need the following XPath :
Search based on the CONSTANT_TEXT CONSTANT_TEXT , return the value of
2nd TD CHANGING_VALUE
//td[.="CONSTANT_TEXT"]/following-sibling::td[1]/text()
Output : CHANGING_VALUE
Search based on CONSTANT_TEXT CONSTANT_TEXT , return the Parent table of that TD
//td[.="CONSTANT_TEXT"]/ancestor::table[1]
Output : <table> element

Reorder html elements in a string using C#

I have a string of html returning from a serivce. I need to update this html server side (Using .Net) and reorder some of the elements around before sending it to the client. As a simple example lets say I have an html string like below. If the string is a table like below. How can I manipulate it to put the last name <th> and <td> into it's own <tr>. The html would be much larger and more complex but for one section of it the below illustrate how I would need to change it. Just using string replace hasn't worked well due to the complexity of the actual HTML.
Initial String
"<table>
<tbody>
<tr>
<th>First name</th>
<td>some first name</td>
<th>Last name</th>
<td>some last name</td>
</tr>
<tr>
<th>blah</td>
<td>blah blah</td>
</tr>
</tbody>
</table>
"
After Modification
"<table>
<tbody>
<tr>
<th>First name</th>
<td>some first name</td>
</tr>
<th>Last name</th>
<td>some last name</td>
<tr>
<th>blah</td>
<td>blah blah</td>
</tr>
</tbody>
</table>
"
I know URL answers are frowned upon, but you should look into the HTML Agility Pack. It's designed for this kind of thing.
http://html-agility-pack.net/?z=codeplex
For the purposes of this answer, I will make the silly assumption that you have read the file in a string list. Let us name this list HTMLLines. Then the following should do what you want
int length=HTMLLines.Count;
for(int loop=0;loop<length;loop++)
{
if(HTMLLines[loop].Equals("<th>Last name</th>"))
{
HTMLLines[loop]="</tr>\n<tr>\n"+HTMLLines[loop];
//break;//If there is only one occurrence, remove the leading // else keep that to repeat for each occurence
}
}
If you save the list after this loop, you should have the desired output.
This code assumes that there are no nulls in the list. If there are any nulls, you should replace HTMLLines[loop].Equals("<th>Last name</th>") with HTMLLines[loop]=="<th>Last name</th>"
If the "<th>Last name</th>" is just a sample you used for this question that cannot be used to match exactly, then you should place all possible matches to an array and check for them each loop. In this case, if we name the array theHeaders, the code will be something like:
int length=HTMLLines.Count;
for(int loop=0;loop<length;loop++)
{
for(int loop1=0;loop1<theHeaders.Length;loop1++)
{
if(HTMLLines[loop].Equals(theHeaders[loop1]))
{
HTMLLines[loop]="</tr>\n<tr>\n"+HTMLLines[loop];
break;
}
}
}
I hope this helps to point you to the right direction.
A very simple approach could be...
var result = htmlString.Replace("<th>Last name</th>", "</tr><tr><th>Last name</th>");
If you need something more complex than this you'll need to add more detail to your question.

Take HTML Table and put into custom List

I have an html table I need to query, get the contents and then act on that.
this the table
<table>
<thead>
<tr>
<th>Version</th>
<th>Usage</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.0.1.1</td>
<td>86</td>
</tr>
<tr>
<td>1.0.0.1</td>
<td>65</td>
</tr>
<tr>
<td>1.0.1.0</td>
<td>28</td>
</tr>
<tr>
<td>1.0.0.0</td>
<td>1</td>
</tr>
</tbody>
</table>
I'm getting that by passing the WebResponse through a regex expression.
What is the best way to get this into some data structure in C# so that I can query on the Version and Usage.. Baically have a List of class Foo.
Foo()
{
Version {get; set;}
Usage {get; set;}
}
Along those lines.
Thanks for your help
The best tool I've found for parsing HTML is the Html Agility Pack library. It is a fairly easy to use library, and will handle improperly formatted markup fairly well. You'll have to do the footwork of getting the data out from the library and into your own structures, but it'll make it easy for getting at the data.

C# Xpath Tables

I have the xml code that contains to a table with 2 rows.
<table>
<thead>
<tr>
<td class="num">test</td>
<td class="num">test2</td>
</tr>
</thead>
</table>
I am using xpath to grap the data from the row.
how do i retrieve only the first row data from the table and not all the data.
The xpath code i am using now is:
/table/thead/tr/th[#class='num']
And my current output is:
test
test2
What do I have to add in the xpath code so I can select the first row only?
Your result is the expected output, the XPath expression asks for all nodes which match, and the two you get are therefore correct.
If you want only the first one, you can do this:
/table/thead/tr/th[#class='num'][1]
Otherwise post your expectation...

Scraping html tables in .NET and taking care of colspans

I am trying to scrape HTML tables in my .NET application, however I came across tables that are aggressively using colspan and rowspan attributes on cells causing me headache. I was wondering if there is a library available that can convert a table into an array of strings and taking care of colspan e.g if colspan=5 on a TD element then it will use the value of the TD for the next 5 elements
<table>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
</tr>
<tr>
<td colspan=4>1</td>
<td>2</td>
</tr></table>
the output would be an array of the following:
[1,2,3,4,5]
[1,1,1,1,2]
you may be able to use ParseControl, which would make the whole thing fairly trivial, since you can access the Colspan property.
You could put it in a XmlDocument and then loop through it. Not sure if that's the best solution, but it works.
Maybe LINQ to XML?

Categories

Resources