I am not sure the title suits my problem.
I have html like below
<table id="searchResultsTable" class="">
<tbody>
<tr class="searchResultsItem even ">
<td class="searchResultsPriceValue">
<div> 26.500 TL</div></td>
<td class="searchResultsTitleValue ">
<a class="classifiedTitle" href="xxxx"> some text</a>
</tr>
<tr class="searchResultsItem odd ">
.
//same as "searchResultsItem even "
.
</tr>
</tbody>
</table>
I am new to htmlagility pack. I have succeed in getting the price value of both "searchResultsItem even" and "searchResultsItem odd".
I want to get href value if the price is below or above some value. I can get href but all time for "searchResultsItem even". I want to get href if even's price value matches my condition for even and if odd matches my condition i want to get for odd.
below is my code
foreach (HtmlNode node1 in doc.DocumentNode.SelectNodes("//table[#id='searchResultsTable']"))
{
foreach (HtmlNode node2 in node.SelectNodes("//td[#class='searchResultsPriceValue']"))
{
string price = node2.InnerText.ToString();
price = price.Trim().Replace(".", String.Empty);
price = price.Replace("TL", String.Empty);
if (Convert.ToInt32(price) < 28000)
{
HtmlNode node3 = node.SelectSingle(".//a[#class='classifiedTitle']");
listBox1.Items.Add(node3.Attributes["href"].Value);
}
}
}
Thanks
Get the tr class name as an attribute value. Loop through rows first, then tds.
foreach (HtmlNode node1 in doc.DocumentNode.SelectNodes("//table[#id='searchResultsTable']"))
{
foreach (HtmlNode tr in table.SelectNodes("//tr"))
{
var #class = tr.GetAttributeValue("class", string.Empty);
switch (#class) {
// rest of your parsing
}
}
}
Related
I am developing add to read web browser data and store it into a dictionary.
During this process, I need to access data By ID but the IDs are not Unique on the page. The page looks like this.
<div id="ID1">
<tbody>
<tr>
<td id="1000" data-field="1">
text
</td>
</tr>
</tbody>
<div id="ID2">
<tbody>
<tr>
<td id="1000" data-field="2">
Some other text
</td>
</tr>
</tbody>
both div elements are on the same page
when I get element By Id It only gives me the first element, not the second one.
Here is My code
HtmlElement myElements = webBrowser1.Document.GetElementById("ID2");
HtmlElement myElements2 = myElements.Document.GetElementById("1000");
if (myElements2.InnerText != null)
{
//Do something
}
How Can I get the inner text of the second element by ID
This is the best and the easiest answer I came up with
I figured out the data-field is a unique value in the page so I looped through the elements and compared it with data-field
HtmlElement Buildingcontacts = webBrowser1.Document.GetElementById("ID2");
HtmlElementCollection ifiels = Buildingcontacts.Document.GetElementsByTagName("td");
foreach (HtmlElement element in ifiels)
{
string datafieldx = element.GetAttribute("data-field");
if (datafieldx == "2")
{
if (element.InnerText != null)
{
//do Somthing
}
}
}
I was using Selenium to get data from a table on the web page.
I have HTML with structure:
<table>
<tbody>
<tr>
<td>
<span>1</span>
<span>0</span>
<br>
<span>
<span>Good Luck</span>
<img src="/App_Themes/Resources/img/icon_tick.gif" width="3" height="7">
</span>
</td>
</tr>
<tr>
<td>
<b>Nowaday<br></b>
<p>hook<br>zp</p>
</td>
</tr>
</tbody>
</table>
I using this code to get all values in this table:
ReadOnlyCollection<IWebElement> lstTable = browser.FindElements(By.XPath("table/tbody/tr"));
foreach (IWebElement val in lstTable)
{
ReadOnlyCollection<IWebElement> lstTDElement = val.FindElements(By.XPath("td"));
}
But it shows result of like:
10Good LuckNowadayhookzp
I want to result like this:
1 0 Good Luck Nowaday hookzp
Have whitespace between a tag.
I think should add like this:
<span>1</span>
<span> </span>
<span>0</span>
And:
<b>Nowaday<br></b>
<p> </p>
<p>hook<br>zp</p>
You should try as below :-
ReadOnlyCollection<IWebElement> lstTDElements = browser.FindElements(By.TagName("td"));
var allTextList = lstTDElements.Select(El => EL.Text).ToList();
string FinalString = allTextList.Aggregate(new System.Text.StringBuilder(), (sb, s) => sb.Append(" "+s)).ToString().Replace("\n", "");
Console.WriteLine(FinalString);
Edited :- You can also get separate element togethor with | separator using xpath as below :-
ReadOnlyCollection<IWebElement> lstTable = browser.FindElements(By.XPath("table/tbody/tr"));
foreach (IWebElement val in lstTable)
{
ReadOnlyCollection<IWebElement> lstTDElement = val.FindElements(By.XPath("//td/span | //td/b | //td/p"));
}
Hope it helps...:)
I have a html to parse(see below)
<div id="mailbox" class="div-w div-m-0">
<h2 class="h-line">InBox</h2>
<div id="mailbox-table">
<table id="maillist">
<tr>
<th>From</th>
<th>Subject</th>
<th>Date</th>
</tr>
<tr onclick="location='readmail.html?mid=welcome'" style="font-weight: bold;">
<td>no-reply#somemail.net</td>
<td>
Hi, Welcome
</td>
<td>
<span title="2016-02-16 13:23:50 UTC">just now</span>
</td>
</tr>
<tr onclick="location='readmail.html?mid=T0wM6P'" style="font-weight: bold;">
<td>someone#outlook.com</td>
<td>
sa
</td>
<td>
<span title="2016-02-16 13:24:04">just now</span>
</td>
</tr>
</table>
</div>
</div>
I need to parse links in <tr onclick= tags and email addresses in <td> tags.
So far i manged to get first occurance of email/link from my html.
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(responseFromServer);
Could someone show me how is it properly done? Basically what i want to do is take all email addresses and links from html that are in said tags.
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//tr[#onclick]"))
{
HtmlAttribute att = link.Attributes["onclick"];
Console.WriteLine(att.Value);
}
EDIT: I need to store parsed values in a class (list) in pairs. Email (link) and senders Email.
public class ClassMailBox
{
public string From { get; set; }
public string LinkToMail { get; set; }
}
You can write the following code:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(responseFromServer);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//tr[#onclick]"))
{
HtmlAttribute att = link.Attributes["onclick"];
ClassMailBox classMailbox = new ClassMailBox() { LinkToMail = att.Value };
classMailBoxes.Add(classMailbox);
}
int currentPosition = 0;
foreach (HtmlNode tableDef in doc.DocumentNode.SelectNodes("//tr[#onclick]/td[1]"))
{
classMailBoxes[currentPosition].From = tableDef.InnerText;
currentPosition++;
}
To keep this code simple, I'm assuming some things:
The email is always on the first td inside the tr which contains an onlink property
Every tr with an onlink attribute contains an email
If those conditions don't apply this code won't work and it could throw some exceptions (IndexOutOfRangeExceptions) or it could match links with wrong email addresses.
Lets say I have this html:
<table class="c1">
<tr>
<td>Dog</td>
<td>Dog<td>
</tr>
<tr>
<td>Cat</td>
<td>Cat<td>
</tr>
</table>
What I tried:
HtmlNode node = doc.DocumentNode.SelectSingleNode("//table[#class='c1']");
HtmlNodeCollection urls = node.SelectNodes("a");
the node have the table but urls is null. Why?
Use Descendants("a") instead of SelectNodes("a");
This should work....
var node = doc.DocumentNode.SelectSingleNode("//table[#class='c1']");
var urls = node.Descendants("a").ToList();
In the following HTML, I can parse the table element, but I don't know how to skip the th elements.
I want to get only the td elements, but when I try to use:
foreach (HtmlNode cell in row.SelectNodes("td"))
...I get an exception.
<table class="tab03">
<tbody>
<tr>
<th class="right" rowspan="2">first</th>
</tr>
<tr>
<th class="right">lp</th>
<th class="right">name</th>
</tr>
<tr>
<td class="right">1</td>
<td class="left">house</td>
</tr>
<tr>
<th class="right" rowspan="2">Second</th>
</tr>
<tr>
<td class="right">2</td>
<td class="left">door</td>
</tr>
</tbody>
</table>
My code:
var document = doc.DocumentNode.SelectNodes("//table");
string store = "";
if (document != null)
{
foreach (HtmlNode table in document)
{
if (table != null)
{
foreach (HtmlNode row in table.SelectNodes("tr"))
{
store = "";
foreach (HtmlNode cell in row.SelectNodes("th|td"))
{
store = store + cell.InnerText+"|";
}
sw.Write(store );
sw.WriteLine();
}
}
}
}
sw.Flush();
sw.Close();
This method uses LINQ to query for HtmlNode instances that have the name td.
I also noticed your output appears as val|val| (with the trailing pipe), This sample uses string.Join(pipe, array) as a less-hideous method of removing that trailing pipe: val|val.
using System.Linq;
// ...
var tablecollection = doc.DocumentNode.SelectNodes("//table");
string store = string.Empty;
if (tablecollection != null)
{
foreach (HtmlNode table in tablecollection)
{
// For all rows with at least one child with the 'td' tag.
foreach (HtmlNode row in table.DescendantNodes()
.Where(desc =>
desc.Name.Equals("tr", StringComparison.OrdinalIgnoreCase) &&
desc.DescendantNodes().Any(child => child.Name.Equals("td",
StringComparison.OrdinalIgnoreCase))))
{
// Combine the child 'td' elements into an array, join with the pipe
// to create the output in 'val|val|val' format.
store = string.Join("|", row.DescendantNodes().Where(desc =>
desc.Name.Equals("td", StringComparison.OrdinalIgnoreCase))
.Select(desc => desc.InnerText));
// You can probably get rid of the 'store' variable as it's
// no longer necessary to store the value of the table's
// cells over the iteration.
sw.Write(store);
sw.WriteLine();
}
}
}
sw.Flush();
sw.Close();
Your XPath syntax is not correct. Please try:
HtmlNode cell in row.SelectNodes("//td")
This will get you the collection of td elements that can be iterated with foreach.