I am developing add to read web browser data and store it into a dictionary.
During this process, I need to access data By ID but the IDs are not Unique on the page. The page looks like this.
<div id="ID1">
<tbody>
<tr>
<td id="1000" data-field="1">
text
</td>
</tr>
</tbody>
<div id="ID2">
<tbody>
<tr>
<td id="1000" data-field="2">
Some other text
</td>
</tr>
</tbody>
both div elements are on the same page
when I get element By Id It only gives me the first element, not the second one.
Here is My code
HtmlElement myElements = webBrowser1.Document.GetElementById("ID2");
HtmlElement myElements2 = myElements.Document.GetElementById("1000");
if (myElements2.InnerText != null)
{
//Do something
}
How Can I get the inner text of the second element by ID
This is the best and the easiest answer I came up with
I figured out the data-field is a unique value in the page so I looped through the elements and compared it with data-field
HtmlElement Buildingcontacts = webBrowser1.Document.GetElementById("ID2");
HtmlElementCollection ifiels = Buildingcontacts.Document.GetElementsByTagName("td");
foreach (HtmlElement element in ifiels)
{
string datafieldx = element.GetAttribute("data-field");
if (datafieldx == "2")
{
if (element.InnerText != null)
{
//do Somthing
}
}
}
Related
I am just getting into traversing through XML documents to learn how to use xpath.
I have stumbled on to a issue. Everytime I try to execute my xpath it returns null as if it didnt find anything.
I've tried the xpath out in XMLQuire and it worked there.
class Program
{
private static string URL = "https://www.kijiji.ca/b-renovation-contracting-handyman/ontario/home-renovations/k0c753l9004";
private static HtmlWeb client = new HtmlWeb();
static void Main(string[] args)
{
var DOM = client.Load(URL); // //table/tbody/tr/td[#class = 'description']/p
var Featured = DOM.DocumentNode.SelectNodes("//table[contains(#class,'top-feature')]/tbody/tr/td/a");
foreach (var Listing in Featured)
{
}
}
}
I commented out the other xpath I tried, I've tried those two and both are returning null why is that?
Here is a image showing the part of the DOM I want to access.
<table class="top-feature js-hover" data-ad-id="1299717863" data-vip-url="/v-renovation-contracting-handyman/sudbury/c-l-contracting-any-job-big-or-small/1299717863">
<tbody><tr>
<td class="watchlist">
<div class="watch js-hover p-vap-lnk-actn-addwtch" data-action="add" data-adid="1299717863" title="Click to add to My Favourites"><div class="icon"></div></div>
<input id="watchlistXsrf" name="ca.kijiji.xsrf.token" value="1527418405414.9b71d1309fdd8a315258ea5a3dac1a09e4a99ec7f32041df88307c46e26a5b1b" type="hidden">
</td>
<td class="image">
<div class="multiple-images"><img src="https://i.ebayimg.com/00/s/NjAwWDgwMA==/z/fXEAAOSwaZdZxTv~/$_2.JPG" alt="C.L. Contracting. Any job big or small."></div>
</td>
<td class="description">
<a href="/v-renovation-contracting-handyman/sudbury/c-l-contracting-any-job-big-or-small/1299717863" class="title ">
C.L. Contracting. Any job big or small.</a>
<p>
Contractor handyman home renovations and repairs. Contractor for Dollarama, Rexall, LaSenza and more. Fully licensed and insured. Able to do drywall, decks, framing, plumbing, flooring windows, ...</p>
<p class="details">
</p>
</td>
<td class="posted">
</td>
</tr>
</tbody></table>
My solution (Need help making my xpath into 1 line instead of traversing through with a bunch of loops.)
private static string URL = "https://www.kijiji.ca/b-renovation-contracting-handyman/ontario/home-renovations/k0c753l9004";
private static HtmlWeb client = new HtmlWeb();
static void Main(string[] args)
{
var DOM = client.Load(URL); // //table/tbody/tr/td[#class = 'description']/p
var Featured = DOM.DocumentNode.SelectNodes("//table[contains(#class,'top-feature')]/tbody/tr/td/a");
foreach (var table in DOM.DocumentNode.SelectNodes("//table[contains(#class, 'top-feature')]"))
{
Console.WriteLine($"Found: {table}");
foreach (var rows in table.SelectNodes("tr"))
{
Console.WriteLine(rows);
foreach (var cell in rows.SelectNodes("td[#class='description']/a"))
{
Console.WriteLine(cell.InnerText.Trim());
}
}
}
Console.ReadKey();
I've managed to fix it, however I ams till curious to why this xpath works
//table[contains(#class, 'top-feature')]/tr/td[#class='description']/a
And this one doesnt.
//table[contains(#class,'top-feature')]/tbody/tr/td/a
As mentioned in the comment, the <tbody> element is generated by a browser developer tool.
If you look at your var DOM object during runtime with the debugger, you can see the InnerHtml property.
<table class="regular-ad js-hover" data-ad-id=".." data-vip-url="..">
<tr>
<td class="watchlist">
...
</td>
<td class="image">
...
</td>
...
</tr>
</table>
No <tbody> element so your XPath has to look like this:
DOM.DocumentNode.SelectNodes("//table[contains(#class,'top-feature')]/tr/td/a");
I have a html to parse(see below)
<div id="mailbox" class="div-w div-m-0">
<h2 class="h-line">InBox</h2>
<div id="mailbox-table">
<table id="maillist">
<tr>
<th>From</th>
<th>Subject</th>
<th>Date</th>
</tr>
<tr onclick="location='readmail.html?mid=welcome'" style="font-weight: bold;">
<td>no-reply#somemail.net</td>
<td>
Hi, Welcome
</td>
<td>
<span title="2016-02-16 13:23:50 UTC">just now</span>
</td>
</tr>
<tr onclick="location='readmail.html?mid=T0wM6P'" style="font-weight: bold;">
<td>someone#outlook.com</td>
<td>
sa
</td>
<td>
<span title="2016-02-16 13:24:04">just now</span>
</td>
</tr>
</table>
</div>
</div>
I need to parse links in <tr onclick= tags and email addresses in <td> tags.
So far i manged to get first occurance of email/link from my html.
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(responseFromServer);
Could someone show me how is it properly done? Basically what i want to do is take all email addresses and links from html that are in said tags.
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//tr[#onclick]"))
{
HtmlAttribute att = link.Attributes["onclick"];
Console.WriteLine(att.Value);
}
EDIT: I need to store parsed values in a class (list) in pairs. Email (link) and senders Email.
public class ClassMailBox
{
public string From { get; set; }
public string LinkToMail { get; set; }
}
You can write the following code:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(responseFromServer);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//tr[#onclick]"))
{
HtmlAttribute att = link.Attributes["onclick"];
ClassMailBox classMailbox = new ClassMailBox() { LinkToMail = att.Value };
classMailBoxes.Add(classMailbox);
}
int currentPosition = 0;
foreach (HtmlNode tableDef in doc.DocumentNode.SelectNodes("//tr[#onclick]/td[1]"))
{
classMailBoxes[currentPosition].From = tableDef.InnerText;
currentPosition++;
}
To keep this code simple, I'm assuming some things:
The email is always on the first td inside the tr which contains an onlink property
Every tr with an onlink attribute contains an email
If those conditions don't apply this code won't work and it could throw some exceptions (IndexOutOfRangeExceptions) or it could match links with wrong email addresses.
I have HTML with looks basically like the following
....
<div id="a">
<table class="a1">
<tbody>
<tr>
<td><a href="a11.html>a11</a>
</tr>
<tr>
<td><a href="a12.html>a12</a>
</tr>
</tbody>
<table>
</div>
...
The following coding in C# I used, however, I cannot retrieve the URL in this stage
IWebElement baseTable = driver.FindElement(By.ClassName(TableID));
// gets all table rows
ICollection<IWebElement> rows = baseTable.FindElements(By.TagName("tr"));
// for every row
IWebElement matchedRow = null;
foreach(var row in rows)
{
Console.Write (row.FindElements(By.XPath("td/a")));
}
First of all, you gave us invalid markup. Right one:
<div id="a">
<table class="a1">
<tbody>
<tr>
<td>
a11
</td>
</tr>
<tr>
<td>
a12
</td>
</tr>
</tbody>
</table>
</div>
If you have only one anchor in table row, you should use this code to retrieve url:
IWebElement baseTable = driver.FindElement(By.ClassName(TableID));
// gets all table rows
ICollection<IWebElement> rows = baseTable.FindElements(By.TagName("tr"));
// for every row
IWebElement matchedRow = null;
foreach (var row in rows)
{
Console.WriteLine(row.FindElement(By.XPath("td/a")).GetAttribute("href"));
}
You need to get href attribute of found element. Otherwise, row.FindElement(By.XPath("td/a") will print type name of the IWebElement inherited class, because it is an some type object, not string.
This does not look like a valid xpath to me
Console.Write (row.FindElements(By.XPath("td/a")));
try
Console.Write (row.FindElements(By.XPath("/td/a")));
I want to retrieve data from HTML document.
I am scraping data from a web site I almost done but get issue when tried to retrieve data from the table.
Here is HTML code
<div id="middle_column">
<form action="url?" method="post" name="inquirydetail">
<input type="hidden" name="ServiceName" value="SurgeWebService">
<input type="hidden" name="TemplateName" value="Inpat_AvailableResponses.htm">
<input type="hidden" name="CurrentPage" value="inquirydetail">
<form method="post" action="url" name="ResponseSel" onSubmit="return EditPage(document.forms[3])">
<TABLE
<tBody
<table
....
</table
<table
....
</table
<table border="0" width="90%">
<tr>
<td width="10%" valign="bottom" class="content"> Service Number</td>
<td width="30%" valign="bottom" class="content"> Status</td>
<td width="50%" valign="bottom" class="content"> Status Date</td>
</tr>
<tr>
<td width="20%" bgcolor="white" class="subtitle">1</td>
<td width="40%" bgcolor="white" class="subtitle">Approved</td>
<td width="40%" bgcolor="white" class="subtitle">03042014</td>
</tr>
<tr>
<td></td>
</tr>
</table>
</tbody>
</TABle>
</div>
I have to retrieve data for Status field It is Approved and write it in SQL DB
There are many tables in the form tag.Tables do not have IDs.How I can get correct table,row and cell
Here is my code
HtmlElement tBody = WB.Document.GetElementById("middle_column");
if (tBody != null)
{
string sURL = WB.Url.ToString();
int iTableCount = tBody.GetElementsByTagName("table").Count;
}
for (int i = 0; i <= iTableCount; i++)
{
HtmlElement tb=tBody.GetElementsByTagName("table")[i];
}
Something is wrong here
Please help with this.
Don't you have any control over the page being displayed within the Webbrowser control? If you do it's better you add an id field for status TD. Then your life would be much easier.
Anyway, here's how you could search a value within a table.
HtmlElementCollection tables = this.WB.Document.GetElementsByTagName("table");
foreach (HtmlElement TBL in tables)
{
foreach (HtmlElement ROW in TBL.All)
{
foreach (HtmlElement CELL in ROW.All)
{
// Now you are looping through all cells in each table
// Here you could use CELL.InnerText to search for "Status" or "Approved"
}
}
}
But, this is not a good approach as you are looping through each table and each cell within each table to find your text. Keep this as the last option.
Hope this helps you to get an idea.
I prefer using the dynamic type and the DomElement property, but you must be using .net 4+.
For tables, the main advantage here is that you don't have to loop through everything. If you know the row and column that you are looking for, then you can just target the important data by row and column numbers instead of looping through the whole table.
The other big advantage is that you can basically use the entire DOM, reading more than just the contents of the table. Make sure you use lowercase properties as required in javascript, even though you are in c#.
HtmlElement myTableElement;
//Set myTableElement using any GetElement... method.
//Use a loop or square bracket index if the method returns an HtmlElementCollection.
dynamic myTable = myTableElement.DomElement;
for (int i = 0; i < myTable.rows.length; i++)
{
for (int j = 0; j < myTable.rows[i].cells.length; j++)
{
string CellContents = myTable.rows[i].cells[j].innerText;
//You are not limited to innerText; you have the whole DOM available.
//Do something with the CellContents.
}
}
Lets say I have this html:
<table class="c1">
<tr>
<td>Dog</td>
<td>Dog<td>
</tr>
<tr>
<td>Cat</td>
<td>Cat<td>
</tr>
</table>
What I tried:
HtmlNode node = doc.DocumentNode.SelectSingleNode("//table[#class='c1']");
HtmlNodeCollection urls = node.SelectNodes("a");
the node have the table but urls is null. Why?
Use Descendants("a") instead of SelectNodes("a");
This should work....
var node = doc.DocumentNode.SelectSingleNode("//table[#class='c1']");
var urls = node.Descendants("a").ToList();