I have a table like this
<table border="0" cellpadding="0" cellspacing="0" id="table2">
<tr>
<th>Name
</th>
<th>Age
</th>
</tr>
<tr>
<td>Mario
</td>
<th>Age: 78
</td>
</tr>
<tr>
<td>Jane
</td>
<td>Age: 67
</td>
</tr>
<tr>
<td>James
</td>
<th>Age: 92
</td>
</tr>
</table>
And want to use HTML Agility Pack to parse it. I have tried this code to no avail:
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//table[#id='table2']//tr"))
{
foreach (HtmlNode col in row.SelectNodes("//td"))
{
Response.Write(col.InnerText);
}
}
What am I doing wrong?
Why don't you just select the tds directly?
foreach (HtmlNode col in doc.DocumentNode.SelectNodes("//table[#id='table2']//tr//td"))
Response.Write(col.InnerText);
Alternately, if you really need the trs separately for some other processing, drop the // and do:
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//table[#id='table2']//tr"))
foreach (HtmlNode col in row.SelectNodes("td"))
Response.Write(col.InnerText);
Of course that will only work if the tds are direct children of the trs but they should be, right?
EDIT:
var cols = doc.DocumentNode.SelectNodes("//table[#id='table2']//tr//td");
for (int ii = 0; ii < cols.Count; ii=ii+2)
{
string name = cols[ii].InnerText.Trim();
int age = int.Parse(cols[ii+1].InnerText.Split(' ')[1]);
}
There's probably a more impressive way to do this with LINQ.
I've run the code and it displays only the Names, which is correct, because the Ages are defined using invalid HTML: <th></td> (probably a typo).
By the way, the code can be simplified to only one loop:
foreach (var cell in doc.DocumentNode.SelectNodes("//table[#id='table2']/tr/td"))
{
Response.Write(cell.InnerText);
}
Here's the code I used to test: http://pastebin.com/euzhUAAh
I had to provide the full xpath. I got the full xpath by using Firebug from a suggestion by #Coda (https://stackoverflow.com/a/3104048/1238850) and I ended up with this code:
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("/html/body/table/tbody/tr/td/table[#id='table2']/tbody/tr"))
{
HtmlNodeCollection cells = row.SelectNodes("td");
for (int i = 0; i < cells.Count; ++i)
{
if (i == 0)
{ Response.Write("Person Name : " + cells[i].InnerText + "<br>"); }
else {
Response.Write("Other attributes are: " + cells[i].InnerText + "<br>");
}
}
}
I am sure it can be written way better than this but it is working for me now.
I did the same project with this:
private List<PhrasalVerb> ExtractVerbsFromMainPage(string content)
{
var verbs =new List<PhrasalVerb>(); ;
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(content);
var rows = doc.DocumentNode.SelectNodes("//table[#class='idioms-table']//tr");
rows.RemoveAt(0); //remove header
foreach (var row in rows)
{
var cols = row.SelectNodes("td");
verbs.Add(new PhrasalVerb {
Uid = Guid.NewGuid(),
Name = cols[0].InnerHtml,
Definition = cols[1].InnerText,
Count =int.TryParse(cols[2].InnerText,out _) == true ? Convert.ToInt32(cols[2].InnerText) : 0
});
}
return verbs;
}
private List<Table1> getTable1Data(string result)
{
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(result);
var table1 = htmlDoc.DocumentNode.SelectNodes("//table").First();
var tbody = table1.ChildNodes["tbody"];
var lst = new List<Table1>();
foreach (var row in tbody.ChildNodes.Where(r => r.Name == "tr"))
{
var tbl1 = new Table1();
var columnsArray = row.ChildNodes.Where(c => c.Name == "td").ToArray();
for (int i = 0; i < columnsArray.Length; i++)
{
if (i == 0)
tbl1.Course = columnsArray[i].InnerText.Trim();
if (i == 1)
tbl1.Count = columnsArray[i].InnerText.Trim();
if (i == 2)
tbl1.Correct = columnsArray[i].InnerText.Trim();
}
lst.Add(tbl1);
}
return lst;
}
public class Table1
{
public string Course { get; set; }
public string Count { get; set; }
public string Correct { get; set; }
}
Related
I want to add columns to my DataTable with the help of foreach from my <th> tags.
I have some problem with it. I don't understand why there is an null exception. In my HTML file i don't have any empty tags.
Fragment of my C# code:
DataTable dt = new DataTable();
int i = 0;
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]");
foreach (var row in table.SelectNodes("tr"))
{
var headers = row.SelectNodes("th");
foreach (var el in headers)
{
if (headers != null)
{
dt.Columns.Add(headers[i].InnerText);
i++;
}
}
}
There is a fragment of my HTML file:
<table>
<colgroup><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/></colgroup>
<tr><th>id</th><th>inserted_at</th><th>DisplayName</th><th>DistinguishedName</th><th>Enabled</th><th>GivenName</th><th>HomeDirectory</th><th>Manager</th><th>Name</th><th>ObjectClass</th><th>ObjectGUID</th><th>SamAccountName</th><th>Surname</th><th>UserPrincipalName</th><th>RowError</th><th>RowState</th><th>Table</th><th>ItemArray</th><th>HasErrors</th></tr>
This works for your html:
var str = #"<table>
<colgroup><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/></colgroup>
<tr><th>id</th><th>inserted_at</th><th>DisplayName</th><th>DistinguishedName</th><th>Enabled</th><th>GivenName</th><th>HomeDirectory</th><th>Manager</th><th>Name</th><th>ObjectClass</th><th>ObjectGUID</th><th>SamAccountName</th><th>Surname</th><th>UserPrincipalName</th><th>RowError</th><th>RowState</th><th>Table</th><th>ItemArray</th><th>HasErrors</th></tr>";
var hdoc = new HtmlAgilityPack.HtmlDocument();
hdoc.LoadHtml(str);
var headerElements = hdoc.DocumentNode.Descendants("th");
foreach(var headerElement in headerElements)
{
Console.WriteLine(headerElement.InnerText);
}
I also need to select it from specific table so..
This actually worked for me:
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]");
var headerElements = table.Descendants("th");
foreach (var headerElement in headerElements)
{
dt.Columns.Add(headerElement.InnerText, typeof(string));
}
How can I find out the sixth column in this html table (using for example HTML Agility Pack or Regex)?
<tr><td>So, 22.05.16</td><td>1</td><td>D</td><td>E</td><td>190</td><td>DifferentThings</td></tr>
In the last column could stand anything and this is only one row of many, so I want the full last column with every entry.
Edit:
If there is an blank
<td></td>
in the 6th row I always get an
System.NullReferenceException
What shoud I do now?
innerTextOfLastCell = lastTdCell.InnerText.Trim();
is causing the error
Edit:
Solved it!
Just typed:
if (lastTdCell != null) //Not lastTdCell.InnerText.Trim()!
{
innerTextOfLastCell = lastTdCell.InnerText.Trim();
s = s + innerTextOfLastCell + "\n";
run.Text = s;
}
else
{
s = s + "\n\n";
run.Text = s;
}
Using HtmlAgilityPack, this should work regardless of the number of columns the table has.
var html = new HtmlDocument();
html.LoadHtml("<table><tr><td>So, 22.05.16</td><td>1</td><td>D</td><td>E</td><td>190</td><td>DifferentThings</td></tr></table>");
var root = html.DocumentNode;
var tableNodes = root.Descendants("table");
var innerTextOfLastCell = string.Empty;
foreach (var tbs in tableNodes.Select((tbNodes, i) => new { tbNodes = tbNodes, i = i }))
{
var trs = tbs.tbNodes.Descendants("tr");
foreach (var tr in trs.Select((trNodes, j) => new { trNodes = trNodes, j = j }))
{
var tds = tr.trNodes.Descendants("td");
var lastTdCell = tds.LastOrDefault();
innerTextOfLastCell = lastTdCell.InnerText.Trim();
}
}
[edit]
If you did want to use the other option from How to get the value from a specific cell C# Html-Agility-Pack, then you could try the following code:
HtmlNode lastTdnode = root.SelectSingleNode("//table[1]/tr[last()]/td[last()]");
This will give you the last <td> from the last <tr> from the first <table>
If you wanted the sixth cell you can use something like this, but will give you the same result as above:
HtmlNode sixthTdNode = root.SelectSingleNode("//table[1]/tr[last()]/td[6]");
If you wanted to mix it up even more you can try this:
HtmlNode nthTdNode = root.SelectSingleNode("//table[1]/tr[last()]/td[" + 6 + "]");
I am trying to loop through my client side Html table contents on my c# server side. Setting the Html table to runat="server" is not an option because it conflicts with the javascript in use.
I use ajax to pass my clientside html table's InnerHtml to my server side method. I thought I would be able to simple create an HtmlTable variable in c# and set the InnerHtml property when I quickly realized this is not possible because I got the error {"'HtmlTable' does not support the InnerHtml property."}
For simplicity , lets say my InnerHtml string passed from client to server is:
string myInnerHtml = "<colgroup>col width="100"/></colgroup><tbody><tr><td>hello</td></tr></tbody>"
I followed a post from another stack overflow question but can not quite get it working.
Can someone point out my errors?
string myInnerHtml = "<colgroup>col width="100"/></colgroup><tbody><tr><td>hello</td></tr></tbody>"
HtmlTable table = new HtmlTable();
System.Text.StringBuilder sb = new System.Text.StringBuilder(myInnerHtml);
System.IO.StringWriter tw = new System.IO.StringWriter(sb);
HtmlTextWriter hw = new HtmlTextWriter(tw);
table.RenderControl(hw);
for (int i = 0; i < table.Rows.Count; i++)
{
for (int c = 0; c < table.Rows[i].Cells.Count; i++)
{
// get cell contents
}
}
hope this helps,
string myInnerHtml = #"<table>
<colgroup>col width='100'/></colgroup>
<tbody>
<tr>
<td>hello 1</td><td>hello 2</td>
</tr>
<tr>
<td>hello 3</td><td>hello 4</td>
</tr>
</tbody>
</table>";
DataSet ds = new DataSet();
ds.ReadXml(new XmlTextReader(new StringReader(myInnerHtml)));
var tr = ds.Tables["tr"];
var td = ds.Tables["td"];
foreach (DataRow trRow in tr.Rows)
foreach(DataRow tdRow in td.AsEnumerable().Where(x => (int)x["tr_Id"] == (int)trRow["tr_Id"]))
Console.WriteLine( tdRow["tr_Id"] + " | " + tdRow["td_Text"]);
output
=========================
//0 | hello 1
//0 | hello 2
//1 | hello 3
//1 | hello 4
Ended up using the HTMLAgilityPack. I found it a lot more readable and manageable for diverse situations.
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<html><body><table>" + innerHtml + "</table></html></body>");
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
{
foreach (HtmlNode row in table.SelectNodes("//tr"))
{
foreach (HtmlNode cell in row.SelectNodes("td"))
{
var divExists = cell.SelectNodes("div");
if (divExists != null)
{
foreach (HtmlNode div in cell.SelectNodes("div"))
{
string test = div.InnerText + div.Attributes["data-id"].Value;
}
}
}
}
}
I m trying to download data from a website into a datatable. The problem is I cannot access the right node because there seem to be blanck spaces. Here is my code so far:
public static DataTable downloadtable()
{
DataTable dt = new DataTable();
string htmlCode = "";
using (WebClient client = new WebClient())
{
client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
htmlCode = client.DownloadString("https://www.eex.com/en/Market%20Data/Trading%20Data/Power/Hour%20Contracts%20%7C%20Spot%20Hourly%20Auction/Area%20Prices/spot-hours-area-table/2013-08-22");
}
//this is just to check the file structure from text file
System.IO.StreamWriter file = new System.IO.StreamWriter("c:\\temp\\test.txt");
file.WriteLine(htmlCode);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
dt = new DataTable();
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table[#class='list electricity']/tr/th[#class='title'][.='Market Area']"))
{
//This is the problem name where I get the error
foreach (HtmlNode row in table.SelectNodes("//td[#class='title'][.=' 00-01 ']"))
{
foreach (var cell in row.SelectNodes("//td"))
{
//this is to check for correct result, final result would be to dump it into datatable
Console.WriteLine(cell.InnerText);
}
}
}
return dt;
}
I m trying to download the Hours prices from the link in the code but it seems to fail because of trailing blanks (I think).
Is there a like statement for the name of a node? Or can you remove trailing blanks?
I believe your problem is that you are trying to retrieve td's from inside a td node which obviously doesn't have more td's.
<tr>
<td class="title"> 00-01 </td>
<td class="spacer"></td>
<td class="r">€/MWh</td>
<td class="spacer"></td>
<td>35.34</td>
<td class="spacer"></td>
<td>34.02</td>
<td class="spacer"></td>
<td>34.02</td>
</tr>
So if you try to iterate with your result table.SelectNodes("//td[#class='title'][.=' 00-01 ']") it will contain no td's inside of it.
If you want all the rows starting from 00-01 you can use this one:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[#class='title'][(normalize-space(.)='00-01')]/ancestor::table"))
{
foreach (var cell in row.SelectNodes("./tr/td"))
{
if (string.IsNullOrEmpty(cell.InnerText.Trim()))
continue;
Console.WriteLine(cell.InnerText.Trim());
}
}
If you want only the 00-01 row you can use this one:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//td[#class='title']"))
{
if (row.InnerText.Trim() == "00-01")
{
foreach (var cell in row.ParentNode.ChildNodes)
{
if (string.IsNullOrEmpty(cell.InnerText.Trim()))
continue;
Console.WriteLine(cell.InnerText.Trim());
}
}
}
Or you can use it as:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[#class='title'][(normalize-space(.)='00-01')]"))
{
foreach (var cell in row.ParentNode.ChildNodes)
{
if (string.IsNullOrEmpty(cell.InnerText.Trim()))
continue;
Console.WriteLine(cell.InnerText.Trim());
}
}
i am tring to get som data, from a html string using HTML Agility pack.
The row string[] i am trying to get the data from returns innerhtml like this:
<td class="street">Riksdagen</td>
<td class="number"> </td>
<td class="number"> </td>
<td class="postalcode">100 12</td>
<td class="locality">Stockholm</td>
<td class="region_code">018001</td>
<td class="county">Stockholm</td>
<td class="namnkommun">Stockholm</td>
How can i assign each class to the right addressDataModel propery?
var row = doc.DocumentNode.SelectNodes("//*[#id='thetable']/tr");
foreach (var rowItem in row)
{
var addressDataModel = new AddressDataModel
{
street = rowItem.FirstChild.InnerText,
zipCodeFrom = // Next item,
zipCodeTo = // Next item,
zipCode = // Next item,
locality = // Next item,
regionCode = // Next item,
state = // Next item,
county = // Next item
};
}
You can write something like this (make sure the node exists before use InnerText prop):
var addressDataModel = new AddressDataModel
{
street = rowItem.SelectSingleNode("./td[#class='street']").InnerText,
zipCodeFrom = // Next item,
zipCodeTo = // Next item,
zipCode = // Next item,
locality = // Next item,
regionCode = // Next item,
state = // Next item,
county = rowItem.SelectSingleNode("./td[#class='county']").InnerText
};
Reference: http://www.w3schools.com/xpath/xpath_syntax.asp
You can also refer to this if you don't want to use Xpath :
HtmlAgilityPack.HtmlDocument htmlContent = new HtmlAgilityPack.HtmlDocument();
htmlContent.LoadHtml(htmlCode);
if (htmlContent.DocumentNode != null)
{
foreach (HtmlNode n in htmlContent.DocumentNode.Descendants("div"))
{
if (n.HasAttributes && n.Attributes["class"] != null)
{
if (n.Attributes["class"].Value == "className")
{
// Do something
}
}
}
}