I have a string which contains html code from a webpage. There's a table in the code I'm interested in. I want to parse the numbers present in the table cells and put them in textboxes, each number in its own textbox. Here's the table:
<table class="tblSkills">
<tr>
<th class="th_first">Strength</th><td class="align_center">15</td>
<th>Passing</th><td class="align_center">17</td>
</tr>
<tr>
<th class="th_first">Stamina</th><td class="align_center">16</td>
<th>Crossing</th><td class="align_center"><img src='/pics/star.png' alt='20' title='20' /></td>
</tr>
<tr>
<th class="th_first">Pace</th><td class="align_center"><img src='/pics/star_silver.png' alt='19' title='19' /></td>
<th>Technique</th><td class="align_center">16</td>
</tr>
<tr>
<th class="th_first">Marking</th><td class="align_center">15</td>
<th>Heading</th><td class="align_center">10</td>
</tr>
<tr>
<th class="th_first">Tackling</th><td class="align_center"><span class='subtle'>5</span></td>
<th>Finishing</th><td class="align_center">15</td>
</tr>
<tr>
<th class="th_first">Workrate</th><td class="align_center">16</td>
<th>Longshots</th><td class="align_center">8</td>
</tr>
<tr>
<th class="th_first">Positioning</th><td class="align_center">18</td>
<th>Set Pieces</th><td class="align_center"><span class='subtle'>2</span></td>
</tr>
</table>
As you can see there are 14 numbers. To make things worse numbers like 19 and 20 are replaced by images and numbers lower than 6 have a span class.
I know I could use HTML agility pack or something similar, but I'm not yet that good to figure how to do it by myself, so I need your help.
Your HTML sample also happens to be good XML. You could use any of .net's XML reading/parsing techniques.
Using LINQ to XML in C#:
var doc = XDocument.Parse(yourHtml);
var properties = new List<string>(
from th in doc.Descendants("th")
select th.Value);
var values = new List<int>(
from td in doc.Descendants("td")
let img = td.Element("img")
let textValue = img == null ? td.Value : img.Attribute("alt").Value
select int.Parse(textValue));
var dict = new Dictionary<string, int>();
for (var i = 0; i < properties.Count; i++)
{
dict[properties[i]] = values[i];
}
Related
I am attempting to use HtmlAgilityPack package to find each of the href links within td tags throughout an entire html page. The trick is that these tables start deep down into the html structure. I noticed with HtmlAgilityPack you can't just say get all tds that are within trs on a page. There is a parent div wrapped around each table with a class on it "table-group" that I am not showing in my sample below. Maybe I can use that as a starting point? The biggest trouble that I am dealing with is that there are several parent elements above everything in my sample below, but I want to skip all of that and start here.
Here is a sample of the structure I am trying to navigate:
<table>
<thead>
</thead>
<tbody>
<tr>
<td>Link 1</td>
<td>1</td>
</tr>
<tr>
<td>Link 2</td>
<td>2</td>
</tr>
<tr>
<td>Link 3</td>
<td>3</td>
</tr>
</tbody>
</table>
<table>
<thead>
</thead>
<tbody>
<tr>
<td>Link 4</td>
<td>4</td>
</tr>
<tr>
<td>Link 5</td>
<td>5</td>
</tr>
<tr>
<td>Link 6</td>
<td>6</td>
</tr>
</tbody>
</table>
I would like my end result to be:
https://path-to-pdf1
https://path-to-pdf2
https://path-to-pdf3
https://path-to-pdf4
https://path-to-pdf5
https://path-to-pdf6
Here is what I have tried:
var html = #"https://myurl.com";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var nodes = htmlDoc.DocumentNode.SelectNodes("//table/tbody/tr/td/a[0]");
foreach (var item in nodes)
{
Console.WriteLine(item.Attributes["href"].Value);
}
Console.ReadKey();
Modify
var nodes = htmlDoc.DocumentNode.SelectNodes("//table/tbody/tr/td/a[0]");
to
var nodes = htmlDoc.DocumentNode.SelectNodes("//table/tbody/tr/td[1]/a");
then you wil get the result you want ,you could read the documents related with XPath for more details
I tried in a MVC project with the same html file:
Update:
I copied the html codes to the html page in my local and get the nodes successfully
I have multiple tables and Location Value is given in different index order.
How can I get location value if previous cell string is "Location" when I loop through table. On below example it is cells[7] but on other table it will be 9. How can I conditionally get values after cells inner text is "Location"? Basically find the cell "Location" get inner text of next cell.
Html Table:
<table class="tbfix FieldsTable"">
<tbody>
<tr>
<td class="name">Last Movement</td>
<td class="value">Port Exit</td>
</tr>
<tr>
<td class="name">Date</td>
<td class="value">26/06/2017 00:00:00</td>
</tr>
<tr>
<td class="name">From</td>
<td class="value">HAMBURGE</td>
</tr>
<tr>
<td class="name">Location</td>
<td class="value">EUROGATE HAMBURG</td>
</tr>
<tr>
<td class="name">E/F</td>
<td class="value">E</td>
</tr>
</tbody>
Controller Loop Through:
foreach (var eachNode in driver.FindElements(By.XPath("//table[contains(descendant::*, 'Last Movement')]")))
{
var cells = eachNode.FindElements(By.XPath(".//td"));
cd = new Detail();
for (int i = 0; i < cells.Count(); i++)
{
cd.ActionType = cells[1].Text.Trim();
string s = cells[3].Text.Trim();
DateTime dt = Convert.ToDateTime(s);
if (_minDate > dt) _minDate = dt;
cd.ActionDate = dt;
}
}
In your foreach loop you could use this:
var location = eachNode.FindElement(By.XPath(".//td[contains(text(),'Location')]/following-sibling::td));
Assuming your data is always structured like that I would loop over all the tags and add the data to a dictionary.
Try something like this:
Dictionary<string,string> tableData = new Dictionary<string, string>();
var trNodes = eachNode.FindElements(By.TagName("tr"));
foreach (var trNode in trNodes)
{
var name = trNode.FindElement(By.CssSelector(".name")).Text.Trim();
var value = trNode.FindElement(By.CssSelector(".value")).Text.Trim();
tableData.Add(name,value);
}
var location = tableData["location"];
You would have to add validation and checks for the dictionary and the structure but that is the general idea.
<table style="width:80%">
<tr align="Left">
<th>Member_No</th>
<th>Member_Name</th>
<th>Amount</th>
<th>Entry_Type</th>
</tr>
<tr>
<td>A-251</td>
<td>Alpesh</td>
<td>2000</td>
<td>Credit</td>
</tr>
<tr>
<td>A-252</td>
<td>Haresh</td>
<td>2000</td>
<td>Debit</td>
</tr>
<tr>
<td>A-253</td>
<td>Suresh</td>
<td>2000</td>
<td>Debit</td>
</tr>
<tr>
<td>A-254</td>
<td>Johny</td>
<td>5000</td>
<td>Credit</td>
</tr>
<tr>
<td>A-255</td>
<td>Vishal</td>
<td>1000</td>
<td>Debit</td>
</tr>
</table>
I am using DataGridView for display table data
I need a total Amount of above column Amount in TextBox or Label when I click on total button
but when entry_Type is Credit then its do sum
& when entry_Type is Debit then its do subtract
please help me for above
C# Winform
Try this:
var total = dataGridView1.Rows.Cast<DataGridViewRow>()
.AsEnumerable()
.Sum(x => x.Cells[3].Value.ToString() == "Credit"? int.Parse(x.Cells[2].Value.ToString()) : -(int.Parse(x.Cells[2].Value.ToString())))
.ToString();
textBox1.Text = total;
Using Windows Forms and C#.
For example...
<table id=tbl>
<tbody>
<tr>
<td>HELLO</td>
<td>YES</td>
<td>TEST</td>
</tr>
<tr>
<td>BLAH BLAH</td>
<td>YES</td>
<td>TEST</td>
</tr>
</tbody>
</table>
I load the page using the WebBrowser Control. The page loads perfectly.
The next thing I want to do is search through all the rows in the table and check if they contain a specific value ; for example in this instance YES.
If they contain it I want the row to be passed on to me so I can store it as string.
But I want the row to be in HTML form. (containing the tags).
How can I accomplish this ?
Please help me.
You can use the HtmlAgilityPack to easily parse the html. For example, to get all of the TD elements, you can do this:
string value = #" <table id=tbl>
<tbody>
<tr>
<td>HELLO</td>
<td>YES</td>
<td>TEST</td>
</tr>
<tr>
<td>BLAH BLAH</td>
<td>YES</td>
<td>TEST</td>
</tr>
</tbody>
</table>";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(value);
var nodes = doc.GetElementbyId("tbl").SelectNodes("tbody/tr/td");
foreach (var node in nodes)
{
Debug.WriteLine(node.InnerText);
}
You can use this: http://simplehtmldom.sourceforge.net/ , its really simple way how to search in HTML files
Just include simple_html_dom.php to your file and then just follow this manual
http://simplehtmldom.sourceforge.net/manual.htm
and your php code will looks like
$html = file_get_html('File.html');
foreach($html->find('td') as $element)
echo $element->text. '<br>';
Hi Everyone I am working on my first MVC 3 website built in C# in asp.net I am having some trouble with with the speed of the website as it takes over 45 seconds to expand or contract the tables that show the database.
I have included my code for the reportTemplate which has my jquery on it
I have been trying to figure out how to use cache but I am worried that the data will not be up to date when the managers need up to date data
#{
ViewBag.Title = "ReportTemplate";
}
<script language="javascript" type="text/javascript">
$(document).ready(function () {
toggle = function (className) {
$('.' + className).toggle('fast');
}
});
</script>
<h2>ReportTemplate</h2>
<p>
#Html.ActionLink("Create New", "Create")
</p>
<table>
<tr>
<th>
INVESTMENT AREA
</th>
<th>
MAJOR PROGRAM
</th>
<th>
MANAGER
</th>
<th>
PROJECT
</th>
<th>
SPA
</th>
<th>
PA
</th>
</tr>
#{
string investment_area = "";
string major_program = "";
string manager = "";
string project = "";
string spa = "";
string pa = "";
string iaClass = "";
string mpClass = "";
string manClass = "";
string pjClass = "";
string spaClass = "";
string paClass = "";
}
#foreach (var item in Model)
{
iaClass = item.investment_area.Substring(0, 2);
mpClass = item.major_program;
manClass = item.manager;
pjClass = item.project;
spaClass = item.spa;
paClass = item.pa;
if (investment_area != item.investment_area)
{
<tr>
<td class = "ndeTable ui-widget-header pointer border" onclick="toggle('#iaClass')" colspan="6">
#item.investment_area
</td>
</tr>
}
investment_area = item.investment_area;
if (major_program != item.major_program)
{
<tr class="#iaClass">
<td style = "width :200px"></td>
<td class = "ndeTable pointer border" onclick="toggle('#mpClass')" colspan="5">
#item.major_program
</td>
</tr>
}
major_program = item.major_program;
if (manager != item.manager)
{
<tr class = "#iaClass #mpClass">
<td></td>
<td style = "width : 100px"></td>
<td class = "ndeTable pointer border" onclick="toggle('#manClass')" colspan="4">
#item.manager
</td>
</tr>
}
manager = item.manager;
if (project != item.project)
{
<tr class = "#iaClass #mpClass #manClass">
<td></td>
<td></td>
<td style = "width : 200px"></td>
<td class = "ndeTable pointer border" onclick = "toggle('#pjClass')" colspan="3">
#item.project
</td>
</tr>
}
project = item.project;
if (spa != item.spa)
{
<tr class = "#iaClass #mpClass #manClass #pjClass">
<td></td>
<td></td>
<td></td>
<td style = "width : 325px"></td>
<td class = "ndeTable pointer border" onclick = "toggle('#spaClass')" colspan = "2">
#item.spa
</td>
</tr>
}
spa = item.spa;
<tr class = "#iaClass #mpClass #manClass #pjClass #spaClass">
<td></td>
<td></td>
<td></td>
<td></td>
<td style = "width: 200px"></td>
<td class = "ndeTable pointer border" onclick = "toggle('#paClass')" colspan = "1">
#item.pa
</td>
</tr>
pa = item.pa;
}
</table>
I am going to continue trying to optimize the code in the meantime but if anyone has any tips or tricks I can use I would be very thankful as this has been troubling me for the past couple days and 45 secs for a small database is ridiculous. Thank you for your time.
Cheers,
James
Fire up sql profiler/sql trace and check out how many queries you are doing during single http request that creates the page. Queries are often the one to blame for performance issues. It's nearly impossible that solely rendering the view takes so much time.
It might happen that you have too many records that you are trying to display in the table. You should use paging. Take a look at datatables.net It is a nice jquery plugin that will help you with paging. Make sure that you are using server side processing. Of course, you will have to change a lot in your code (both server side and client side).
Which browser are you working with? Older browsers don't have the Javascript performance of the newer ones and with the number of panels you could be simultaneously exposing or hiding that is undoubtedly the cause of your issue.
I agree with the answer that deals with paging, that would be a good option reduce the amount of data you are presenting in one go or you must reduce the number of elements you are revealing simultaneously. Javascript execution isn't instant and on the old browsers it's really slow.