<td valign="top" class="m92_h_bigimg">
<img border=0 src="http://i2.giatamedia.de/s.php?uid=168846&source=xml&size=320&vea=5vf&cid=2492&file=007399_8790757.jpg" name="bigpic">
</td>
<td valign="top" class="m92_h_bigimg2">
<table border=0 cellpadding=0 cellspacing=0>
<tr>
<td valign="top" class="m92_h_para">Hotel:</td>
<td valign="top" class="m92_h_name">
Melia Tropical <br>
<img src="/images/star.gif" height=13 width=13 alt="*"><img src="/images/star.gif" height=13 width=13 alt="*"><img src="/images/star.gif" height=13 width=13 alt="*"><img src="/images/star.gif" height=13 width=13 alt="*"><img src="/images/star.gif" height=13 width=13 alt="*">
</td>
</tr>
<tr>
<td valign="top" class="m92_h_para">Zimmer:</td>
<td valign="top" class="m92_h_wert"><b>Suite</b></td>
</tr>
<tr>
<td valign="top" class="m92_h_para">Verpflegung:</td>
<td valign="top" class="m92_h_wert"><b>All Inclusive</b></td>
</tr>
<tr>
<td valign="top" class="m92_h_para">Ort:</td>
<td valign="top" class="m92_h_wert">Punta Cana</td>
</tr>
<tr>
<td valign="top" class="m92_h_para">Region:</td>
<td valign="top" class="m92_h_wert">Punta Cana</td>
</tr>
<tr>
<td valign="top" class="m92_h_para">Land:</td>
<td valign="top" class="m92_h_wert">Dom. Republik</td>
</tr>
<tr>
<td valign="top" class="m92_h_para">Anbieter:</td>
<td valign="top" class="m92_h_wert"><img border=0 src="http://www.lmweb.net/lmi/va/gifs/5VF.gif" alt="5 vor Flug" title="5 vor Flug"><br>5 vor Flug</td>
</tr>
</table>
<table border=0 cellpadding=0 cellspacing=0>
<tr>
<td><img src="/images/dropleftw.gif" height="16" width="18"></td>
<td>
<div id="mark" class="m92_notice">
<a target="vakanz" href="siteplus/reminder.php?session_id=rslr1ejntpmj07n0f2smqfhsj5&REC=147203&m_flag=1&m_typ=hotel">Dieses Hotel merken</a>
</div>
</td>
</tr>
<tr>
<td><img src="/images/dropleftw.gif" height="16" width="18"></td>
<td>
<div class="m92_notice">
Hotelbewertung anzeigen
</div>
</td>
</tr>
</table>
</td>
With the HtmlAgility-pack, how can I get the data between <td valign="top" class="m92_h_bigimg"> and his closing <td>. I tried with this code not using the HtmlAgility-pack and this works but it found first </td> and closed. So the code is not correct. I read that the HtmlAgility-pack is the best solution for this kind of problems.
public static string[] GetStringInBetween(string strBegin, string strEnd, string strSource, bool includeBegin, bool includeEnd)
{
string[] result = { "", "" };
int iIndexOfBegin = strSource.IndexOf(strBegin, StringComparison.Ordinal);
if (iIndexOfBegin != -1)
{
int iEnd = strSource.IndexOf(strEnd, iIndexOfBegin, StringComparison.Ordinal);
if (iEnd != -1)
{
result[0] = strSource.Substring(iIndexOfBegin + (includeBegin ? 0 : strBegin.Length), iEnd + (includeEnd ? strEnd.Length : 0) - iIndexOfBegin);
if (iEnd + strEnd.Length < strSource.Length)
result[1] = strSource.Substring(iEnd + strEnd.Length);
}
}
return result;
}
How can I do this?
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(html);
var str = htmlDoc.DocumentNode
.Descendants("td")
.Where(x => x.Attributes["class"] != null && x.Attributes["class"].Value == "m92_h_bigimg")
.Select(x => x.InnerHtml)
.First();
The HtmlAgilityPack supports standard XPath queries, so I think you could do something like:
foreach (var node in doc.DocumentElement.SelectNodes("//td[#class='m92_h_bigimg']"))
{
// Do work on your node.
}
... where doc is your instance of HtmlDocument
Related
I want to check the FormattedLastFillDate field ...Some how syntax is throwing an error...Can any one help to write a If condition in .cshtml file...Below is the block of code.
#if ( FormattedLastFillDate!= "My logic")
<tr>
<td class="td--numeric">{{OrderNumber}}</td>
<td>
{{DrugName}}
<div class="order-directions">{{Directions}}</div>
<div class="order-message">{{Message}}</div>
</td>
<td>{{DrugStrength}}</td>
<td>{{DrugForm}}</td>
<td class="td--numeric">{{FormattedRefillsLeft}}</td>
<td class="td--numeric">{{Ndc}}</td>
<td class="td--numeric">{{FormattedLastFillDate}}</td>
</tr>
you need to try this one:
#if ( FormattedLastFillDate!= "My logic")
{
<tr>
<td class="td--numeric">{{OrderNumber}}</td>
<td>
{{DrugName}}
<div class="order-directions">{{Directions}}</div>
<div class="order-message">{{Message}}</div>
</td>
<td>{{DrugStrength}}</td>
<td>{{DrugForm}}</td>
<td class="td--numeric">{{FormattedRefillsLeft}}</td>
<td class="td--numeric">{{Ndc}}</td>
<td class="td--numeric">{{FormattedLastFillDate}}</td>
</tr>
}
the variable should be accessible mode.
I think you were pretty close, try this:
*#{string FormattedLastFillDate= "test";}
#if (FormattedLastFillDate != "test")
{ <tr>
<td class="td--numeric">{{OrderNumber}}</td>
<td>
{{DrugName}}
<div class="order-directions">{{Directions}}</div>
<div class="order-message">{{Message}}</div>
</td>
<td>{{DrugStrength}}</td>
<td>{{DrugForm}}</td>
<td class="td--numeric">{{FormattedRefillsLeft}}</td>
<td class="td--numeric">{{Ndc}}</td>
<td class="td--numeric">{{FormattedLastFillDate}}</td>
</tr>
}*
I have a model filled with List model and I want to use foreach loop to display it ,but ....my html looks like this:
<tr>
<td class="lottery-unit lottery-unit-0"><img src="~/images/temp/1.png"></td>
<td class="lottery-unit lottery-unit-1"><img src="~/images/temp/2.png"></td>
<td class="lottery-unit lottery-unit-2"><img src="~/images/temp/4.png"></td>
<td class="lottery-unit lottery-unit-3"><img src="~/images/temp/3.png"></td>
</tr>
<tr>
<td class="lottery-unit lottery-unit-11"><img src="~/images/temp/7.png"></td>
<td colspan="2" rowspan="2">
</td>
<td class="lottery-unit lottery-unit-4"><img src="~/images/temp/5.png"></td>
</tr>
<tr>
<td class="lottery-unit lottery-unit-10"><img src="~/images/temp/1.png"></td>
<td class="lottery-unit lottery-unit-5"><img src="~/images/temp/6.png"></td>
</tr>
<tr>
<td class="lottery-unit lottery-unit-9"><img src="~/images/temp/3.png"></td>
<td class="lottery-unit lottery-unit-8"><img src="~/images/temp/6.png"></td>
<td class="lottery-unit lottery-unit-7"><img src="~/images/temp/8.png"></td>
<td class="lottery-unit lottery-unit-6"><img src="~/images/temp/7.png"></td>
</tr>
lottery-unit lottery-unit-0 to lottery-unit lottery-unit-11 is eleven dto I want to display,and the tr is not fixed.
and if I use razor to write it like this :
foreach (var item in Model)
{
if (item.Order >= 0 && item.Order <= 3)
{
if (item.Order == 0)
{
<tr>
}
<td class="lottery-unit lottery-unit-0"><img src="~/images/temp/1.png"></td>
#if(item.Order == 3)
{
</tr>
}
}
The above will make some errors. Then how to write the correct code ?
You need to add # in front of the foreach and also add #: in front of the <tr> and </tr> inside the if statements. Your code should be like this:
#foreach (var item in Model)
{
if (item.Order >= 0 && item.Order <= 3)
{
if (item.Order == 0)
{
#:<tr>
}
<td class="lottery-unit lottery-unit-0"><img src="~/images/temp/1.png"></td>
if (item.Order == 3)
{
#:</tr>
}
}
}
i want to show data from my xml file and
this is my xml file
<table>
<tr class="even">
<td class="ltid">1</td>
<td class="ltn">لستر سیتی</td>
<td class="ltg">31</td>
<td class="ltw">19</td>
<td class="ltd">9</td>
<td class="ltl">3</td>
<td class="ltgf">54</td>
<td class="ltga">31</td>
<td class="ltgd" dir="ltr">+23</td>
<td class="ltp">66</td>
</tr>
<tr>
<td class="ltid">2</td>
<td class="ltn">تاتنهام</td>
<td class="ltg">31</td>
<td class="ltw">17</td>
<td class="ltd">10</td>
<td class="ltl">4</td>
<td class="ltgf">56</td>
<td class="ltga">24</td>
<td class="ltgd" dir="ltr">+32</td>
<td class="ltp">61</td>
</tr>
<tr>
<td class="ltid">3</td>
<td class="ltn">آرسنال</td>
<td class="ltg">30</td>
<td class="ltw">16</td>
<td class="ltd">7</td>
<td class="ltl">7</td>
<td class="ltgf">48</td>
<td class="ltga">30</td>
<td class="ltgd" dir="ltr">+18</td>
<td class="ltp">55</td>
</tr>
</table>
and i want to get the third team so
i want to get '<td class="ltid">3</td>'
and this is the code i tried
var doc = XDocument.Parse(richTextBox2.Text);
var navigator = doc.CreateNavigator();
var contentCell = navigator.SelectSingleNode("//td[#class='ltid']");
txtTeam.Text = contentCell.Value;
but i don't know how to get the third td with this class value
i searched for find an answer but i couldn't find answer
and i wrote an another code before this one but in first <tr> we have 3 so it just find that from first <tr> not the third <tr>
please help me to get value from third <tr>
This is one way :
(//td[#class='ltid'])[3]
The XPath will return the 3rd occurrence of td[#class='ltid'] from the entire XML document.
You can try:
var nav = doc.CreateNavigator();
XPathNodeIterator iterator = nav.Select("//td[#class='ltid']");
while (iterator.MoveNext())
{
// do whatever you want with your item
}
There is 3 ways you could do this:
xpath 1: //tr[3]/td[#class='ltid']
xpath 2: //td[#class='ltid'])[3]
xpath 3: //td[contains(text()='3')]
I have HTML Code in a string named gridHTML
<html>
<body>
<style>a{text-decoration:none; color: black;} th { border: solid thin; }
td{text-align: center;vertical-align: middle;font-family: Arial;font-size: 8pt; height: 50px;
border-width: 1px;border-left-style: solid;border-right-style: solid;}
table { border-collapse: collapse; } tr:nth-child(1) { border: solid thin; border-width: 2px;}
tr{ border: solid thin; border-style: dashed solid dashed solid;}
</style>
<div>
<table >
<tr class='leftColumnTableHeadO' align='center' style='font-family: Arial; font-size: 8pt; font-weight: normal; width: 100px;'>
<th scope='col'>TM No.</th>
<th scope='col' style='width: 83px;'>Filing Date</th>
<th scope='col'>TradeMark</th>
<th scope='col'>Class</th>
<th scope='col'>Jr#</th>
<th scope='col'>Applicant</th>
<th scope='col'>Agent / Attorney</th>
<th scope='col'>Status</th>
<th scope='col'>City</th>
<th scope='col'>Logo</th>
</tr>
<tr class='lightGrayBg' >
<td ><a title='View Report' class='calBtn' href='javascript:__doPostBack('ctl00$MainContent$grdTradeMarkNumber$ctl02$ctl00','')'>38255</a> </td>
<td ><span id='MainContent_grdTradeMarkNumber_lblFilingDate_0'>09-12-1962</span> </td>
<td >IMIDAN</td>
<td >5</td>
<td >158</td>
<td >test</td>
<td >test</td>
<td >Registered</td>
<td >DELWARE</td>
<td ></td>
</tr>
<tr >
<td ><a title='View Report' class='calBtn' href='javascript:__doPostBack('ctl00$MainContent$grdTradeMarkNumber$ctl03$ctl00','')'>188389</a> </td>
<td ><span id='MainContent_grdTradeMarkNumber_lblFilingDate_1'>09-09-2003</span> </td>
<td >RAND</td>
<td >16</td>
<td >682</td>
<td >Ttest </td>
<td >test </td>
<td >Advertised</td>
<td >CALIFORNIA</td>
<td ></td>
</tr>
<tr class='lightGrayBg' >
<td ><a title='View Report' class='calBtn' href='javascript:__doPostBack('ctl00$MainContent$grdTradeMarkNumber$ctl04$ctl00','')'>207063</a> </td>
<td ><span id='MainContent_grdTradeMarkNumber_lblFilingDate_2'>11-03-2005</span> </td>
<td >FP DIESEL</td>
<td >7</td>
<td >690</td>
<td >testtest</td>
<td >testtest</td>
<td >Advertised</td>
<td >-</td>
<td ></td>
</tr>
</table>
</div>
</body>
</html>
I want to get all rows separately in a list
i am using split method to do this
List<string> rows = gridHTML.Split(new string[] { "<tr" }, StringSplitOptions.None).ToList();
but the problem is when i look into the list "<td" is removed
Is there any (other) way to get all rows in a list ?
For this one, you could use Linq To XML easily. ie:
var rows = XElement.Parse(gridHTML).Descendants("tr");
var cells = rows.Elements("td");
var cellContentsAsString = cells.Select(c => (string)c);
etc.
You should not use string methods (or regex) to parse HTML, i recommend HtmlAgilityPack:
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(gridHTML);
List<HtmlNode> trList = doc.DocumentNode.Descendants("tr").ToList();
Since it seems that you want to load this table data into a collection, maybe following approach is better for your requirement. It will load the rows and cells into a DataTable, even the DataColumns are initialized correctly with the table-header values:
DataTable table = new DataTable();
bool firstRowContainsHeader = true;
var tableRows = doc.DocumentNode.Descendants("tr");
var tableData = tableRows.Skip(firstRowContainsHeader ? 1 : 0)
.Select(row => row.Descendants("td")
.Select((cell, index) => new { row, cell, index, cell.InnerText })
.ToList());
var headerCells = tableRows.First().Descendants()
.Where(n => n.Name == "td" || n.Name == "th");
int columnIndex = 0;
foreach (HtmlNode cell in headerCells)
{
string colName = firstRowContainsHeader
? cell.InnerText
: String.Format("Column {0}", (++columnIndex).ToString());
table.Columns.Add(colName, typeof(string));
}
foreach (var rowCells in tableData)
{
DataRow row = table.Rows.Add();
for (int i = 0; i < Math.Min(rowCells.Count, table.Columns.Count); i++)
{
row.SetField(i, rowCells[i].InnerText);
}
}
I have a huge html page that i want to scrap values from it.
I tried to use Firebug to get the XPath of the element i want but it is not a static XPath as it is changes from time to time so how could i get the values i want.
In the following snippet i want to get the Production of Lumber per hour which is located in the 20
<div class="boxes-contents cf"><table id="production" cellpadding="1" cellspacing="1">
<thead>
<tr>
<th colspan="4">
Production per hour: </th>
</tr>
</thead>
<tbody>
<tr>
<td class="ico">
<img class="r1" src="img/x.gif" alt="Lumber" title="Lumber" />
</td>
<td class="res">
Lumber:
</td>
<td class="num">
20 </td>
</tr>
<tr>
<td class="ico">
<img class="r2" src="img/x.gif" alt="Clay" title="Clay" />
</td>
<td class="res">
Clay:
</td>
<td class="num">
20 </td>
</tr>
<tr>
<td class="ico">
<img class="r3" src="img/x.gif" alt="Iron" title="Iron" />
</td>
<td class="res">
Iron:
</td>
<td class="num">
20 </td>
</tr>
<tr>
<td class="ico">
<img class="r4" src="img/x.gif" alt="Crop" title="Crop" />
</td>
<td class="res">
Crop:
</td>
<td class="num">
59 </td>
</tr>
</tbody>
</table>
</div>
Using Html agility pack you will want to do something like the following.
byte[] htmlBytes;
MemoryStream htmlMemStream;
StreamReader htmlStreamReader;
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlBytes = webclient.DownloadData(url);
htmlMemStream = new MemoryStream(htmlBytes);
htmlStreamReader = new StreamReader(htmlMemStream);
htmlDoc.LoadHtml(htmlStreamReader.ReadToEnd());
var table = htmlDoc.DocumentNode.Descendants("table").FirstOrDefault();
var lumberTd = table.Descendants("td").Where(node => node.Attributes["class"] != null && node.Attributes["class"].Value == "num").FirstOrDefault();
string lumberValue = lumberTd.InnerText.Trim();
Warning, that 'FirstOrDefault()' can return null so you should probably put some checks in there.
Hope that helps.
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(fileName);
var result = doc.DocumentNode.SelectNodes("//div[#class='boxes-contents cf']//tbody/tr")
.First(tr => tr.Element("td").Element("img").Attributes["title"].Value == "Lumber")
.Elements("td")
.First(td=>td.Attributes["class"].Value=="num")
.InnerText
.Trim();