Parse HTML in c# - c#

So i am trying to get SCN08_SS_GetCustomer_CAM from this html code.
<tr>
<td class="line2left bordered">
<div class="tablelabel typped virtualuser" style="margin-left:00px">SCN08_SS_GetCustomer_CAM</div>
</td>
<td class="line2right bordered">875.2</td>
<td class="line2right bordered">875.2</td>
<td class="line2right bordered">875.2</td>
<td class="line2right bordered">1</td>
<td class="line2right bordered">0</td>
<td class="line2right bordered">0</td>
<td class="line2right bordered"></td>
</tr>
I am basically building a desktop application using WPF. Coding in .net c#.
In htmlagilitypack there is a way to getelementyid but no getelementbyclass. And in this html code there is no id. Hence i will have to get it by class.
So any ideas on how to code this guys?

Here is a nice and simple application
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.Load(#"pathtoyourpage.html");
var result = htmlDoc.DocumentNode.SelectSingleNode("//div[#class='tablelabel typped virtualuser']").InnerText;
Console.WriteLine(result.ToString());
Haven't tested it though

Related

Using a C# variable as a table element

What I have is this:
<table>
<tr bgcolor="#007ACC" style="color:White">
<td width="145">Account Group</td>
<td width="80"></td>
<td width="10">Active</td>
</tr>
<tr>
·
·
</tr>
</table>
What I need to do is make it so "Account Group" can be changed based on a user's treeview selection. i.e., if the user selects a Child node, I need to change that to "Account Number".
Is it possible to change a table element on-the-fly like that? If so, how would I do this?
Place a label in <td> to display text, so that you can change them based on label id
<td width="145">
<asp:Label Text="Account Group" ID="lblUserContent" runat="server" />
</td>
As per treeview selection changes you can change the text by using following code:
if(your condition)
lblUserContent.Text="Account Number"
else
lblUserContent.Text="Account Group"
The best way to do this will depend on how you're using your treeview, but here's a quick way to output the value of a C# variable into your table:
<table>
<tr bgcolor="#007ACC" style="color:White">
<td width="145"><%# Eval("MyCSharpVariable") %></td>
<td width="80"></td>
<td width="10">Active</td>
</tr>
<tr>
·
·
</tr>
</table>

XPath drops contents of td column on an HTML page for screen scraping

Below you find an excerpt of code used to screen scrape an economic calendar.
The HTML page that it parses using XPath includes this row as the first rown
in a table. (Only pasted this row instead of the entire HTML page.)
<tr class="calendar_row newday singleevent" data-eventid="42064"> <td class="date"><div class="date">Sun<div>Dec 23</div></div></td> <td class="time">All Day</td> <td class="currency">JPY</td> <td class="impact"> <div title="Non-Economic" class="holiday"></div> </td> <td class="event"><div>Bank Holiday</div></td> <td class="detail"><a class="calendar_detail level1" data-level="1"></a></td> <td class="actual"> </td> <td class="forecast"></td> <td class="previous"></td> <td class="graph"></td> </tr>
This code that selects the first tr row using XPath:
var doc = new HtmlDocument();
doc.Load(new StringReader(html));
var rows = doc.DocumentNode.SelectNodes("//tr[#class=\"calendar_row\"]");
var rowHtml = rows[0].InnerHtml;
The problem is that rowHtml returns this:
<td class="date"></td> <td class="time">All Day</td> <td class="currency">EUR</td> <td class="impact"> <div title="Non-Economic" class="holiday"></div> </td> <td class="event"> <div>French Bank Holiday</div> </td> <td class="detail"><a class="calendar_detail level2" data-level="2"></a></td> <td class="actual"> </td> <td class="forecast"></td> <td class="previous"></td> <td class="graph"></td>
Now you can see that the contents of the td column for the date vanished! Why?
I've experimented many things and stumped as to why it drops the contents of that column.
The other columns have content that it keeps. So what's wrong with the date column?
Is there some kind of setting or property somewhere to cause or prevent dropping contents?
Even if you haven't got a clue what's wrong but have some suggestions of a way to investigate it more.
Like #AlexeiLevenkov mentioned, you must be selecting a different row than what you want. You've pruned too much of essential problem away in an effort to simplify, but it's still clear what's wrong...
Consider that your input document might basically look like this:
<?xml version="1.0" encoding="UTF-8"?>
<table>
<tr class="calendar_row" data-eventid="12345">
<td>This IS NOT the tr you're looking for</td>
</tr>
<tr class="calendar_row newday singleevent" data-eventid="42064">
<td>This IS the tr you're looking for</td>
</tr>
</table>
The test #class="calendar_row" won't match against the tr you show, but it will match against the first row.
You could change your test to be contains(#class,'calendar_row') instead, but that would match both rows. You're going to have to identify some content or attribute that's unique to the row you desire. Perhaps the #data-eventid attribute would work -- can't tell without seeing your whole input file.

Match Table w/ Regex

I'm trying to match a table w/ regex but I'm having some issues. I can't figure out exactly why it will not match properly. Here is the HTML:
<table class="integrationteamstats">
<tbody>
<tr>
<td class="right">
<span class="mediumtextBlack">Queue:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0</span>
</td>
<td class="right">
<span class="mediumtextBlack">Aban:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0%</span>
</td>
<td class="right">
<span class="mediumtextBlack">Staffed:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0</span>
</td>
</tr>
<tr>
<td class="right">
<span class="mediumtextBlack">Wait:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0:00</span>
</td>
<td class="right">
<span class="mediumtextBlack">Total:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0</span>
</td>
<td class="right">
<span class="mediumtextBlack">On ACD:</span>
</td>
<td class="left">
<span class="mediumtextBlack">0</span>
</td>
</tr>
</tbody>
</table>
I need to get 2 pieces of information:
the data inside of the td below Queue and the data inside the td below Wait (so the Queue count and wait time). Obivously the numbers are going to update frequently.
This is the regex I have for pulling the initial table, but it isnt working:
Match statstable = Regex.Match(this.html, "<table class=\"integrationteamstats\">(.*?)</table>");
And I'm not sure what regex I should use to get the data from the td's.
Before anyone asks, no there is no way I can update the HTML to have an ID or anything of that nature. Its pretty much as is. The only thing that is consistent is the location of the td's.
Instead of regex, I suggest using the HTML Agility Pack to parse the HTML and query its structure.
What is exactly the Html Agility Pack (HAP)?
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
In general, regex is a poor choice for parsing HTML.

How do I create an email template within my application [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Razor views as email templates
I am sending out an email to the user from within my service of my website.
I want to format this so that it shows nice and a specific way.
Here is what I have:
<table bgcolor="#FFE680" border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td width="100%" height="51" valign="middle"><h1> <strong>[COMPANY] User ID Reminder </strong></h1></td>
<td align="right" valign="top" width="8"><img alt="" width="8" height="8" align="top" /></td>
</tr>
</tbody>
</table>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td><table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td align="left" valign="top" width="100%"><table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td><table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td colspan="1" align="left"></td>
</tr>
<tr>
<td align="left" valign="top" width="100%"><p> <br />
Dear [USERNAME], </p>
<p> In response to your request to be reminded of your User ID, please find below the information we have on file for you. If you didn't submit this request, ignore this email. </p>
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td>Your User ID is: </td>
<td>[USERNAME]</td>
</tr>
<tr>
<td>Your registered email address is:  </td>
<td>[EMAIL]</td>
</tr>
</tbody>
</table>
<p>
If you have forgotten your password, you can request it here.<br />
</p>
<p>
Thank you,<br />[COMPANY]
</p></td>
</tr>
</tbody>
</table></td>
</tr>
</tbody>
</table></td>
</tr>
</tbody>
</table></td>
</tr>
</tbody>
</table>
Now my question isn't about the looks of the email but how do I put this as the body of my email while populating the appropriate fields?
I put it in resources as a string and I figured I'd just do a replace of the specific [fields]. This starting becoming tedious and seemed sloppy:
var body = SuburbanHUB.Properties.Resources.ForgotPasswordEmailBody.Replace("[USERNAME]", username).Replace("[EMAIL]")...
I'm sure there is a better way of doing this but I have not the experience.
=== CLARIFICATION ===
I am using razor to call a WCF service written in C#. The service that is being called is where the email is being sent from and not the view. I am using Razor with MVC with C# as the underlying code.
I'm in the middle of trying to do the same thing. We're using Razor templates to generate the email, and passing in a Model which has all the various variables to fill it out. This lets us include everything that Razor and MVC support, including #if and #Html.Partial, which lets us construct an email from pieces.
The answer we just went with involves running an internal MVC webserver, requesting pages from it with the Model as a parameter, and capturing the response text, but there's variants on my question that are more self contained. Take a look and see if any of the other answers or comments help you.
I had similar requirement and I end up using NVelocity and it worked great for me. Please see this article for a workthrough http://www.codeproject.com/Articles/12751/Template-merging-with-NVelocity-and-ASP-NET. You have to modify your fields in the template according to the velocity templating language. Download NVelocity from http://nvelocity.sourceforge.net/.

Convert a html table into an rss feed

I have a table similar to the one below which I would like to convert into an rss feed somehow. What is the best way to approach this? Should I be scraping the contents and trying to build up an rss or is there a much simpler annd easier way (I'm hoping)? I'm using the asp.net / c# - anyone point me to any tutorials out there that will help me achieve this would be great:)
<table align="left" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td align="left" valign="top" style="width: 125px; height: 125px;" colspan="1"><img title="Costa Rica" alt="Costa Rica" src="/CR_sq.jpg?n=4185" /></td>
<td align="left" valign="top" colspan="1"><strong><font color="#fff" size="2">Costa Rica <br /></font><span class="SubHeadingGrey_7_0">16 August 2012</span></strong><br /><br />Some Text Here <a title="...read on" href="/WorkArea/linkit.aspx?LinkIdentifier=id&ItemID=1234">...read on</a></td>
</tr>
<tr>
<td align="left" valign="top" style="width: 125px; height: 125px;"><img width="117" height="117" title="South Africa" style="width: 117px; height: 117px;" alt="AL 2012 Icon" src="/SA2012.jpg?width=117&height=117&mode=max" /></td>
<td align="left" valign="top"><p><strong><font color="#fff" size="2">South African Story<br /></font><span class="SubHeadingGrey_7_0">16 August 2012</span></strong></p>
<p>This is summary text <a title="... read on" href="/SA.aspx">... read on</a></p>
</td>
</tr>
<tr>
<td align="left" valign="top" style="width: 125px; height: 125px;"><img title="ITALY" alt="ITALY" src="/Italy.jpg?n=43" /></td>
<td align="left" valign="top"><strong><font color="#fff" size="2">Italian Article<br /></font><span class="SubHeadingGrey_7_0">15 August 2012</span></strong><br /><br />Italian Visit Article<a title="...read on" href="/WorkArea/linkit.aspx?LinkIdentifier=id&ItemID=1256">...read on</a></td>
</tr>
</tbody>
</table>
As long as the html is well formed and matches XML you can read it in as xml and then use XSLT to convert it to an rss feed using XslTransform here is a simple example of how to use xlsTransform http://www.xmlfiles.com/articles/cynthia/xslt/default.asp

Categories

Resources