C# XML Parse string [duplicate] - c#

I would like to read in a dynamic URL what contains a HTML file, and read it like an XML file, based on nodes (HTML tags). Is this somehow possible?
I mean, there is this HTML code:
<table class="bidders" cellpadding="0" cellspacing="0">
<tr class="bidRow4">
<td>kucik (automata)</td>
<td class="right">9 374 Ft</td>
<td class="bidders_date">2010-06-10 18:19:52</td>
</tr>
<tr class="bidRow4">
<td>macszaf (automata)</td>
<td class="right">9 373 Ft</td>
<td class="bidders_date">2010-06-10 18:19:52</td>
</tr>
<tr class="bidRow2">
<td>kucik (automata)</td>
<td class="right">9 372 Ft</td>
<td class="bidders_date">2010-06-10 18:19:42</td>
</tr>
<tr class="bidRow2">
<td>macszaf (automata)</td>
<td class="right">9 371 Ft</td>
<td class="bidders_date">2010-06-10 18:19:42</td>
</tr>
<tr class="bidRow0">
<td>kucik (automata)</td>
<td class="right">9 370 Ft</td>
<td class="bidders_date">2010-06-10 18:19:32</td>
</tr>
<tr class="bidRow0">
<td>macszaf (automata)</td>
<td class="right">9 369 Ft</td>
<td class="bidders_date">2010-06-10 18:19:32</td>
</tr>
<tr class="bidRow8">
<td>kucik (automata)</td>
<td class="right">9 368 Ft</td>
<td class="bidders_date">2010-06-10 18:19:22</td>
</tr>
<tr class="bidRow8">
<td>macszaf (automata)</td>
<td class="right">9 367 Ft</td>
<td class="bidders_date">2010-06-10 18:19:22</td>
</tr>
<tr class="bidRow6">
<td>kucik (automata)</td>
<td class="right">9 366 Ft</td>
<td class="bidders_date">2010-06-10 18:19:12</td>
</tr>
<tr class="bidRow6">
<td>macszaf (automata)</td>
<td class="right">9 365 Ft</td>
<td class="bidders_date">2010-06-10 18:19:12</td>
</tr>
</table>
I want to parse this into a ListView (or a Grid) to create rows with the data contained. All tr are different row, and all td in a given td is a column in the given row.
And also I want it to be as fast as possible, as it would update itself in 5 seconds.
Is there any library for this?

I recommend HTML Agility Pack. You'll have to handle the GUI part yourself. It doesn't require valid HTML, but creates a HtmlDocument similar to XmlDocument.

Sure, it's possible. But be warned — a compliant xml processor is supposed to treat anything that's not well-formed as a fatal error. That means it's only going to work on documents that pass validation for xhtml strict.

I normally use Fast XPath Reader in combination with LinqToXML for the job. It is rather old (2007) though.
I wasn't aware of the HTML Agility Pack, so I can't say how it compares (in both performance and ease of use).

Why not just do string replacement to convert the HTML table into XML:
<table class="bidders" cellpadding="0" cellspacing="0">
becomes:
<?xml version="1.0" encoding="UTF-8"?>
and
<tr class="bidRow4">
becomes
<item>
and
<td class="right">
becomes
<field1>
etc
EDIT 1:
I think also that the DataSet Class has a:
.ReadXML
method such that you could then databind to that dataset:
DataSet ds = new DataSet();
ds.ReadXml("foo.xml");
DataGrid.DataSource = ds;
DataGrid.DataBind();
or something similar

Related

Why do my rows in a colums suddenly jump to the right? And how do I fix that?

Good Day everyone. So while trying to add a few date fields to a popup window on a site I am making I experienced something odd. Adding these 3 rows to the pop up caused the column they were suppose to be in to jump to the right and now i can not get them to line up.
I do not know how important it is to note, but there is a textbox to the left of the column, but the boxes/rows i am adding will be below the textbox's height.
Below I have tried to take a slice of the code as an example, if it is enough I will attempt to add more:
<table style="float: left;">
<tr>
<div>
<tr>(the following code shows normally like it should)
<td align="left" valign="top" colspan="4">Lable:</td>
</tr>
<tr>
<td width="2px"></td>
<td align="left" valign="top">Label</td>
<td colspan="3">
<telerik:RadDatePicker ID="RDP1" runat="server"
Culture="Language"
DbSelectedDate='<%# (Container is GridEditFormInsertItem)? DateTime.Today : Eval("EVAL1") %>'
Width="145px">
<Calendar ID="Calendar3" runat="server" UseColumnHeadersAsSelectors="False" UseRowHeadersAsSelectors="False" ViewSelectorText="x">
</Calendar>
<DatePopupButton HoverImageUrl="" ImageUrl="" />
<DateInput ID="DateInput3" runat="server" DateFormat="dd-MM-yyyy" DisplayDateFormat="dd-MM-yyyy">
</DateInput>
</telerik:RadDatePicker>
</td>
</tr>
<tr>
<td></td>
<td colspan="3">Label</td>
</tr>
<tr>
<td></td>
<td align="left" valign="top">Label:</td>
<td align="left" valign="top" colspan="3">
<telerik:RadDatePicker ID="RDP2" runat="server" Culture="Language" DbSelectedDate='<%# Eval("EVAL2") %>' Width="170px">
<Calendar ID="Calendar5" runat="server" UseColumnHeadersAsSelectors="False" UseRowHeadersAsSelectors="False" ViewSelectorText="x">
</Calendar>
<DateInput ID="DateInput5" runat="server" DateFormat="dd-MM-yyyy" DisplayDateFormat="dd-MM-yyyy">
</DateInput>
</telerik:RadDatePicker>
<asp:ImageButton ID="btnDelete" runat="server" ImageUrl="url" OnClick="btnFunction_Click" ToolTip="Text" Style="vertical-align:middle;" />
</td>
</tr>
</div>
</tr>
For those that would like to see the CSS, there is none, at least none that would have an impact on my problem as they are pointing more towards the actual webpage and not the pop up window.
In advance I would like to say Thank you for the help and your time.
The problem occurs because your rows don't have an equal number of <td> or columns
First row - 1 td with colspan 4 > total 4
Second row - 1 td + 1 td + 1 td with colspan 3 > total 5
Third row - 1 td + 1 td with colspan 3 > total 4
Fourth row - 1 td + 1 td + 1 td with colspan 3 > total 5
Your table is not in the correct structure (you have a tr in a td)
Ensure your table is in the following structure:
<table>
<tr>
<td>
Also check your columns are all equal, use colspan="" to merge cells if needed.
<table>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="4"></td>
</tr>
<table>

Parsing from HTML file using HtnlAgilityPack

I have an HTML (DTD HTML 4.0 Transitional) file generated by Oracle Reports.
Here is source of HTML file:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><META content="IE=5.0000" http-equiv="X-UA-Compatible">
<META http-equiv="Content-Type" content="text/html; charset=windows-1251">
<META name="GENERATOR" content="MSHTML 11.00.9600.17801"></HEAD>
<BODY dir="LTR" bgcolor="#ffffff"> <!-- Created by Oracle Reports -->
<TABLE width="960" border="0" cellspacing="0" cellpadding="0">
<TBODY>
<TR valign="top">
<TD height="9">
<TD width="71" rowspan="3" colspan="3"><FONT face="Courier New"
size="1"><B><TT>Date</TT></B></FONT><BR>
<TD>
<TD width="89" rowspan="3" colspan="3"><FONT face="Courier New"
size="1"><B><TT>Target Number</TT></B></FONT>
<TD>
<TD width="143" rowspan="3" colspan="7"><FONT face="Courier New"
size="1"><B><TT>Description</TT></B></FONT>
<TD colspan="11">
<TD width="101" rowspan="3" colspan="4"><FONT face="Courier New"
size="1"><B><TT>Transaction </TT></B></FONT><BR><FONT face="Courier New" size="1"><B><TT>Sum</TT></B></FONT><BR>
<TD colspan="2">
<TD width="89" rowspan="3"><FONT face="Courier New"
size="1"><B><TT>Fee</TT></B></FONT>
<TD>
<TD width="113" rowspan="3" colspan="4"><FONT face="Courier New"
size="1"><B><TT>Sum</TT></B></FONT>
<TD>
<TD width="137" rowspan="3" colspan="2"><FONT face="Courier New"
size="1"><B><TT>Device </TT></B></FONT><BR><FONT face="Courier New" size="1"><B><TT>Id</TT></B></FONT><BR>
<TD>
<TR valign="top">
<TD height="9">
<TD>
<TD>
<TD colspan="3">
<TD width="40" colspan="5"><FONT face="Courier New"
size="1"><B><TT>Reference</TT></B></FONT>
<TD colspan="3">
<TD colspan="2">
<TD>
<TD>
<TD>
<TR valign="top">
<TD height="9">
<TD>
<TD>
<TD colspan="11">
<TD colspan="2">
<TD>
<TD>
<TD>
<TR valign="top">
<TD height="9">
<TD width="71" rowspan="2" colspan="3"><FONT face="Courier New"
size="1"><TT>03/09/2015</TT></FONT>
<TD>
<TD width="89" rowspan="2" colspan="3"><FONT face="Courier New"
size="1"><TT>4405641418</TT></FONT>
<TD>
<TD width="143" rowspan="2" colspan="7"><FONT face="Courier New"
size="1"><TT>WWW.EXAMPLE.COM</TT></FONT>
<TD>
<TD width="71" rowspan="2" colspan="9"><FONT face="Courier New"
size="1"><TT>524601231313</TT></FONT>
<TD>
<TD width="101" rowspan="2" colspan="4"><FONT face="Courier New"
size="1"><TT> 1 087,00</TT></FONT>
<TD colspan="2">
<TD width="89" rowspan="2"><FONT face="Courier New"
size="1"><TT>-26,09</TT></FONT>
<TD>
<TD width="113" rowspan="2" colspan="4"><FONT face="Courier New"
size="1"><TT> 1 060,91</TT></FONT>
<TD>
<TD width="137" rowspan="2" colspan="2"><FONT face="Courier New"
size="1"><TT>11055700</TT></FONT>
<TD>
<TR valign="top">
<TD height="9">
<TD>
<TD>
<TD>
<TD>
<TD colspan="2">
<TD>
<TD>
<TD>
<TR>
<TD height="5" colspan="43">
<TR valign="top">
<TD height="9">
<TD width="71" rowspan="2" colspan="3"><FONT face="Courier New"
size="1"><TT>03/09/2015</TT></FONT>
<TD>
<TD width="89" rowspan="2" colspan="3"><FONT face="Courier New"
size="1"><TT>4405641418</TT></FONT>
<TD>
<TD width="143" rowspan="2" colspan="7"><FONT face="Courier New"
size="1"><TT>WWW.EXAMPLE.COM</TT></FONT>
<TD>
<TD width="71" rowspan="2" colspan="9"><FONT face="Courier New"
size="1"><TT>524601231313</TT></FONT>
<TD>
<TD width="101" rowspan="2" colspan="4"><FONT face="Courier New"
size="1"><TT> 55,00</TT></FONT>
<TD colspan="2">
<TD width="89" rowspan="2"><FONT face="Courier New"
size="1"><TT>-1,32</TT></FONT>
<TD>
<TD width="113" rowspan="2" colspan="4"><FONT face="Courier New"
size="1"><TT> 53,68</TT></FONT>
<TD>
<TD width="137" rowspan="2" colspan="2"><FONT face="Courier New"
size="1"><TT>11055700</TT></FONT>
<TD>
</BODY></HTML>
I need to parse that HTML to my C# entities using HTML agility pack. I'm not able to access TT tag in TD tag.
Here is C# code:
var tds = DocumentNode.SelectSingleNode("//body").SelectNodes("//tr[td[contains(#width,'71') and contains(#colspan,'3')]]").Descendants("tt");
How Can I access a TT tag?
I think Kent is on to something, your document has a lot of unclosed <td> tags and that will cause issues when parsing. I suppose there is a reason even oracle is forcing this to render in IE5 compatible mode.
When looking in the debugger you will see that the HtmlAgilityPack has added a whole lot of close tags to the end of the document (check doc.DocumentNode.OuterHtml in the debugger):
</td></td></td></td></td></td></td></td></td></td></td></td></td></td></td>
</td></td></tr></td></tr></td></td></td></td></td></td></td></td></td></tr>
</td></td></td></td></td></td></td></td></td></td></td></td></td></td></td>
</td></td></tr></td></td></td></td></td></td></td></td></tr></td></td></td>
</td></td></td></td></td></td></td></tr></td></td></td></td></td></td></td>
</td></td></td></td></td></td></td></td></tr></tbody></table></body></html>
These aren't closed where they're supposed to be... Unfortunately, the OptionFixNestedTags is turned on by default and it doesn't seem to influence the parser, as it does need to close these tags. neither does OptionAutoCloseOnEnd = false.
The next issue you're facing is that the SelectSingleNode and SelectNodes methods return null, not an empty collection, so your code will start throwing nullreference exceptions like crazy when anything is not found (which is probably the case in your code, at least it does that in my little test project). If you're using C#6 you can at least use ?. to pre-empt the exception, but that won't fix the search code.
Then you're first calling SelectSingleNode("//body") followed by .SelectNodes("//..."), that second call should not use // which is anchored at the document root, but should use .// to be anchored below the body tag. as it is you might as well remove the SelectSingleNode("//body") call.
Due to the nesting issues, the Xpath won't find any td's directly under tr it seems which fit your description. That is because as far as the Agility Pack is concerted, the td you're looking for is a child of the td that precedes it
This is the structure as it is read:
<TR valign="top">
<TD height="9">
<TD width="71" rowspan="3" colspan="3"><FONT face="Courier New"
size="1"><B><TT>Date</TT></B></FONT><BR>
<TD></td>
</td>
</td>
</tr>
So in order to find your tt tags, you'll have to resort to:
var tds = doc.DocumentNode.SelectNodes("//body//tr//td[#width=71 and #colspan=3]");
Note that I also simplified the attribute lookups, as contains will cause issues if there are any callspan=33 or width=171 for example.
Your best action is to probably go back to the source of the report and query the database directly. Or fix the document first by closing any empty <td>'s before further parsing them.
There may be ways of changing the parser to detect td and tr differently, using by changing the ElementFlags for the node before loading the document, but my attempts have all met the same issues as you're already encountering.
HtmlNode.ElementsFlags.Remove("td");
HtmlNode.ElementsFlags.Add("td", HtmlElementFlag.Closed | HtmlElementFlag.Empty);
HtmlNode.ElementsFlags.Remove("tr");
HtmlNode.ElementsFlags.Add("tr", HtmlElementFlag.Closed);
https://stackoverflow.com/a/293357/736079
If it is only the TT-tags you want.
HtmlNodeCollection tds = DocumentNode.SelectNodes("//body[#dir='LTR']//table//tbody//tr//td//tt");
Should give you all the TT-tags.
Next time could you give a shorter and more concrete HTML-file. This one doesn't have ending tages for Table or tbody.
Also I think that you have to set the option for nested tags to true or else it will ignore td and tt tags.
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags=true;

Parse HTML in c#

So i am trying to get SCN08_SS_GetCustomer_CAM from this html code.
<tr>
<td class="line2left bordered">
<div class="tablelabel typped virtualuser" style="margin-left:00px">SCN08_SS_GetCustomer_CAM</div>
</td>
<td class="line2right bordered">875.2</td>
<td class="line2right bordered">875.2</td>
<td class="line2right bordered">875.2</td>
<td class="line2right bordered">1</td>
<td class="line2right bordered">0</td>
<td class="line2right bordered">0</td>
<td class="line2right bordered"></td>
</tr>
I am basically building a desktop application using WPF. Coding in .net c#.
In htmlagilitypack there is a way to getelementyid but no getelementbyclass. And in this html code there is no id. Hence i will have to get it by class.
So any ideas on how to code this guys?
Here is a nice and simple application
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.Load(#"pathtoyourpage.html");
var result = htmlDoc.DocumentNode.SelectSingleNode("//div[#class='tablelabel typped virtualuser']").InnerText;
Console.WriteLine(result.ToString());
Haven't tested it though

How do I create an email template within my application [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Razor views as email templates
I am sending out an email to the user from within my service of my website.
I want to format this so that it shows nice and a specific way.
Here is what I have:
<table bgcolor="#FFE680" border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td width="100%" height="51" valign="middle"><h1> <strong>[COMPANY] User ID Reminder </strong></h1></td>
<td align="right" valign="top" width="8"><img alt="" width="8" height="8" align="top" /></td>
</tr>
</tbody>
</table>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td><table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td align="left" valign="top" width="100%"><table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td><table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td colspan="1" align="left"></td>
</tr>
<tr>
<td align="left" valign="top" width="100%"><p> <br />
Dear [USERNAME], </p>
<p> In response to your request to be reminded of your User ID, please find below the information we have on file for you. If you didn't submit this request, ignore this email. </p>
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td>Your User ID is: </td>
<td>[USERNAME]</td>
</tr>
<tr>
<td>Your registered email address is:  </td>
<td>[EMAIL]</td>
</tr>
</tbody>
</table>
<p>
If you have forgotten your password, you can request it here.<br />
</p>
<p>
Thank you,<br />[COMPANY]
</p></td>
</tr>
</tbody>
</table></td>
</tr>
</tbody>
</table></td>
</tr>
</tbody>
</table></td>
</tr>
</tbody>
</table>
Now my question isn't about the looks of the email but how do I put this as the body of my email while populating the appropriate fields?
I put it in resources as a string and I figured I'd just do a replace of the specific [fields]. This starting becoming tedious and seemed sloppy:
var body = SuburbanHUB.Properties.Resources.ForgotPasswordEmailBody.Replace("[USERNAME]", username).Replace("[EMAIL]")...
I'm sure there is a better way of doing this but I have not the experience.
=== CLARIFICATION ===
I am using razor to call a WCF service written in C#. The service that is being called is where the email is being sent from and not the view. I am using Razor with MVC with C# as the underlying code.
I'm in the middle of trying to do the same thing. We're using Razor templates to generate the email, and passing in a Model which has all the various variables to fill it out. This lets us include everything that Razor and MVC support, including #if and #Html.Partial, which lets us construct an email from pieces.
The answer we just went with involves running an internal MVC webserver, requesting pages from it with the Model as a parameter, and capturing the response text, but there's variants on my question that are more self contained. Take a look and see if any of the other answers or comments help you.
I had similar requirement and I end up using NVelocity and it worked great for me. Please see this article for a workthrough http://www.codeproject.com/Articles/12751/Template-merging-with-NVelocity-and-ASP-NET. You have to modify your fields in the template according to the velocity templating language. Download NVelocity from http://nvelocity.sourceforge.net/.

Convert a html table into an rss feed

I have a table similar to the one below which I would like to convert into an rss feed somehow. What is the best way to approach this? Should I be scraping the contents and trying to build up an rss or is there a much simpler annd easier way (I'm hoping)? I'm using the asp.net / c# - anyone point me to any tutorials out there that will help me achieve this would be great:)
<table align="left" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td align="left" valign="top" style="width: 125px; height: 125px;" colspan="1"><img title="Costa Rica" alt="Costa Rica" src="/CR_sq.jpg?n=4185" /></td>
<td align="left" valign="top" colspan="1"><strong><font color="#fff" size="2">Costa Rica <br /></font><span class="SubHeadingGrey_7_0">16 August 2012</span></strong><br /><br />Some Text Here <a title="...read on" href="/WorkArea/linkit.aspx?LinkIdentifier=id&ItemID=1234">...read on</a></td>
</tr>
<tr>
<td align="left" valign="top" style="width: 125px; height: 125px;"><img width="117" height="117" title="South Africa" style="width: 117px; height: 117px;" alt="AL 2012 Icon" src="/SA2012.jpg?width=117&height=117&mode=max" /></td>
<td align="left" valign="top"><p><strong><font color="#fff" size="2">South African Story<br /></font><span class="SubHeadingGrey_7_0">16 August 2012</span></strong></p>
<p>This is summary text <a title="... read on" href="/SA.aspx">... read on</a></p>
</td>
</tr>
<tr>
<td align="left" valign="top" style="width: 125px; height: 125px;"><img title="ITALY" alt="ITALY" src="/Italy.jpg?n=43" /></td>
<td align="left" valign="top"><strong><font color="#fff" size="2">Italian Article<br /></font><span class="SubHeadingGrey_7_0">15 August 2012</span></strong><br /><br />Italian Visit Article<a title="...read on" href="/WorkArea/linkit.aspx?LinkIdentifier=id&ItemID=1256">...read on</a></td>
</tr>
</tbody>
</table>
As long as the html is well formed and matches XML you can read it in as xml and then use XSLT to convert it to an rss feed using XslTransform here is a simple example of how to use xlsTransform http://www.xmlfiles.com/articles/cynthia/xslt/default.asp

Categories

Resources