Retrieve only the HTML structure (All DOM elements) from HTML file - c#

I want to retrieve only the structure of HTML document using C# as there is a requirement to create a template from the document and store in a database,which can be used in the future to compare if such a document was earlier received and process further For eg if have the below simple HTML:
<HTML>
<BODY>
<DIV name="Span1">Simple HTML Form</DIV>
<FORM>
<SPAN name="TextLabel">EID: 12345</SPAN>
<SPAN name="TextLabel1">Date:'2019-07-10'</SPAN>
</FORM>
<table>
<tr>
<td>Name </td>
<td> Occupation</td>
</tr>
<tr>
<td> XYZ </td>
<td> SSE </td>
</tr>
</table>
</BODY>
</HTML>
I want the following output:
<HTML>
<BODY>
<DIV></DIV>
<FORM>
<SPAN></SPAN>
<SPAN></SPAN>
</FORM>
<table>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
</table>
</BODY>
</HTML>

Using HtmlAgilityPack might be an option. You can start from this example and develop...
HtmlDocument doc = new HtmlDocument();
string html = #"<HTML>
<BODY>
<DIV name=""Span1"">Simple HTML Form</DIV>
<FORM>
<SPAN name=""TextLabel"">EID: 12345</SPAN>
<SPAN name=""TextLabel1"">Date:'2019-07-10'</SPAN>
</FORM>
<table>
<tr>
<td>Name </td>
<td> Occupation</td>
</tr>
<tr>
<td> XYZ </td>
<td> SSE </td>
</tr>
</table>
</BODY>
</HTML>";
doc.LoadHtml(html);
var nodes = doc.DocumentNode.Descendants();

you can use Regex :
string html = #"<HTML>
<BODY>
<DIV name=""Span1"">Simple HTML Form</DIV>
<FORM>
<SPAN name=""TextLabel"">EID: 12345</SPAN>
<SPAN name=""TextLabel1"">Date:'2019-07-10'</SPAN>
</FORM>
<table>
<tr>
<td>Name </td>
<td> Occupation</td>
</tr>
<tr>
<td> XYZ </td>
<td> SSE </td>
</tr>
</table>
</BODY>
</HTML>";
Regex regex = new Regex(#"<.+?>");
MatchCollection match = regex.Matches(html);
foreach(var item in match)
Console.WriteLine(item);

Related

ASP.Net listview layout behaving oddly

I'm trying to use a ListView in an ASP.Net page and failing to get the results I was expecting. My page looks like this:
<table>
<tr>
<td><label class="subHeading">Contacts</label></td>
</tr>
<tr>
<asp:ListView runat="server" id="lvwContacts">
<LayoutTemplate>
<div class="tableWrapper">
<div class="tableScroll">
<table>
<tr>
<th><label>Full Name</label></th>
<th><label>Job Title</label></th>
<th><label>Direct Line</label></th>
<th><label>Mobile Phone</label></th>
<th><label>Email</label></th>
</tr>
<tr id="itemPlaceHolder" runat="server"></tr>
</table>
</div>
</div>
</LayoutTemplate>
<ItemTemplate>
<tr>
... etc
but when I look at the output the table is not appearing inside the divs:
<div class="tableWrapper">
<div class="tableScroll"></div>
</div>
<table>
<tbody>
<tr>
<td><label class="subHeading">Contacts</label></td>
</tr>
<tr></tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<th><label>Full Name</label></th>
<th><label>Job Title</label></th>
<th><label>Direct Line</label></th>
<th><label>Mobile Phone</label></th>
<th><label>Email</label></th>
</tr>
... etc
I've tried putting the divs around the whole listview with much the same result. What on earth is going on here? Have I done something stupid or do ListViews really behave like this?
Thanks
John
You must make sure you have valid HTML markup. Currently one of your <tr>'s has a <div> as a child, not a <td> or <th>.
See this demo:
/* style used to illustrate problem */
.tableWrapper {
padding: 10px;
background: red;
}
<label>Invalid markup</label>
<table>
<tr>
<td><label class="subHeading">Contacts</label></td>
</tr>
<tr> <!-- Invalid. child is a div not a td or th -->
<div class="tableWrapper">
<div class="tableScroll">
<table>
<tr>
<th><label>Full Name</label></th>
<th><label>Job Title</label></th>
<th><label>Direct Line</label></th>
<th><label>Mobile Phone</label></th>
<th><label>Email</label></th>
</tr>
</table>
</div>
</div>
</tr>
</table>
<hr>
<label>Valid markup</label>
<table>
<tr>
<td><label class="subHeading">Contacts</label></td>
</tr>
<tr>
<td> <!-- This is required! -->
<div class="tableWrapper">
<div class="tableScroll">
<table>
<tr>
<th><label>Full Name</label></th>
<th><label>Job Title</label></th>
<th><label>Direct Line</label></th>
<th><label>Mobile Phone</label></th>
<th><label>Email</label></th>
</tr>
</table>
</div>
</div>
</td>
</tr>
</table>
Inspect the rendered output of both tables... you will see what happens when the markup is not valid (what you are experiencing) the browser removes the <div> from the table. The second table has correct markup so it renders as-is

How to check a Condition in .cshtml file

I want to check the FormattedLastFillDate field ...Some how syntax is throwing an error...Can any one help to write a If condition in .cshtml file...Below is the block of code.
#if ( FormattedLastFillDate!= "My logic")
<tr>
<td class="td--numeric">{{OrderNumber}}</td>
<td>
{{DrugName}}
<div class="order-directions">{{Directions}}</div>
<div class="order-message">{{Message}}</div>
</td>
<td>{{DrugStrength}}</td>
<td>{{DrugForm}}</td>
<td class="td--numeric">{{FormattedRefillsLeft}}</td>
<td class="td--numeric">{{Ndc}}</td>
<td class="td--numeric">{{FormattedLastFillDate}}</td>
</tr>
you need to try this one:
#if ( FormattedLastFillDate!= "My logic")
{
<tr>
<td class="td--numeric">{{OrderNumber}}</td>
<td>
{{DrugName}}
<div class="order-directions">{{Directions}}</div>
<div class="order-message">{{Message}}</div>
</td>
<td>{{DrugStrength}}</td>
<td>{{DrugForm}}</td>
<td class="td--numeric">{{FormattedRefillsLeft}}</td>
<td class="td--numeric">{{Ndc}}</td>
<td class="td--numeric">{{FormattedLastFillDate}}</td>
</tr>
}
the variable should be accessible mode.
I think you were pretty close, try this:
*#{string FormattedLastFillDate= "test";}
#if (FormattedLastFillDate != "test")
{ <tr>
<td class="td--numeric">{{OrderNumber}}</td>
<td>
{{DrugName}}
<div class="order-directions">{{Directions}}</div>
<div class="order-message">{{Message}}</div>
</td>
<td>{{DrugStrength}}</td>
<td>{{DrugForm}}</td>
<td class="td--numeric">{{FormattedRefillsLeft}}</td>
<td class="td--numeric">{{Ndc}}</td>
<td class="td--numeric">{{FormattedLastFillDate}}</td>
</tr>
}*

Move a button to another tr programatically

I have a login form designed using a <table> in an ASP.NET application.
A client is asking to modify the layout by moving around a few UI controls. They want to move the Forgot Password link below the Login button so they are on separate lines.
However, for all other clients, everything needs to remain the same, so if any CSS or style changes need to be done, I would need to do it programatically. Are there any easy ways to do this?
I am trying to avoid creating duplicate user controls to fit the layout they want, and then hide/show controls, depending on the client.
<table style="TABLE-LAYOUT: fixed; WIDTH: 370px;">
<COLGROUP>
<COL width="120px">
<COL width="250px">
</COLGROUP>
<tr>
<td>Username:</td>
<td><input type="text" name="username"></td>
</tr>
<tr>
<td>Password:</td>
<td><input type="text" name="password"></td>
</tr>
<tr>
<td> </td>
<td>
<table>
<tr>
<td style="text-align:right;">
Forgot Password
</td>
<td>
<button type="submit">Login</button>
</td>
</tr>
</table>
</td>
</tr>
</table>
JSfiddle Demo
EDIT: jQuery NOT allowed. JavaScript OK.
Jquery can do this for you. the code might look something like this:
$("#toggle").click(toggleButtonPosition);
function toggleButtonPosition() {
var buttonRow = $("button[type=submit]").closest("tr"), buttonCell, newTR, tr;
if (buttonRow.find("td").length === 2) {
// the table is in its initial configuration. Need to extract the button cell and add it to a new row
buttonCell = buttonRow.find("button").closest("td");
buttonCell.remove();
newTR = $("<tr />").append(buttonCell);
newTR.insertBefore(buttonRow);
} else {
// Reverse the process
buttonCell = buttonRow.find("td");
tr = buttonRow.siblings().first();
tr.append(buttonCell);
buttonRow.remove();
}
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table style="TABLE-LAYOUT: fixed; WIDTH: 370px;">
<COLGROUP>
<COL width="120px">
<COL width="250px">
</COLGROUP>
<tr>
<td>Username:</td>
<td><input type="text" name="username"></td>
</tr>
<tr>
<td>Password:</td>
<td><input type="text" name="password"></td>
</tr>
<tr>
<td> </td>
<td>
<table>
<tr>
<td style="text-align:right;">
Forgot Password
</td>
<td>
<button type="submit">Login</button>
</td>
</tr>
</table>
</td>
</tr>
</table>
Toggle Button Position
You don't need to duplicate the layout. You can do it by adding/ removing the css class depending upon the client using jquery or javascript.
$(document).ready(function(){
var client1 = true; // variable to store client
if(client1){
$('button').removeClass('client2').addClass('client1');
}
else {
$('button').removeClass('client1').addClass('client2');
}});
Css classes to be added/removed from the button depending upon the client. I have put 'client1' css class on the button.
button.client1{display:block;margin:0 auto}
button.client2{float:right;margin-left:4px}
Html code as follows:
<table style="TABLE-LAYOUT: fixed; WIDTH: 370px;">
<COLGROUP>
<COL width="120px">
<COL width="250px">
</COLGROUP>
<tr>
<td>Username:</td>
<td><input type="text" name="username"></td>
</tr>
<tr>
<td>Password:</td>
<td><input type="text" name="password"></td>
</tr>
<tr>
<td> </td>
<td>
<table>
<tr>
<td>
<button type="submit" class='client1'>Login</button>
Forgot Password
</td>
</tr>
</table>
</td>
</tr>
You could use jQuery to archive that.
Here is an example of moving the button one step back
$(document).ready(function() {
var loginLayOut = $(".login"); // login Table
var btnTd = loginLayOut.find("button").parent(); // button container, in this case its td
var tbody = btnTd.parent().parent(); // we get the tbody by moving two step back
var tr = $("<tr></tr>"); // create a new tr
tr.append(btnTd); // append the td to the new created tr
tbody.prepend(tr); // insert the tr to the tbody at position 0
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table class='login' style="TABLE-LAYOUT: fixed; WIDTH: 370px;">
<COLGROUP>
<COL width="120px">
<COL width="250px">
</COLGROUP>
<tr>
<td>Username:</td>
<td><input type="text" name="username"></td>
</tr>
<tr>
<td>Password:</td>
<td><input type="text" name="password"></td>
</tr>
<tr>
<td> </td>
<td>
<table>
<tr>
<td style="text-align:right;">
Forgot Password
</td>
<td>
<button type="submit">Login</button>
</td>
</tr>
</table>
</td>
</tr>
</table>

How to get HTML selector form this source?

Hello im trying to get this "a" atrribute from this HTML source code using HtmlAgilityPack in C#.
<table width='200'>
<tr>
<td width='50'>
<a href='index.php?action=shop&type=koszulka'>
<img src='images/lay_game/miasto/sklep.png' width='40' class="dymek" style='cursor:pointer;' title="Tutaj możesz kupić wyposażenie dla swojego zawodnika" /></a>
</td>
<td>
<a href='index.php?action=shop&type=koszulka' >Sklepy</a>
</td>
</tr>
<tr>
<td width='50'>
<a href='index.php?action=37317|lbr5tlbphafc3cf30b08vl8601|trening|MCMxIzI=|a32a443dd66c39e8cce9a4903171d81b|162f3a6d72c860855a5dc3de18c8855c'>
<img src='images/lay_game/miasto/trening.png' width='40' class="dymek" style='cursor:pointer;' title="Chcesz podnieść swoje umiejętności? Dobrze trafiłeś"/></a>
</td>
<td>
<a href='index.php?action=37317|lbr5tlbphafc3cf30b08vl8601|trening|MCMxIzI=|a32a443dd66c39e8cce9a4903171d81b|162f3a6d72c860855a5dc3de18c8855c'>Trening</a>
</td>
</tr>
<tr>
<td width='50'>
<a href='index.php?action=hospital'>
<img src='images/lay_game/miasto/szpital.png' width='40' class="dymek" style='cursor:pointer;' title="Możesz tu zredukować zmęczenie, wyleczyć kontuzję lub podnieść formę"/></a>
</td>
<td>
<a href='index.php?action=hospital'>Szpital</a>
</td>
</tr>
<tr>
<td width='50'>
<a href='index.php?action=gielda'>
<img src='images/lay_game/miasto/centrum.png' width='40' class="dymek" style='cursor:pointer;' title="Chcesz zarobić i nie boisz się ryzyka? Zatem witamy na giełdzie FT" /></a>
</td>
<td>
<a href='index.php?action=gielda'>Giełda</a>
</td>
</tr>
<tr>
<td width='50'>
<a href='index.php?action=pojedynek'>
<img src='images/lay_game/miasto/pojedynek.png' width='40' class="dymek" style='cursor:pointer;' title="Pojedynek Uliczny." /></a>
</td>
<td>
<a href='index.php?action=pojedynek'>Pojedynek</a>
</td>
</tr>
</table>
My target is a attribute with href="index.php?action=37317|lbr5tlbphafc3cf30b08vl8601|trening|MCMxIzI=|a32a443dd66c39e8cce9a4903171d81b|162f3a6d72c860855a5dc3de18c8855c"
I really dunno how to get this. My trying code is below:
HtmlAgilityPack.HtmlDocument HTMLParser = new HtmlAgilityPack.HtmlDocument();
HTMLParser.LoadHtml(result);
string href;
foreach (HtmlNode node in HTMLParser.DocumentNode.SelectNodes("//table//tr//td//a"))
{
href = node.ChildNodes[0].InnerHtml;
}
But it not working :(
The following should work fine, assuming all you care about is that particular <a> tag:
HtmlNode anchor = HTMLParser.DocumentNode.SelectSingleNode(#"//table/tr[2]/td/a");
There are two <a> elements with href attribute you wanted. More importantly, it isn't clear how you want to identify that particular <a>. Assuming that you want to indentify by inner text "Trening", try this way :
HtmlNode a = HTMLParser.DocumentNode.SelectSingleNode(#"//table/tr/td/a[.='Trening']");
String href = a.GetAttributeValue("href", "");

extract data from an html tbody using c#

I am using c# Web.Client to download an html string.
A small example of the html been returned is
<tbody class='resultBody ' id='Tbody2'>
<tr id='Tr2' class='firstRow'>
<td class='cbrow tier_Gold' rowspan='4'>
<input type='checkbox' name='listingId' value='452' id='Checkbox2' />
</td>
<td class='resNum' rowspan='4'>
<div class='node'>
B</div>
</td>
<td class='datarow busName' id='Td2'>
</td>
<td rowspan='2' class='resLinks'>
</td>
<td class="hoops" rowspan='2'>
</td>
</tr>
<tr>
<td class="datarow">
<dl class="addrBlock">
<dd class="bizAddr">
123 ABC St</dd>
</dl>
</td>
</tr>
</tbody>
<tbody class='resultBody ' id='Tbody3'>
<tr id='Tr3' class='firstRow'>
<td class='cbrow tier_Gold' rowspan='4'>
<input type='checkbox' name='listingId' value='99' id='Checkbox3' />
</td>
<td class='resNum' rowspan='4'>
<div class='node'>
B</div>
</td>
<td class='datarow busName' id='Td3'>
</td>
<td rowspan='2' class='resLinks'>
</td>
<td class="hoops" rowspan='2'>
</td>
</tr>
<tr>
<td class="datarow">
<dl class="addrBlock">
<dd class="bizAddr">
1111 Some St</dd>
</dl>
</td>
</tr>
</tbody>
I am interested in 2 elements of the html but I have no idea the best way to get to them. How would be the best way for me to get the value from and get the inner html from the element
Any suggestions would be great!!!
download the HTML Agility Pack (free)
create a new HtmlDocument
loadhtml
use DOM navigation or an xpath query (SelectSingleNode etc) to find the elements
access InerHtml of the elements you want
The API is similar to XmlDocument, but it works on html that isn't xhtml.

Categories

Resources