What is the intended use of the HtmlAgilityPack MixedCodeDocument? - c#

I am using version 1.4 of the HtmlAgilityPack and as I understand it, the MixedCodeDocument and related classes are there to help you parse asp.net markup as found in aspx and ascx files. I've found zero documentation or examples for the MixedCodeDocument class. From what I've tried, it seems that the MixedCodeDocument breaks a file's text into chunks separating asp.net fragments from non-asp.net fragments. For example, the following snippet:
<asp:Label ID="lbl_xyz" runat="server" Text='<%=Name%>'></asp:Label>
<a href='#'>blah</a>
would be broken up into:
// Text fragment 1
<asp:Label ID="lbl_xyz" runat="server" Text="
// Code fragment 1
<%=Name%>
// Text fragment 2 (two lines)
></asp:Label>
<a href='#'>blah</a>
But there is no parsing done any deeper than that, i.e. the a tag is not parsed into its own node with attributes or anything like that.
So my best guess is that the MixedCodeDocument is expected to be used to strip out the code fragments so that the remaining text fragments can be pieced together and then parsed using the HtmlDocument class.
Does anybody know if that's correct? Or even better, does anybody have any tips for ways to successfully parse and manipulate an aspx or ascx file using the HAP or other?

You guess is 100% correct.
The MixedCodeDocument class was designed to be able to parse a text that contains two languages, that is, classic ASP, ASP.NET, etc. hence the name :-)
Originally the Html Agility Pack was used in a tool that is capable of processing and transforming a whole tree of various files, including HTML and other types of file. If you needed to replace only the HTML parts for other files, this class helped you split code & markup and. Separated code and markup blocks can then be parsed by other means.
I don't think anyone's using it today :)

Related

Export contenteditable div data to Word causes blank line

I have a contenteditable div the user enter data. When they enter line break, each browser stores the data differently. When I export this data to Word using HtmlToOpenXml it adds a blank line for the content and I want to avoid that so the html page and word doc look the same.
One option for me is to replace the tags <br>, <div>, <p> with blank and then replace the </div> and </p> with <br/> in the C# code using RegEx. But I do not know what all formatting is used for contenteditable div by different browsers and this implementation may not help.
I would like to know what is the best way to address this or is there any open source tool/dll that helps me with this issue?
e.g. ContentEditable div actual data in browsers looks like below
Chrome -
line1<div>line2</div><div>line3</div>
IE Edge-
<div>line1</div><div>line22</div><div>line3<br></div>
FireFox - I read it uses <p> </p> instead of <div> </div>
Safari - ????
A Solution I found:
You could use RegEx, which I highly recommend in C# for parsing information.
Then effectively based on the formatting you could narrow down what browser it is and then move on towards parsing it's output and what its XML means universally. This will not be easy but no cross-platform ever truly is. I would give a example of how this could be done, but RegEx in all honesty takes a good amount of work and it would be quite a bit of code to make a example that could show you how to parse it and find out what the browser is.

Which plugins can I use to print HTML code?

I have some HTML code to show up on an HTML page, so it must not be interpreted as HTML.
Also, I'd like to maintain space/empty line and so on.
I'm on C#/.NET 3.5 : what can I use?
Just use HtmlEncode.
Encodes a string to be displayed in a browser.
And documented in the overloads:
HTML encoding makes sure that text is displayed correctly in the browser and not interpreted by the browser as HTML. For example, if a text string contains a less than sign (<) or greater than sign (>), the browser would interpret these characters as the opening or closing bracket of an HTML tag. When the characters are HTML encoded, they are converted to the strings < and >, which causes the browser to display the less than sign and greater than sign correctly.
It is not clear for what purpose you want to display this, but you may want to pretty print before HTML encoding (the HTML Agility Pack may do this, not sure) - and to show it as fixed width you can enclose in a <pre> element.
Since you're not actually saying which technology within .Net you are using to render your Html page (Asp.Net WebForms or MVC or whatever) the answer falls back to how you would do it in HTML, regardless of your server technology. After that, how you actually achieve this output is entirely up to you.
Render it in a <pre /> block:
<pre>
<p>hello world!</p>
<pre>
Here the text will appear as <p>Hello world!</p> and, by default, appear in a fixed-width font and all whitespace will be retained.

$(selector).text() equivalent in c# (Revised)

I am trying check if the inner html of the element is empty but I wanted to do the validation on the server side, I'm treating the html as a string. Here is my code
public string HasContent(string htmlString){
// this is the expected value of the htmlString
// <span class="spanArea">
// <STYLE>.ExternalClass234B6D3CB6ED46EEB13945B1427AA47{;}</STYLE>
// </span>
// From this jquery code-------------->
// if($('.spanArea').text().length>0){
//
// }
// <------------------
// I wanted to convert the jquery statement above into c# code.
/// c# code goes here
return htmlSTring;
}
using this line
$('.spanArea').text() // what is the equivalent of this line in c#
I will know if the .spanArea does really have something to display in the ui or not. I wanted to do the checking on the server side. No need to worry about how to I managed to access the DOM I have already taken cared of it. Consider the htmlString as the Html string.
My question is if there is any equivalent for this jquery line in C#?
Thanks in advance! :)
If you really need to get that data from the HTML in the ServerSide then I would recommend you to use a Html-Parser for that job.
If you check other SO posts you will find that Html Agility Pack was recommended many times.
Tag the SpanArea with runat="server" and you can then access it in the code behind:
<span id="mySpan" class="spanArea" runat="server" />
You can then:
string spanContent = mySpan.InnerText;
Your code-behind for the page that includes this AJAX call will have already have executed (in presenting the page to the browser) before the AJAX call is ever executed so your question doesn't appear correct.
The code-behind that is delivering the HTML fragment you indicated is probably constructing that using a StringBuilder or similar so you should be able to verify in that code whether there is any data.
The fragment you provided only includes a DIV, a SPAN and a STYLE tag. This is all likely to collapse to a zero width element and display nothing.
Have a look at this article which will help you understand the ASP.NET page life cycle.

C# - Cutting HTML String into separate lines without breaking HTML tags

I have to break a HTML content string in to multiple lines...
And each line should have some fixed characters, 50 or 60
Also I don't want to break the word..or html tags...
ex : <p>Email: someone#gmail.com</p>
<p><em>"Text goes <font color=red>Hello world</font> Text goes here and Text goes here &nbsp Text goes here 1976."</em> </p>
How can I acheive this in C# ?
Any help would be appreciated...
I think you will need a HTML parser, and then you will have to serialize it again.
Instead of creating your own parser and serializer you should look into existing libraries.
For the parser I recommend the OSS Html Agility Pack

Wiki rendering engine for C#? like redcloth, bluecloth, or something decent

We have used the redcloth and bluecloth wiki renderer's with Ruby, basically you can do something like this...
html = RedCloth.to_html(wiki_content)
and poof, you get back HTML.
Is there something out there for C#/.NET ?
try http://wikiplex.codeplex.com/
There are some wiki rendering engines but the names escape me right now. Perhaps check out some of these open-source options? I've previously reviewed MindTouch from that list for an application and it was quite rich, but it did much more than I needed to do.
If you just need something to turn text into HTMLcontent, I use Halide which lets people type in a textarea then it'll HTML-ify links, remove dangerous content, add <p></p> and <br />, etc. Very simple but no built-in formatting options.
SO uses a custom version of Markdown for their text editor and HTML content rendering. Search google for Markdown.NET for a number of ports.

Categories

Resources