I have some HTML code to show up on an HTML page, so it must not be interpreted as HTML.
Also, I'd like to maintain space/empty line and so on.
I'm on C#/.NET 3.5 : what can I use?
Just use HtmlEncode.
Encodes a string to be displayed in a browser.
And documented in the overloads:
HTML encoding makes sure that text is displayed correctly in the browser and not interpreted by the browser as HTML. For example, if a text string contains a less than sign (<) or greater than sign (>), the browser would interpret these characters as the opening or closing bracket of an HTML tag. When the characters are HTML encoded, they are converted to the strings < and >, which causes the browser to display the less than sign and greater than sign correctly.
It is not clear for what purpose you want to display this, but you may want to pretty print before HTML encoding (the HTML Agility Pack may do this, not sure) - and to show it as fixed width you can enclose in a <pre> element.
Since you're not actually saying which technology within .Net you are using to render your Html page (Asp.Net WebForms or MVC or whatever) the answer falls back to how you would do it in HTML, regardless of your server technology. After that, how you actually achieve this output is entirely up to you.
Render it in a <pre /> block:
<pre>
<p>hello world!</p>
<pre>
Here the text will appear as <p>Hello world!</p> and, by default, appear in a fixed-width font and all whitespace will be retained.
Related
I have a contenteditable div the user enter data. When they enter line break, each browser stores the data differently. When I export this data to Word using HtmlToOpenXml it adds a blank line for the content and I want to avoid that so the html page and word doc look the same.
One option for me is to replace the tags <br>, <div>, <p> with blank and then replace the </div> and </p> with <br/> in the C# code using RegEx. But I do not know what all formatting is used for contenteditable div by different browsers and this implementation may not help.
I would like to know what is the best way to address this or is there any open source tool/dll that helps me with this issue?
e.g. ContentEditable div actual data in browsers looks like below
Chrome -
line1<div>line2</div><div>line3</div>
IE Edge-
<div>line1</div><div>line22</div><div>line3<br></div>
FireFox - I read it uses <p> </p> instead of <div> </div>
Safari - ????
A Solution I found:
You could use RegEx, which I highly recommend in C# for parsing information.
Then effectively based on the formatting you could narrow down what browser it is and then move on towards parsing it's output and what its XML means universally. This will not be easy but no cross-platform ever truly is. I would give a example of how this could be done, but RegEx in all honesty takes a good amount of work and it would be quite a bit of code to make a example that could show you how to parse it and find out what the browser is.
I have this string below:
<List>\r\n <First>\r\n <Second>BlaBla..</Second>\r\n...
But in View (MVC Asp.Net) presenting in one line only.
What can I do to respect the \r\n to broke in a new line?
Thanks
\r\n will format your string in source, it will be visible when you view HTML source of your page. You need to use HTML <br> tag instead of \r\n, so your browser will format your output accordingly.
Apart from the other answers that suggest using <br/>, you can also use the CSS white-space property which can actually make line breaks when it sees \r\n, or the <pre> tag which has this set up by default. See: http://www.w3schools.com/cssref/pr_text_white-space.asp
Assuming that your view is an HTML page rendered in the browser, understand that HTML does not render line break characters. The source code will, but not the HTML. If you need line breaks, you should use a tag structure that renders them, such as <p> or <br>.
Basically i have a webpage with embedded css and JavaScript, so what i want to do is extract only the HTML itself, from texts to tables , images and what not.
So far i have the whole web page stored into a string called "html" the contents of this page is just the facebook hompepage for example,but as you will see there's all scripts and other embedded stuff which i don't want to have.
HTMLEdit = //webpage I chose to store in here//
string html = HTMLEdit.DocumentText;
String result = "this i want to only contain the <head>,<body>,<foot>."
I am only interested in displaying the result witch only contains html, i don't want the JavaScript or css or any other stuff
I have looked at the agility pack but there's no documentation on there website to do this and this is my first ever c# project i have decided to make, so excuse my ignorance if i don't make sense.
See this question
HTML Agility Pack strip tags NOT IN whitelist
Maybe adapt that answer, and drop link and script tags.
i wrote a small application that will monitor the clipboard and paste text directly in a webbrowser component.
...
DocumentWysiwyg = ClipBoardWebBrowser.Document.DomDocument as IHTMLDocument2;
DocumentWysiwyg.designMode = "On";
...
and when the Clipboard changes, i paste the Clipboard content into the webbrowser using:
ClipBoardWebBrowser.Document.Write(Clipboard.GetText(TextDataFormat.Html));
now, when pasting any copied content from web as html, i get inside the html
span class="Apple-converted-space">Â </span
which does not belong to the html i copied from a website.
what are those? and how can i get rid of them?
any help is really appreciated .
here is the html code for google.de as example
http://pastie.org/pastes/3706386/text
how would i make sure that the pasted clipboard is exactly the same as the copied data in the clipboard?
On Mac multiple occurrences of a regular space " " get converted such that they replace every other regular space with a non-breaking space character aka This allows the spaces to still break on every other character, but preserve such ranges of spaces.
The reason for this is because without this trick HTML would compress multiple spaces to a single one. Having only characters would disable line wrapping, because as their name states they would be non-breaking.
In addition to "Apple-converted-space" there was also "Apple-style-span" but that was eliminated from Webkit in 2011: https://www.webkit.org/blog/1737/apple-style-span-is-gone/
So to answer your question: since Webkit is filling the pasteboard you cannot do anything to prevent such behavior.
I have a requirement that user can input HTML tags in the ASP.NET TextBox. The value of the textbox will be saved in the database and then we need to show it
on some other page what he had entered. SO to do so I set the ValidateRequest="false" on the Page directive.
Now the problem is that when user input somthing like :
<script> window.location = 'http://www.xyz.com'; </script>
Now its values saved in the database, but when I am showing its value in some other page It redirects me to "http://www.xyz.com" which is obvious
as the javascript catches it. But I need to find a solution as I need to show exactly what he had entered.
I am thinking of Server.HtmlEncode. Can you guide me to a direction for my requirement
Always always always encode the input from the user and then and only then persist in your database. You can achieve this easily by doing
Server.HtmlEncode(userinput)
Now, when it come time to display the content to the user decode the user input and put it on the screen:
Server.HtmlDecode(userinput)
You need to encode all of the input before you output it back to the user and you could consider implementing a whitelist based approach to what kind of HTML you allow a user to submit.
I suggest a whitelist approach because it's much easier to write rules to allow p,br,em,strong,a (for example) rather than to try and identify every kind of malicious input and blacklist them.
Possibly consider using something like MarkDown (as used on StackOverflow) instead of allowing plain HTML?
You need to escape some characters during generating the HTML: '<' -> <, '>' -> >, '&' -> &. This way you get displayed exactly what the user entered, otherwise the HTML parser would possibly recognize HTML tags and execute them.
Have you tried using HTMLEncode on all of your inputs? I personally use the Telerik RadEditor that escapes the characters before submitting them... that way the system doesn't barf on exceptions.
Here's an SO question along the same lines.
You should have a look at the HTML tags you do not want to support because of vulnerabilities as the one you described, such as
script
img
iframe
applet
object
embed
form, button, input
and replace the leading "<" by "& lt;".
Also replace < /body> and < /html>
HTML editors such as CKEditor allow you to require well-formed XHTML, and define tags to be excluded from input.