C#, turn html to valid html e-mail - c#

I want to turn an html page that can easily be edited on the net to a valid html e-mail (inline styles, absolute links etc).
I have found this project premailer, it changes your html to work well in as much e-mail clients as possible. I want to know if a .NET equivalent exists or if it could be possible to run this project in IronRuby for example.

It's Ruby code, so I would expect that it would run in IronRuby. Did you try it and run into problems?

Related

In C#, how to prevent XSS while allowing HTML input, including br's?

I've been using MS's AntiXSS library for a while now. Recently I decided to change the textareas in my site to be plain textareas (used to be WYSIWYG), and run a conversion on the newlines to br's.
Problem is, MS's AntiXSS library doesn't support this... it strips out the br's. I don't want to let the user's entry go directly into my DB unchecked. Without using the MS AntiXSS library, what's a reliable way to prevent XSS while allowing HTML input, including br's (in C#)?
You can disable your AntiXSS for this field and store directly the input from the user in your database.
That way, you'll be able to render this text on any output and not only HTML.
Now, when you want to display this text on an HTML page using ASP MVC Razor, you can use something like this :
#Html.Encode(Model.MyMultilineTextField).Replace(#"\n", "<br />")
Html.Encode will encode the text so Html tags are not interpreted and the XSS is not possible.
You may add an extension method on Html that does the transformation (whith replace) for you. You may also handle \r.
Is it possible to get a copy of the AntiXSS' output? If so, run your input through the AntiXSS and then make the replacement afterword and store the data yourself.
To resolve this, I decided to store the raw HTML as-is, performing a replace on Environment.Newlines to <br /> before storing it.
Then on the flip side, when showing it to visitors I use the MS AntiXSS code to clean it up. Not 100% the ideal way I'd like to do it, but gets the job done.
I do a bit of caching here to make sure it's not running through AntiXSS on every request too.

jQuery + C# code to resolve URL's from user-supplied text

I'd like to add some kind of simple URL resolution and formatting to my C# and jQuery-based ASP.NET web application. I currently allow users to add simple text-based descriptions to items and leave simple comments ('simple' as in I only allow plain text).
What I need to support is the ability for a user to enter something like:
Check out this cool link: http://www.really-cool-site.com
...and have the URL above properly resolved as a link and automagically turned into a clickable link...kinda like the way the editor in StackOverflow works. Except that we don't want to support BBCode or any of its variants. The user experience would actually be more like the way Facebook resolves user-generated URL's.
What are some jQuery + C# solutions I should consider?
There's another question with a solution that might help you. It uses a regex in pure JS.
Personally though, I would do it server-side when the user submits it. That way, you only need to do it once, rather than every time you display that text. You could use a similar regex in C#.
I ended up using server-side C# code to do the linkification. I use an AJAX-jQuery wrapper to call into a PageMethod that does the work.
The PageMethod both linkifies and sanitizes the user-supplied string, then returns the result.
I use the Microsoft Anti-Cross Site Scripting Library (AntiXSS) to sanitize:
http://www.microsoft.com/download/en/details.aspx?id=5242
And I use C# code I found here and there to resolve and shorten links using good olde string parsing and regular expressions.
My method is not as cool as the way FaceBook does it in real time, but at least now my users can add links to their descriptions and comments.

Detect HTML in ASP.NET

(clarification: this is an old question that has been tweaked for admin purposes)
There have been a fair amount of questions on this site about parsing HTML from textareas and whatnot, or not allowing HTML in Textboxes. My question is similar: How would I detect if HTML is present in the textbox? Would I need to run it through a regular expression of all known HTML tags? Is there a current library for .NET that has the ability to detect when HTML is inserted into a Textarea?
Edit: Similarly, is there a JavaScript Library that does this?
Edit #2: Due to the way that the web app works (It validates textarea text on asyncronous postback using the Validate method of ASP.NET), it bombs before it can get back to the code-behind to use HTML.Encode. My concern was trying to find another way of handling HTML in those instances.
Not really an answer, but why you need it at all? You need to sanitize HTML input only if you are going to output it without modifications, i.e. if you want to allow your users actually to be able to use HTML. And if you want that, you do not have to "detect" HTML, you just need to make sure that you handle it safe. Jeff Atwood has a good routine for this.
If you want to prevent at all HTML output, you can take whatever the user inputs, without any checks. Just take care to HtmlEncode it, and store it that way. Then your output will not have actually any "real" HTML from what the user wrote.
Yes, a regular expression is probably the easiest way to do that.
One regex would be: <([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
You can run that in both ASP.Net and javascript. The .Net framework class you use is System.Text.RegularExpressions.Regex
Hope that helps!
bool containsHtml = Regex.IsMatch(MyTextbox.Text, #"<(.|\n)*?>");
As far as I know you cannot paste HTML into a TextArea and have it work automatically at least in .Net 2.0. ASP.Net automatically santizes input. You need to set ValidateInput page directive to false (If I remember correctly).
If you want to allow HTML tags and want to pick from a possible list of tags, I suggest you lookup 'Markdown' and this Jeff Atwood Post.
+1 Sunny. “detecting” HTML is a fool's errand. You need to escape it on output, and as long as you're doing that you're safe. If you're not escaping it, sanitisation hacks aren't going to make you secure, they're just going to obfuscate the problem.
 Due to the way that the web app works (It validates textarea text on asyncronous postback using the Validate method of ASP.NET)
Yeah, you'll want to stop doing that. ASP.NET's “request validation” is utterly bogus and needs to be turned off if you want to have any chance of processing uploaded content consistently.
Well, in HTML you can't do a lot without a less than symbol "<".
So, I would look for a less than symbol followed by come characters followed by a greater than symbol. If you find that, you can pretty much be assured that it is HTML.
I don't think you have to look for specific tags, as HTML will ignore invalid tags as part of the specification and it would still be considered HTML.
EDIT: Oops! Almost forgot... the ampersand character! If you see one in the text, you MIGHT have HTML since it is used to specify special characters (like © for ©) This can be dangerous because the user could specify < for < so it might turn into HTML later...

Compile ASPX in WinForms App

I'm writing a WinForms application that sends email messages (like a mail merge).
I'd like to use ASP.Net's rendering engine to render the HTML bodies of the messages.
What's the simplest way to get the rendered output of a single ASPX page without the entire ASP.Net runtime?
To make things harder, I'd prefer to compile the ASPX at runtime so that it can be modified without rebuilding the application. However, this is not a requirement; if it's too difficult, I'll give up on it.
Rick Strahl posted an article how to do this at this location: http://www.west-wind.com/presentations/aspnetruntime/aspnetruntime.asp. I know there is a way to call some internal .NET Framework methods but I can't remember what they are off hand.
You may want to consider using a templating lib like NVelocity. Using the WebForms rendering engine in this manner is a bit overkill and hackish at best.
As an aside: keep in mind that HTML in email sucks. Even the most elementary of CSS is ignored by the majority of email clients. If you want my advice, KISS and save your sanity: if you're going to automate emails, send only plain text.

How to allow simple HTML tags in comments or anywhere?

In my web application I am developing a comment functionality, where user's can comment. But I am facing a problem which is I want to allow simple HTML tags in the comment box. HTML tags like <b>, <strong>, <i>, <em>, <u>, etc., that are normally allowed to enter in a commenting box. But then I also want when user presses enter then it will be automatically converted into breaks (<br /> tags) and get stored into database, so that when I'll display them in the web page then they'll look like as user entered.
Can you please tell me how to parse that user entered only allowed set of HTML tags and how to convert enters into <br /> tags and then store them in database.
Or if anyone have some better idea or suggestion to implement this kind of functionality. I am using ASP.NET 2.0 (C#)
I noticed that StackOverflow.com is doing the same thing on Profile Editing. When we edit our profile then below the "About Me" field "basic HTML allowed" line is written, I want to do almost the same functionality.
I don't have a C# specific answer for you, but you can go about it a few different ways. One is to let the user input whatever they want, then you run a filter over it to strip out the "bad" html. There are numerous open source filters that do this for PHP, Python, etc. In general, it's a pretty difficult problem, and it's best to let some well developed 3rd party code do this rather than write it yourself.
Another way to handle it is to allow the user to enter comments in some kind of simpler markup language like BBCode, Textile, or Markdown (stackoverflow is using Markdown), perhaps in conjunction with a nice Javascript editor. You then run the user's text through a processor for one of these markup languages to get the HTML. You can usually obtain implementations of these processors for whatever language you are using. These processors usually strip out the "bad" HTML.
Its rather "simple" to do that in php and python due to the large number of functions.I am still learning c# .lol. but havent yet come across the function.The chances are that it exists and all you need to do is search for it.I mean a function that can take the user input,search for the allowed tags (which are in an array of course) and replace the <> with something else like [] then use a function to escape the other html tags.In php we use htmlentities().
Something like
<code>
$txt=$_POST['comment'];
$txt=strreplace("<b>*</b>","[b]*[/b],"$txt");
$securetxt=htmlentities($txt);
$finaltxt=strreplace("[b]*[/b]","<b>*</b>","$securetxt");
//Now save to Db
I'm not sure, but I think you have to escape html characters when inserting in database and when retrieving echo them unescaped, so the browser can see it just like html.
I don´t know asp.net, but in php there´s an easy function, strip_tags, that let you add exceptions (in your case, b, em, etc.). If there´s nothing like that in C# you can write a regular expression that strips out all tags except the allowed ones but chances are that such an expression already exists so it should be easy to find.
replacing \n (or something similar) with br shouldn´t be a problem either with a simple search and replace.
This is a dangerous road to go down. You might think you can do some awesome regexes, or find someone who can help you with it, but sanitizing SOME markup and leaving other is just crazy talk.
I highly recommend you look into BBCode or another token system. Even something untokenized such as what SO uses, is probably a much better solution.

Categories

Resources