Asp.net Mvc, Razor and Localization

Asp.net Mvc, Razor and Localization - c#

I know this matter has already been brought on these pages many times, but still I haven't found the "good solution" I am required to find. Let's start the explanation.
Localization in .net, and in mvc, is made in 2 ways that can even be mixed together:
Resource files (both local or global)
Localized views with a viewengine to call the appropriate view based on culture
I'll explain the solutions I tried and all the problems I got with every one of them.
Text in resource files, all tags in the view
This solution would have me put every text in resources, and every tag in the view, even the inline tags such as [strong] or [span].
Pros:
Clean separation, no structure whatsoever in localization.
Easy encoding: everything that is returned from the resource gets html encoded.
Cons:
If I have a paragraph with some strongs, a couple of link etc I have to split it in many resource keys. This is considered to make the view too unreadable and also takes too much time to create it.
For the same reason as above, if in two different languages the [strong] text is in different places (like "Il cane di Marco" and "Marcos's dog"), I can't achieve it, since all my tags are in the view.
Text and inline tags in resource files, through parameters
This method will have the resources contain some placeholders for string.Format, and those placeholders will be filled with inline tags post-encoding.
Pros:
Clean separation, with just placeholders in the text, so if I am ever to replace [strong] with [em] I do it in the view where I pass it as parameter and it gets changed in every language
Cons:
Encoding is a bit harder, I have to pre-encode the value from the resource, then use string.Format, and finally return it as MvcHtmlString to tell the view engine to not re-encode it when displaying.
For the same reason as above, including, for instance, an ActionLink as parameter would be troublesome. Let's say I get the text for my actionlink from a resource. My method already encodes it. But then, the ActionLink method would re-encode it again. I would need a distinct method to get resources without encoding them, or new helper methods that get an MvcHtmlString instead of a string as text parameter, but both are rather unpractical.
Still takes a whole lot of time to build views, having to create all the resource keys and then fill them.
Localized views
Pros:
All views are plain html. No resources to read.
Cons:
Duplicated html everywhere. I don't even need to explain how this is totally evil.
Have to manually encode all troublesome characters like grave vowels, quotes and such.
Conclusions
A mix of the above techinques inherits pros and cons, but it's still no good. I am challenged to find a proper productive solution, while all of the above are considered "unpractical" and "time consuming".
To make things worse, I found out that there isn't a single tool that refactors "text" from aspx or cshtml (or even html) views/pages into resources. All the tools out there can refactor System.String instances in code files (.cs or .vb) into resources only (resharper for instance, and a couple of others I can't remember now).
So I'm stuck, can't find anything appropriate on my own, and can't find anything on the web either. Is it possible noone else got challenged with this problem before and found a solution?

I personally like the idea of storing inline tags in the resource file. However I do it a little differently. I store very plain tags like <span class='emphasis'>dog</span> and then I use CSS to style the tags appropriately. Now, instead of "passing in" a tag as a parameter, I simply style the span.emphasis rule in my CSS appropriately. Change carries over to all languages.
The Sexier Option:
Another option I thought of and quite enjoy is to use a "readable markup" language like StackOverflow's very own MarkdownSharp. This way you aren't storing any HTML in the resource file, only markdown text. So in your resource you would have **dog** and then it gets shunted through markdown in the view (I created a helper for this, (Usage: Html.Markdown(string text)). Now you're not storing tags, you're storing a common human readable markup language. The markdownsharp source is one .CS file and it's easy to modify. So you could always change the way it renders the ending HTML. This gives you total control over all your resources without storing HTML, and without duplicating views or chunks of HTML.
EDIT
This also gives you control over the encoding. You could easily make sure the content of your resource files contain no valid HTML. Markdown syntax (as you know from using stack overflow) does not contain HTML tags and thus can be encoded without harm. Then you just use your helper to convert the Markdown syntax to valid HTML.
EDIT #2
There is one bug in markdown that I had to fix myself. Anything markdown detects is to be rendered as a "code" block will be HTML encoded. This is a problem if you have already HTML encoded all content being passed to markdown as anything in the code blocks will be essentially re-encoded which turns > into &gt; and completely screws up the text within code blocks. To fix this I modified the markdown.cs file to include a boolean option that stops markdown from encoding text within code blocks. See this issue for the fixed .cs file that I added to the MarkdownSharp project issues.
EDIT #3 - Html Helper Sample
public static class HtmlHelpers
{
public static MvcHtmlString Markdown(this HtmlHelper helper, string text)
{
var markdown = new MarkdownSharp.Markdown
{
AutoHyperlink = true,
EncodeCodeBlocks = false, // This option is my custom option to stop the code block encoding problem.
LinkEmails = true,
EncodeProblemUrlCharacters = true
};
string html = markdown.Transform(markdownText);
return MvcHtmlString.Create(html);
}
}

Nothing stops you from storing HTML in resource files, then calling #Html.Raw(MyResources.Resource).

Have you thought about using localized models, have your view be strongly types to IMyModel and then pass in the appropriately decorated model then you can use/change how your doing your localization fairly easy by modifying the appropriated model.
it's clean, very flexible, and very easy to maintain.
you could start out with Recourse file based localization and then for paces you need to update more often switch that model to a cached DB based localization model.

Related

How to use AntiXss with a Web API

This is a question that has been asked before, but I've not found the information I'm looking for or maybe I'm just missing the point so please bear with me. I can always adjust my question if I'm asking it the wrong way.
If for example, I have a POST endpoint that use a simply DTO object with 2 properties (i.e. companyRequestDto) and contains a script tag in one of its properties. When I call my endpoint from Postman I use the following:
{
"company": "My Company<script>alert(1);</script>",
"description": "This is a description"
}
When it is received by the action in my endpoint,
public void Post(CompanyRequestDto companyRequestDto)
my DTO object will automatically be set and its properties will be set to:
companyDto.Company = "My Brand<script>alert(1);</script>";
companyDto.Description = "This is a description";
I clearly don't want this information to be stored in our database as is, nor do I want it stored as an escaped string as displayed above.
1) Request: So my first question is how do I throw an error if the DTO posted contains some invalid content such as the tag?
I've looked at Microsoft AntiXss but I don't understand how to handle this as the data provided in the properties of a DTO object is not an html string but just a string, so What I am missing here as I don't understand how this is helping sanitizing or validating the passed data.
When I call
var test = AntiXss.AntiXssEncoder.HtmlEncode(companyRequestDto.Company, true);
It returns an encoded string, but then what??
Is there a way to remove disallowed keywords or just simply throw an error?
2) Response: Assuming 1) was not implemented or didn't work properly and it ended up being stored in our database, am I suppose to return encoded data as a json string, so instead of returning:
"My company"
Am I suppose to return:
"My Company<script>alert(1)</script>"
Is the browser (or whatever app) just supposed to display as below then?:
"My Company<script>alert(1)</script>"
3) Code: Assuming there is a way to sanitize or throw an error, should I use this at the property level using attribute on all the properties of my various DTO objects or is there a way to apply this at the class level using an attribute that will validate and/or sanitize all string properties of a DTO object for example?
I found interesting articles but none really answering my problems or I'm having other problems with some of the answers:
asp.net mvc What is the difference between AntiXss.HtmlEncode and HttpUtility.HtmlEncode?
Stopping XSS when using WebAPI (currently looking into this one but don't see how example is solving problem as property is always failing whether I use the script tag or not)
how to sanitize input data in web api using anti xss attack (also looking at this one but having a problem calling ReadFromStreamAsync from my project at work. Might be down to some of the settings in my web.config but haven't figured out why but it always seems to return an empty string)
Thanks.
UPDATE 1:
I've just finished going through the answer from Stopping XSS when using WebAPI
This is probably the closest one to what I am looking for. Except I don't want to encode the data, as I don't want to store it in my database, so I'll see if I can figure out how to throw an error but I'm not sure what the condition will be. Maybe I should just look for characters such as <, >, ; , etc... as these will not likely be used in any of our fields.

You need to consider where your data will be used when you think about encoding, so that data with in it is only a problem if it's rendered as HTML so if you are going to display data that has been provided by users anywhere, it's probably at the point you are going to display it that you would want to html encode it for display (you want to avoid repeatedly html encoding the same string when saving it for example).
Again, it depends what the response is going to be used for... you probably want to html encode it at the point it's going to be displayed... remember if you are encoding something in the response it may not match whats in data so if the calling code could do something like call your API to search for a company with that name that could cause problems. If the browser does display the html encoded version it might look ugly but it's better than users being compromised by XSS attacks.
It's quite difficult to sanitize text for things like tags if you allow most characters for normal use. It's easier if you can whitelist characters allowed and only allow, say, alphanumeric but that isn't often possible. This can be done using a regex validation attribute on the DTO object. The best approach I think is to encode values for display if you can't stop certain characters. It's really difficult to try to allow all characters but avoid things like as people can start using ascii characters etc.

HTML in C# Resource File [duplicate]

I am developing a standard small ASP.NET MVC website which will be made multilingual using ASP.NET resource files.
My question is about the resource files. See if you have a block of text which should be shown in paragraphs, is it appropriate to add <p> tags inside your resource file text?
If not, what would be the best way to deal with it?

You can use the #Html.Raw method inside your view, e.g. #Html.Raw(STRING FROM RESX FILE HERE)

When I faced the same question I decided against having paragraphs directly in resources. Separate text block by simple new lines (Shift+Enter if I'm not mistaken). Then create a helper method to wrap blocks into paragraphs.
using System;
using System.Text;
using System.Web;
namespace MyHtmlExtensions
{
public static class ResourceHelpers
{
public static IHtmlString WrapTextBlockIntoParagraphs(this string s)
{
if (s == null) return new HtmlString(string.Empty);
var blocks = s.Split(new string[] { "\r\n", "\n" },
StringSplitOptions.RemoveEmptyEntries);
StringBuilder htmlParagraphs = new StringBuilder();
foreach (string block in blocks)
{
htmlParagraphs.Append("<p>" + block + "</p>");
}
return new HtmlString(htmlParagraphs.ToString());
}
}
}
Then in your view you import your extension method namespace:
<%# Import Namespace="MyHtmlExtensions" %>
And apply it:
<%= Resources.Texts.HelloWorld.WrapTextBlockIntoParagraphs () %>
I think it is a more cleaner way than putting markup tags into text resources where they in principle do not belong.

You can wrap the data for the resx value field in a CDATA block, <![CDATA[ ]]>.
e.g.
<![CDATA[My text in <b>bold</b>]]>
Depending on how you want to use it, you will probably have to strip off the <![CDATA[ and ]]> within your app.
This approach allows you to include any html markup in the resx file. I do this to store formatted text messages to be displayed in a jquery-ui dialog from various views within an asp.net mvc app.

Yes and no. Resources are a good place to store localized text and basically this will allow you to send the whole resource to some translation agency who will do all translations for you. So, that's a plus.
However, it also increases the chance of bugs simply because you might forget to encode or decode the HTML. The .resx file is an xml file so it needs to be stored encoded in the resource. (Fortunately, it will encode it automatically.) When you retrieve it again, it will be decoded automatically again. However, if you pass the text to other functions, there is a risk that it will be encoded again, resulting in <b>bold</b> instead of bold...
So you will need some additional testing. Which is something you can manage.
But the big problem could be for the person who does the translations. If they just translate the .resx file, they'll see encoded strings. This could be solved if they use something to make the resources more readable. Just as long as they won't translate the tags themselves. They could confuse the persons who will be doing the translations.

You might as well open your resource file in designer and simply type html and text as you wish ..
For example: Resource Text <b>Data</b>
However the solution from #Tony Bennett works perfectly..
Also something like this works; the special characters here are use to open/close the triangle brackets of < and >:
are you targeting to <b>sustain</b> the Mark they achieved

I would say that this should be avoided, but sometimes it is practically unavoidable. Ideally it would be best to store your resources at a level of granularity that allows you to add the presentation markup in common for all languages in the appropriate area of the app.
Sometimes gramatical differences between languages make this difficult. Sometimes you want to store large chunks of text based resources (large enough that they would have to encompass things like p tags). It depends a little what you are doing.

Use URLEncoder to store the HTML string within the resource; then Decode the String and use webview rather than converting all the HTML tags.
URLDecoder decodes the argument which is assumed to be encoded in the
x-www-form-urlencoded MIME content type. '+' will be converted to
space, '%' and two following hex digit characters are converted to the
equivalent byte value. All other characters are passed through
unmodified.
For example "A+B+C %24%25" -> "A B C $%".

We do.
You could use a text-to-HTML converter, but as text contains so much less information than HTML you'd be restricted.

Detect Razor/C# code?

Is there a way to detect if an HTML page contains any razor/C# code? Essentially I want users to be able to provide custom layouts, with tags that I will replace with RenderSection. I want to validate that prior to making this replacement, that none of the HTML contains anything like for example, <a href="#(some C# code)".
All discussions about alternative ways to do this, should/could/would aside, just simply:
Is there a way to programmatically detect if a file contains C#/Razor code?

I don't know a lot about the Razor markup -- but I am thinking that when you grab the layout string they are passing in you will want to parse the text out and grab everything that starts with an # and toss those words into an array. Then, when you republish it to you website use razor code to access the data in the array...
Alternately, and easier, would be to go through all the passed in code and replace all the # signs with a different symbol say & that way it wont get interpreted by the Razor processor:
layoutString = layoutString.Replace('#', '&');

In the browser? No, because unless the programmer made a mistake, there is no Razor/C# code in teh rendered HTML, only HTML that was the result of that.
What you ask is like asking what type of oven was used to bake a pizza from the pizza. Bad news - you never will know.
If you provie sensible tags from those, you could parse them in javascript, but you have to output that metadata yourself as part of the generated html.

After reading your comment to TomTom; the answer is:
No. Razor does not come with any public syntax parser.

Moving strings that contain HTML tag to resource files

I wish to move some of the UI strings into the resource files. The strings contain some of the styling tags, for example :- "I wish to move this < i >string< / i > into resource files."
What is the best way to do this? If possible, please give an example with the code?
PS :- 1) Breaking up the string in 3 parts is not an option, as it makes translation tough.
2) I tried using :- #string.Format(Resources.ResourceString, string).
Where ResourceString = I wish to move this < i >{0}< / i > into resource files.

It's quite common to include HTML formatting in localizable string resources and translators (as well as the CAT tools that translators use) are generally knowledgeable about basic HTML and will keep your HTML tags in the translations. Although of course you should check that your localized strings are OK when you receive your translations and as part of your QA plans.
As you said, breaking up the strings can cause more problems for the translator than keeping them as they are with their HTML tags.

Why isn't MarkdownSharp encoding my HTML?

In my mind, one of the bigger goals of Markdown is to prevent the user from typing potentially malformed HTML directly.
Well that isn't exactly working for me in MarkdownSharp.
This example works properly when you have the extra line break immediately after "abc"...
But when that line break isn't there, I think it should still be HtmlEncoded, but that isn't happening here...
Behind the scenes, the rendered markup is coming from an iframe. And this is the code behind it...
<%
var md = new MarkdownSharp.Markdown();
%>
<%= md.Transform(Request.Form[0]) %>
Surely I must be missing something. Oh, and I am using v1.13 (the latest version as of this writing).
EDIT (this is a test for StackOverflow's implementation)
abc
this shouldn't be red

For those not wanting to use Steve Wortham's customized solution, I have submitted an issue and a proposed fix to the MarkdownSharp guys: http://code.google.com/p/markdownsharp/issues/detail?id=43
If you download my attached Markdown.cs file you will find a new option that you can set. It will stop MarkdownSharp from re-encoding text within the code blocks.
Just don't forget to HTML encode your input BEFORE you pass it into markdown, NOT after.
Another solution is to white-list HTML tags like Stack Overflow does. You would do this AFTER you pass your content to markdown.
See this for more information: http://www.CodeTunnel.com/blog/post/24/mardownsharp-and-encoded-html

Since it became clear that the StackOverflow implementation contains quite a few customizations that could be time consuming to test and figure out, I decided to go another direction.
I created my own simplified markup language that's a subset of Markdown. The open-source project is at http://ultralight.codeplex.com/ and you can see a working example at http://www.bucketsoft.com/ultralight/
The project is a complete ASP.NET MVC solution with a Javascript editor. And unlike MarkdownSharp, safe HTML is guaranteed. The Javascript parser is used both client-side and server-side to guarantee consistent markup (special thanks to the Jurassic Javascript compiler). It's a beautiful thing to only have to maintain one codebase for that parser.
Although the project is still in beta, I'm using it on my own site already and it seems to be working well so far.

Maybe I'm not understanding? If you are starting a new code block in Markdown, in all its varieties, you do need a double linebreak and four-space indentation -- a single newline won't do in any of the renderers I have to hand.
abc -- Here comes a code block:
<div style="background-color: red"> This is code</div>
yielding:
abc -- Here comes a code block:
<div style="background-color: red"> This is code</div>
From what you are saying it seems that MarkdownSharp does fine with this rule, so with just one newline (but indentation):
abc -- Here comes a code block:
<div style="background-color: red"> This should be code</div>
we get a mess not a code block:
abc -- Here comes a code block:
This should be code
I assume StackOverflow is stripping the <div> tags, because they think comments shouldn't have divisions and suchlike things. (?) (In general they have to do a lot of other processing don't they, e.g. to get syntax highlighting and so on?)
EDIT: I think people are expecting the wrong thing of a Markdown implementation. For example, as I say below, there is no such thing as 'invalid markdown'. It isn't a programming language or anything like one. I have verified that all three markdown implementations I have available from the command line indifferently 'convert' random .js and .c files, or those inserted into otherwise sensible markdown -- and also interpolated zip files and other nonsense -- into valid html that browsers don't mind displaying at all -- chicken scratches though it is. If you want to exclude something, e.g. in a wiki program, you do something further, of course, as most markdown-employing wiki programs do.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.