HTML in C# Resource File [duplicate]

HTML in C# Resource File [duplicate] - c#

I am developing a standard small ASP.NET MVC website which will be made multilingual using ASP.NET resource files.
My question is about the resource files. See if you have a block of text which should be shown in paragraphs, is it appropriate to add <p> tags inside your resource file text?
If not, what would be the best way to deal with it?

You can use the #Html.Raw method inside your view, e.g. #Html.Raw(STRING FROM RESX FILE HERE)

When I faced the same question I decided against having paragraphs directly in resources. Separate text block by simple new lines (Shift+Enter if I'm not mistaken). Then create a helper method to wrap blocks into paragraphs.
using System;
using System.Text;
using System.Web;
namespace MyHtmlExtensions
{
public static class ResourceHelpers
{
public static IHtmlString WrapTextBlockIntoParagraphs(this string s)
{
if (s == null) return new HtmlString(string.Empty);
var blocks = s.Split(new string[] { "\r\n", "\n" },
StringSplitOptions.RemoveEmptyEntries);
StringBuilder htmlParagraphs = new StringBuilder();
foreach (string block in blocks)
{
htmlParagraphs.Append("<p>" + block + "</p>");
}
return new HtmlString(htmlParagraphs.ToString());
}
}
}
Then in your view you import your extension method namespace:
<%# Import Namespace="MyHtmlExtensions" %>
And apply it:
<%= Resources.Texts.HelloWorld.WrapTextBlockIntoParagraphs () %>
I think it is a more cleaner way than putting markup tags into text resources where they in principle do not belong.

You can wrap the data for the resx value field in a CDATA block, <![CDATA[ ]]>.
e.g.
<![CDATA[My text in <b>bold</b>]]>
Depending on how you want to use it, you will probably have to strip off the <![CDATA[ and ]]> within your app.
This approach allows you to include any html markup in the resx file. I do this to store formatted text messages to be displayed in a jquery-ui dialog from various views within an asp.net mvc app.

Yes and no. Resources are a good place to store localized text and basically this will allow you to send the whole resource to some translation agency who will do all translations for you. So, that's a plus.
However, it also increases the chance of bugs simply because you might forget to encode or decode the HTML. The .resx file is an xml file so it needs to be stored encoded in the resource. (Fortunately, it will encode it automatically.) When you retrieve it again, it will be decoded automatically again. However, if you pass the text to other functions, there is a risk that it will be encoded again, resulting in <b>bold</b> instead of bold...
So you will need some additional testing. Which is something you can manage.
But the big problem could be for the person who does the translations. If they just translate the .resx file, they'll see encoded strings. This could be solved if they use something to make the resources more readable. Just as long as they won't translate the tags themselves. They could confuse the persons who will be doing the translations.

You might as well open your resource file in designer and simply type html and text as you wish ..
For example: Resource Text <b>Data</b>
However the solution from #Tony Bennett works perfectly..
Also something like this works; the special characters here are use to open/close the triangle brackets of < and >:
are you targeting to <b>sustain</b> the Mark they achieved

I would say that this should be avoided, but sometimes it is practically unavoidable. Ideally it would be best to store your resources at a level of granularity that allows you to add the presentation markup in common for all languages in the appropriate area of the app.
Sometimes gramatical differences between languages make this difficult. Sometimes you want to store large chunks of text based resources (large enough that they would have to encompass things like p tags). It depends a little what you are doing.

Use URLEncoder to store the HTML string within the resource; then Decode the String and use webview rather than converting all the HTML tags.
URLDecoder decodes the argument which is assumed to be encoded in the
x-www-form-urlencoded MIME content type. '+' will be converted to
space, '%' and two following hex digit characters are converted to the
equivalent byte value. All other characters are passed through
unmodified.
For example "A+B+C %24%25" -> "A B C $%".

We do.
You could use a text-to-HTML converter, but as text contains so much less information than HTML you'd be restricted.

Related

Using regex to find descriptions in the javascript files of MVC project?

I have an ASP.NET MVC application where I pass certain translations to my views so I can show my pages in different languages (the translations are used in the corresponding javascript file, not in the razor view). Now I have a List<string> for every view which contains the descriptions of the translations I need, which is a real pain to maintain of course. If I change a single javascript file, I need to update the corresponding collection etc.
Now I had a crazy idea, in all my js files I use dictionary.find('<description>') to get access to the translations. Would it be a bad idea to populate my lists when the model is first accessed by using a regex on the javascript files? It would look something like this:
protected static List<string> Descriptions;
private static Model()
{
string basePath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Scripts");
string fileName = Path.Combine(basePath, $"{modelName}.js");
string javascript = File.ReadAllText(fileName);
Regex regex = new Regex(#"dictionary\.find\('(.+?)'\)");
var matches = regex.Matches(javascript)
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
}
This code basically reads the javascript file that is used for the view and finds all the words that are used between dictionary.find('...').
I've already tested this, and it seems to work but my question is:
How bad of an idea is this? Or is it good? My models/scripts are named very consistently etc so that wouldn't be a problem.

I would think that you would want to store data in file types meant for that.
I think there is no problem having a conventions based approach to this, but you might be better suited putting your <description> data in a JSON file. The JSON file could live right next to your js files.
If you do that, you can just load up your translations by using a json serializer and you won't have to muck around with regex.
EDIT: I think it would still be a good idea to have a test verifying that all of your expected files do exist and match your expected format.

Detect Razor/C# code?

Is there a way to detect if an HTML page contains any razor/C# code? Essentially I want users to be able to provide custom layouts, with tags that I will replace with RenderSection. I want to validate that prior to making this replacement, that none of the HTML contains anything like for example, <a href="#(some C# code)".
All discussions about alternative ways to do this, should/could/would aside, just simply:
Is there a way to programmatically detect if a file contains C#/Razor code?

I don't know a lot about the Razor markup -- but I am thinking that when you grab the layout string they are passing in you will want to parse the text out and grab everything that starts with an # and toss those words into an array. Then, when you republish it to you website use razor code to access the data in the array...
Alternately, and easier, would be to go through all the passed in code and replace all the # signs with a different symbol say & that way it wont get interpreted by the Razor processor:
layoutString = layoutString.Replace('#', '&');

In the browser? No, because unless the programmer made a mistake, there is no Razor/C# code in teh rendered HTML, only HTML that was the result of that.
What you ask is like asking what type of oven was used to bake a pizza from the pizza. Bad news - you never will know.
If you provie sensible tags from those, you could parse them in javascript, but you have to output that metadata yourself as part of the generated html.

After reading your comment to TomTom; the answer is:
No. Razor does not come with any public syntax parser.

Moving strings that contain HTML tag to resource files

I wish to move some of the UI strings into the resource files. The strings contain some of the styling tags, for example :- "I wish to move this < i >string< / i > into resource files."
What is the best way to do this? If possible, please give an example with the code?
PS :- 1) Breaking up the string in 3 parts is not an option, as it makes translation tough.
2) I tried using :- #string.Format(Resources.ResourceString, string).
Where ResourceString = I wish to move this < i >{0}< / i > into resource files.

It's quite common to include HTML formatting in localizable string resources and translators (as well as the CAT tools that translators use) are generally knowledgeable about basic HTML and will keep your HTML tags in the translations. Although of course you should check that your localized strings are OK when you receive your translations and as part of your QA plans.
As you said, breaking up the strings can cause more problems for the translator than keeping them as they are with their HTML tags.

Asp.net Mvc, Razor and Localization

I know this matter has already been brought on these pages many times, but still I haven't found the "good solution" I am required to find. Let's start the explanation.
Localization in .net, and in mvc, is made in 2 ways that can even be mixed together:
Resource files (both local or global)
Localized views with a viewengine to call the appropriate view based on culture
I'll explain the solutions I tried and all the problems I got with every one of them.
Text in resource files, all tags in the view
This solution would have me put every text in resources, and every tag in the view, even the inline tags such as [strong] or [span].
Pros:
Clean separation, no structure whatsoever in localization.
Easy encoding: everything that is returned from the resource gets html encoded.
Cons:
If I have a paragraph with some strongs, a couple of link etc I have to split it in many resource keys. This is considered to make the view too unreadable and also takes too much time to create it.
For the same reason as above, if in two different languages the [strong] text is in different places (like "Il cane di Marco" and "Marcos's dog"), I can't achieve it, since all my tags are in the view.
Text and inline tags in resource files, through parameters
This method will have the resources contain some placeholders for string.Format, and those placeholders will be filled with inline tags post-encoding.
Pros:
Clean separation, with just placeholders in the text, so if I am ever to replace [strong] with [em] I do it in the view where I pass it as parameter and it gets changed in every language
Cons:
Encoding is a bit harder, I have to pre-encode the value from the resource, then use string.Format, and finally return it as MvcHtmlString to tell the view engine to not re-encode it when displaying.
For the same reason as above, including, for instance, an ActionLink as parameter would be troublesome. Let's say I get the text for my actionlink from a resource. My method already encodes it. But then, the ActionLink method would re-encode it again. I would need a distinct method to get resources without encoding them, or new helper methods that get an MvcHtmlString instead of a string as text parameter, but both are rather unpractical.
Still takes a whole lot of time to build views, having to create all the resource keys and then fill them.
Localized views
Pros:
All views are plain html. No resources to read.
Cons:
Duplicated html everywhere. I don't even need to explain how this is totally evil.
Have to manually encode all troublesome characters like grave vowels, quotes and such.
Conclusions
A mix of the above techinques inherits pros and cons, but it's still no good. I am challenged to find a proper productive solution, while all of the above are considered "unpractical" and "time consuming".
To make things worse, I found out that there isn't a single tool that refactors "text" from aspx or cshtml (or even html) views/pages into resources. All the tools out there can refactor System.String instances in code files (.cs or .vb) into resources only (resharper for instance, and a couple of others I can't remember now).
So I'm stuck, can't find anything appropriate on my own, and can't find anything on the web either. Is it possible noone else got challenged with this problem before and found a solution?

I personally like the idea of storing inline tags in the resource file. However I do it a little differently. I store very plain tags like <span class='emphasis'>dog</span> and then I use CSS to style the tags appropriately. Now, instead of "passing in" a tag as a parameter, I simply style the span.emphasis rule in my CSS appropriately. Change carries over to all languages.
The Sexier Option:
Another option I thought of and quite enjoy is to use a "readable markup" language like StackOverflow's very own MarkdownSharp. This way you aren't storing any HTML in the resource file, only markdown text. So in your resource you would have **dog** and then it gets shunted through markdown in the view (I created a helper for this, (Usage: Html.Markdown(string text)). Now you're not storing tags, you're storing a common human readable markup language. The markdownsharp source is one .CS file and it's easy to modify. So you could always change the way it renders the ending HTML. This gives you total control over all your resources without storing HTML, and without duplicating views or chunks of HTML.
EDIT
This also gives you control over the encoding. You could easily make sure the content of your resource files contain no valid HTML. Markdown syntax (as you know from using stack overflow) does not contain HTML tags and thus can be encoded without harm. Then you just use your helper to convert the Markdown syntax to valid HTML.
EDIT #2
There is one bug in markdown that I had to fix myself. Anything markdown detects is to be rendered as a "code" block will be HTML encoded. This is a problem if you have already HTML encoded all content being passed to markdown as anything in the code blocks will be essentially re-encoded which turns > into &gt; and completely screws up the text within code blocks. To fix this I modified the markdown.cs file to include a boolean option that stops markdown from encoding text within code blocks. See this issue for the fixed .cs file that I added to the MarkdownSharp project issues.
EDIT #3 - Html Helper Sample
public static class HtmlHelpers
{
public static MvcHtmlString Markdown(this HtmlHelper helper, string text)
{
var markdown = new MarkdownSharp.Markdown
{
AutoHyperlink = true,
EncodeCodeBlocks = false, // This option is my custom option to stop the code block encoding problem.
LinkEmails = true,
EncodeProblemUrlCharacters = true
};
string html = markdown.Transform(markdownText);
return MvcHtmlString.Create(html);
}
}

Nothing stops you from storing HTML in resource files, then calling #Html.Raw(MyResources.Resource).

Have you thought about using localized models, have your view be strongly types to IMyModel and then pass in the appropriately decorated model then you can use/change how your doing your localization fairly easy by modifying the appropriated model.
it's clean, very flexible, and very easy to maintain.
you could start out with Recourse file based localization and then for paces you need to update more often switch that model to a cached DB based localization model.

Removing <div>'s from text file?

Ive made a small program in C#.net which doesnt really serve much of a purpose, its tells you the chance of your DOOM based on todays news lol. It takes an RSS on load from the BBC website and will then look for key words which either increment of decrease the percentage chance of DOOM.
Crazy little project which maybe one day the classes will come uin handy to use again for something more important.
I recieve the RSS in an xml format but it contains alot of div tags and formatting characters which i dont really want to be in the database of keywords,
What is the best way of removing these unwanted characters and div's?
Thanks,
Ash

If you want to remove the DIV tags WITH content as well:
string start = "<div>";
string end = "</div>";
string txt = Regex.Replace(htmlString, Regex.Escape(start) + "(?<data>[^" + Regex.Escape(end) + "]*)" + Regex.Escape(end), string.Empty);
Input: <xml><div>junk</div>XXX<div>junk2</div></xml>
Output: <xml>XXX</xml>

IMHO the easiest way is to use regular expressions. Something like:
string txt = Regex.Replace(htmlString, #"<(.|\n)*?>", string.Empty);
Depending on which tags and characters you want to remove you will modify the regex, of course. You will find a lot of material on this and other methods if you do a web search for 'strip html C#'.
SO question Render or convert Html to ‘formatted’ Text (.NET) might help you, too.

Stripping HTML tags from a given string is a common requirement and you can probably find many resources online that do it for you.
The accepted method, however, is to use a Regular expression based Search and Replace. This article provides a good sample along with benchmarks. Another point worth mentioning is that you would require separate Regex based lookups for the different kinds of unwanted characters you are seeing. (Perhaps showing us an example of the HTML you receive would help)
Note that your requirements may vary based on which tags you want to remove. In your question, you only mention DIV tags. If that is the only tag you need to replace, a simple string search and replace should suffice.

A regular expression such as this:
<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
Would highlight all HTML tags.
Use this to remove them form your data.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.