Using regex to find descriptions in the javascript files of MVC project?

Using regex to find descriptions in the javascript files of MVC project? - c#

I have an ASP.NET MVC application where I pass certain translations to my views so I can show my pages in different languages (the translations are used in the corresponding javascript file, not in the razor view). Now I have a List<string> for every view which contains the descriptions of the translations I need, which is a real pain to maintain of course. If I change a single javascript file, I need to update the corresponding collection etc.
Now I had a crazy idea, in all my js files I use dictionary.find('<description>') to get access to the translations. Would it be a bad idea to populate my lists when the model is first accessed by using a regex on the javascript files? It would look something like this:
protected static List<string> Descriptions;
private static Model()
{
string basePath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Scripts");
string fileName = Path.Combine(basePath, $"{modelName}.js");
string javascript = File.ReadAllText(fileName);
Regex regex = new Regex(#"dictionary\.find\('(.+?)'\)");
var matches = regex.Matches(javascript)
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
}
This code basically reads the javascript file that is used for the view and finds all the words that are used between dictionary.find('...').
I've already tested this, and it seems to work but my question is:
How bad of an idea is this? Or is it good? My models/scripts are named very consistently etc so that wouldn't be a problem.

I would think that you would want to store data in file types meant for that.
I think there is no problem having a conventions based approach to this, but you might be better suited putting your <description> data in a JSON file. The JSON file could live right next to your js files.
If you do that, you can just load up your translations by using a json serializer and you won't have to muck around with regex.
EDIT: I think it would still be a good idea to have a test verifying that all of your expected files do exist and match your expected format.

Related

HTML in C# Resource File [duplicate]

I am developing a standard small ASP.NET MVC website which will be made multilingual using ASP.NET resource files.
My question is about the resource files. See if you have a block of text which should be shown in paragraphs, is it appropriate to add <p> tags inside your resource file text?
If not, what would be the best way to deal with it?

You can use the #Html.Raw method inside your view, e.g. #Html.Raw(STRING FROM RESX FILE HERE)

When I faced the same question I decided against having paragraphs directly in resources. Separate text block by simple new lines (Shift+Enter if I'm not mistaken). Then create a helper method to wrap blocks into paragraphs.
using System;
using System.Text;
using System.Web;
namespace MyHtmlExtensions
{
public static class ResourceHelpers
{
public static IHtmlString WrapTextBlockIntoParagraphs(this string s)
{
if (s == null) return new HtmlString(string.Empty);
var blocks = s.Split(new string[] { "\r\n", "\n" },
StringSplitOptions.RemoveEmptyEntries);
StringBuilder htmlParagraphs = new StringBuilder();
foreach (string block in blocks)
{
htmlParagraphs.Append("<p>" + block + "</p>");
}
return new HtmlString(htmlParagraphs.ToString());
}
}
}
Then in your view you import your extension method namespace:
<%# Import Namespace="MyHtmlExtensions" %>
And apply it:
<%= Resources.Texts.HelloWorld.WrapTextBlockIntoParagraphs () %>
I think it is a more cleaner way than putting markup tags into text resources where they in principle do not belong.

You can wrap the data for the resx value field in a CDATA block, <![CDATA[ ]]>.
e.g.
<![CDATA[My text in <b>bold</b>]]>
Depending on how you want to use it, you will probably have to strip off the <![CDATA[ and ]]> within your app.
This approach allows you to include any html markup in the resx file. I do this to store formatted text messages to be displayed in a jquery-ui dialog from various views within an asp.net mvc app.

Yes and no. Resources are a good place to store localized text and basically this will allow you to send the whole resource to some translation agency who will do all translations for you. So, that's a plus.
However, it also increases the chance of bugs simply because you might forget to encode or decode the HTML. The .resx file is an xml file so it needs to be stored encoded in the resource. (Fortunately, it will encode it automatically.) When you retrieve it again, it will be decoded automatically again. However, if you pass the text to other functions, there is a risk that it will be encoded again, resulting in <b>bold</b> instead of bold...
So you will need some additional testing. Which is something you can manage.
But the big problem could be for the person who does the translations. If they just translate the .resx file, they'll see encoded strings. This could be solved if they use something to make the resources more readable. Just as long as they won't translate the tags themselves. They could confuse the persons who will be doing the translations.

You might as well open your resource file in designer and simply type html and text as you wish ..
For example: Resource Text <b>Data</b>
However the solution from #Tony Bennett works perfectly..
Also something like this works; the special characters here are use to open/close the triangle brackets of < and >:
are you targeting to <b>sustain</b> the Mark they achieved

I would say that this should be avoided, but sometimes it is practically unavoidable. Ideally it would be best to store your resources at a level of granularity that allows you to add the presentation markup in common for all languages in the appropriate area of the app.
Sometimes gramatical differences between languages make this difficult. Sometimes you want to store large chunks of text based resources (large enough that they would have to encompass things like p tags). It depends a little what you are doing.

Use URLEncoder to store the HTML string within the resource; then Decode the String and use webview rather than converting all the HTML tags.
URLDecoder decodes the argument which is assumed to be encoded in the
x-www-form-urlencoded MIME content type. '+' will be converted to
space, '%' and two following hex digit characters are converted to the
equivalent byte value. All other characters are passed through
unmodified.
For example "A+B+C %24%25" -> "A B C $%".

We do.
You could use a text-to-HTML converter, but as text contains so much less information than HTML you'd be restricted.

Is it possible to check for a verb without creating my own database?

I am playing around with a sentence string entry for a project I'm working on in C# and wanted to see if there was an alternative way to search for a verb using a built in function.
Currently, I am using a database table with a list of regular verbs and cycling through those to check if there is a match but wanted to see if there would be a better way to do this?
Consider the following input:
"Develop string matching software for verb"
Program will read the string and check each word,
if (word == isVerb)
{
m_verbs.Add(word);
}

Short answer :
There is a better way.
Long answer :
It's not that simple. The problem is that there is no inbuilt language functionality into the string class in C#. This is an implementation detail that rests on the developer's shoulders.
You have some grammatical (or perhaps lexical is a better word) issues to consider as Owen79 pointed out in his comment. Then there is the question of environment / resource restrictions.
You have a few options available to you :
Web based dictionary services. You can query those with the words of your sentence and get back the 'status' of each word. Then you will take only the statuses you want, like verbs for instance. Here is a link to DictService which also includes a C# code sample.
A text / xml / other file based solution. Similar approach, you simply look up the words in the file and act according to the presence or absence of the word in the file. You can cache (load into memory) the contents of the file to save on IO operations. Here are the links to lists of regular and irregular verbs.
Database solution is identical to the previous one with the exception of loading contents into memory. That part may be unnecessary but that depends on your implementation requirements.
Bottom line each solution will require some work but whatever option you go for the key aspects to consider are the platform and the resources available to you. If computational speed is a concern you will most likely need to do some tricks to cut down on lookup times etc.
Hope this helps

you could load the common verbs from disk in a text file. If you have lots of verbs and worry about memory you could bucket them into common and uncommon or alphabetically then load in the dictionaries if needed

If you don't want to use the databse option (although highly recommanded), then you need to put them in a data structure (e.g. array or list). You can then use powerful System.Linq extension methods.
For example:
string[] allVerbs = new[] { "eat", "drink" }; // etc
string s = "Develop string matching software for verb";
var words = s.Split(' ');
foreach (var word in words)
if (allVerbs.Contains(word.ToLower()))
m_verbs.Add(word);

XML: Searching elements for specific text using C#

I'm trying to get a list of PDF links from different websites. First I'm using the Web client class to download the page source. I then use sgmlReader to convert the HTML to XML. So for one particular site, I'll get a tag that looks like this:
<p>1985 to 1997 Board Action Summary</p>
I need to grab all the links that contain ".pdf". Obviously not all websites are laid out the same, so just searching for a <p> tag, wont be dynamic enough. I'd rather not use linq, but I will if I have to. Thanks in advance.

Linq makes this easy...
var hrefs = doc.Root.Descendants("a")
.Where(a => a.Attrib("href").Value.ToUpper().EndsWith(".PDF"))
.Select(a => a.Attrib("href"));
away you go! (note: did this from memory, so you might have to fix it somewhat)
This will break down for <a/> tags that don't have an href (anchors) but you can fix that surely...

I think you have 2 options here. If you need only the links, you can use Regular Expressions to find the matches for strings ending with .pdf. If you need to manipulate the XML structure or get other values from the XML, it would be better to use XmlDocument and use an XPath query to find out the nodes which have a link to a pdf file in it. Using LINQ to XML just reduces the number of lines of code you have to write.

How to handle paths to files with extra parameters in C#?

I'm downloading files from the Internet inside of my application. Now I'm dealing with multiple file types so I need to able to detect what file type the file is before my application can continue. The problem that I ran into is that some of the URLs where the files are getting downloaded from contain extra parameters.
For example:
http://www.myfaketestsite.com/myaudio.mp3?id=20
Originally I was using String.EndsWith(). Obviously this doesn't work anymore. Any idea on how to detect the file type?

Wrap the URL in a Uri class. It will split it up into different segments that you can use, or you can use the helper methods on the Uri class itself:
var uri = new Uri("http://www.myfaketestsite.com/myaudio.mp3?id=20");
string path = uri.GetLeftPart(UriPartial.Path);
// path = "http://www.myfaketestsite.com/myaudio.mp3"
Your question is a duplicate of:
Truncating Query String & Returning Clean URL C# ASP.net
Get url without querystring

You could always split on the question mark to eliminate the parameters. e.g.
string s = "http://www.myfaketestsite.com/myaudio.mp3?id=20";
string withoutQueryString = s.Split('?')[0];
If no question mark exists, it won't matter, as you'll still be grabbing the value from the zero index. You can then do your logic on the withoutQueryString string.

Asp.net Mvc, Razor and Localization

I know this matter has already been brought on these pages many times, but still I haven't found the "good solution" I am required to find. Let's start the explanation.
Localization in .net, and in mvc, is made in 2 ways that can even be mixed together:
Resource files (both local or global)
Localized views with a viewengine to call the appropriate view based on culture
I'll explain the solutions I tried and all the problems I got with every one of them.
Text in resource files, all tags in the view
This solution would have me put every text in resources, and every tag in the view, even the inline tags such as [strong] or [span].
Pros:
Clean separation, no structure whatsoever in localization.
Easy encoding: everything that is returned from the resource gets html encoded.
Cons:
If I have a paragraph with some strongs, a couple of link etc I have to split it in many resource keys. This is considered to make the view too unreadable and also takes too much time to create it.
For the same reason as above, if in two different languages the [strong] text is in different places (like "Il cane di Marco" and "Marcos's dog"), I can't achieve it, since all my tags are in the view.
Text and inline tags in resource files, through parameters
This method will have the resources contain some placeholders for string.Format, and those placeholders will be filled with inline tags post-encoding.
Pros:
Clean separation, with just placeholders in the text, so if I am ever to replace [strong] with [em] I do it in the view where I pass it as parameter and it gets changed in every language
Cons:
Encoding is a bit harder, I have to pre-encode the value from the resource, then use string.Format, and finally return it as MvcHtmlString to tell the view engine to not re-encode it when displaying.
For the same reason as above, including, for instance, an ActionLink as parameter would be troublesome. Let's say I get the text for my actionlink from a resource. My method already encodes it. But then, the ActionLink method would re-encode it again. I would need a distinct method to get resources without encoding them, or new helper methods that get an MvcHtmlString instead of a string as text parameter, but both are rather unpractical.
Still takes a whole lot of time to build views, having to create all the resource keys and then fill them.
Localized views
Pros:
All views are plain html. No resources to read.
Cons:
Duplicated html everywhere. I don't even need to explain how this is totally evil.
Have to manually encode all troublesome characters like grave vowels, quotes and such.
Conclusions
A mix of the above techinques inherits pros and cons, but it's still no good. I am challenged to find a proper productive solution, while all of the above are considered "unpractical" and "time consuming".
To make things worse, I found out that there isn't a single tool that refactors "text" from aspx or cshtml (or even html) views/pages into resources. All the tools out there can refactor System.String instances in code files (.cs or .vb) into resources only (resharper for instance, and a couple of others I can't remember now).
So I'm stuck, can't find anything appropriate on my own, and can't find anything on the web either. Is it possible noone else got challenged with this problem before and found a solution?

I personally like the idea of storing inline tags in the resource file. However I do it a little differently. I store very plain tags like <span class='emphasis'>dog</span> and then I use CSS to style the tags appropriately. Now, instead of "passing in" a tag as a parameter, I simply style the span.emphasis rule in my CSS appropriately. Change carries over to all languages.
The Sexier Option:
Another option I thought of and quite enjoy is to use a "readable markup" language like StackOverflow's very own MarkdownSharp. This way you aren't storing any HTML in the resource file, only markdown text. So in your resource you would have **dog** and then it gets shunted through markdown in the view (I created a helper for this, (Usage: Html.Markdown(string text)). Now you're not storing tags, you're storing a common human readable markup language. The markdownsharp source is one .CS file and it's easy to modify. So you could always change the way it renders the ending HTML. This gives you total control over all your resources without storing HTML, and without duplicating views or chunks of HTML.
EDIT
This also gives you control over the encoding. You could easily make sure the content of your resource files contain no valid HTML. Markdown syntax (as you know from using stack overflow) does not contain HTML tags and thus can be encoded without harm. Then you just use your helper to convert the Markdown syntax to valid HTML.
EDIT #2
There is one bug in markdown that I had to fix myself. Anything markdown detects is to be rendered as a "code" block will be HTML encoded. This is a problem if you have already HTML encoded all content being passed to markdown as anything in the code blocks will be essentially re-encoded which turns > into &gt; and completely screws up the text within code blocks. To fix this I modified the markdown.cs file to include a boolean option that stops markdown from encoding text within code blocks. See this issue for the fixed .cs file that I added to the MarkdownSharp project issues.
EDIT #3 - Html Helper Sample
public static class HtmlHelpers
{
public static MvcHtmlString Markdown(this HtmlHelper helper, string text)
{
var markdown = new MarkdownSharp.Markdown
{
AutoHyperlink = true,
EncodeCodeBlocks = false, // This option is my custom option to stop the code block encoding problem.
LinkEmails = true,
EncodeProblemUrlCharacters = true
};
string html = markdown.Transform(markdownText);
return MvcHtmlString.Create(html);
}
}

Nothing stops you from storing HTML in resource files, then calling #Html.Raw(MyResources.Resource).

Have you thought about using localized models, have your view be strongly types to IMyModel and then pass in the appropriately decorated model then you can use/change how your doing your localization fairly easy by modifying the appropriated model.
it's clean, very flexible, and very easy to maintain.
you could start out with Recourse file based localization and then for paces you need to update more often switch that model to a cached DB based localization model.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.