Tag Cloud in C#

Tag Cloud in C# - c#

I am making a small C# application and would like to extract a tag cloud from a simple plain text. Is there a function that could do that for me?

Building a tag cloud is, as I see it, a two part process:
First, you need to split and count your tokens. Depending on how the document is structured, as well as the language it is written in, this could be as easy as counting the space-separated words. However, this is a very naive approach, as words like the, of, a, etc... will have the biggest word-count and are not very useful as tags. I would suggest implementing some sort of word black list, in order to exclude the most common and meaningless tags.
Once you have the result in a (tag, count) way, you could use something similar to the following code:
(Searches is a list of SearchRecordEntity, SearchRecordEntity holds the tag and its count, SearchTagElement is a subclass of SearchRecordEntity that has the TagCategory attribute,and ProcessedTags is a List of SearchTagElements which holds the result)
double max = Searches.Max(x => (double)x.Count);
List<SearchTagElement> processedTags = new List<SearchTagElement>();
foreach (SearchRecordEntity sd in Searches)
{
var element = new SearchTagElement();
double count = (double)sd.Count;
double percent = (count / max) * 100;
if (percent < 20)
{
element.TagCategory = "smallestTag";
}
else if (percent < 40)
{
element.TagCategory = "smallTag";
}
else if (percent < 60)
{
element.TagCategory = "mediumTag";
}
else if (percent < 80)
{
element.TagCategory = "largeTag";
}
else
{
element.TagCategory = "largestTag";
}
processedTags.Add(element);
}

I would really recommend using http://thetagcloud.codeplex.com/. It is a very clean implementation that takes care of grouping, counting and rendering of tags. It also provides filtering capabilities.

Take a look at http://sourcecodecloud.codeplex.com/

Here is an ASP.NET Cloud COntrol, that might help you at least get started, full source included.

You may want to take a look at WordCloud, a project on CodeProject. It includes 430 stops words (like the, an, a, etc.) and uses the Porter stemming algorithm, which reduces words to their root for so that "stemmed stemming stem" are all counted as 1 occurrence of the same word.
It's all in C# - the only thing you would have to do it modify it to output HTML instead of the visualization it creates.

Have a look at this answer for an algorithm:
Algorithm to implement a word cloud like Wordle
The "DisOrganizer" mentioned in the answers could serve your purpose. With a little change, you can let this "Disorganizer" to serve an image, the way you wanted. PS: The code is written in C# https://github.com/chandru9279/zasz.me/blob/master/zasz.me/

Take a look at this. It worked for me. There is a project under Examples folder named WebExample which will help you for solving this.
https://github.com/chrisdavies/Sparc.TagCloud

I'm not sure if this is exactly what your looking for but it may help you get started:
LINQ that counts word frequency(in VB but I'm converting to C# now)
Dim Words = "Hello World ))))) This is a test Hello World"
Dim CountTheWords = From str In Words.Split(" ") _
Where Char.IsLetter(str) _
Group By str Into Count()

You could store a category and the amount of items it has in some sort of collection, or database table.
From that, you can get the count for a certain category and have certain bounds. So your parameter is the category, and your return value is a count.
So if the count is >10 & <20, then apply a .CSS style to the link which will be of a certain size.
You can store these counts as keys in a collection, and then get the value where the key matches your return value (as I mentioned above).
I haven't got source code at hand for this process, but you won't find a simple function to do all this for you either. A control, yes (as above).
This is a very conventional approach and the standard way of doing it from what I've seen in magazine tutorials, etc, and the first approach I would think of (not necessarily the best).

The Zoomable TagCloud Generator which extracts keywords from a given source (text file and other sources) and displays the TagCloud as Zooming User Interface (ZUI)

Related

Dynamic Regex generation for predictable repeating string patterns in a data feed

I'm currently trying to process a number of data feeds that I have no control over, where I am using Regular Expressions in C# to extract information.
The originator of the data feed is extracting basic row data from their database (like a product name, price, etc), and then formatting that data within rows of English text. For each row, some of the text is repeated static text and some is the dynamically generated text from the database.
e.g
Panasonic TV with FREE Blu-Ray Player
Sony TV with FREE DVD Player + Box Office DVD
Kenwood Hi-Fi Unit with $20 Amazon MP3 Voucher
So the format in this instance is: PRODUCT with FREEGIFT.
PRODUCT and FREEGIFT are dynamic parts of each row, and the "with" text is static. Each feed has about 2000 rows.
Creating a Regular Expression to extract the dynamic parts is trivial.
The problem is that the marketing bods in control of the data feed keep on changing the structure of the static text, usually once a fortnight, so this week I might have:
Brand new Panasonic TV and a FREE Blu-Ray Player if you order today
Brand new Sony TV and a FREE DVD Player + Box Office DVD if you order today
Brand new Kenwood Hi-Fi unit and a $20 Amazon MP3 Voucher if you order today
And next week it will probably be something different, so I have to keep modifying my Regular Expressions...
How would you handle this?
Is there an algorithm to determine static and variable text within repeating rows of strings? If so, what would be the best way to use the output of such an algorithm to programatically create a dynamic Regular Expression?
Thanks for any help or advice.

This code isn't perfect, it certainly isn't efficient, and it's very likely to be too late to help you, but it does work. If given a set of strings, it will return the common content above a certain length.
However, as others have mentioned, an algorithm can only give you an approximation, as you could hit a bad batch where all products have the same initial word, and then the code would accidentally identify that content as static. It may also produce mismatches when dynamic content shares values with static content, but as the size of samples you feed into it grows, the chance of error will shrink.
I'd recommend running this on a subset of your data (20000 rows would be a bad idea!) with some sort of extra sanity checking (max # of static elements etc)
Final caveat: it may do a perfect job, but even if it does, how do you know which item is the PRODUCT and which one is the FREEGIFT?
The algorithm
If all strings in the set start with the same character, add that character to the "current match" set, then remove the leading character from all strings
If not, remove the first character from all strings whose first x (minimum match length) characters aren't contained in all the other strings
As soon as a mismatch is reached (case 2), yield the current match if it meets the length requirement
Continue until all strings are exhausted
The implementation
private static IEnumerable<string> FindCommonContent(string[] strings, int minimumMatchLength)
{
string sharedContent = "";
while (strings.All(x => x.Length > 0))
{
var item1FirstCharacter = strings[0][0];
if (strings.All(x => x[0] == item1FirstCharacter))
{
sharedContent += item1FirstCharacter;
for (int index = 0; index < strings.Length; index++)
strings[index] = strings[index].Substring(1);
continue;
}
if (sharedContent.Length >= minimumMatchLength)
yield return sharedContent;
sharedContent = "";
// If the first minMatch characters of a string aren't in all the other strings, consume the first character of that string
for (int index = 0; index < strings.Length; index++)
{
string testBlock = strings[index].Substring(0, Math.Min(minimumMatchLength, strings[index].Length));
if (!strings.All(x => x.Contains(testBlock)))
strings[index] = strings[index].Substring(1);
}
}
if (sharedContent.Length >= minimumMatchLength)
yield return sharedContent;
}
Output
Set 1 (from your example):
FindCommonContent(strings, 4);
=> "with "
Set 2 (from your example):
FindCommonContent(strings, 4);
=> "Brand new ", "and a ", "if you order today"
Building the regex
This should be as simple as:
"{.*}" + string.Join("{.*}", FindCommonContent(strings, 4)) + "{.*}";
=> "^{.*}Brand new {.*}and a {.*}if you order today{.*}$"
Although you could modify the algorithm to return information about where the matches are (between or outside the static content), this will be fine, as you know some will match zero-length strings anyway.

I think it would be possible with an algorithm , but the time it would take you to code it versus simply doing the Regular Expression might not be worth it.
You could however make your changing process faster. If instead of having your Regex String inside your application, you'd put it in a text file somewhere, you wouldn't have to recompile and redeploy everything every time there's a change, you could simply edit the text file.
Depending on your project size and implementation, this could save you a generous amount of time.

Max edit distance and suggestion based on word frequency

I need a spell checker with the following specification:
Very scalable.
To be able to set a maximum edit distance for the suggested words.
To get suggestion based on provided words frequencies (most common word first).
I took a look at Hunspell:
I found the parameter MAXDIFF in the man but doesn't seem to work as expected. Maybe I'm using it the wrong way
file t.aff:
MAXDIFF 1
file dico.dic:
5
rouge
vert
bleu
bleue
orange
-
NHunspell.Hunspell h = new NHunspell.Hunspell("t.aff", "dico.dic");
List<string> s = h.Suggest("bleuue");
returns the same thing t.aff being empty or not:
bleue
bleu

We decided to use Apache Solr, which exactly fulfills our needs.
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck

A maxdiff of one should return a few, but still can return more than one.
Even a maxdiff of zero can give more than a single result, but it should lower the change. It depends on the n-gram. Try a maxdiff of zero less results, but this still doesn't guarantee you will get a single suggestion.
For your requirement to sort on the most frequent word, the Google ngram corpus is publicly available.

performance problem

OK so I need to know if anyone can see a way to reduce the number of iterations of these loops because I can't. The first while loop is for going through a file, reading one line at a time. The first foreach loop is then comparing each of the compareSet with what was read in the first while loop. Then the next while loop is to do with bit counting.
As requested, an explaination of my algorithm:
There is a file that is too large to fit in memory. It contains a word followed by the pages in a very large document that this word is on. EG:
sky 1 7 9 32....... (it is not in this format, but you get the idea).
so parseLine reads in the line and converts it into a list of ints that are like a bit array where 1 means the word is on the page, and 0 means it isn't.
CompareSet is a bunch of other words. I can't fit my entire list of words into memory so I can only fit a subset of them. This is a bunch of words just like the "sky" example. I then compare each word in compareSet with Sky by seeing if they are on the same page.
So if sky and some other word both have 1 set at a certain index in the bit array (simulated as an int array for performance), they are on the same page. The algorithm therefore counts the occurances of any two words on a particular page. So in the end I will have a list like:
(for all words in list) is on the same page as (for all words in list) x number of times.
eg sky and land is on the same page x number of times.
while ((line = parseLine(s)) != null) {
getPageList(line.Item2, compareWord);
foreach (Tuple<int, uint[], List<Tuple<int, int>>> word in compareSet) {
unchecked {
for (int i = 0; i < 327395; i++) {
if (word.Item2[i] == 0 || compareWord[i] == 0)
continue;
uint combinedNumber = word.Item2[i] & compareWord[i];
while (combinedNumber != 0) {
actual++;
combinedNumber = combinedNumber & (combinedNumber - 1);
}
}
}

As my old professor Bud used to say: "When you see nested loops like this, your spidey senses should be goin' CRAZY!"
You have a while with a nested for with another while. This nesting of loops is an exponential increase on the order of operations. Your one for loop has 327395 iterations. Assuming they have the same or similar number of iterations, that means you have an order of operations of
327,395 * 327,395 * 327,395 = 35,092,646,987,154,875 (insane)
It's no wonder that things would be slowing down. You need to redefine your algorithm to remove these nested loops or combine work somewhere. Even if the numbers are smaller than my assumptions, the nesting of the loops is creating a LOT of operations that are probably unnecessary.

As Joal already mentioned nobody is able to optimize this looping algorithm. But what you can do is trying to better explain what you are trying to accomplish and what your hard requirements are. Maybe you can take a different approach by using some like HashSet<T>.IntersectWith() or BloomFilter or something like this.
So if you really want help from here you should not only post the code that doesn't work, but also what the overall task is you like to accomplish. Maybe someone has a completely other idea to solve your problem, making your whole algorithm obsolete.

Many times, when generating messages to show to the user, the message will contain a number of something that I want to inform the customer about.
I'll give an example: The customer has selected a number of items from 1 and up, and has clicked delete. Now I want to give a confirmation message to the customer, and I want to mention the number of items he has selected to minimize the chance of him making a mistake by selecting a bunch of items and clicking delete when he only wants to delete one of them.
One way is to make the generic message like this:
int noofitemsselected = SomeFunction();
string message = "You have selected " + noofitemsselected + " item(s). Are you sure you want to delete it/them?";
The "problem" here is the case where noofitemselected is 1, and we have to write item and it instead of items and them.
My normal solution will be something like this
int noofitemsselected = SomeFunction();
string message = "You have selected " + noofitemsselected + " " + (noofitemsselected==1?"item" : "items") + ". Are you sure you want to delete " + (noofitemsselected==1?"it" : "them") + "?";
This gets quite long and quite nasty really fast if there are many references to the numbers plurality inside the code, and the actual message gets hard to read.
So my questions is simply. Are there any better ways of generating messages like this?
EDIT
I see a lot of persons has got very hung up in the case that I mentioned that the message should be displayed inside a message box, and has simply given an answer of how to avoid using the message box at all, and that is all good.
But remember that the problem of pluralization also apply to texts other places in the program in addition to message boxes. For example, a label alongside a grid displaying the number of lines selected in the grid will have the same problem regarding pluralization.
So this basically apply to most text that is outputted in some way from programs, and then the solution is not as simple as to just change the program to not output text anymore :)

You can avoid all of this messy plurality by just deleting the items without any message and giving the user a really good Undo facility. Users never read anything. You should build a good Undo facility as part of your program anyway.
You actually get 2 benefits when you createe a comprehensive Undo facility. The first benefit makes the user's life easier by allowing him/her to reverse mistakes and minimise reading. The second benefit is that your app is reflecting real life by allowing the reversal of non-trivial workflow (not just mistakes).
I once wrote an app without using a single dialog or confirmation message. It took some serious thinking and was significantly harder to implement than using confirmation-type messages. But the end result was rather nice to use according to its end-users.

If there is ever any chance, no matter how small, that this app will need to be translated to other languages then both are wrong. The correct way of doing this is:
string message = ( noofitemsselected==1 ?
"You have selected " + noofitemsselected + " item. Are you sure you want to delete it?":
"You have selected " + noofitemsselected + " items. Are you sure you want to delete them?"
);
This is because different languages handle plurality differently. Some like Malay don't even have syntactic plurals so the strings would generally be identical. Separating the two strings makes it easier to support other languages later on.
Otherwise if this app is meant to be consumed by the general public and is supposed to be user friendly then the second method is preferable. Sorry but I don't really know a shorter way of doing this.
If this app is meant to be consumed only internally by your company then do the shortcut "item(s)" thing. You don't really have to impress anybody when writing enterprisy code. But I'd advise against doing this for publicly consumed app because this gives the impression that the programmer is lazy and thus lower their opinion of the quality of the app. Trust me, small things like this matter.

How about just:
string message = "Are you sure you want to delete " + noofitemsselected + " item(s)?"
That way, you eliminate the number agreement difficulties, and end up with an even shorter, more to-the-point error message for the user as a bonus. We all know users don't read error messages anyway. The shorter they are, the more likely they are to at least glance at the text.
Or, armed with this knowledge that users don't read error messages, you could approach this a different way. Skip the confirmation message altogether, and just provide an undo feature that Just Works, regardless of what was deleted. Most users are already accustomed to undoing an operation when they notice it was not what they wanted, and are likely to find this behavior more natural than having to deal with another annoying pop-up.

What about what Java has had for years: java.text.MessageFormat and ChoiceFormat? See http://download.oracle.com/javase/1.4.2/docs/api/java/text/MessageFormat.html for more information.
MessageFormat form = new MessageFormat("The disk \"{1}\" contains {0}.");
form.applyPattern(
"There {0,choice,0#are no files|1#is one file|1<are {0,number,integer} files}.");
Object[] testArgs = {new Long(12373), "MyDisk"};
System.out.println(form.format(testArgs));
// output, with different testArgs
output: The disk "MyDisk" are no files.
output: The disk "MyDisk" is one file.
output: The disk "MyDisk" are 1,273 files.
In your case you want something somewhat simpler:
MessageFormat form = new MessageFormat("Are you sure you want to delete {0,choice,1#one item,1<{0,number.integer} files}?");
The advantage of this approach is that it works well with the i18n bundles, and you can provide translations properly for languages (like Japanese) that have no concept of plural or singular words.

I'd go with not hardcoding the message, but providing two messages in an seperate Resource file. Like
string DELETE_SINGLE = "You have selected {0} item. Are you sure you want to delete it?";
string DELETE_MULTI = "You have selected {0} items. Are you sure you want to delete them?";
and then feeding them into String.Format like
if(noofitemsselected == 1)
messageTemplate = MessageResources.DELETE_SINGLE;
else
messageTemplate = MessageResources.DELETE_MULTI;
string message = String.Format(messageTemplate, noofitemsselected)
I think that this approach is easier to localize and maintain. All UI messages would be at a single locaion.

You can sidestep the issue entirely by phrasing the message differently.
string message = "The number of selected items is " + noofitemsselected + ". Are you sure you want to delete everything in this selection?";

The first thing I'd suggest is: use string.Format. That allows you to do something like this:
int numOfItems = GetNumOfItems();
string msgTemplate;
msgTemplate = numOfItems == 1 ? "You selected only {0} item." : "Wow, you selected {0} items!";
string msg = string.Format(msgTemplate, numOfItems);
Further, in WPF apps, I've seen systems where a resource string would be pipe-delimited to have two messages: a singular and a plural message (or a zero/single/many message, even). A custom converter could then be used to parse this resource and use the relevant (formatted) string, so your Xaml is something like this:
<TextBlock Text="{Binding numOfItems, Converter={StaticResource c:NumericMessageFormatter}, ConverterParameter={StaticResource s:SuitableMessageTemplate}}" />

For English, plenty of answers above. For other languages it is more difficult, as plurals depend on the gender of the noun and the word ending. Some examples in French:
Regular masculine:
Vous avez choisi 1 compte. Voulez-vous vraiment le supprimer.
Vous avez choisi 2 comptes. Voulez-vous vraiment les supprimer.
Regular feminine
Vous avez choisi 1 table. Voulez-vous vraiment la supprimer.
Vous avez choisi 2 tables. Voulez-vous vraiment les supprimer.
Irregular masculine (finishes with 's')
Vous avez choisi 1 pays. Voulez-vous vraiment le supprimer.
Vous avez choisi 2 pays. Voulez-vous vraiment les supprimer?
The same problem exists in most Latin languages and gets worse in German or Russian, where there are 3 genders (maculine, feminine and neuter).
You'll need to take care if your objective is to handle more than just English.

To be able to have pluralized messages which will be possible to localize properly, my opinion is that it would be wise to first create a layer of indirection between the number and a message.
For example, use a constant of some sort to specify which message you want to display. Fetch the message using some function that will hide the implementation details.
get_message(DELETE_WARNING, quantity)
Next, create a dictionary that holds the possible messages and variations, and make variations know when they should be used.
DELETE_WARNING = {
1: 'Are you sure you want to delete %s item',
>1: 'Are you sure you want to delete %s items'
>5: 'My language has special plural above five, do you wish to delete it?'
}
Now you could simply find the key that corresponds to the quantity and interpolate the value of the quantity with that message.
This oversimplified and naive example, but I don't really see any other sane way to do this and be able to provide good support for L10N and I18N.

You'll have to translate the function below from VBA to C#, but your usage would change to:
int noofitemsselected = SomeFunction();
string message = Pluralize("You have selected # item[s]. Are you sure you want to delete [it/them]?", noofitemsselected);
I have a VBA function that I use in MS Access to do exactly what you are talking about. I know I'll get hacked to pieces for posting VBA, but here goes anyway. The algorithm should be apparent from the comments:
'---------------------------------------------------------------------------------------'
' Procedure : Pluralize'
' Purpose : Formats an English phrase to make verbs agree in number.'
' Usage : Msg = "There [is/are] # record[s]. [It/They] consist[s/] of # part[y/ies] each."'
' Pluralize(Msg, 1) --> "There is 1 record. It consists of 1 party each."'
' Pluralize(Msg, 6) --> "There are 6 records. They consist of 6 parties each."'
'---------------------------------------------------------------------------------------'
''
Function Pluralize(Text As String, Num As Variant, Optional NumToken As String = "#")
Const OpeningBracket = "\["
Const ClosingBracket = "\]"
Const DividingSlash = "/"
Const CharGroup = "([^\]]*)" 'Group of 0 or more characters not equal to closing bracket'
Dim IsPlural As Boolean, Msg As String, Pattern As String
On Error GoTo Err_Pluralize
If IsNumeric(Num) Then
IsPlural = (Num <> 1)
End If
Msg = Text
'Replace the number token with the actual number'
Msg = Replace(Msg, NumToken, Num)
'Replace [y/ies] style references'
Pattern = OpeningBracket & CharGroup & DividingSlash & CharGroup & ClosingBracket
Msg = RegExReplace(Pattern, Msg, "$" & IIf(IsPlural, 2, 1))
'Replace [s] style references'
Pattern = OpeningBracket & CharGroup & ClosingBracket
Msg = RegExReplace(Pattern, Msg, IIf(IsPlural, "$1", ""))
'Return the modified message'
Pluralize = Msg
End Function
Function RegExReplace(SearchPattern As String, _
TextToSearch As String, _
ReplacePattern As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.MultiLine = False
.Global = True
.IgnoreCase = False
.Pattern = SearchPattern
End With
RegExReplace = RE.Replace(TextToSearch, ReplacePattern)
End Function
The usage got cut off a bit in the code comments above, so I'll repeat it here:
Msg = "There [is/are] # record[s]. [It/They] consist[s/] of # part[y/ies] each."
Pluralize(Msg, 1) --> "There is 1 record. It consists of 1 party each."
Pluralize(Msg, 6) --> "There are 6 records. They consist of 6 parties each."
Yes, this solution ignores languages that are not English. Whether that matters depends on your requirements.

You could generate the plural automatically, see eg. plural generator.
For plural generating rules see wikipedia
string msg = "Do you want to delete " + numItems + GetPlural(" item", numItems) + "?";

How about a more generic way. Avoid pluralization in the second sentence:
Number of selected items to be deleted: noofitemsselected.
Are you sure?
I find out that doing it this way puts the number at the end of the line which is really easy to spot. This solution would work with the same logic in any language.

My general approach is to write a "single/plural function", like this:
public static string noun(int n, string single, string plural)
{
if (n==1)
return single;
else
return plural;
}
Then in the body of the message I call this function:
string message="Congratulations! You have won "+n+" "+noun(n, "foobar", "foobars")+"!";
This isn't a whole lot better, but at least it, (a) puts the decision in a function and so unclutters the code a little, and (b) is flexible enough to handle irregular plurals. i.e. it's easy enough to say noun(n, "child", "children") and the like.
Of course this only works for English, but the concept is readily extensible to languages with more complex endings.
It occurs to me that you could make the last parameter optional for the easy case:
public static string noun(int n, string single, string plural=null)
{
if (n==1)
return single;
else if (plural==null)
return single+"s";
else
return plural;
}

Internationalization
I assume you want internationalization support, in which case different languages have different patterns for plurals (e.g. a special plural form for 2 of something, or more complicated languages like Polish), and you can't rely on applying some simple pattern to your string to fix it.
You can use GNU Gettext's ngettext function and provide two English messages in your source code. Gettext will provide the infrastructure to choose from other (potentially more) messages when translated into other languages. See http://www.gnu.org/software/hello/manual/gettext/Plural-forms.html for a full description of GNU gettext's plural support.
GNU Gettext is under the LGPL. ngettext is named GettextResourceManager.GetPluralString in the C# port of Gettext.
(If you don't need localization support, and don't want to use Gettext right away, then write your own function that does this for English, and pass two full messages to it, that way if you need l10n later, you can add by rewriting a single function.)

How about to write function like
string GetOutputMessage(int count, string oneItemMsg, string multiItemMsg)
{
return string.Format("{0} {1}", count, count > 1 ? multiItemMsg : oneItemMsg);
}
.. and use it whenever you need?
string message = "You have selected " + GetOutputMessage(noofitemsselected,"item","items") + ". Are you sure you want to delete it/them?";

For the first problem , I mean Pluralize, you can use Inflector.
And for the second, you can use a string representation extension with a name such as ToPronounString.

I had this exact same question posed to me yesterday by a member of our team.
Since it came up again here on StackOverflow I figured the universe was telling me to have a bash at producing a decent solution.
I've quickly put something together and it's by no means perfect however it might be of use or spark some discussion/development.
This code is based on the idea that there can be 3 messages. One for zero items, one for one item and one for more than one item which follow the following structure:
singlePropertyName
singlePropertyName_Zero
singlePropertyName_Plural
I've created an internal class to test with in order to mimick the resource class. I haven't tested this using an actual resource file yet so I'm yet to see the full result.
Here's the code (currently i've included some generics where I know I could have specified the third param simply as a Type and also the second param is a string, I think there's a way to combine these two parameters into something better but I'll come back to that when I have a spare moment.
public static string GetMessage<T>(int count, string resourceSingularName, T resourceType) where T : Type
{
var resourcePluralName = resourceSingularName + "_Plural";
var resourceZeroName = resourceSingularName + "_Zero";
string resource = string.Empty;
if(count == 0)
{
resource = resourceZeroName;
}
else{
resource = (count <= 1)? resourceSingularName : resourcePluralName;
}
var x = resourceType.GetProperty(resource).GetValue(Activator.CreateInstance(resourceType),null);
return x.ToString();
}
Test resource class:
internal class TestMessenger
{
public string Tester{get{
return "Hello World of one";}}
public string Tester_Zero{get{
return "Hello no world";}}
public string Tester_Plural{get{
return "Hello Worlds";}}
}
and my quick executing method
void Main()
{
var message = GetMessage(56, "Tester",typeof(TestMessenger));
message.Dump();
}

From my point of view, your first solution is the most suited one. Why I say that is, in case you need the application to support multiple languages, the second option can be painstaking. With the fist approach it is easy to localize the text without much effort.

You could go for a more generic message like 'Are you sure you want to delete the selected item(s)'.

I depends on how nice a message you want to have. From easiest to hardest:
Re-write your error message to avoid pluralization. Not as nice for your user, but faster.
Use more general language but still include the number(s).
Use a "pluralization" and inflector system ala Rails, so you can say pluralize(5,'bunch') and get 5 bunches. Rails has a good pattern for this.
For internationalization, you need to look at what Java provides. That will support a wide variety of languages, including those that have different forms of adjectives with 2 or 3 items. The "s" solution is very English centric.
Which option you go with depends on your product goals. - ndp

Why would you want to present a message the users can actually understand? It goes against 40 years of programing history. Nooooo, we have a good thing going on, don't spoil it with understandable messages.
(j/k)

Do it like it's done in World of Warcraft:
BILLING_NAG_WARNING = "Your play time expires in %d |4minute:minutes;";

It gets a little bit shorter with
string message = "Are you sure you want to delete " + noofitemsselected + " item" + (noofitemsselected>1 ? "s" : "") + "?";

One approach I haven't seen mentioned would be the use of a substitution/select tag (e.g. something like "You are about to squash {0} [?i({0}=1):/cactus/cacti/]". (in other words, have a format-like expression specify the substitution based upon whether argument zero, taken as an integer, equals 1). I've seen such tags used in the days before .net; I'm not aware of any standard for them in .net, nor do I know the best way to format them.

I would think out of the box for a minute, all of the suggestions here are either do the pluralization (and worry about more than 1 level of pluralization, gender, etc) or not use it at all and provide a nice undo.
I would go the non lingual way and use visual queues for that. e.g. imagine an Iphone app you select items by wiping your finger. before deleting them using the master delete button, it will "shake" the selected items and show you a question mark titled box with a V (ok) or X (cancel) buttons...
Or, in the 3D world of Kinekt / Move / Wii - imagine selecting the files, moving your hand to the delete button and be told to move your hand above your head to confirm (using the same visual symbols as I mentioned before. e.g. instead of asking you delete 3 files? it will show you 3 files with a hovering half transparent red X on and tell you to do something to confirm.

working with incredibly large numbers in .NET

I'm trying to work through the problems on projecteuler.net but I keep running into a couple of problems.
The first is a question of storing large quanities of elements in a List<t>. I keep getting OutOfMemoryException's when storing large quantities in the list.
Now I admit I might not be doing these things in the best way but, is there some way of defining how much memory the app can consume?
It usually crashes when I get abour 100,000,000 elements :S
Secondly, some of the questions require the addition of massive numbers. I use ulong data type where I think the number is going to get super big, but I still manage to wrap past the largest supported int and get into negative numbers.
Do you have any tips for working with incredibly large numbers?

Consider System.Numerics.BigInteger.

You need to use a large number class that uses some basic math principals to split these operations up. This implementation of a C# BigInteger library on CodePoject seems to be the most promising. The article has some good explanations of how operations with massive numbers work, as well.
Also see:
Big integers in C#

As far as Project Euler goes, you might be barking up the wrong tree if you are hitting OutOfMemory exceptions. From their website:
Each problem has been designed according to a "one-minute rule", which means that although it may take several hours to design a successful algorithm with more difficult problems, an efficient implementation will allow a solution to be obtained on a modestly powered computer in less than one minute.

As user Jakers said, if you're using Big Numbers, probably you're doing it wrong.
Of the ProjectEuler problems I've done, none have required big-number math so far.
Its more about finding the proper algorithm to avoid big-numbers.
Want hints? Post here, and we might have an interesting Euler-thread started.

I assume this is C#? F# has built in ways of handling both these problems (BigInt type and lazy sequences).
You can use both F# techniques from C#, if you like. The BigInt type is reasonably usable from other languages if you add a reference to the core F# assembly.
Lazy sequences are basically just syntax friendly enumerators. Putting 100,000,000 elements in a list isn't a great plan, so you should rethink your solutions to get around that. If you don't need to keep information around, throw it away! If it's cheaper to recompute it than store it, throw it away!

See the answers in this thread. You probably need to use one of the third-party big integer libraries/classes available or wait for C# 4.0 which will include a native BigInteger datatype.

As far as defining how much memory an app will use, you can check the available memory before performing an operation by using the MemoryFailPoint class.
This allows you to preallocate memory before doing the operation, so you can check if an operation will fail before running it.

string Add(string s1, string s2)
{
bool carry = false;
string result = string.Empty;
if (s1.Length < s2.Length)
s1 = s1.PadLeft(s2.Length, '0');
if(s2.Length < s1.Length)
s2 = s2.PadLeft(s1.Length, '0');
for(int i = s1.Length-1; i >= 0; i--)
{
var augend = Convert.ToInt64(s1.Substring(i,1));
var addend = Convert.ToInt64(s2.Substring(i,1));
var sum = augend + addend;
sum += (carry ? 1 : 0);
carry = false;
if(sum > 9)
{
carry = true;
sum -= 10;
}
result = sum.ToString() + result;
}
if(carry)
{
result = "1" + result;
}
return result;
}

I am not sure if it is a good way of handling it, but I use the following in my project.
I have a "double theRelevantNumber" variable and an "int PowerOfTen" for each item and in my relevant class I have a "int relevantDecimals" variable.
So... when large numbers is encountered they are handled like this:
First they are changed to x,yyy form. So if the number 123456,789 was inputed and the "powerOfTen" was 10, it would start like this:
theRelevantNumber = 123456,789
PowerOfTen = 10
The number was then: 123456,789*10^10
It is then changed to:
1,23456789*10^15
It is then rounded by the number of relevant decimals (for example 5) to 1,23456 and then saved along with "PowerOfTen = 15"
When adding or subracting numbers together, any number outside the relevant decimals are ignored. Meaning if you take:
1*10^15 + 1*10^10 it will change to 1,00001 if "relevantDecimals" is 5 but will not change at all if "relevantDecimals" are 4.
This method make you able to deal with numbers up doubleLimit*10^intLimit without any problem, and at least for OOP it is not that hard to keep track of.

You don't need to use BigInteger. You can do this even with string array of numbers.
class Solution
{
static void Main(String[] args)
{
int n = 5;
string[] unsorted = new string[6] { "3141592653589793238","1", "3", "5737362592653589793238", "3", "5" };
string[] result = SortStrings(n, unsorted);
foreach (string s in result)
Console.WriteLine(s);
Console.ReadLine();
}
static string[] SortStrings(int size, string[] arr)
{
Array.Sort(arr, (left, right) =>
{
if (left.Length != right.Length)
return left.Length - right.Length;
return left.CompareTo(right);
});
return arr;
}
}

If you want to work with incredibly large numbers look here...
MIKI Calculator
I am not a professional programmer i write for myself, sometimes, so sorry for unprofessional use of c# but the program works. I will be grateful for any advice and correction.
I use this calculator to generate 32-character passwords from numbers that are around 58 digits long.
Since the program adds numbers in the string format, you can perform calculations on numbers with the maximum length of the string variable. The program uses long lists for the calculation, so it is possible to calculate on larger numbers, possibly 18x the maximum capacity of the list.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Tag Cloud in C# - c#

I am making a small C# application and would like to extract a tag cloud from a simple plain text. Is there a function that could do that for me?

I would really recommend using http://thetagcloud.codeplex.com/. It is a very clean implementation that takes care of grouping, counting and rendering of tags. It also provides filtering capabilities.

Take a look at http://sourcecodecloud.codeplex.com/

Here is an ASP.NET Cloud COntrol, that might help you at least get started, full source included.

Take a look at this. It worked for me. There is a project under Examples folder named WebExample which will help you for solving this. https://github.com/chrisdavies/Sparc.TagCloud

The Zoomable TagCloud Generator which extracts keywords from a given source (text file and other sources) and displays the TagCloud as Zooming User Interface (ZUI)

Related

Dynamic Regex generation for predictable repeating string patterns in a data feed

Max edit distance and suggestion based on word frequency

performance problem

Plurality in user messages

working with incredibly large numbers in .NET

Categories

Resources