Using Dictionary to count the number of appearances - c#

My problem is that I am trying to take a body of text from a text box for example
"Spent the day with "insert famous name" '#excited #happy #happy"
then I want to count how many times each hashtag appears in the body, which can be any length of text.
so the above would return this
excited = 1
happy = 2
I Was planning on using a dictionary but I am not sure how I would implement the search for the hashtags and add to the dictionary.
This is all I have so far
string body = txtBody.Text;
Dictionary<string, string> dic = new Dictionary<string, string>();
foreach(char c in body)
{
}
thanks for any help

This can be achieved with a couple of LINQ methods:
var text = "Spent the day with <insert famous name> #excited #happy #happy";
var hashtags = text.Split(new[] { ' ' })
.Where(word => word.StartsWith("#"))
.GroupBy(hashtag => hashtag)
.ToDictionary(group => group.Key, group => group.Count());
Console.WriteLine(string.Join("; ", hashtags.Select(kvp => kvp.Key + ": " + kvp.Value)));
This will print
#excited: 1; #happy: 2

This will find any hashtags in a string of the form a hash followed by one or more non-whitespace characters and create a dictionary of them versus their count.
You did mean Dictionary<string, int> really, didn't you?
var input = "Spent the day with \"insert famous name\" '#excited #happy #happy";
Dictionary<string, int> dic =
Regex
.Matches(input, #"(?<=\#)\S+")
.Cast<Match>()
.Select(m => m.Value)
.GroupBy(s => s)
.ToDictionary(g => g.Key, g => g.Count());

Related

LINQ using .ToDictionary without .Select

I have the following dictionary:
Dictionary<string, string> clauses = new Dictionary<string, string>();
where the clauses are like this:
"A|B" - "some text"
"A|D|E" - "some text"
"G" - "some text"
"E|A" - "some text"
...
and I want to populate the dictionary below:
Dictionary<string, int> columnsBitMap = new Dictionary<string, int>();
where the string values are the unique letters of the first dictionary strings and int values are calculated by math formula.
I have the following which is working perfectly:
columnsBitMap = String.Join("|", clauses.Select(clause => clause).Select(clause => clause.Key)).Split('|')
.Distinct().OrderBy(column => column)
-- can I remove the next Select ?
.Select((column, index) => new KeyValuePair<string, int>(column, index))
.ToDictionary(column => column.Key, column => Convert.ToInt32(Math.Pow(2, column.Value)));
but I am wondering if this could be simplified removing the .Select part?
The output should be like this:
A 1
B 2
D 4
E 8
G 16
This bit is completely superfluous:
.Select(clause => clause)
Just remove it and the rest should work fine.
I dont see much reason to get rid of the part
.Select((column, index) => new KeyValuePair<string, int>(column, index))
But if you're against using a KeyValuePair<TKey,TValue> you could just make it an anonymous object
.Select((column, index) => new{ Key = column, Value = index })
But there's not a great amount of difference.
I approached your requirement in a slightly different way:
var result = clauses.SelectMany(clause => clause.Key.Split('|'))
.Distinct().OrderBy(column => column)
.Select((column, index) => new {Key=column,Value=index})
.ToDictionary(column => column.Key, column => Convert.ToInt32(Math.Pow(2, column.Value)));
Working example with your test case: http://rextester.com/PWC41147

C# File to Dictionary, but taking pairs of words

I am thinking about making a dictionary that contains words pairs as well as single words from a file.
Standard "single word" looks like:
private Dictionary<string, int> tempDict = new Dictionary<string, int>();
private void GetWords(string[] file)
{
tempDict = file
.SelectMany(i => File.ReadLines(i)
.SelectMany(line => line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)))
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
And the string:
Adam likes coffee
will be:
Adam ; likes ; coffee
But I want to make it so it matches pairs as well (but only the neighbouring ones) so it would look like:
Adam ; Adam likes ; likes ; likes coffee ; coffee
I am not sure if it is manageable to do, and need some help with this one.
MoreLINQ has a Enumerable.Pairwise which takes the current and the predecessor value and a projections function.
Returns a sequence resulting from applying a function to each element in the source sequence and its predecessor, with the exception of the first element which is only returned as the predecessor of the second element.
Concatenating that with the original split value array would output:
var sentence = "Adam likes coffee";
var splitWords = sentence.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
var pairWise = splitWords.Pairwise((first, second) => string.Format("{0} {1}", first,
second))
.Concat(splitWords)
.GroupBy(x => x)
.ToDictionary(x => x.Key, x => x.Count())
Would result in:

Regex Split String at particular word

I would like to use the .net Regex.Split method to split this input string into an array. It must group the word.
Input: **AAA**-1111,**AAA**-666,**SMT**-QWQE,**SMT**-TTTR
Expected output:
**AAA** : 1111,666
**SMT** : QWQE,TTTR
What pattern do I need to use?
As the comment on the question notes, you cannot do this in a single step (regex or not).
So:
Split on commas.
Split on dash (but keep the pairs)
Group by the first part of each pair.
Something like:
var result = select outer in input.Split(",")
let p = outer.Split('-') // will be string[2]
select new { identifier = p[0], value = p[1] }
into pair
group by pair.identifier into g
select new {
identifier = g.Key
values = String.Join(",", g)
}
This should give you an IEnumerable with a key-string and a string listing (separated by comma) the values fore each:
var input = "AAA-1111,AAA-666,SMT-QWQE,SMT-TTTR";
var list = input.Split(',')
.Select(pair => pair.Split('-'))
.GroupBy(pair => pair.First())
.Select(grp =>
new{
key = grp.Key,
items = String.Join(",", grp.Select(x => x[1]))
});
You can then use it for example like this (if you just want to output the values):
string output = "";
foreach(var grp in list)
{
output += grp.key + ": " + grp.items + Environment.NewLine;
}
FWIW here's the same solution in fluent syntax which might be easier to understand:
string input = "AAA-1111,AAA-666,SMT-QWQE,SMT-TTTR";
Dictionary<string, string> output = input.Split(',') // first split by ','
.Select(el => el.Split('-')) // then split each inner element by '-'
.GroupBy(el => el.ElementAt(0), el => el.ElementAt(1)) // group by the part that comes before '-'
.ToDictionary(grp => grp.Key, grp => string.Join(",", grp)); // convert to a dictionary with comma separated values
-
output["AAA"] // 1111,666
output["SMT"] // QWQE,TTTR

Convert String list to a dictionary

I Have a string list like this ["saman=1", "kaman=2"]
How may I convert this to a dictionary like {Saman:1 , kaman:2}
strList.Select(k,v =>new {k,v} , k=> k.split('=')[0], val => v.split('=')[1]);
This should work:
strList.ToDictionary(x => x.Split('=')[0], x => x.Split('=')[1])
If you want Dictionary<string, int> you can parse the Value to integer:
strList.ToDictionary(x => x.Split('=')[0], x => int.Parse(x.Split('=')[1]))
You should split by ", " first, and then split each item by = to get key/value pairs.
Additional Trim call will get rid of [" at the beginning and "] at the end of your input string.
var input = #"[""saman=1"", ""kaman=2""]";
var dict = input.Trim('[', '"', ']')
.Split(new [] {#""", """}, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Split('='))
.ToDictionary(x => x[0], x => x[1]);
Very, very simply with LINQ:
IDictionary<string, string> dictionary =
list.ToDictionary(pair => pair.Key, pair => pair.Value);
Note that this will fail if there are any duplicate keys - I assume that's okay?

Counting the Frequency of Specific Words in Text File

I have a text file stored as a string variable. The text file is processed so that it only contains lowercase words and spaces. Now, say I have a static dictionary, which is just a list of specific words, and I want to count, from within the text file, the frequency of each word in the dictionary. For example:
Text file:
i love love vb development although i m a total newbie
Dictionary:
love, development, fire, stone
The output I'd like to see is something like the following, listing both the dictionary word and its count. If it makes coding simpler, it can also only list the dictionary word that appeared in the text.
===========
WORD, COUNT
love, 2
development, 1
fire, 0
stone, 0
============
Using a regex (eg "\w+") I can get all the word matches, but I have no clue how to get the counts that are also in the dictionary, so I'm stuck. Efficiency is crucial here since the dictionary is quite large (~100,000 words) and the text files are not small either (~200kb each).
I appreciate any kind help.
You can count the words in the string by grouping them and turning it into a dictionary:
Dictionary<string, int> count =
theString.Split(' ')
.GroupBy(s => s)
.ToDictionary(g => g.Key, g => g.Count());
Now you can just check if the words exist in the dictionary, and show the count if it does.
var dict = new Dictionary<string, int>();
foreach (var word in file)
if (dict.ContainsKey(word))
dict[word]++;
else
dict[word] = 1;
Using Groovy regex facilty, i would do it as below :-
def input="""
i love love vb development although i m a total newbie
"""
def dictionary=["love", "development", "fire", "stone"]
dictionary.each{
def pattern= ~/${it}/
match = input =~ pattern
println "${it}" + "-"+ match.count
}
Try this. The words variable is obviously your string of text. The keywords array is a list of keywords you want to count.
This won't return a 0 for dictionary words that aren't in the text, but you specified that this behavior is okay. This should give you relatively good performance while meeting the requirements of your application.
string words = "i love love vb development although i m a total newbie";
string[] keywords = new[] { "love", "development", "fire", "stone" };
Regex regex = new Regex("\\w+");
var frequencyList = regex.Matches(words)
.Cast<Match>()
.Select(c => c.Value.ToLowerInvariant())
.Where(c => keywords.Contains(c))
.GroupBy(c => c)
.Select(g => new { Word = g.Key, Count = g.Count() })
.OrderByDescending(g => g.Count)
.ThenBy(g => g.Word);
//Convert to a dictionary
Dictionary<string, int> dict = frequencyList.ToDictionary(d => d.Word, d => d.Count);
//Or iterate through them as is
foreach (var item in frequencyList)
Response.Write(String.Format("{0}, {1}", item.Word, item.Count));
If you want to achieve the same thing without using RegEx since you indicated you know everything is lower case and separated by spaces, you could modify the above code like so:
string words = "i love love vb development although i m a total newbie";
string[] keywords = new[] { "love", "development", "fire", "stone" };
var frequencyList = words.Split(' ')
.Select(c => c)
.Where(c => keywords.Contains(c))
.GroupBy(c => c)
.Select(g => new { Word = g.Key, Count = g.Count() })
.OrderByDescending(g => g.Count)
.ThenBy(g => g.Word);
Dictionary<string, int> dict = frequencyList.ToDictionary(d => d.Word, d => d.Count);

Categories

Resources