How to display all mistaken words - c#

I have some text in richTextBox1.
I have to sort the words by their frequency and display them in richTextBox2. It seems to work.
Have to find all mistaken words and display them in richTextBox4. I'm using Hunspell.
Apparently I'm missing something. Almost all words are displayed in richTextBox4 not only the wrong ones.
Code:
foreach (Match match in wordPattern.Matches(str))
{
if (!words.ContainsKey(match.Value))
words.Add(match.Value, 1);
else
words[match.Value]++;
}
string[] words2 = new string[words.Keys.Count];
words.Keys.CopyTo(words2, 0);
int[] freqs = new int[words.Values.Count];
words.Values.CopyTo(freqs, 0);
Array.Sort(freqs, words2);
Array.Reverse(freqs);
Array.Reverse(words2);
Dictionary<string, int> dictByFreq = new Dictionary<string, int>();
for (int i = 0; i < freqs.Length; i++)
{
dictByFreq.Add(words2[i], freqs[i]);
}
Hunspell hunspell = new Hunspell("en_US.aff", "en_US.dic");
StringBuilder resultSb = new StringBuilder(dictByFreq.Count);
foreach (KeyValuePair<string, int> entry in dictByFreq)
{
resultSb.AppendLine(string.Format("{0} [{1}]", entry.Key, entry.Value));
richTextBox2.Text = resultSb.ToString();
bool correct = hunspell.Spell(entry.Key);
if (correct == false)
{
richTextBox4.Text = resultSb.ToString();
}
}

In addition to the above answer (which should work if your Hunspell.Spell method works correctly), I have a few suggestions to shorten your code. You are adding Matches to your dictionary, and counting the number of occurrences of each match. Then you appear to be sorting them in descending value of the frequency (so the highest occurrence match will have index 0 in the result). Here are a few code snippets which should make your function a lot shorter:
IOrderedEnumerable<KeyValuePair<string, int>> dictByFreq = words.OrderBy<KeyValuePair<string, int>, int>((KeyValuePair<string, int> kvp) => -kvp.Value);
This uses the .NET framework to do all your work for you. words.OrderBy takes a Func argument which provides the value to sort on. The problem with using the default values for this function is it wants to sort on the keys and you want to sort on the values. This function call will do exactly that. It will also sort them in descending order based on the values, which is the frequency that a particular match occurred. It returns an IOrderedEnumerable object, which has to be stored. And since that is enumerable, you don't even have to put it back into a dictionary! If you really need to do other operations on it later, you can call the dictByFreq.ToList() function, which returns an object of type: List>.
So your whole function then becomes this:
foreach (Match match in wordPattern.Matches(str))
{
if (!words.ContainsKey(match.Value))
words.Add(match.Value, 1);
else
words[match.Value]++;
}
IOrderedEnumerable<KeyValuePair<string, int>> dictByFreq = words.OrderBy<KeyValuePair<string, int>, int>((KeyValuePair<string, int> kvp) => -kvp.Value);
Hunspell hunspell = new Hunspell("en_US.aff", "en_US.dic");
StringBuilder resultSb = new StringBuilder(dictByFreq.Count);
foreach (KeyValuePair<string, int> entry in dictByFreq)
{
resultSb.AppendLine(string.Format("{0} [{1}]", entry.Key, entry.Value));
richTextBox2.Text = resultSb.ToString();
bool correct = hunspell.Spell(entry.Key);
if (correct == false)
{
richTextBox4.Text = entry.Key;
}
}

Your are displaying on richtextbox4 the same as in richtextbox2 :)
I think this should work:
foreach (KeyValuePair<string, int> entry in dictByFreq)
{
resultSb.AppendLine(string.Format("{0} [{1}]", entry.Key, entry.Value));
richTextBox2.Text = resultSb.ToString();
bool correct = hunspell.Spell(entry.Key);
if (correct == false)
{
richTextBox4.Text += entry.Key;
}
}

Related

Create a 2-column List using a variable for list name

Because the original post (Create List with name from variable) was so old, I didn't want to approach this as an answer.
But, I wanted to add this use of the above solution because it was non-obvious to me. And, it may help some of my fellow noobs... Also, I ran into some issues I don't know how to address.
I needed a way to create a list using a variable name, in this case "mstrClock", for timing diagrams.
I was not able to get .NET to accept a two-column list, though, so I ended up with two dictionaries.
Is there a way to structure this so that I can use a single dictionary for both columns?
dictD.Add("mstrClock", new List<double>());
dictL.Add("mstrClock", new List<string>());
Then as I develop the timing diagram, I add to the lists as follows:
dictD["mstrClock"].Add(x); // This value will normally be the time value.
dictL["mstrClock"].Add("L"); // This value will be the "L", "F" or "H" logic level
Then to get at the data I did this:
for (int n = 0; n < dictD["mstrClock"].Count; n++)
{
listBox1.Items.Add(dictL["mstrClock"][n] + "\t" + dictD["mstrClock"][n].ToString());
}
Why not just store what you want to display, in the dictionary?
dict.Add("mstrClock", new List<string>());
dict["mstrClock"].Add($"L\t{x}");
for (int n = 0; n < dict["mstrClock"].Count; n++)
{
listBox1.Items.Add(dict["mstrClock"][n]);
}
On another point, do you even need a dictionary? What is the point of having a dictionary with one key? If you only need a List<string>, then only create that.
var items = new List<string>());
items.Add($"L\t{x}");
foreach (var item in items)
{
listBox1.Items.Add(item);
}
You can use Tuples in modern C# to create your two-column list as follows:
var list = new List<(double time, string logicLevel)>();
list.Add((1, "L"));
list.Add((2, "F"));
foreach (var element in list)
{
listBox1.Items.Add($"{element.time} \t {element.logicLevel}");
}
If using a dictionary is a must, you can change the above code to something like:
var dict = new Dictionary<string, List<(double time, string logicLevel)>>();
dict["mstrClock"] = new List<(double time, string logicLevel)>();
dict["mstrClock"].Add((1, "L"));
dict["mstrClock"].Add((2, "F"));
var list = dict["mstrClock"];
foreach (var element in list)
{
listBox1.Items.Add($"{element.time} \t {element.logicLevel}");
}
One approach to creating a 2-column list would be to create a list of key/value pairs:
var list = new List<KeyValuePair<double, string>();
list.Add(new KeyValuePair<double, string>(1, "L");
foreach (KeyValuePair<double, string> element in list)
{
listBox1.Items.Add($"{element.key} \t {element.value}");
}

Iterate a list and keep an index based on the name of the item

I have a list of items which have names and I need to iterate them, but I also need to know how many times this item with the same name it is. So this is an example:
-----
|1|A|
|2|B|
|3|C|
|4|C|
|5|C|
|6|A|
|7|B|
|8|C|
|9|C|
-----
So, when I'm iterating and I'm on row 1, I want to know it is the first time it is an A, when I'm on row 6, I want to know it is the second time, when I'm on row 9, I want to know it is the 5th C, etc. How can I achieve this? Is there some index I can keep track of? I was also thinking of filling a hash while iterating, but perhaps thats too much.
You can use Dictionary<char, int> for keeping count of each character in your list
here your key will be character and value will contain number of occurrences of that character in list
Dictionary<char, int> occurances = new Dictionary<char, int>();
List<char> elements = new List<char>{'A', 'B','C','C','C','A','B', 'C', 'C'};
int result = 0;
foreach(char element in elements)
{
if(occurances.TryGetValue(element, out result))
occurances[element] = result + 1;
else
occurances.Add(element, 1);
}
foreach(KeyValuePair<char, int> kv in occurances)
Console.WriteLine("Key: "+ kv.Key + " Value: "+kv.Value);
Output:
Key: A Value: 2
Key: B Value: 2
Key: C Value: 5
POC: dotNetFiddler
Use dictionary to keep track of counter.
List<string> input = new List<string> { "A", "B", "C", "C", "C", "A", "B", "C", "C" };
Dictionary<string, int> output = new Dictionary<string, int>();
foreach(var item in input)
{
if (output.ContainsKey(item))
{
output[item] = output[item] + 1;
}
else
{
output.Add(item, 1);
}
}
I think you'll need a reversed index instead of row store index.
Row store index just like your table described, and reversed index store terms to search indexes.
Probably like this:
A 1,6
B 2,7
C 3,4,5,8,9
The search engine such like 'Elastic search/Solr' will store terms like this.
If you are in C#, Dictionary<string, List<int>> is pretty much good for you. There you can keep your data that is reverse indexed.
The clean way is to implement your own list; the item is your own object. By this method, you implement your own Iterator pattern with an additional property in your object and your own Add() method. The new Iterator should inherit List and should override the Add() method of List.
I implement this for my own. you can use it. keep in mind, this solution is one of some solutions that exist. However, I think this is one the best solutions with respect to SOLID and OO principals.
public class CounterIterator : List<Item>
{
public new void Add(Item item)
{
base.Add(item);
foreach (var listItem in this)
{
if (listItem.Equals(item))
{
item.CountRepeat++;
}
}
}
}
public class Item
{
public Item(string value)
{
Value = value;
}
public string Value { get; private set; }
public int CountRepeat { get; set; }
public override bool Equals(object obj)
{
var item = obj as Item;
return item != null && Value.Equals(item.Value);
}
}
I tested the code above. It is an extension of List which has an added behavior. If anyone thinks it is not a correct answer, please mention me in comments. I will try to clarify the issue.

Linq query for building a dictionary from a reg file

I'm building a simple dictionary from a reg file (export from Windows Regedit). The .reg file contains a key in square brackets, followed by zero or more lines of text, followed by a blank line. This code will create the dictionary that I need:
var a = File.ReadLines("test.reg");
var dict = new Dictionary<String, List<String>>();
foreach (var key in a) {
if (key.StartsWith("[HKEY")) {
var iter = a.GetEnumerator();
var value = new List<String>();
do {
iter.MoveNext();
value.Add(iter.Current);
} while (String.IsNullOrWhiteSpace(iter.Current) == false);
dict.Add(key, value);
}
}
I feel like there is a cleaner (prettier?) way to do this in a single Linq statement (using a group by), but it's unclear to me how to implement the iteration of the value items into a list. I suspect I could do the same GetEnumerator in a let statement but it seems like there should be a way to implement this without resorting to an explicit iterator.
Sample data:
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.msu]
#="Microsoft.System.Update.1"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS]
#="WMP11.AssocFile.M2TS"
"Content Type"="video/vnd.dlna.mpeg-tts"
"PerceivedType"="video"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS\OpenWithProgIds]
"WMP11.AssocFile.M2TS"=hex(0):
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS\ShellEx]
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS\ShellEx\{BB2E617C-0920-11D1-9A0B-00C04FC2D6C1}]
#="{9DBD2C50-62AD-11D0-B806-00C04FD706EC}"
Update
I'm sorry I need to be more specific. The files am looking at around ~300MB so I took the approach I did to keep the memory footprint down. I'd prefer an approach that doesn't require pulling the entire file into memory.
You can always use Regex:
var dict = new Dictionary<String, List<String>>();
var a = File.ReadAllText(#"test.reg");
var results = Regex.Matches(a, "(\\[[^\\]]+\\])([^\\[]+)\r\n\r\n", RegexOptions.Singleline);
foreach (Match item in results)
{
dict.Add(
item.Groups[1].Value,
item.Groups[2].Value.Split(new[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries).ToList()
);
}
I whipped this out real quick. You might be able to improve the regex pattern.
Instead of using GetEnumerator you can take advantage of TakeWhile and Split methods to break your list into smaller list (each sublist represents one key and its values)
var registryLines = File.ReadLines("test.reg");
Dictionary<string, List<string>> resultKeys = new Dictionary<string, List<string>>();
while (registryLines.Count() > 0)
{
// Take the key and values into a single list
var keyValues = registryLines.TakeWhile(x => !String.IsNullOrWhiteSpace(x)).ToList();
// Adds a new entry to the dictionary using the first value as key and the rest of the list as value
if (keyValues != null && keyValues.Count > 0)
resultKeys.Add(keyValues[0], keyValues.Skip(1).ToList());
// Jumps to the next registry (+1 to skip the blank line)
registryLines = registryLines.Skip(keyValues.Count + 1);
}
EDIT based on your update
Update I'm sorry I need to be more specific. The files am looking at
around ~300MB so I took the approach I did to keep the memory
footprint down. I'd prefer an approach that doesn't require pulling
the entire file into memory.
Well, if you can't read the whole file into memory, it makes no sense to me asking for a LINQ solution. Here is a sample of how you can do it reading line by line (still no need for GetEnumerator)
Dictionary<string, List<string>> resultKeys = new Dictionary<string, List<string>>();
using (StreamReader reader = File.OpenText("test.reg"))
{
List<string> keyAndValues = new List<string>();
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
// Adds key and values to a list until it finds a blank line
if (!string.IsNullOrWhiteSpace(line))
keyAndValues.Add(line);
else
{
// Adds a new entry to the dictionary using the first value as key and the rest of the list as value
if (keyAndValues != null && keyAndValues.Count > 0)
resultKeys.Add(keyAndValues[0], keyAndValues.Skip(1).ToList());
// Starts a new Key collection
keyAndValues = new List<string>();
}
}
}
I think you can use a code like this - if you can use memory -:
var lines = File.ReadAllText(fileName);
var result =
Regex.Matches(lines, #"\[(?<key>HKEY[^]]+)\]\s+(?<value>[^[]+)")
.OfType<Match>()
.ToDictionary(k => k.Groups["key"], v => v.Groups["value"].ToString().Trim('\n', '\r', ' '));
C# Demo
This will take 24.173 seconds for a file with more than 4 million lines - Size:~550MB - by using 1.2 GB memory.
Edit :
The best way is using File.ReadAllLines as it is lazy:
var lines = File.ReadAllLines(fileName);
var keyRegex = new Regex(#"\[(?<key>HKEY[^]]+)\]");
var currentKey = string.Empty;
var currentValue = string.Empty;
var result = new Dictionary<string, string>();
foreach (var line in lines)
{
var match = keyRegex.Match(line);
if (match.Length > 0)
{
if (!string.IsNullOrEmpty(currentKey))
{
result.Add(currentKey, currentValue);
currentValue = string.Empty;
}
currentKey = match.Groups["key"].ToString();
}
else
{
currentValue += line;
}
}
This will take 17093 milliseconds for a file with 795180 lines.

Searching for dictionary keys contained in a string array

I have a List of strings where each item is a free text describing a skill, so looks kinda like this:
List<string> list = new List<string> {"very good right now", "pretty good",
"convinced me that is good", "pretty medium", "just medium" .....}
And I want to keep a user score for these free texts. So for now, I use conditions:
foreach (var item in list)
{
if (item.Contains("good"))
{
score += 2.5;
Console.WriteLine("good skill, score+= 2.5, is now {0}", score);
}
else if (item.Contains(low"))
{
score += 1.0;
Console.WriteLine("low skill, score+= 1.0, is now {0}", score);
}
}
Suppose In the furure I want to use a dictionary for the score mapping, such as:
Dictionary<string, double> dic = new Dictionary<string, double>
{ { "good", 2.5 }, { "low", 1.0 }};
What would be a good way to cross between the dictionary values and the string list? The way I see it now is do a nested loop:
foreach (var item in list)
{
foreach (var key in dic.Keys)
if (item.Contains(key))
score += dic[key];
}
But I'm sure there are better ways. Better being faster, or more pleasant to the eye (LINQ) at the very least.
Thanks.
var scores = from item in list
from word in item.Split()
join kvp in dic on word equals kvp.Key
select kvp.Value;
var totalScore = scores.Sum();
Note: your current solution checks whether the item in the list contains key in the dictionary. But it will return true even if key in dictionary is a part of some word in the item. E.g. "follow the rabbit" contains "low". Splitting item into words solves this issue.
Also LINQ join uses hash set internally to search first sequence items in second sequence. That gives you O(1) lookup speed instead of O(N) when you enumerating all entries of dictionary.
If your code finds N skill strings containing the word "good" then it appends score 2.5 N times.
So you can just count skill strings containing dictionary work and multiply the value on corresponding score.
var scores = from pair in dic
let word = pair.Key
let score = pair.Value
let count = list.Count(x => x.Contains(word))
select score * count;
var totalScore = scores.Sum();
its not faster really, but you can use LINQ:
score = list.Select(s => dic.Where(d => s.Contains(d.Key))
.Sum(d => d.Value))
.Sum();
note that your example loop will hit 2 different keys if he string matches both, I kept that in my solution.
Well, you aren't really using the Dictionary as a dictionary, so we can simplify this a bit with a new class:
class TermValue
{
public string Term { get; set; }
public double Value { get; set; }
public TermValue(string t, double v)
{
Term = t;
Value = v;
}
}
With that, we can be a bit more direct:
void Main()
{
var dic = new TermValue[] { new TermValue("good", 2.5), new TermValue("low", 1.0)};
List<string> list = new List<string> {"very good right now", "pretty good",
"convinced me that is good", "pretty medium", "just medium" };
double score = 0.0;
foreach (var item in list)
{
var entry = dic.FirstOrDefault(d =>item.Contains(d.Term));
if (entry != null)
score += entry.Value;
}
}
From here, we can just play a bit (the compiled code for this will probably be the same as above)
double score = 0.0;
foreach (var item in list)
{
score += dic.FirstOrDefault(d =>item.Contains(d.Term))?.Value ?? 0.0;
}
then, (in the word of the Purple One), we can go crazy:
double score = list.Aggregate(0.0,
(scre, item) =>scre + (dic.FirstOrDefault(d => item.Contains(d.Term))?.Value ?? 0.0));

How to join values from dictionary?

I have dictionary code as follows:
int entry=0;
string[] numbers ={"123","123","123","456","123"};
Dictionary<string, List<string>> dictionary = new Dictionary<string, List<string>>();
foreach (string number in numbers)
{
if (dictionary.ContainsKey("ABC"))
{
}
else if (!dictionary.ContainsKey("ABC") && entry==0)
{
dictionary.Add("ABC", new List<string>());
dictionary["ABC"].Add(number);
entry = 1;
}
else if (!dictionary.ContainsKey("ABC") && entry == 1)
{
dictionary["ABC"].Add(number);
}
}
foreach(KeyValuePair<string,string> kvp in dictionary)
{
Console.WriteLine("Key={0},Value = {1}", kvp.Key,kvp.Value);
}
Console.ReadKey();
I want output like as follows Key="ABC",Value="123,456" i.e. I need to print all the dictionary values only once without repeat. In above string array 123 came 4 times.But I want to print that only one time and need 456 also and also joint that values with comma(",").So I need output like Key="ABC",Value="123,456". Please share your ideas. Thanks in advance.
I need to print all the dictionary values only once without repeat.
Use Distinct method.
joint that values with comma(",")
Use String.Join method.
foreach(var kvp in dictionary)
{
Console.WriteLine("Key={0},Value = {1}",
kvp.Key,
String.Join(", " kvp.Value.Distinct())
);
}
You can try like this:
foreach(var value in dictionary.Values.Distinct())
{
names = String.Join(", ", value);
}
The following for loop variable is incorrect, I think:
foreach(KeyValuePair<string,string> kvp in dictionary)
{
Console.WriteLine("Key={0},Value = {1}", kvp.Key,kvp.Value);
}
It should read: then note the difference in the writeline
foreach(KeyValuePair<string,List<string>> kvp in dictionary)
{
Console.WriteLine("Key={0},Value = {1}", kvp.Key,string.Join(",", kvp.Value.ToArray()));
}
You can use this simple linq to flatten and join all your dictionary contents:
var result = string.Join(" - ", dic.Select(kvp => string.Format("Key={0}, Values={1}", kvp.Key, string.Join(", ",kvp.Value.Distinct()))));
Instead of using this array :
string[] numbers = {"123","123","123","456","123"};
Add another array as :
string[] uniqueNumbers = numbers.Distinct().ToArray();
and now use this array with unique values to add to the dictionary.

Categories

Resources