Storing Word Line and Frequency based on Word

Storing Word Line and Frequency based on Word - c#

I am working on a problem, in which I have to be able to read a text file, and count the frequency and line number of a specific word.
So for example, a txt file that reads
"Hi my name is
Bob. This is
Cool"
Should return:
1 Hi 1
1 my 1
1 name 1
2 is 1 2
1 bob 2
1 this 2
1 cool 3
I am having trouble deciding how to store the line number, as well as the word frequency. I have tried a few different things, and so far this is where I am at.
Any help?
Dictionary<string, int> countDictionary = new Dictionary<string,int>();
Dictionary<string, List<int>> lineDictionary = new Dictionary<string, List<int>>();
List<string> lines = new List<string>();
System.IO.StreamReader file =
new System.IO.StreamReader("Sample.txt");
//Creates a List of lines
string x;
while ((x = file.ReadLine()) != null)
{
lines.Add(x);
}
foreach(var y in Enumerable.Range(0,lines.Count()))
{
foreach(var word in lines[y].Split())
{
if(!countDictionary.Keys.Contains(word.ToLower()) && !lineDictionary.Keys.Contains(word.ToLower()))
{
countDictionary.Add(word.ToLower(), 1);
//lineDictionary.Add(word.ToLower(), /*what to put here*/);
}
else
{
countDictionary[word] += 1;
//ADD line to dictionary???
}
}
}
foreach (var pair in countDictionary)//WHAT TO PUT HERE to print both
{
Console.WriteLine("{0} {1}", pair.Value, pair.Key);
}
file.Close();
System.Console.ReadLine();

You can pretty much do this with one line of linq
var processed =
//get the lines of text as IEnumerable<string>
File.ReadLines(#"myFilePath.txt")
//get a word and a line number for every word
//so you'll have a sequence of objects with 2 properties
//word and lineNumber
.SelectMany((line, lineNumber) => line.Split().Select(word => new{word, lineNumber}))
//group these objects by their "word" property
.GroupBy(x => x.word)
//select what you need
.Select(g => new{
//number of objects in the group
//i.e. the frequency of the word
Count = g.Count(),
//the actual word
Word = g.Key,
//a sequence of line numbers of each instance of the
//word in the group
Positions = g.Select(x => x.lineNumber)});
foreach(var entry in processed)
{
Console.WriteLine("{0} {1} {2}",
entry.Count,
entry.Word,
string.Join(" ",entry.Positions));
}
I like 0 based counting, so you may want to add 1 in the appropriate place.

You are tracking two different properties of the entity "word" in two separate data structures. I would suggest creating a class to represent that entity, something like
public class WordStats
{
public string Word { get; set; }
public int Count { get; set; }
public List<int> AppearsInLines { get; set; }
public Word()
{
AppearsInLines = new List<int>();
}
}
Then track things in a
Dictionary<string, WordStats> wordStats = new Dictionary<string, WordStats>();
Use the word itself as the key. When you encounter a new word, check whether there is already an instance of Word with that specific key. If so, get it and update the Count and AppearsInLines property; if not create a new instance and add it to the dictionary.
foreach(var y in Enumerable.Range(0,lines.Count()))
{
foreach(var word in lines[y].Split())
{
WordStats wordStat;
bool alreadyHave = words.TryGetValue(word, out wordStat);
if (alreadyHave)
{
wordStat.Count++;
wordStat.AppearsInLines.Add(y);
}
else
{
wordStat = new WordStats();
wordStat.Count = 1;
wordStat.AppearsInLines.Add(y);
wordStats.Add(word, wordStat);
}

Related

Counting and accessing items in a list of lists ie: invoice with line items

I am trying to wrap my head around C# Lists, coming from a strong PHP background and thinking of things in PHP Array terms, but I have a class that includes a list and I am trying to count distint items within it. Is there a simple linq way to do this or would I use some sort of nested foreach?
Thank you in advance
public void main() {
List<invoice> inv = new List<invoice>();
// I do something that populates inv with, say 100 invoices
// Count distinct inv.lines.rowtype ?? to get:
Type A 34
Type B 3
Type X 21 ...etc
}
class invoice {
int invoicenumber;
int customernumber;
List<lineitem> lines;
struct lineitem {
string rowtype;
string somethingelse;
int whatever;
}
public invoice {
lines = new List<lineitem>;
}
}

Something like this?
inv.SelectMany(i => i.lines).GroupBy(l => l.rowtype).ToDictionary(g => g.Key, g => g.Count())

You could probably use some LINQ for this, however for the sake of simplicity and readability, I would recommend using for loops
// Keep a dictionary for count
var lineItemDict = new Dictionary<string, int>();
foreach (var inv in invoices)
{
foreach (var line in inv.lines)
{
// If the rowtype already exists, increment the count
if (lineItemDict.ContainsKey(line.rowtype))
{
lineItemDict.TryGetValue(line.rowtype, out count);
lineItemDict[line.rowtype] = count + 1;
}
else
{
// Else add a new entry
lineItemDict.Add(line.rowtype, 1);
}
}
}
With LINQ:
// Keep a dictionary for count
var lineItemDict = new Dictionary<string, int>();
invoices.ForEach(inv => {
inv.lines.ForEach(line => {
// If the rowtype already exists, increment the count
if (lineItemDict.ContainsKey(line.rowtype))
{
lineItemDict.TryGetValue(line.rowtype, out count);
lineItemDict[line.rowtype] = count + 1;
}
else
{
// Else add a new entry
lineItemDict.Add(line.rowtype, 1);
}
});
});
Both of these will leave you with a dictionary (lineItemDict) that looks like this:
<rowtype> : <count>
For example,
'A' : 34
'B' : 3
'X' : 21

C#: How to split a string with a changing prefix

Hello I looked at several post about this topics but no answer could help me.
I extract data about various machines which look like this:
"time, M1.A, M1.B, M1.C, M2.A, M2.B, M2.C, M3.A, M3.B, M3.C"
M1 is the prefix which specifies which machine. A,B,C are attributes of this machine like temperature, pressure, etc.
The output should then look like this:
{{"time", "M1.A", "M1.B", "M1.C"}, {"time", "M2.A",....}}
I know that I could possibly split at "," and then create the list but I was wondering if there is another way to detect if the prefix changed.

Regex.Matches(myList, #"M(?<digit>\d+)\..") //find all M1.A etc
.Cast<Match>() //convert the resulting list to an enumerable of Match
.GroupBy(m => m.Groups["digit"].Value) //find the groups with the same digits
.Select(g => new[] { "time" }.Union(g.Select(m => m.Value)).ToArray());
//combine the groups into arrays beginning with "time"

You mention "the output should then look like this...", but then you mention a list, so I'm going to assume that you mean to make the original string into a list of lists of strings.
List<string> split = new List<string>(s.Split(','));
string first = split[0];
split.RemoveAt(0);
List<List<string>> result = new List<List<string>>();
foreach (var dist in split.Select(o => o.Split('.')[0]).Distinct())
{
List<string> temp = new List<string> {first};
temp.AddRange(split.Where(o => o.StartsWith(dist)));
result.Add(temp);
}
This does the original split, removes the first value (you didn't really specify that, I assumed), then loops around each machine. The machines are created by splitting each value further by '.' and making a distinct list. It then selects all values in the list that start with the machine and adds them with the first value to the resulting list.

Using Regex I created a dictionary :
string input = "time, M1.A, M1.B, M1.C, M2.A, M2.B, M2.C, M3.A, M3.B, M3.C";
string pattern1 = #"^(?'name'[^,]*),(?'machines'.*)";
Match match1 = Regex.Match(input, pattern1);
string name = match1.Groups["name"].Value;
string machines = match1.Groups["machines"].Value.Trim();
string pattern2 = #"\s*(?'machine'[^.]*).(?'attribute'\w+)(,|$)";
MatchCollection matches = Regex.Matches(machines, pattern2);
Dictionary<string, List<string>> dict = matches.Cast<Match>()
.GroupBy(x => x.Groups["machine"].Value, y => y.Groups["attribute"].Value)
.ToDictionary(x => x.Key, y => y.ToList());

Some quick example for you. I think is better to parse it by you own way and have string structure of your Machine-Attribute pair.
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp4 {
class Program {
static void Main(string[] args) {
string inputString = "time, M1.A, M1.B, M1.C, M2.A, M2.B, M2.C, M3.A, M3.B, M3.C";
string[] attrList = inputString.Split(',');
// 1. Get all machines with attributes
List<MachineAttribute> MachineAttributeList = new List<MachineAttribute>();
for (int i = 1; i < attrList.Length; i++) {
MachineAttributeList.Add(new MachineAttribute(attrList[i]));
}
// 2. For each machine create
foreach (var machine in MachineAttributeList.Select(x=>x.Machine).Distinct()) {
Console.Write(attrList[0]);
foreach (var attribute in MachineAttributeList.Where(x=>x.Machine == machine)) {
Console.Write(attribute + ",");
}
Console.WriteLine();
}
Console.ReadLine();
}
}
public class MachineAttribute {
public string Machine { get; }
public string Attribute { get; }
public MachineAttribute(string inputData) {
var array = inputData.Split('.');
if (array.Length > 0) Machine = array[0];
if (array.Length > 1) Attribute = array[1];
}
public override string ToString() {
return Machine + "." + Attribute;
}
}
}

How to parse a text file with alternating lines of names and lists of integers?

I need to read a file and put that data inside to different arrays.
My .txt file looks like:
w1;
1 2 3
w2;
3 4 5
w3;
4 5 6
I tried something like the following:
int[] w1 = new int [3];
int[] w2 = new int [3];
int[] w3 = new int [3];
string v = "w1:|w2:|w3:";
foreach (string line in File.ReadAllLines(#"D:\\Data.txt"))
{
string[] parts = Regex.Split(line, v);
I got that string but I have no idea how to cut every element of it to arrays showed above.

Rather than parsing the file and putting the arrays into three hardcoded variables corresponding to hardcoded names w1, w2 and w3, I would remove the hardcoding and parse the file into a Dictionary<string, int[]> like so:
public static class DataFileExtensions
{
public static Dictionary<string, int[]> ParseDataFile(string fileName)
{
var separators = new [] { ' ' };
var query = from pair in File.ReadLines(fileName).Chunk(2)
let key = pair[0].TrimEnd(';')
let value = (pair.Count < 2 ? "" : pair[1]).Split(separators, StringSplitOptions.RemoveEmptyEntries).Select(s => int.Parse(s, NumberFormatInfo.InvariantInfo)).ToArray()
select new { key, value };
return query.ToDictionary(p => p.key, p => p.value);
}
}
public static class EnumerableExtensions
{
// Adapted from the answer to "Split List into Sublists with LINQ" by casperOne
// https://stackoverflow.com/questions/419019/split-list-into-sublists-with-linq/
// https://stackoverflow.com/a/419058
// https://stackoverflow.com/users/50776/casperone
public static IEnumerable<List<T>> Chunk<T>(this IEnumerable<T> enumerable, int groupSize)
{
// The list to return.
List<T> list = new List<T>(groupSize);
// Cycle through all of the items.
foreach (T item in enumerable)
{
// Add the item.
list.Add(item);
// If the list has the number of elements, return that.
if (list.Count == groupSize)
{
// Return the list.
yield return list;
// Set the list to a new list.
list = new List<T>(groupSize);
}
}
// Return the remainder if there is any,
if (list.Count != 0)
{
// Return the list.
yield return list;
}
}
}
And you would use it as follows:
var dictionary = DataFileExtensions.ParseDataFile(fileName);
Console.WriteLine("Result of parsing {0}, encountered {1} data arrays:", fileName, dictionary.Count);
foreach (var pair in dictionary)
{
var name = pair.Key;
var data = pair.Value;
Console.WriteLine(" Data row name = {0}, values = [{1}]", name, string.Join(",", data));
}
Which outputs:
Result of parsing Question49341548.txt, encountered 3 data arrays:
Data row name = w1, values = [1,2,3]
Data row name = w2, values = [3,4,5]
Data row name = w3, values = [4,5,6]
Notes:
I parse the integer values using NumberFormatInfo.InvariantInfo to ensure consistency of parsing in all locales.
I break the lines of the file into chunks of two by using a lightly modified version of the method from this answer to Split List into Sublists with LINQ by casperOne.
After breaking the file into chunks of pairs of lines, I trim the ; from the first line in each pair and use that as the dictionary key. The second line in each pair gets parsed into an array of integer values.
If the names w1, w2 and so on are not unique, you could deserialize instead into a Lookup<string, int []> by replacing ToDictionary() with ToLookup().
Rather than loading the entire file into memory upfront using File.ReadAllLines(), I enumerate though it sequentially using File.ReadLines(). This should reduce memory usage without any additional complexity.
Sample working .Net fiddle.

Your RegEx doesn't actually do anything, you already have an array with each line separated. What you want to do is just ignore the lines that aren't data:
var lines = File.ReadAllLines(#"D:\\Data.txt");
for (int i = 1; i < lines.Length; i += 2) // i.e indexes 1, 3 and 5
{
string[] numbers = lines[i].Split(' ');
}
Or, you could just assign given that you know the order:
w1 = lines[1].Split(' ');
w2 = lines[3].Split(' ');
w3 = lines[5].Split(' ');

Group pairs of connected values into Lists

So I am working on a problem, and coming up against a wall that I can't seem to find a way around. I get so much information from OS, that I thought I would ask on here, and see if there is a way to do this better than what I'm finding.
Basically, I have a class that has a bunch of values in it, but for our purposes only one matters.
public class GroupPair
{
public string object1 { get; set; }
public string object2 { get; set; }
public List<string> BothObjects
{
get
{
List<string> s= new List<string>();
s.Add(object1);
s.Add(object2);
return s;
}
}
I have a List, and I need to be able to sort them into groups. Where it becomes tricky is that both values are not unique, and the group size and number of groups is variable. I basically need a way to say, "give me every group that can be made from this list, where each group contains all pairs that include any individual member of the group." Let me give and example... here are some pairs:
a d
f h
d t
n w
h a
n o
q d
w f
o y
After the grouping, this is what I want:
Group 1
a d
h a
q d
f h
w f
d t
Group 2
n x
n o
o y
Melt your brain yet?
Any ideas on how this could be done, or even if there is a name for this kind of concept that I can research myself?

Here's my quick-and-dirty approach.
Short explanation:
The idea is to start with one pair (which can be thought of as a node in a graph). From that node, you add any adjacent nodes (pairs which have a shared member). Then you search the nodes adjacent to those nodes that you just added. All along you keep track of which nodes have been visited so you don't loop endlessly.
public static List<HashSet<GroupPair>> GetGroups(IEnumerable<GroupPair> pairs)
{
var groups = new List<HashSet<GroupPair>();
var unassignedPairs = new HashSet<GroupPair>(pairs);
while (unassignedPairs.Count != 0)
{
var group = new HashSet<GroupPair>();
var rootPair = unassignedPairs.First();
group.Add(rootPair);
unassignedPairs.Remove(rootPair);
var membersToVisit = new Queue<string>(rootPair.BothObjects);
var visited = new HashSet<string>();
while (members.Count != 0)
{
string member = membersToVisit.Dequeue();
visited.Add(member);
foreach (var newPair in unassignedPairs
.Where(p => p.BothObjects.Contains(member)).ToList())
{
group.Add(newPair);
unAssignedPairs.Remove(newPair);
foreach (var newMember in newPair.BothObjects.Except(visited))
{
membersToVisit.Enqueue(newMember)
}
}
}
groups.Add(group);
}
return groups;
}

This is just an idea for a solution.
You'll need to know how many unique 'individuals' you have. For your example, it's 26.
First, you create a dictionary of 26 pairs, where key is an individual, in our case a letter, and a value is a group number where it will be in the end. For each pair, initial value should be zero.
Second, you keep a 'groupNumber' integer variable that will store the next group number. You initialise it with 1.
Then, you iterate over the list of GroupPairs. You take the first GroupPair, which contains 'a' and 'd' and set the respective values in the dictionary to '1'.
For each following GroupPair you take its individuals and look up the respective values in the dictionary.
If one of the values is non-zero, i.e. one of the individuals already belongs to a group, you set the other value to the same number, thus putting it in the same group.
If both values are zeros you set them to 'groupNumber' and increment 'groupNumber'.
If both values are non-zero, this is where it gets a bit tricky. You find all pairs in the group dictionary where value equals the second value from that pair, and set their value to the first value from that pair.
After that is done, you iterate over the list of GroupPairs once again. For each pair you look up the first individual in the group dictionary and thus find out which group the pair belongs to.
Hope that makes sense...

This code matches the sample input and produces the required output. Bascially I keep a HashSet of items per group and have list of remaing items to process.
private static void GroupPairs(List<Tuple<string, string>> pairs)
{
int groupCounter = 0;
while (pairs.Count > 0)
{
var onegroup = new HashSet<string>();
Console.WriteLine("Group {0}", ++groupCounter);
int initialGroupCount;
do
{
var remainder = new List<Tuple<string, string>>();
initialGroupCount = onegroup.Count;
foreach (var curr in pairs)
{
if (onegroup.Contains(curr.Item1) ||
onegroup.Contains((curr.Item2)) ||
onegroup.Count == 0)
{
Console.WriteLine("{0} {1}", curr.Item1, curr.Item2);
onegroup.Add(curr.Item1);
onegroup.Add(curr.Item2);
}
else
{
remainder.Add(curr);
}
}
pairs = remainder;
} while (initialGroupCount < onegroup.Count);
}
}

For the sake of completeness I also have a recursive solution.
Near the end is the GroupPair class that acts as datacontainer with two helper methods: Add and Merge.
You invoke it like so:
var gp = GroupByPairs(
new List<Tuple<string, string>>
{
new Tuple<string, string>("a", "d"),
new Tuple<string, string>("f", "h"),
/* you get the idea */
}.GetEnumerator());
foreach (var groupData in gp)
{
Console.WriteLine(groupData.ToString());
}
//recursive take on the problem
private static IEnumerable<GroupPair> GroupByPairs(
IEnumerator<Tuple<string, string>> pairs)
{
// result Groups
var listGroup = new List<GroupPair>();
if (pairs.MoveNext())
{
var item = pairs.Current;
var current = new GroupPair(item);
var subgroup = GroupByPairs(pairs); // recurse
// loop over the groups
GroupPair target = null;
foreach (var groupData in subgroup)
{
// find the group the current item matches
if (groupData.Keys.Contains(item.Item1) ||
groupData.Keys.Contains(item.Item2))
{
// determine if we already have a target
if (target == null)
{
// add item and keep groupData
target = groupData;
groupData.Add(item);
listGroup.Add(groupData);
}
else
{
// merge this with target
// do not keep groupData
target.Merge(groupData);
}
}
else
{
// keep groupData
listGroup.Add(groupData);
}
}
// current item not added
// store its group in the listGroup
if (target == null)
{
listGroup.Add(current);
}
}
return listGroup;
}
public class GroupPair
{
private static int _groupsCount = 0;
private int id;
public GroupPair(Tuple<string, string> item)
{
id = Interlocked.Increment(ref _groupsCount);
Keys = new HashSet<string>();
Items = new List<Tuple<string, string>>();
Add(item);
}
// add the pair and update the Keys
public void Add(Tuple<string, string> item)
{
Keys.Add(item.Item1);
Keys.Add(item.Item2);
Items.Add(item);
}
// Add all items from another GroupPair
public void Merge(GroupPair groupPair)
{
foreach (var item in groupPair.Items)
{
Add(item);
}
}
public HashSet<string> Keys { get; private set; }
public List<Tuple<string, string>> Items { get; private set; }
public override string ToString()
{
var build = new StringBuilder();
build.AppendFormat("Group {0}", id);
build.AppendLine();
foreach (var pair in Items)
{
build.AppendFormat("{0} {1}", pair.Item1, pair.Item2);
build.AppendLine();
}
return build.ToString();
}
}

Adding comma separated strings to an ArrayList c#

How to add a comma separated string to an ArrayList? My string could hold 1 or many items which I'd like to add to ArrayList, each item combine with it's own id value separated by underscore (_) so it must be separated arraylist items..
e.g :
string supplierIdWithProducts = "1_1001,1_1002,20_1003,100_1005,100_1006";
ArrayList myArrayList= new ArrayList();
myArrayList.Add("1001,1002"); // 1
myArrayList.Add("1003"); // 20
myArrayList.Add("1005,1006"); // 100
After the ArrayList has been populated, I'd like pass it to a web service
that part is ok for me
foreach (string item in myArrayList){}
How could i do this...
Thanks..

string supplierIdWithProducts = "1_1001,1_1002,20_1003,100_1005,100_1006";
var lookup =
supplierIdWithProducts.Split(',')
.ToLookup(id => id.Split('_')[0],
id => id.Split('_')[1]);
foreach (var grp in lookup)
{
Console.WriteLine("{0} - {1}", grp.Key, string.Join(", ", grp));
}
will print:
1 - 1001, 1002
20 - 1003
100 - 1005, 1006

Firstly, I suggest you try to use a Dictionary or any other generic collection instead of an ArrayList to make it type-safe. Then use a string.Split(char c) and start the processing from there.
Here's an idea on how you can do it. It might get shorter with Extension methods of course. But here's just a thought-process on how you can do it.
static void ParseSupplierIdWithProducts()
{
string supplierIdWithProducts = "1_1001,1_1002,20_1003,100_1005,100_1006";
//eg. [0] = "1_1001", [1] = "1_1002", etc
List<string> supplierIdAndProductsListSeparatedByUnderscore = supplierIdWithProducts.Split(',').ToList();
//this will be the placeholder for each product ID with multiple products in them
//eg. [0] = key:"1", value(s):["1001", "1002"]
// [1] = key:"20", value(s):["1003"]
Dictionary<string, List<string>> supplierIdWithProductsDict = new Dictionary<string, List<string>>();
foreach (string s in supplierIdAndProductsListSeparatedByUnderscore)
{
string key = s.Split('_')[0];
string value = s.Split('_')[1];
List<string> val = null;
//look if the supplier ID is present
if (supplierIdWithProductsDict.TryGetValue(key, out val))
{
if (val == null)
{
//the supplier ID is present but the values are null
supplierIdWithProductsDict[key] = new List<string> { value };
}
else
{
supplierIdWithProductsDict[key].Add(value);
}
}
else
{
//that supplier ID is not present, add it and the value/product
supplierIdWithProductsDict.Add(key, new List<string> { value });
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Storing Word Line and Frequency based on Word - c#

Related

Counting and accessing items in a list of lists ie: invoice with line items

C#: How to split a string with a changing prefix

How to parse a text file with alternating lines of names and lists of integers?

Group pairs of connected values into Lists

Adding comma separated strings to an ArrayList c#

Categories

Resources