So I am working on a problem, and coming up against a wall that I can't seem to find a way around. I get so much information from OS, that I thought I would ask on here, and see if there is a way to do this better than what I'm finding.
Basically, I have a class that has a bunch of values in it, but for our purposes only one matters.
public class GroupPair
{
public string object1 { get; set; }
public string object2 { get; set; }
public List<string> BothObjects
{
get
{
List<string> s= new List<string>();
s.Add(object1);
s.Add(object2);
return s;
}
}
I have a List, and I need to be able to sort them into groups. Where it becomes tricky is that both values are not unique, and the group size and number of groups is variable. I basically need a way to say, "give me every group that can be made from this list, where each group contains all pairs that include any individual member of the group." Let me give and example... here are some pairs:
a d
f h
d t
n w
h a
n o
q d
w f
o y
After the grouping, this is what I want:
Group 1
a d
h a
q d
f h
w f
d t
Group 2
n x
n o
o y
Melt your brain yet?
Any ideas on how this could be done, or even if there is a name for this kind of concept that I can research myself?
Here's my quick-and-dirty approach.
Short explanation:
The idea is to start with one pair (which can be thought of as a node in a graph). From that node, you add any adjacent nodes (pairs which have a shared member). Then you search the nodes adjacent to those nodes that you just added. All along you keep track of which nodes have been visited so you don't loop endlessly.
public static List<HashSet<GroupPair>> GetGroups(IEnumerable<GroupPair> pairs)
{
var groups = new List<HashSet<GroupPair>();
var unassignedPairs = new HashSet<GroupPair>(pairs);
while (unassignedPairs.Count != 0)
{
var group = new HashSet<GroupPair>();
var rootPair = unassignedPairs.First();
group.Add(rootPair);
unassignedPairs.Remove(rootPair);
var membersToVisit = new Queue<string>(rootPair.BothObjects);
var visited = new HashSet<string>();
while (members.Count != 0)
{
string member = membersToVisit.Dequeue();
visited.Add(member);
foreach (var newPair in unassignedPairs
.Where(p => p.BothObjects.Contains(member)).ToList())
{
group.Add(newPair);
unAssignedPairs.Remove(newPair);
foreach (var newMember in newPair.BothObjects.Except(visited))
{
membersToVisit.Enqueue(newMember)
}
}
}
groups.Add(group);
}
return groups;
}
This is just an idea for a solution.
You'll need to know how many unique 'individuals' you have. For your example, it's 26.
First, you create a dictionary of 26 pairs, where key is an individual, in our case a letter, and a value is a group number where it will be in the end. For each pair, initial value should be zero.
Second, you keep a 'groupNumber' integer variable that will store the next group number. You initialise it with 1.
Then, you iterate over the list of GroupPairs. You take the first GroupPair, which contains 'a' and 'd' and set the respective values in the dictionary to '1'.
For each following GroupPair you take its individuals and look up the respective values in the dictionary.
If one of the values is non-zero, i.e. one of the individuals already belongs to a group, you set the other value to the same number, thus putting it in the same group.
If both values are zeros you set them to 'groupNumber' and increment 'groupNumber'.
If both values are non-zero, this is where it gets a bit tricky. You find all pairs in the group dictionary where value equals the second value from that pair, and set their value to the first value from that pair.
After that is done, you iterate over the list of GroupPairs once again. For each pair you look up the first individual in the group dictionary and thus find out which group the pair belongs to.
Hope that makes sense...
This code matches the sample input and produces the required output. Bascially I keep a HashSet of items per group and have list of remaing items to process.
private static void GroupPairs(List<Tuple<string, string>> pairs)
{
int groupCounter = 0;
while (pairs.Count > 0)
{
var onegroup = new HashSet<string>();
Console.WriteLine("Group {0}", ++groupCounter);
int initialGroupCount;
do
{
var remainder = new List<Tuple<string, string>>();
initialGroupCount = onegroup.Count;
foreach (var curr in pairs)
{
if (onegroup.Contains(curr.Item1) ||
onegroup.Contains((curr.Item2)) ||
onegroup.Count == 0)
{
Console.WriteLine("{0} {1}", curr.Item1, curr.Item2);
onegroup.Add(curr.Item1);
onegroup.Add(curr.Item2);
}
else
{
remainder.Add(curr);
}
}
pairs = remainder;
} while (initialGroupCount < onegroup.Count);
}
}
For the sake of completeness I also have a recursive solution.
Near the end is the GroupPair class that acts as datacontainer with two helper methods: Add and Merge.
You invoke it like so:
var gp = GroupByPairs(
new List<Tuple<string, string>>
{
new Tuple<string, string>("a", "d"),
new Tuple<string, string>("f", "h"),
/* you get the idea */
}.GetEnumerator());
foreach (var groupData in gp)
{
Console.WriteLine(groupData.ToString());
}
//recursive take on the problem
private static IEnumerable<GroupPair> GroupByPairs(
IEnumerator<Tuple<string, string>> pairs)
{
// result Groups
var listGroup = new List<GroupPair>();
if (pairs.MoveNext())
{
var item = pairs.Current;
var current = new GroupPair(item);
var subgroup = GroupByPairs(pairs); // recurse
// loop over the groups
GroupPair target = null;
foreach (var groupData in subgroup)
{
// find the group the current item matches
if (groupData.Keys.Contains(item.Item1) ||
groupData.Keys.Contains(item.Item2))
{
// determine if we already have a target
if (target == null)
{
// add item and keep groupData
target = groupData;
groupData.Add(item);
listGroup.Add(groupData);
}
else
{
// merge this with target
// do not keep groupData
target.Merge(groupData);
}
}
else
{
// keep groupData
listGroup.Add(groupData);
}
}
// current item not added
// store its group in the listGroup
if (target == null)
{
listGroup.Add(current);
}
}
return listGroup;
}
public class GroupPair
{
private static int _groupsCount = 0;
private int id;
public GroupPair(Tuple<string, string> item)
{
id = Interlocked.Increment(ref _groupsCount);
Keys = new HashSet<string>();
Items = new List<Tuple<string, string>>();
Add(item);
}
// add the pair and update the Keys
public void Add(Tuple<string, string> item)
{
Keys.Add(item.Item1);
Keys.Add(item.Item2);
Items.Add(item);
}
// Add all items from another GroupPair
public void Merge(GroupPair groupPair)
{
foreach (var item in groupPair.Items)
{
Add(item);
}
}
public HashSet<string> Keys { get; private set; }
public List<Tuple<string, string>> Items { get; private set; }
public override string ToString()
{
var build = new StringBuilder();
build.AppendFormat("Group {0}", id);
build.AppendLine();
foreach (var pair in Items)
{
build.AppendFormat("{0} {1}", pair.Item1, pair.Item2);
build.AppendLine();
}
return build.ToString();
}
}
Related
I have two lists, each list is of type "Node". So I have a StartNodeList and an EndNodeList.
Each Node consists of 3 properties of type Double... X, Y and Z.
The StartNodeList and EndNodeList currently contain Nodes with identical property values.
The output I need is a single list of type Node that contains only Nodes with unique property values (i.e. no duplicate Nodes).
I have tried all manner of foreach loops and comparison operators that I can think of with varying levels of success with nothing working perfectly, and several hours of researching the problem online hasn't helped.
Could someone please help me toward a solution?
while (selector.MoveNext())
{
Beam beam = selector.Current as Beam;
if (beam != null)
{
Node nodeEnd = new Node();
nodeEnd.x = beam.EndPoint.X;
nodeEnd.y = beam.EndPoint.Y;
nodeEnd.z = beam.EndPoint.Z;
Node nodeStart = new Node();
nodeStart.x = beam.StartPoint.X;
nodeStart.y = beam.StartPoint.Y;
nodeStart.z = beam.StartPoint.Z;
Member member = new Member() { member_start = nodeStart, member_end = nodeEnd, member_id = 1 };
memberList.Add(member);
nodeEndList.Add(nodeEnd);
nodeStartList.Add(nodeStart);
memberNumdber++;
}
}
Console.WriteLine(nodeStartList.Count());
Console.ReadLine();
int count = nodeStartList.Count();
foreach(Node i in nodeEndList)
{
nodeListSorted = EqualityComparer.Compare(i, nodeStartList);
}
public static class EqualityComparer
{
public static List<Node> Compare(Node node, List<Node> list)
{
List<Node> output = new List<Node>();
output.Add(node);
foreach(Node i in list)
{
if (node.x.Equals(i.x) && node.y.Equals(i.y) && node.z.Equals(i.z))
{
}
else
{
output.Add(i);
}
}
return output;
}
}
I would recommend using linq. you can use Union which Produces the set union of two sequences and Any which determines whether any element of a sequence exists based onj the given condition.
var uniqueList = list1.Where(el => !list2.Any(l2 => l2.x == el.x && l2.y == el.y && l2.z == e.z)).Union(List2);.ToList();
I am trying to wrap my head around C# Lists, coming from a strong PHP background and thinking of things in PHP Array terms, but I have a class that includes a list and I am trying to count distint items within it. Is there a simple linq way to do this or would I use some sort of nested foreach?
Thank you in advance
public void main() {
List<invoice> inv = new List<invoice>();
// I do something that populates inv with, say 100 invoices
// Count distinct inv.lines.rowtype ?? to get:
Type A 34
Type B 3
Type X 21 ...etc
}
class invoice {
int invoicenumber;
int customernumber;
List<lineitem> lines;
struct lineitem {
string rowtype;
string somethingelse;
int whatever;
}
public invoice {
lines = new List<lineitem>;
}
}
Something like this?
inv.SelectMany(i => i.lines).GroupBy(l => l.rowtype).ToDictionary(g => g.Key, g => g.Count())
You could probably use some LINQ for this, however for the sake of simplicity and readability, I would recommend using for loops
// Keep a dictionary for count
var lineItemDict = new Dictionary<string, int>();
foreach (var inv in invoices)
{
foreach (var line in inv.lines)
{
// If the rowtype already exists, increment the count
if (lineItemDict.ContainsKey(line.rowtype))
{
lineItemDict.TryGetValue(line.rowtype, out count);
lineItemDict[line.rowtype] = count + 1;
}
else
{
// Else add a new entry
lineItemDict.Add(line.rowtype, 1);
}
}
}
With LINQ:
// Keep a dictionary for count
var lineItemDict = new Dictionary<string, int>();
invoices.ForEach(inv => {
inv.lines.ForEach(line => {
// If the rowtype already exists, increment the count
if (lineItemDict.ContainsKey(line.rowtype))
{
lineItemDict.TryGetValue(line.rowtype, out count);
lineItemDict[line.rowtype] = count + 1;
}
else
{
// Else add a new entry
lineItemDict.Add(line.rowtype, 1);
}
});
});
Both of these will leave you with a dictionary (lineItemDict) that looks like this:
<rowtype> : <count>
For example,
'A' : 34
'B' : 3
'X' : 21
In my asp.net c# application, I have following list of occurrences of item combinations. I want to list the most frequently occurrence combinations.
Item1
Item1, Item2
Item3
Item1, Item3, Item2
Item3, Item1
Item2, Item1
According to the above example, I should get below output.
most frequently occurrence of the combinations are;
Item1 & Item2 - No of occurrences are 3 (#2, #4 & #6)
Item1 & Item3 - No of occurrences are 2 (#4 & #5)
My structure is as below.
public class MyList
{
public List<MyItem> MyItems { get; set; }
}
public class MyItem
{
public string ItemName { get; set; }
}
Out of the top of my head i would map all possible combinations using a hash where ab is the same as ba (or you could order your items alphabetically for example and then hash them) and then just count occurrences of the hashes...
You can create a weighted graph from your list with weight between two nodes representing frequency of occurrence. This StackExchange post has some information, as well as you can learn about adjacency matrix on this previous SO post here.
According to me, it would be wise to use
HashSet<Tuple<Item1, Item2>> to represent a connection and have it's value stored in a dictionary.
For multiple items, the problem is similar to finding out which path was traversed most, in path traversal algorithm for graphs.
Though for very large set of data, I recommend using SSAS and SSIS services through SQL Statements and Analysis Queries dynamically with C# to create a market basket analysis, which should generate desired statistics for you.
Here is a quick and dirty way to do this to get you started. You should probably use hash tables for performance, but I think Dictionaries are easier to visualize.
Fiddle: https://dotnetfiddle.net/yofkLf
public static void Main()
{
List<MyItem[]> MyItems = new List<MyItem[]>()
{
new MyItem[] { new MyItem("Item1") },
new MyItem[] { new MyItem("Item1"), new MyItem("Item2") },
new MyItem[] { new MyItem("Item3") },
new MyItem[] { new MyItem("Item1"), new MyItem("Item3"), new MyItem("Item2") },
new MyItem[] { new MyItem("Item3"), new MyItem("Item1") },
new MyItem[] { new MyItem("Item2"), new MyItem("Item1") }
};
Dictionary<Tuple<string, string>, int> results = new Dictionary<Tuple<string, string>, int>();
foreach (MyItem[] arr in MyItems)
{
// Iterate through the items in the array. Then, iterate through the items after that item in the array to get all combinations.
for (int i = 0; i < arr.Length; i++)
{
string s1 = arr[i].ItemName;
for (int j = i + 1; j < arr.Length; j++)
{
string s2 = arr[j].ItemName;
// Order the Tuple so that (Item1, Item2) is the same as (Item2, Item1).
Tuple<string, string> t = new Tuple<string, string>(s1, s2);
if (string.Compare(s1, s2) > 0)
{
t = new Tuple<string, string>(s2, s1);
}
if (results.ContainsKey(t))
{
results[t]++;
}
else
{
results[t] = 1;
}
}
}
}
// And here are your results.
// You can always use Linq to sort the dictionary by values.
foreach (var v in results)
{
Console.WriteLine(v.Key.ToString() + " = " + v.Value.ToString());
// Outputs:
// (Item1, Item2) = 3
// (Item1, Item3) = 2
// (Item2, Item3) = 1
}
}
...
public class MyItem
{
public string ItemName { get; set; }
public MyItem(string ItemName)
{
this.ItemName = ItemName;
}
}
Of course this would be different if you didn't have that string property in MyItems.
Here's a rough O(N^2) approach:
Iterate over the outer collection (the List<List<Item>>)
Come up with a way to define the current row, call it rowId
Now iterate the known row ids (inner iteration).
Count when one of these is a complete subset of the other; either the current row is contained in a previous set, or the previous set is contained in the current row. (This is the solution you want.) This works be incrementing the count of the rows previously seen if they are a subset of the current row, or tracking the number of times the current row is a subset of the previously seen combinations and setting that at the end of each inner iteration.
Some assumptions:
You don't care about every possible combination of items, only combinations that have already been seen.
Items have a unique identifier
Like I said above, this is an O(N^2) approach, so performance may be a concern. There's also two checks for subset membership which may be a performance issue. I'm also just joining and splitting ids as strings, you can probably get a more optimal solution by setting up another dictionary that tracks ids. There's also some room for improvement with Dictionary.TryGetValue. Extracting the sets of items you want is left as an exercise for the reader, but should be a straightforward OrderBy(..).Where(...) operation. But this should get you started.
public class MyItem
{
public string ItemName { get; set; }
}
class Program
{
public static void GetComboCount()
{
var itemsCollection = new List<List<MyItem>>() {
new List<MyItem>() { new MyItem() { ItemName = "Item1" } },
new List<MyItem>() { new MyItem() { ItemName = "Item1" }, new MyItem() { ItemName = "Item2" } },
new List<MyItem>() { new MyItem() { ItemName = "Item3" } },
new List<MyItem>() { new MyItem() { ItemName = "Item1" }, new MyItem() { ItemName = "Item3" }, new MyItem() { ItemName = "Item2" } },
new List<MyItem>() { new MyItem() { ItemName = "Item3" }, new MyItem() { ItemName = "Item1" } },
new List<MyItem>() { new MyItem() { ItemName = "Item2" }, new MyItem() { ItemName = "Item1" } }
};
var comboCount = new Dictionary<string, int>();
foreach (var row in itemsCollection)
{
var ids = row.Select(x => x.ItemName).OrderBy(x => x);
var rowId = String.Join(",", ids);
var rowIdCount = ids.Count();
var seen = false;
var comboCountList = comboCount.ToList();
int currentRowCount = 1;
foreach (var kvp in comboCountList)
{
var key = kvp.Key;
if (key == rowId)
{
seen = true;
currentRowCount++;
continue;
}
var keySplit = key.Split(',');
var keyIdCount = keySplit.Length;
if (ids.Where(x => keySplit.Contains(x)).Count() == keyIdCount)
{
comboCount[kvp.Key] = kvp.Value + 1;
}
else if (keySplit.Where(x => ids.Contains(x)).Count() == rowIdCount)
{
currentRowCount++;
}
}
if (!seen)
{
comboCount.Add(rowId, currentRowCount);
}
else
{
comboCount[rowId] = currentRowCount;
}
}
foreach (var kvp in comboCount)
{
Console.WriteLine(String.Format("{0}: {1}", kvp.Key, kvp.Value));
}
}
static void Main(string[] args)
{
GetComboCount();
}
}
console output:
Item1: 5
Item1,Item2: 3
Item3: 3
Item1,Item2,Item3: 1
Item1,Item3: 2
I am working with a graph data structure and have a recursive function to calculate the depth of a node by counting the parents to the root node.
There are some other issues that I need to deal with, but for right now my main problem is to do with storing the current value of the recursive dictionary parameter, which stores the path branches.
using System;
using System.Collections.Generic;
using System.Linq;
public class Node {
public string name;
public int ID;
public int maxDepth;
public readonly List<Node> Dependencies = new List<Node>();
public readonly List<Node> Children = new List<Node>();
public bool isOrphan {
get {
return Dependencies.Count == 0;
}
}
public bool isParent {
get {
return Children.Count != 0;
}
}
}
public class test {
private static readonly List<Node> nodes = new List<Node>();
public static void Main() {
Node A = new Node() {
name = "A",
ID = 1
};
Node B = new Node() {
name = "B",
ID = 2
};
Node C = new Node() {
name = "C",
ID = 3
};
Node D = new Node() {
name = "D",
ID = 4
};
Node E = new Node() {
name = "E",
ID = 5
};
Node F = new Node() {
name = "F",
ID = 6
};
Node G = new Node() {
name = "G",
ID = 7
};
nodes.Add(A);
nodes.Add(B);
nodes.Add(C);
nodes.Add(D);
nodes.Add(E);
nodes.Add(F);
nodes.Add(G);
A.Children.Add(B);
A.Children.Add(G);
B.Children.Add(C);
B.Children.Add(D);
C.Children.Add(D);
D.Children.Add(E);
E.Children.Add(F);
B.Dependencies.Add(A);
C.Dependencies.Add(B);
D.Dependencies.Add(B);
D.Dependencies.Add(C);
E.Dependencies.Add(D);
E.Dependencies.Add(G);
F.Dependencies.Add(E);
G.Dependencies.Add(A);
foreach (Node n in nodes) {
n.maxDepth = getMaxNodeDepth(n);
}
Console.ReadLine();
}
private static int getMaxNodeDepth(Node n, string listIndex = "base",
Dictionary<string, List<int>> paths = null) {
bool firstIteration = false;
if (paths == null) {
firstIteration = true;
listIndex = n.name.Replace(" ", "-");
paths = new Dictionary<string, List<int>> {
{listIndex, new List<int>(0)}
};
}
// Prevent the starting node from being added to the path
if (!paths[listIndex].Contains(n.ID) && !firstIteration)
paths[listIndex].Add(n.ID);
// This variable should take the CURRENT path and store it;
// not the value after all the recursion has completed.
// Right now, the current path is affected by the recursions, somehow...
List<int> currentPath = new List<int>(paths[listIndex]);
foreach (Node parent in n.Dependencies) {
if (n.Dependencies.Count >= 2) {
listIndex = parent.name;
paths.Add(listIndex, currentPath);
}
getMaxNodeDepth(parent, listIndex, paths);
}
// Print out branches
if (firstIteration) {
string list = n.name + "\n";
int listNumber = 1;
foreach (List<int> iList in paths.Values) {
list += string.Format("Branch#{0} -- ", paths.Keys.ElementAt(listNumber - 1));
int total = 0;
foreach (int i in iList) {
list += string.Format("{0}, ", nodes.First(x => x.ID == i).name);
total++;
}
listNumber++;
list += string.Format(" -- ({0})\n", total);
}
Console.WriteLine(list);
}
// Order all paths by length, return the highest count
// This is to be used to space out the hierarchy properly
return paths.Values.OrderByDescending(path => path.Count).First().Count;
}
}
When the foreach loop encounters a node with more than one parent, it creates a new branch and should populate it with the current IDs of the nodes.
C D
\ /
B
|
A
|
...
What should happen
Using the above example, beginning with A, it will first iterate B, as its direct parent. Then it begins on B's parents, which it has two of and because of this, it creates a separate branch and should fill that branch with B and its children (until the starting node, this time being A).
What actually does
Somehow, when B has finished iterating over C, parent D polls the current path and is returned B, C, where it should actually be just B, as C is a sibling, not a direct child or parent.
Huge edit
The code I've attached runs completely out of the box and contains an example. You can see the result contains some anomalous results, such as
F
Branch#G -- E, D, G, A, -- (4)
which should actually be
G
Branch#G -- G, A, -- (2)
When you give a dictionary as a parameter to a method, the contents of the dictionary is not copied, only the reference to the dictionary is copied.
So altering the dictionary in one recursion branch will change the dictionary for the other branch as well.
To fix it, you can copy the dictionary explicitly yourself when passing the dictionary:
getMaxNodeDepth(parent, listIndex, new Dictionary<string, List<int>>(paths));
EDIT: Actually that wouldn't be enough either since it will copy the reference to the inner list and not the contents of the inner list, so you'll need a more nested cloning code:
private Dictionary<string, List<int>> clone(Dictionary<string, List<int>> map)
{
Dictionary<string, List<int>> clone = new Dictionary<string, List<int>>(map.Count);
foreach (var pair in map)
{
clone[pair.Key] = new List<int>(pair.Value);
}
return clone;
}
//And then call it from your code:
getMaxNodeDepth(parent, listIndex, clone(paths));
However, assuming you don't need to fill this paths dictionary for outside code, and the only output here is the "maximum depth" of the node, you can probably simplify your code a lot, for example:
private int getMaxNodeDepth(Node n)
{
if (n.Dependencies == null || n.Dependencies.Count == 0) return 1;
return 1 + n.Dependencies.Max(parent => getMaxNodeDepth(parent));
}
EDIT: edited to add a solution that returns the "maximum path" as well:
private List<Node> getMaxNodeDepth(Node n)
{
List<Node> path =
n.GetSubFolders().Select(getMaxNodeDepth).OrderByDescending(p => p.Count).
FirstOrDefault() ?? new List<Node>();
path.Insert(0, n);
return path;
}
EDIT: and based on the comment from the OP, here's a solution that returns all available paths:
private static List<List<Node>> getAllPaths(Node n)
{
if (n.Dependencies == null || n.Dependencies.Count == 0)
return new List<List<Node>> { new List<Node> { n }};
List<List<Node>> allPaths = n.Dependencies.SelectMany(getAllPaths).ToList();
allPaths.ForEach(path => path.Insert(0, n));
return allPaths;
}
private static int getMaxDepth(Node n)
{
return getAllPaths(n).Max(p => p.Count);
}
I am working on a problem, in which I have to be able to read a text file, and count the frequency and line number of a specific word.
So for example, a txt file that reads
"Hi my name is
Bob. This is
Cool"
Should return:
1 Hi 1
1 my 1
1 name 1
2 is 1 2
1 bob 2
1 this 2
1 cool 3
I am having trouble deciding how to store the line number, as well as the word frequency. I have tried a few different things, and so far this is where I am at.
Any help?
Dictionary<string, int> countDictionary = new Dictionary<string,int>();
Dictionary<string, List<int>> lineDictionary = new Dictionary<string, List<int>>();
List<string> lines = new List<string>();
System.IO.StreamReader file =
new System.IO.StreamReader("Sample.txt");
//Creates a List of lines
string x;
while ((x = file.ReadLine()) != null)
{
lines.Add(x);
}
foreach(var y in Enumerable.Range(0,lines.Count()))
{
foreach(var word in lines[y].Split())
{
if(!countDictionary.Keys.Contains(word.ToLower()) && !lineDictionary.Keys.Contains(word.ToLower()))
{
countDictionary.Add(word.ToLower(), 1);
//lineDictionary.Add(word.ToLower(), /*what to put here*/);
}
else
{
countDictionary[word] += 1;
//ADD line to dictionary???
}
}
}
foreach (var pair in countDictionary)//WHAT TO PUT HERE to print both
{
Console.WriteLine("{0} {1}", pair.Value, pair.Key);
}
file.Close();
System.Console.ReadLine();
You can pretty much do this with one line of linq
var processed =
//get the lines of text as IEnumerable<string>
File.ReadLines(#"myFilePath.txt")
//get a word and a line number for every word
//so you'll have a sequence of objects with 2 properties
//word and lineNumber
.SelectMany((line, lineNumber) => line.Split().Select(word => new{word, lineNumber}))
//group these objects by their "word" property
.GroupBy(x => x.word)
//select what you need
.Select(g => new{
//number of objects in the group
//i.e. the frequency of the word
Count = g.Count(),
//the actual word
Word = g.Key,
//a sequence of line numbers of each instance of the
//word in the group
Positions = g.Select(x => x.lineNumber)});
foreach(var entry in processed)
{
Console.WriteLine("{0} {1} {2}",
entry.Count,
entry.Word,
string.Join(" ",entry.Positions));
}
I like 0 based counting, so you may want to add 1 in the appropriate place.
You are tracking two different properties of the entity "word" in two separate data structures. I would suggest creating a class to represent that entity, something like
public class WordStats
{
public string Word { get; set; }
public int Count { get; set; }
public List<int> AppearsInLines { get; set; }
public Word()
{
AppearsInLines = new List<int>();
}
}
Then track things in a
Dictionary<string, WordStats> wordStats = new Dictionary<string, WordStats>();
Use the word itself as the key. When you encounter a new word, check whether there is already an instance of Word with that specific key. If so, get it and update the Count and AppearsInLines property; if not create a new instance and add it to the dictionary.
foreach(var y in Enumerable.Range(0,lines.Count()))
{
foreach(var word in lines[y].Split())
{
WordStats wordStat;
bool alreadyHave = words.TryGetValue(word, out wordStat);
if (alreadyHave)
{
wordStat.Count++;
wordStat.AppearsInLines.Add(y);
}
else
{
wordStat = new WordStats();
wordStat.Count = 1;
wordStat.AppearsInLines.Add(y);
wordStats.Add(word, wordStat);
}