Related
I have two different text files and I have to find 10 longest words that are in both of them. I have to print the list of those words out and write the frequency - how many times they are repeated in those separate files. The problem I have with my current code is that it finds the words, but when it comes to frequency - it combines the frequency count. How can I change the code to know the frequency count for separate files?
Here is my code for finding words that are in both text files:
public static Dictionary<string, int> PopularWords(string data1, string data2, char[] punctuation)
{
string[] book1 = data1.Split(punctuation, StringSplitOptions.RemoveEmptyEntries);
string[] book2 = data2.Split(punctuation, StringSplitOptions.RemoveEmptyEntries);
Dictionary<string, int> matches = new Dictionary<string, int>();
for (int i = 0; i < book1.Length; i++)
{
if (matches.ContainsKey(book1[i]))
{
matches[book1[i]]++;
continue;
}
for (int j = 0; j < book2.Length; j++)
{
if (book1[i] == book2[j])
{
if (matches.ContainsKey(book1[i]))
{
matches[book1[i]]++;
} else
{
matches.Add(book1[i], 2);
}
}
}
}
return matches;
And here is my code for reading and printing:
public static void ProcessPopular(string data, string data1, string results)
{
char[] punctuation = { ' ', '.', ',', '!', '?', ':', ';', '(', ')', '\n' };
string lines = File.ReadAllText(data, Encoding.UTF8);
string lines2 = File.ReadAllText(data1, Encoding.UTF8);
var popular = PopularWords(lines, lines2, punctuation);
KeyValuePair<string, int>[] popularWords = popular.ToArray();
Array.Sort(popularWords, (x, y) => y.Key.Length.CompareTo(x.Key.Length));
using (var writerF = File.CreateText(results))
{
int foundWords = 0;
writerF.WriteLine("{0, -25} | {1, -35} | {2, -35}", "Longest words", "Frequency in 1 .txt file", "Frequency in 2 .txt file");
writerF.WriteLine(new string('-', 101));
// not finished
}
}
Here's my take on this:
public static Dictionary<string, Dictionary<string, int>> PopularWords(string data1, string data2, char[] punctuation)
{
string[] book1 = data1.Split(punctuation, StringSplitOptions.RemoveEmptyEntries);
string[] book2 = data2.Split(punctuation, StringSplitOptions.RemoveEmptyEntries);
return
Enumerable
.Concat(
book1.Select(x => (word: x, book: "book1")),
book2.Select(x => (word: x, book: "book2")))
.ToLookup(x => x.word, x => x.book)
.OrderByDescending(x => x.Key.Length)
.Take(10)
.ToDictionary(x => x.Key, x => x.GroupBy(y => y).ToDictionary(y => y.Key, y => y.Count())); ;
}
If I start with this data:
char[] punctuation = new char[] { ' ', ',', '.', '?', '-', ':' };
string data1 = "I have two different text files and I have to find 10 longest words that are in both of them. I have to print the list of those words out and write the frequency - how many times they are repeated in those separate files. The problem I have with my current code is that it finds the words, but when it comes to frequency - it combines the frequency count. How can I change the code to know the frequency count for separate files?";
string data2 = "This solution is more general: it works whatever number of files you wish to process. This is an extremely raw query that could be separated in smaller queries, but it gives the logical basis. Other requirements, like only 10 words or minimum word length etc can be easily applied. Please do mind that this a bare-bone example, without any safety checks. It also omits reading data from files. The problem I have with my current code is that it finds the words, but when it comes to frequency - it combines the frequency count. How can I change the code to know the frequency count for separate files?";
I get this result:
"requirements": { "book2" = 1 }
"different": { "book1" = 1 }
"frequency": { "book1" = 4, "book2" = 3 }
"extremely": { "book2" = 1 }
"separated": { "book2" = 1 }
"repeated": { "book1" = 1 }
"separate": { "book1" = 2, "book2" = 1 }
"combines": { "book1" = 1, "book2" = 1 }
"solution": { "book2" = 1 }
"whatever": { "book2" = 1 }
To simplify, if performance is not the key here, I would go this way:
public static void Method()
{
var a = "A deep blue raffle, very deep and blue, raffle raffle. An old one was there";
var b = "deep blue raffle, very very very long and blue, raffle RAFFLE. A new one was there";
char[] punctuation = { '.', ',', '!', '?', ':', ';', '(', ')', '\n' };
var fileOne = new string(a.Where(c => punctuation.Contains(c) is false).ToArray()).Split(" ");
var fileTwo = new string(b.Where(c => punctuation.Contains(c) is false).ToArray()).Split(" ");
var duplicates = fileOne.Intersect(fileTwo, StringComparer.OrdinalIgnoreCase);
var result = new List<(int, int, string)>(duplicates.Count());
foreach(var duplicat in duplicates)
{
result.Add((fileOne.Count(x => x.Equals(duplicat, StringComparison.OrdinalIgnoreCase)), fileTwo.Count(x => x.Equals(duplicat, StringComparison.OrdinalIgnoreCase)), duplicat));
}
foreach (var val in result)
{
Output.WriteLine($"Word: {val.Item3} | In file one: {val.Item1} | In file two: {val.Item2}");
}
}
This will give you the result of
Word: A | In file one: 1 | In file two: 1
Word: deep | In file one: 2 | In file two: 1
Word: blue | In file one: 2 | In file two: 2
Word: raffle | In file one: 3 | In file two: 3
Word: very | In file one: 1 | In file two: 3
Word: and | In file one: 1 | In file two: 1
Word: one | In file one: 1 | In file two: 1
Word: was | In file one: 1 | In file two: 1
Word: there | In file one: 1 | In file two: 1
Other requirements, like only 10 words or minimum word length etc can be easily applied.
Please do mind that this a bare-bone example, without any safety checks. It also omits reading data from files.
EDIT I was not very pleased with my original solution, so I reworked it. I abandonned one thing I liked in my previous solution: the fact that it didn't depend on an external list of punctuation characters, but that this list was generated by the query itself. But it made the query more complicated and long.
In case you would be curious about a different coding style, here is a solution using Linq.
This solution is more general: it works whatever number of files you wish to process.
This is a Linqpad query that you can run directly via copy/paste, but you need to provide the text files of course:
// Choose here how many different words you want.
var resultCount = 10;
// Add as many files as needed.
var Files = new List<string>
{
#"C:\Temp\FileA.txt",
#"C:\Temp\FileB.txt",
#"C:\Temp\FileC.txt",
};
char[] punctuation = { '.', ',', '!', '?', ':', ';', '(', ')', '\n', '"', ' ' };
// Perform the calculation.
var LongestCommonWords = Files
.SelectMany(f => File.ReadAllText(f)
.Split(punctuation, StringSplitOptions.TrimEntries)
.ToLookup(w => ( word: w.ToLower(), fileName: f))
)
.ToLookup(e => e.Key.word)
.Where(g => g.Count() == Files.Count())
.OrderByDescending(g => g.Key.Length)
.Take(resultCount); // Take only the desired amount (10 for instance)
// Display the results.
foreach (var word in LongestCommonWords)
{
var occurences = string.Join(" / ", word.Select(g => $"{Path.GetFileName(g.Key.fileName)} - {g.Count()}"));
Console.WriteLine($"{word.Key} - {occurences}");
}
Here is an output obtained with the content of three Wikipedia pages:
contribution - FileA.txt - 9 / FileB.txt - 1 / FileC.txt - 5
subsequently - FileA.txt - 2 / FileB.txt - 1 / FileC.txt - 1
introduction - FileA.txt - 1 / FileB.txt - 4 / FileC.txt - 3
alternative - FileA.txt - 2 / FileB.txt - 1 / FileC.txt - 1
independent - FileA.txt - 5 / FileB.txt - 3 / FileC.txt - 3
significant - FileA.txt - 2 / FileB.txt - 1 / FileC.txt - 3
established - FileA.txt - 1 / FileB.txt - 1 / FileC.txt - 1
outstanding - FileA.txt - 1 / FileB.txt - 3 / FileC.txt - 3
programming - FileA.txt - 1 / FileB.txt - 2 / FileC.txt - 4
university - FileA.txt - 44 / FileB.txt - 17 / FileC.txt - 7
I'm trying to merge two lists and I thought I had a solution but if there are two PackItems with the same length the results are not as expected.
Expectations/requirements.
Both lists contain the same total number of pieces for each length.
EDIT: Added code to clarify the input requirements.
The same length can be used in multiple PacksItems.
The same lengths can be produced out of multiple CoilNums.
The goal is to contain a list the contains a unique entry for each PackItem.ID/CoilNum.
Requirement for the output is that the total number of pieces for each length matched the input lists.
Here is the code I have so far.
public class PackItem
{
public int ID { get; set; }
public int Quantity { get; set; }
public string Length { get; set; }
}
public class ProductionInfo
{
public ProductionInfo AddID(PackItem item)
{
LineID = item.ID;
Quantity = Math.Min(Quantity, item.Quantity);
return this;
}
public int LineID { get; set; }
public string CoilNum { get; set; }
public int Quantity { get; set; }
public string Length { get; set; }
}
private void DoTest()
{
var packItems = new List<PackItem>()
{
new PackItem() {ID = 4, Quantity = 5, Length = "10"},
new PackItem() {ID = 5, Quantity = 2, Length = "4"},
new PackItem() {ID = 6, Quantity = 1, Length = "4"}
};
var productionInfoList = new List<ProductionInfo>()
{
new ProductionInfo() { CoilNum = "A", Quantity = 4, Length = "10"},
new ProductionInfo() { CoilNum = "B", Quantity = 1, Length = "10"},
new ProductionInfo() { CoilNum = "B", Quantity = 2, Length = "4"},
new ProductionInfo() { CoilNum = "A", Quantity = 1, Length = "4"},
};
//assert that both lists meet input requirements
var result1 = "";
var sum1 = packItems.GroupBy(i => i.Length);
foreach (var group in sum1) result1 += $"{group.Sum(i=>i.Quantity)} | {group.Key}\n";
var input2 = "";
var result2 = "";
var sum2 = productionInfoList.GroupBy(i => i.Length);
foreach (var group in sum2) result2 += $"{group.Sum(i => i.Quantity)} | {group.Key}\n";
Console.WriteLine("packItems: \nSum(Quantity) | Length");
Console.WriteLine(result1);
Console.WriteLine("productionInfoList: \nSum(Quantity) | Length");
Console.WriteLine(result2);
if (result1 == result2)
{
Console.WriteLine("Both Lists have the same quantity of each length");
}
else
{
Console.WriteLine("Error: Both Lists do not have the same quantity of each length");
return;
}
var merged = productionInfoList.SelectMany(x => packItems, (x, y) => new { x, y })
.Where(i => i.x.Length == i.y.Length)
.Select(i => i.x.AddID(i.y));
Console.WriteLine("ID | Coil | Qty | Length");
foreach (var item in merged)
{
Console.WriteLine($"{item.LineID} | {item.CoilNum} | {item.Quantity} | {item.Length}");
}
}
//expected output
ID | Coil | Qty | Length
4 | A | 4 | 10
4 | B | 1 | 10
5 | B | 2 | 4
6 | A | 1 | 4
//actual output
ID | Coil | Qty | Length
4 | A | 4 | 10
4 | B | 1 | 10
5 | B | 2 | 4
6 | B | 1 | 4
5 | A | 1 | 4
6 | A | 1 | 4
I'm stuck at this point and they only way I can think of is splitting each of these lists into individual items of one each, and then compiling a list by looping through them one by one.
Is there a way this can be done with Linq?
Here is a method that produces the correct output. Is there an easier way to do this? Can this be done with Linq only?
private void DoTest()
{
var packItems = new List<PackItem>()
{
new PackItem() {ID = 4, Quantity = 5, Length = "10"},
new PackItem() {ID = 5, Quantity = 2, Length = "4"},
new PackItem() {ID = 6, Quantity = 1, Length = "4"}
};
var productionInfoList = new List<ProductionInfo>()
{
new ProductionInfo() { CoilNum = "A", Quantity = 4, Length = "10"},
new ProductionInfo() { CoilNum = "B", Quantity = 1, Length = "10"},
new ProductionInfo() { CoilNum = "B", Quantity = 2, Length = "4"},
new ProductionInfo() { CoilNum = "A", Quantity = 1, Length = "4"},
};
//first create a list with one item for each pieces
var individualProduction = new List<ProductionInfo>();
foreach (var item in productionInfoList)
{
for (int i = 0; i < item.Quantity; i++)
{
individualProduction.Add(new ProductionInfo()
{
Quantity = 1,
Length = item.Length,
CoilNum = item.CoilNum
});
}
}
//next loop through and assign all the pack line ids
foreach (var item in individualProduction)
{
var packItem = packItems.FirstOrDefault(i => i.Quantity > 0 && i.Length == item.Length);
if (packItem != null)
{
packItem.Quantity -= 1;
item.LineID = packItem.ID;
}
else
{
item.Quantity = 0;
}
}
//now group them back into a merged list
var grouped = individualProduction.GroupBy(i => (i.CoilNum, i.LineID, i.Length));
//output the merged list
var merged1 = grouped.Select(g => new ProductionInfo()
{
LineID = g.Key.LineID,
CoilNum = g.Key.CoilNum,
Length = g.Key.Length,
Quantity = g.Count()
});
}
Quite unclear ...
This one is closed of the desired result but doesn't take into consideration any quantity so that the fist PackItem is always choosed. If decreasing the pItem.Quantity this would select the next available pItem.ID where Quantity > 0. But this will require more code :)
var results = productionInfoList.Select(pInfo =>
{
var pItem = packItems.First(z => z.Length == pInfo.Length);
return new { pItem.ID, pInfo.CoilNum, pInfo.Quantity, pInfo.Length };
}).ToList();
When you have a goal of : The goal is to contain a list the contains a unique entry for each PackItem.ID/CoilNum. your bottom answer is correct, since it has a unique id coilNum pair. What you are looking for is a different uniquenes.
var l = packItems.Join(productionInfoList, x => x.Length, y => y.Length, (x, y) => { y.AddID(x); return y; }).GroupBy(x => new { x.CoilNum, x.Length }).Select(x => x.First());
It is unclear on the exact rules of the case, but here I am using Length as a unique key to perform a join operation (Would recommend to have a different unique key for join operations).
I'm trying to merge several values of diffrent lists into one line.
for example:
list A = [1,2,3,4,5,6,7,8,9]
list B = [A,B,C,D]
list C = [!,?,-]
then ill go with a loop through all lists and the output should be:
line = [1,A,!]
line = [2,B,?]
line = [3,C,-]
line = [4,D,NULL]
line = [5,NULL, NULL]
line = [6 ,NULL ,NULL]...
The result will be added into one object
So far I tried to iterate through my lists with foreach loops but after I debugging it's clear that my approach cant work:
foreach (var item in list1){
foreach (var item2 in list2){
foreach (var item3 in list3){
string line = makeStringFrom(item, item2, item3);
}
}
}
But I dont know how to make it work.
You can also use LINQ functions.
var listA = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
var listB = new List<string> { "A", "B", "C", "D" };
var listC = new List<string> { "!", "?", "-" };
var result = Enumerable.Range(0, Math.Max(Math.Max(listA.Count, listB.Count), listC.Count))
.Select(i => new
{
a = listA.ElementAtOrDefault(i),
b = listB.ElementAtOrDefault(i),
c = listC.ElementAtOrDefault(i)
}).ToList();
foreach (var item in result)
{
Console.WriteLine("{0} {1} {2}", item.a, item.b, item.c);
}
Result:
1 A !
2 B ?
3 C -
4 D
5
6
7
8
9
The general method would be:
Find the maximum length of all of the lists
Then create a loop to go from 0 to the max length-1
Check if each list contains that index of the item, and if so,
retrieve the value, otherwise return null
Build your line from those values
You can use this:
var A = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9" };
var B = new List<string>() { "A", "B", "C", "D" };
var C = new List<string>() { "!", "?", "-"};
var lists = new List<List<string>>() { A, B, C };
int count = 0;
foreach ( var list in lists )
count = Math.Max(count, list.Count);
var result = new List<List<string>>();
for ( int index = 0; index < count; index++ )
{
var item = new List<string>();
result.Add(item);
foreach ( var list in lists )
item.Add(index < list.Count ? list[index] : null);
}
foreach ( var list in result )
{
string str = "";
foreach ( var item in list )
str += ( item == null ? "(null)" : item ) + " ";
str.TrimEnd(' ');
Console.WriteLine(str);
}
We create a list of the lists so you can use that for any number of lists.
Next we take the max count of these lists.
Then we parse them as indicated by the algorithm:
We create a new list.
We add this list to the result that is a list of lists.
We add in this new list each of others lists items while taking null is no more items available.
You can use a StringBuilder if you plan to manage several and big lists to optimize memory strings concatenation.
Fiddle Snippet
Output
1 A !
2 B ?
3 C -
4 D (null)
5 (null) (null)
6 (null) (null)
7 (null) (null)
8 (null) (null)
9 (null) (null)
I have a list a simple Player object, as follows
Name | Team | Position | Total
Tom Brady | Team 1 | QB | 200
Adrian Peterson | Team 1 | RB | 250
Calvin Johnson | Team 2 | WR | 260
LeVon Bell | Team 2 | RB | 220
Peyton Manning | Team 3 | QB | 220
Arian Foster | Team 3 | RB | 220
This is a simple sample, in reality there are about 200 records. What I want to do is to get all possible combinations of players per team, and sum their total, so the end product would be as follows
Possibilities
Teams | Players | Total
Team 1 | Tom Brady, Adrian Peterson | 450
Team 2 | Calvin Johnson, LeVon Bell | 480
Team 3 | Peyton Manning, Arian Foster | 440
Basically I am looking for trade possibilities, so I need to get combinations of players per team. The largest possible combination I am looking for is 5 players per team, where I would have the Players and their points combined in a new object. Right now I can get there with below.
var playerList = players.GroupBy(p => p.Team)
.Select(g => new
{
Team = g.Key,
g
}).ToList();
List<Possibilities> possibilities = new List<Possibilities>();
foreach (var a in playerList)
{
List<Possibilities> onePlayer = (from b in a.g
select new Possibilities
{
Players = b.Name,
Total = b.Total,
Team = a.Team
}).ToList();
List<Possibilities> twoPlayer = (from b in a.g
from c in a.g
select new Possibilities
{
Players = b.Name + ", " + c.Name,
Total = b.Total + c.Total,
Team = a.Team
}).ToList();
And this gives me all combinations of 1,2,3 players per team, but I want to add 4 and 5. This also does not remove duplicate combinations (Player 1, Player 2 and Player 2,Player1). Is there a cleaner way to do this?
You can generate all combinations of a limited set of items (where the number of items is <= 31) by counting using a binary number. Each set bit in the binary number represents an item being present in the combination.
For example, counting (in binary) for a set of 3 items:
000, 001, 010, 011, 100, 101, 110, 111
A 1 in the binary number indicates that the corresponding item in the set should be included in that combination.
If you want to ensure that the combination includes number of items in a certain range, you need to count the bits set in the binary number and check if that count is in range. There's an efficient way to do that using bit twiddling (see here for examples).
Putting that together gives code like the following. Run it and check the output. Hopefully you can see how to use it with your program.
This example produces all combinations of between 2 and 3 items taken from A, B, C, D. Its output is:
A,B
A,C
B,C
A,B,C
A,D
B,D
A,B,D
C,D
A,C,D
B,C,D
The code is:
using System;
using System.Collections.Generic;
namespace ConsoleApplication1
{
public class Program
{
public static void Main()
{
var data = new [] {"A", "B", "C", "D"};
// Get all the combinations of elements from A,B,C,D with between 2 and 3 values:
var combinations = Combinations(data, 2, 3);
// Combinations() has returned an IEnumerable<IEnumerable<T>>,
// that is, a sequence of subsequences where each subsequence is one combination.
foreach (var combination in combinations)
Console.WriteLine(string.Join(",", combination));
}
public static IEnumerable<IEnumerable<T>> Combinations<T>(T[] input, int minElements, int maxElements)
{
int numCombinations = 2 << (input.Length - 1);
for (int bits = 0; bits < numCombinations; ++bits)
{
int bitCount = NumBitsSet(bits);
if (minElements <= bitCount && bitCount <= maxElements)
yield return combination(input, bits);
}
}
private static IEnumerable<T> combination<T>(T[] input, int bits)
{
for (int bit = 1, i = 0; i < input.Length; ++i, bit <<= 1)
if ((bits & bit) != 0)
yield return input[i];
}
public static int NumBitsSet(int i)
{
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}
}
}
Although probably very inefficient, the below might work. Of course that's just a simple example and you'll need to adjust the code to your situation.
static void Main(string[] args)
{
var nums = new[] { 1, 2, 3, 4, 5, 6 };
var combinations = new List<int[]>();
int[] current;
foreach (int i in nums)
{
combinations.Add(new[] { i });
foreach (int j in nums.Where(n => n != i))
{
current = new[] { i, j };
if (!combinations.Any(c => current.Length == c.Length && current.All(n => c.Contains(n))))
{
combinations.Add(current);
}
foreach (int k in nums.Where(n => n != i && n != j))
{
current = new[] { i, j, k };
if (!combinations.Any(c => current.Length == c.Length && current.All(n => c.Contains(n))))
{
combinations.Add(current);
}
foreach (int l in nums.Where(n => n != i && n != j && n != k))
{
current = new[] { i, j, k, l };
if (!combinations.Any(c => current.Length == c.Length && current.All(n => c.Contains(n))))
{
combinations.Add(current);
}
foreach (int m in nums.Where(n => n != i && n != j && n != k && n != l))
{
current = new[] { i, j, k, l, m };
if (!combinations.Any(c => current.Length == c.Length && current.All(n => c.Contains(n))))
{
combinations.Add(current);
}
}
}
}
}
}
foreach (var c in combinations)
{
foreach (var num in c)
{
Console.Write(num + " ");
}
Console.WriteLine();
}
Console.ReadKey();
}
I've imported a DataTable from a SQL Database using SqlDataAdapter and Fill-Method.
My datatable looks like this:
Timestamp(unix time) | Value
x | 10
x | 42
x | 643
y | 5
y | 9
y | 70
...and so on. The table contains a lot of values (1000+) but has always three rows with the same timestamp.
Now I want it to look like this:
Timestamp(unix time) | Value 1 | Value 2 | Value 3
x | 10 | 42 | 643
y | 5 | 9 | 70
How can I sort it this way?
(If there are more than three values, the programm should just insert the first three values it has found)
Thanks for any help!
Thanks for your approach! I solved it myself now.
This is how I've done it:
var grouped = from myRow in myDataTable.AsEnumerable()
group myRow by myRow.Field<int>("TIMESTAMP");
foreach (var timestamp in grouped)
{
string[] myRow = new string[5];
myRow[0] = timestamp.Key.ToString();
int i = 1;
foreach (var value in timestamp)
{
myRow[i] = value.Field<double>("VALUE").ToString();
i++;
if (i > 4)
break;
}
mySortedTable.Rows.Add(myRow);
}
I think this may also be solvable in SQL, but if you want to do it programmatically, I have tested the following in LinqPad:
void Main()
{
var list = new List<Tuple<string,int>> {
Tuple.Create("x", 10),
Tuple.Create("x", 42),
Tuple.Create("x", 643),
Tuple.Create("y", 5),
Tuple.Create("y", 9),
Tuple.Create("y", 70),
};
var result =
from grp in list.GroupBy(t => t.Item1)
let firstThree = grp.Select(t => t.Item2).Take(3).ToList()
select new {
Key = grp.Key,
Value1 = firstThree[0],
Value2 = firstThree[1],
Value3 = firstThree[2] };
foreach (var item in result)
Console.WriteLine(item);
}
It assumes that you have at least three elements, otherwise you'll get an out of range exception.
While the end result is an anonymous type, you could easily pipe the results of the operation into a DataRow instead.