Finding 10 longest words that are in both text files c# - c#

I have two different text files and I have to find 10 longest words that are in both of them. I have to print the list of those words out and write the frequency - how many times they are repeated in those separate files. The problem I have with my current code is that it finds the words, but when it comes to frequency - it combines the frequency count. How can I change the code to know the frequency count for separate files?
Here is my code for finding words that are in both text files:
public static Dictionary<string, int> PopularWords(string data1, string data2, char[] punctuation)
{
string[] book1 = data1.Split(punctuation, StringSplitOptions.RemoveEmptyEntries);
string[] book2 = data2.Split(punctuation, StringSplitOptions.RemoveEmptyEntries);
Dictionary<string, int> matches = new Dictionary<string, int>();
for (int i = 0; i < book1.Length; i++)
{
if (matches.ContainsKey(book1[i]))
{
matches[book1[i]]++;
continue;
}
for (int j = 0; j < book2.Length; j++)
{
if (book1[i] == book2[j])
{
if (matches.ContainsKey(book1[i]))
{
matches[book1[i]]++;
} else
{
matches.Add(book1[i], 2);
}
}
}
}
return matches;
And here is my code for reading and printing:
public static void ProcessPopular(string data, string data1, string results)
{
char[] punctuation = { ' ', '.', ',', '!', '?', ':', ';', '(', ')', '\n' };
string lines = File.ReadAllText(data, Encoding.UTF8);
string lines2 = File.ReadAllText(data1, Encoding.UTF8);
var popular = PopularWords(lines, lines2, punctuation);
KeyValuePair<string, int>[] popularWords = popular.ToArray();
Array.Sort(popularWords, (x, y) => y.Key.Length.CompareTo(x.Key.Length));
using (var writerF = File.CreateText(results))
{
int foundWords = 0;
writerF.WriteLine("{0, -25} | {1, -35} | {2, -35}", "Longest words", "Frequency in 1 .txt file", "Frequency in 2 .txt file");
writerF.WriteLine(new string('-', 101));
// not finished
}
}

Here's my take on this:
public static Dictionary<string, Dictionary<string, int>> PopularWords(string data1, string data2, char[] punctuation)
{
string[] book1 = data1.Split(punctuation, StringSplitOptions.RemoveEmptyEntries);
string[] book2 = data2.Split(punctuation, StringSplitOptions.RemoveEmptyEntries);
return
Enumerable
.Concat(
book1.Select(x => (word: x, book: "book1")),
book2.Select(x => (word: x, book: "book2")))
.ToLookup(x => x.word, x => x.book)
.OrderByDescending(x => x.Key.Length)
.Take(10)
.ToDictionary(x => x.Key, x => x.GroupBy(y => y).ToDictionary(y => y.Key, y => y.Count())); ;
}
If I start with this data:
char[] punctuation = new char[] { ' ', ',', '.', '?', '-', ':' };
string data1 = "I have two different text files and I have to find 10 longest words that are in both of them. I have to print the list of those words out and write the frequency - how many times they are repeated in those separate files. The problem I have with my current code is that it finds the words, but when it comes to frequency - it combines the frequency count. How can I change the code to know the frequency count for separate files?";
string data2 = "This solution is more general: it works whatever number of files you wish to process. This is an extremely raw query that could be separated in smaller queries, but it gives the logical basis. Other requirements, like only 10 words or minimum word length etc can be easily applied. Please do mind that this a bare-bone example, without any safety checks. It also omits reading data from files. The problem I have with my current code is that it finds the words, but when it comes to frequency - it combines the frequency count. How can I change the code to know the frequency count for separate files?";
I get this result:
"requirements": { "book2" = 1 }
"different": { "book1" = 1 }
"frequency": { "book1" = 4, "book2" = 3 }
"extremely": { "book2" = 1 }
"separated": { "book2" = 1 }
"repeated": { "book1" = 1 }
"separate": { "book1" = 2, "book2" = 1 }
"combines": { "book1" = 1, "book2" = 1 }
"solution": { "book2" = 1 }
"whatever": { "book2" = 1 }

To simplify, if performance is not the key here, I would go this way:
public static void Method()
{
var a = "A deep blue raffle, very deep and blue, raffle raffle. An old one was there";
var b = "deep blue raffle, very very very long and blue, raffle RAFFLE. A new one was there";
char[] punctuation = { '.', ',', '!', '?', ':', ';', '(', ')', '\n' };
var fileOne = new string(a.Where(c => punctuation.Contains(c) is false).ToArray()).Split(" ");
var fileTwo = new string(b.Where(c => punctuation.Contains(c) is false).ToArray()).Split(" ");
var duplicates = fileOne.Intersect(fileTwo, StringComparer.OrdinalIgnoreCase);
var result = new List<(int, int, string)>(duplicates.Count());
foreach(var duplicat in duplicates)
{
result.Add((fileOne.Count(x => x.Equals(duplicat, StringComparison.OrdinalIgnoreCase)), fileTwo.Count(x => x.Equals(duplicat, StringComparison.OrdinalIgnoreCase)), duplicat));
}
foreach (var val in result)
{
Output.WriteLine($"Word: {val.Item3} | In file one: {val.Item1} | In file two: {val.Item2}");
}
}
This will give you the result of
Word: A | In file one: 1 | In file two: 1
Word: deep | In file one: 2 | In file two: 1
Word: blue | In file one: 2 | In file two: 2
Word: raffle | In file one: 3 | In file two: 3
Word: very | In file one: 1 | In file two: 3
Word: and | In file one: 1 | In file two: 1
Word: one | In file one: 1 | In file two: 1
Word: was | In file one: 1 | In file two: 1
Word: there | In file one: 1 | In file two: 1
Other requirements, like only 10 words or minimum word length etc can be easily applied.
Please do mind that this a bare-bone example, without any safety checks. It also omits reading data from files.

EDIT I was not very pleased with my original solution, so I reworked it. I abandonned one thing I liked in my previous solution: the fact that it didn't depend on an external list of punctuation characters, but that this list was generated by the query itself. But it made the query more complicated and long.
In case you would be curious about a different coding style, here is a solution using Linq.
This solution is more general: it works whatever number of files you wish to process.
This is a Linqpad query that you can run directly via copy/paste, but you need to provide the text files of course:
// Choose here how many different words you want.
var resultCount = 10;
// Add as many files as needed.
var Files = new List<string>
{
#"C:\Temp\FileA.txt",
#"C:\Temp\FileB.txt",
#"C:\Temp\FileC.txt",
};
char[] punctuation = { '.', ',', '!', '?', ':', ';', '(', ')', '\n', '"', ' ' };
// Perform the calculation.
var LongestCommonWords = Files
.SelectMany(f => File.ReadAllText(f)
.Split(punctuation, StringSplitOptions.TrimEntries)
.ToLookup(w => ( word: w.ToLower(), fileName: f))
)
.ToLookup(e => e.Key.word)
.Where(g => g.Count() == Files.Count())
.OrderByDescending(g => g.Key.Length)
.Take(resultCount); // Take only the desired amount (10 for instance)
// Display the results.
foreach (var word in LongestCommonWords)
{
var occurences = string.Join(" / ", word.Select(g => $"{Path.GetFileName(g.Key.fileName)} - {g.Count()}"));
Console.WriteLine($"{word.Key} - {occurences}");
}
Here is an output obtained with the content of three Wikipedia pages:
contribution - FileA.txt - 9 / FileB.txt - 1 / FileC.txt - 5
subsequently - FileA.txt - 2 / FileB.txt - 1 / FileC.txt - 1
introduction - FileA.txt - 1 / FileB.txt - 4 / FileC.txt - 3
alternative - FileA.txt - 2 / FileB.txt - 1 / FileC.txt - 1
independent - FileA.txt - 5 / FileB.txt - 3 / FileC.txt - 3
significant - FileA.txt - 2 / FileB.txt - 1 / FileC.txt - 3
established - FileA.txt - 1 / FileB.txt - 1 / FileC.txt - 1
outstanding - FileA.txt - 1 / FileB.txt - 3 / FileC.txt - 3
programming - FileA.txt - 1 / FileB.txt - 2 / FileC.txt - 4
university - FileA.txt - 44 / FileB.txt - 17 / FileC.txt - 7

Related

Split Array into 2D based on 2 parameters C#

I have a text file which I have split up into a string array based on new line.
string[] arr = s.Split('\n');
Now, I need to further categorize this into a 2-dimensional array wherein each column is a new "transaction".
So the text file basically contains info about bank transactions, an example being given below:
21......
22....
23.....
31....
32.....
31.....
32.....
21....
21.....
22....
The beginning of the numbers signify a new tx record which begins at a new line. I want to make it into a 2D array wherein each column is grouped as one tx beginning from 21 until it comes across the next 21 (so the record before it).
for (int i = 0; i < arr.Length; i++)
{
if (arr[i].StartsWith("21"))
{
indices[i] = i;
}
}
I tried to write the code above to check for array element beginning with 21 and then storing the index but it ends up storing all the indices.
Any help will be appreciated!
What you'd need to do is
string[] arr = s.Split('\n');
List<List<string>> listOfLists = new List<List<string>>(); //dynamic multi-dimensional list
//list to hold the lines after the line with "21" and that line
List<string> newList = new List<string>();
listOfLists.Add(newList);
for(int i = 0; i < arr.Length; i++)
{
if(arr[i].StartsWith("21"))
{
if(newList.Count > 0)
{
newList = new List<string>(); //make a new list for a column
listOfLists.Add(newList); //add the list of lines (one column) to the main list
}
}
newList.Add(arr[i]); //add the line to a column
}
If I understand you right, you can try regular expressions (i.e. instead of splitting, extract transactions):
using System.Linq;
using System.Text.RegularExpressions;
...
string line = "21 A 22 B 23 C 31 D 32 E 31 F 32 G 21 H 21 I 22 J";
var result = Regex
.Matches(line, "21 .*?((?=21 )|$)")
.OfType<Match>()
.Select(match => match.Value)
.ToArray(); // <- let's materialize as na array
Console.Write(string.Join(Environment.NewLine, result));
Outcome:
21 A 22 B 23 C 31 D 32 E 31 F 32 G
21 H
21 I 22 J

Sort List<string > by Leading Numbers

I am having trouble properly sorting my list based on the leading number. When I sort, it starts with 1, then goes to 10, 11, etc.
I am trying to sort the following in order:
1 | Text One
10 | Text Two
11 | Text Three
The method I'm trying to sort is here:
finalnoteslist = finalnoteslist.OrderBy(num => num).ToList();
System.Text.StringBuilder clipData = new System.Text.StringBuilder();
foreach (object value in finalnoteslist)
{
clipData.AppendLine(value.ToString());
}
Clipboard.Clear();
Clipboard.SetText(clipData.ToString());
MessageBox.Show(clipData.ToString() + Environment.NewLine + "NOTES COPIED TO CLIPBOARD. CONTROL + V TO PASTE IN DRAWING");
}
int CompareStringBuilders(System.Text.StringBuilder a, System.Text.StringBuilder b)
{
for (int i = 0; i < a.Length && i < b.Length; i++)
{
var comparison = a[i].CompareTo(b[i]);
if (comparison != 0)
return comparison;
}
return a.Length.CompareTo(b.Length);
}
You split each item by its seperator | and parse the first part into a int value. Then you sort those.
List<string> finalnoteslist = new List<string>()
{ "1 | Text One",
"10 | Text Two",
"11 | Text Three"
};
finalnoteslist = finalnoteslist.OrderBy(x => int.Parse(x.Split('|').First())).ToList();
You could use string.Split to split and get the leading integer, which can be used to sort your list.
finalnoteslist = finalnoteslist.OrderBy(x=> int.Parse(x.Split('|')[0])).ToList();
Try this Demo
To Sort the List in-place:
List<string> strings = new List<string>()
{
"1 | Text One", "12 | Text Two", "100 | Text Three", "2 | Text Four"
};
Func<string, int> getNumber = (str) => Int32.Parse(str.Split('|').FirstOrDefault());
strings.Sort((y, x) => getNumber(y) - getNumber(x));
To Sort using Linq (creates a new List):
strings = strings.OrderBy(x => convertFunction(x)).ToList();

How to exchange string content in a text file in C#?

I have a text file as follows:
1 ... 3 4 2
2 ... 3 21 4
3 ... 6 4 21 15
4 ... 14 21 12
I want to edit these strings, so that numbers after dotted parts to be splitted corresponding to the first number of each string. For example,
1
2 1
3 1 2
4 1 2 3
...
21 3 4
How can I do this?
Note: I obtain the first number group from a text file and edit it string by string. After that, I have written edited strings to the text file. In light of this, sample part of my code to obtain the first number group is provided as below:
for (var i = 0; i < existingLines.Length; i++)
{
var split = existingLines[i].Split('\t');
var act = i - 1;
var sc1 = int.Parse(split[6]);
var sc2 = int.Parse(split[7]);
appendedLines.Add(string.Format("{0} {1} {2}", act, sc1, sc2));
}
This LINQ code should get you started
string path = "c:\\temp\\test.txt";
using (var sr = new StreamReader(path))
{
var lines = new List<IEnumerable<int>>();
while (!sr.EndOfStream)
{
lines.Add(sr.ReadLine().Split(new[] { '.', ' ' }, StringSplitOptions.RemoveEmptyEntries).Select(x => int.Parse(x)));
}
foreach (var node in lines.SelectMany(x => x).Distinct().OrderBy(x => x))
{
var predecessors = lines.Where(x => x.Skip(1).Contains(node))
.Select(x => x.First())
.OrderBy(x => x);
Console.WriteLine(node + " " + string.Join(" ", predecessors));
}
}
Output
2 1
3 1 2
4 1 2 3
6 3
12 4
14 4
15 3
21 2 3 4

How many elements ( values ) are in each line in a text file

What to use in order to get the number of elements in each line. The example of the text file is given below. All I want to do is to get the number of elements in each line. Like the first line would have 4 elements, the second one 3 and so on.
1 5 4 6
2 4 6
1 9 8 7 5 3
3 2 1 1
private static void Skaitymaz(Trikampis[] trikampiai)
{
string line = null;
using (StreamReader reader = new StreamReader(#"U2.txt"))
{
string eilute = null;
while (null != (eilute = reader.ReadLine()))
{
int[] values = eilute.Split(' ');
}
}
}
Try,
string line = null;
using (StreamReader reader = new StreamReader(#"U2.txt"))
{
string eilute = null;
while (null != (eilute = reader.ReadLine()))
{
string[] values = eilute.Split(' ');
int noOfElement = values.Length;
}
}
You need to get length of the array after split,
values.Length
Something like that (Linq): read each line, split it by space or, probably, tabulation and count the items:
var numbers = File
.ReadLines(#"C:\MyText.txt")
.Select(line => line.Split(new Char[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries).Length);
// Test: 4, 3, 6, 4
Console.Write(String.Join(", ", numbers));

Split string array into smaller arrays

I've looked around but can't find anything that has helped me. I have the following issue - I have a string array that contains:
[0] = "2.4 kWh # 105.00 c/kWh"
where [0] is the index of the array. I need to split it by a space, so that I can have several smaller arrays. So it should look like:
[0] will contain 2.4
[1] will contain kWh
[2] will contain #
[3] will contain 105.00
[4] will contain c/mWh
I've tried several solutions but none works. Any assistance would be highly appreciated.
Reference
string s = "2.4 kWh # 105.00 c/kWh";
string[] words = s.Split(new char [] {' '}); // Split string on spaces.
foreach (string word in words)
{
Console.WriteLine(word);
}
Then you can get the console output as
2.4
kWh
#
105.00
c/mWh
We'll use string[] strings = new[] { "2.4 kWh # 105.00 c/kWh", "this is a test" }; as an example of your array.
This is how you can put it all into one array. I've kept it as an IEnumerable<T> to keep that benefit, but feel free to append .ToArray().
public IEnumerable<string> SplitAll(IEnumerable<string> collection)
{
return collection.SelectMany(c => c.Split(' '));
}
Here, this would evaluate to { "2.4", "kWh", "#", "105.00", "c/kWh", "this", "is", "a", "test" }.
Or if I'm misunderstanding you and you actually do want an array of arrays,
public IEnumerable<string[]> SplitAll(IEnumerable<string> collection)
{
return collection.Select(c => c.Split(' '));
}
Here, { { "2.4", "kWh", "#", "105.00", "c/kWh" }, { "this", "is", "a", "test" } }.
Or if I'm totally misunderstanding you and you just want to split the one string, that's even easier, and I've already shown it, but you can use string.Split.
This will give you a two dimensional array (array of string arrays):
var newArr = strArr.Select(s => s.Split(' ').ToArray()).ToArray();
for example:
string[] strArr = new string[] { "2.4 kWh # 105.00 c/kWh", "Hello, world" };
var newArr = strArr.Select(s => s.Split(' ').ToArray()).ToArray();
for (int i = 0; i < newArr.Length; i++)
{
for(int j = 0; j < newArr[i].Length; j++)
Console.WriteLine(newArr[i][j]);
Console.WriteLine();
}
// 2.4
// c/kWh
// #
// 105.00
// kWh
//
// Hello,
// world

Categories

Resources