How to insert a text file into a Binary Search Tree? - c#

I know that a loop is involved in order to insert each word to the BST, but I'm not sure how to implement it.

You created an insert function in you binary search tree ... use it.
class Program
{
static BSTree<string> myTree = new BSTree<string>();
static void Main(string[] args)
{
readFile("textfile.txt");
string buffer = "";
myTree.InOrder(ref buffer);
Console.WriteLine(buffer);
}
static void readFile(string filename)
{
const int MAX_FILE_LINES = 50000;
string[] AllLines = new string[MAX_FILE_LINES];
//reads from bin/DEBUG subdirectory of project directory
AllLines = File.ReadAllLines(filename);
foreach (string line in AllLines)
{
//split words using space , . ?
string[] words = line.Split(' ', ',', '.', '?', ';', ':', '!');
foreach (string word in words)
{
if (word != "")
myTree.InsertItem(word.ToLower());
}
}
}
}
On another note, I will mention your InOrder function will return a string that starts with ',' character. Not sure if this was intended. Also, for various reasons, you may want to use a StringBuilder instead of manupulating a string.

Related

Removing words from text with separators in front(using Regex or List)

I need to remove words from the text with separators next to them. The problem is that the program only removes 1 separator after the word but there are many of them. Any suggestions how to remove other separators?
Also, I need to make sure that the word is not connected with other letters. For example (If the word is fHouse or Housef it should not be removed)
At the moment I have:
public static void Process(string fin, string fout)
{
using (var foutv = File.CreateText(fout)) //fout - OutPut.txt
{
using (StreamReader reader = new StreamReader(fin)) // fin - InPut.txt
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] WordsToRemove = { "Home", "House", "Room" };
char[] seperators = {';', ' ', '.', ',', '!', '?', ':'};
foreach(string word in WordsToRemove)
{
foreach (char seperator in seperators)
{
line = line.Replace(word + seperator, string.Empty);
}
}
foutv.WriteLine(line);
}
}
}
}
I have :
fhgkHouse!House!Dog;;;!!Inside!C!Room!Home!House!Room;;;;;;;;;;!Table!London!Computer!Room;..;
Results I get:
fhgkDog;;;!!Inside!C!;;;;;;;;;!Table!London!Computer!..;
The results should be:
fhgkHouse!Dog;;;!!Inside!C!Table!London!Computer!
Try this regex : \b(Home|House|Room)(!|;)*\b|;+\.\.;+
See at: https://regex101.com/r/LUsyM8/1
In there, I substitute words and special characters with blank or empty string.
It produces the same expected result I guess.

How can I get text file path from console user input and and use it in an array?

My code below gets a text file from a path stored in the code and then saves it to an array. I then proceed to separate the words from the previous array by using delimiters with the Split() method to divide any words that contain such delimiters (' ', ',', '.', ':', ';', '-') into two words.
It then save each separated word to a new array list which I save into a text file to a path also stored in code.
My code does what I want. How can I make it so that the file path is entered by the user through the console so that it can be passed into the array?
Also, how can I let the user enter a file path from the console for the new file to be in?
namespace TextParser
{
class MainClass
{
public static void Main (string[] args)
{
System.Console.WriteLine ("Please enter location of the text file you would like to sort: ");
string[] lines = System.IO.File.ReadAllLines (#"C:\Users\Desktop\dictionary.txt");
System.Console.ReadKey ();
char[] delimiterChars = { ' ', ',', '.', ':', ';', '-' };
int wordCount = 0;
ArrayList noDuplicates = new ArrayList();
TextWriter writeToNewFile = new StreamWriter(#"C:\Users\Desktop\dictionaryFIXED.txt");
foreach (string line in lines)
{
string wordsToSeparate = line;
string[] newListWithDuplicates = wordsToSeparate.Split (delimiterChars);
foreach (string word in newListWithDuplicates)
{
if(!noDuplicates.Contains(word))
{
noDuplicates.Add (word);
Console.WriteLine (word);
wordCount++;
}
}
}
foreach(string s in noDuplicates)
{
writeToNewFile.WriteLine (s);
}
writeToNewFile.Close ();
Console.WriteLine (wordCount);
Console.WriteLine ("Press any key to exit.");
System.Console.ReadKey ();
}
}
}
You need to get user input, then attempt to read it with a try catch block
Console.Writeline("Input file path");
string input = Console.ReadLine();
try{
string[] lines = System.IO.File.ReadAllLines (input);
//The rest of your code here
}catch{
Console.Writeline("You did not input a valid path");
}

Replace List<string> with words from a file, keep the order

public static List<string> items = new List<string>() { "a","b","c","d","e" };
I am trying to change each of those from loading a file and replacing with their current inventory.
if (File.Exists("darColItems.txt") == true)
{
char c = ',';
StreamReader st = new StreamReader("darColItems.txt");
temp = st.ReadToEnd().ToCharArray();
foreach (c in temp)
{
}
st.Close();
}
Edit: Taking a file such as: iron,bronze,gold,diamond,iron and taking each name and place it into the list for each spot.
File.txt: "string1","string2","string3","string4","string5"
Startup of program:
List inventory (current):
"a","b","c","d","e"
Load inventory....
List inventory (final):
"string1","string2","string3","string4","string5"
Assuming that you actually want to replace all items in the list with all items in the file in the order of occurence and the delimiter is comma. You could use String.Split:
items = File.ReadAllText("path").Split(new [] { ',' }, StringSplitOptions.None).ToList();
If you have quotes around the words in the file which you want to remove, you can use String.Trim:
items = File.ReadAllText("path")
.Split(new char[] { ',' }, StringSplitOptions.None)
.Select(s => s.Trim('"', ' ')) // remove quotes + spaces at the beginning and end
.ToList();
//keep filename in a constant/variable for easy reuse (better, put it in a config file)
const string SourceFile = "darColItems.txt";
//what character separates data elements (if the elements may contain this character you may instead want to look into a regex; for now we'll keep it simple though, & assume that's not the case
const char delimeter = ',';
//here's where we'll store our values
var values = new List<string>();
//check that our input file exists
if (File.Exists(SourceFile))
{
//using statement ensures file is closed & disposed, even if there's an error mid-process
using (var reader = File.OpenText(SourceFile))
{
string line;
//read each line until the end of file (at which point the line read will be null)
while ((line = reader.ReadLine()) != null)
{
//split the string by the delimiter (',') and feed those values into our list
foreach (string value in line.Split(delimiter)
{
values.Add(value);
}
}
}
}

Punctuation Problems

This is a program that reads in a CSV file, adds the values to a dictionary class and then analyses a string in a textbox to see if any of the words match the dictionary entry. It will replace abbreviations (LOL, ROFL etc) into their real words. It matches strings by splitting the inputted text into individual words.
public void btnanalyze_Click(object sender, EventArgs e)
{
var abbrev = new Dictionary<string, string>();
using (StreamReader reader = new StreamReader("C:/Users/Jordan Moffat/Desktop/coursework/textwords0.csv"))
{
string line;
string[] row;
while ((line = reader.ReadLine()) != null)
{
row = line.Split(',');
abbrev.Add(row[0], row[1]);
Console.WriteLine(abbrev);
}
}
string twitterinput;
twitterinput = "";
// string output;
twitterinput = txtInput.Text;
char[] delimiterChars = { ' ', ',', '.', ':', '\t' };
string text = twitterinput;
string[] words = twitterinput.Split(delimiterChars);
string merge;
foreach (string s in words)
{
if (abbrev.ContainsKey(s))
{
string value = abbrev[s];
merge = string.Join(" ", value);
}
if (!abbrev.ContainsKey(s))
{
string not = s;
merge = string.Join(" ", not);
}
MessageBox.Show(merge);
}
}
The problem is that the program won't translate the word if there's punctuation. I realised the character set I was using meant that punctuation wasn't a problem, but also didn't allow me to retain it when printing out. Is there a way that I can ignore the last character, as opposed to removing it, and still retain it for the output? I was trying to write it into a new variable, but I can't find a way to do that either...
That seems overly complicated. You can do the same thing with regular expressions and backreferences.
foreach(var line in yourReader)
{
var dict = new Dictionary<string,string>(); // your replacement dictionaries
foreach(var kvp in dict)
{
System.Text.RegularExpressions.Regex.Replace(line,"(\s|,|\.|:|\\t)" + kvp.Key + "(\s|,|\.|:|\\t)","\0" + kvp.Value + "\1");
}
}
I hacked this regex together so it may not be right, but it's the basic idea.

how to perform tokenization and stopword removal in C#?

Basically i want to tokenise each word of the paragraph and then perform stopword removal. Which will be preprocessed data for my algorithm.
You can remove all punctuation and split the string for whitespace.
string s = "This is, a sentence.";
s = s.Replace(",","").Replace(".");
string words[] = s.split(" ");
if read from text file or any text you can:
char[] dele = { ' ', ',', '.', '\t', ';', '#', '!' };
List<string> allLinesText = File.ReadAllText(text file).Split(dele).ToList();
then you can convert stop-words to dictionary and save your document to list then
foreach (KeyValuePair<string, string> word in StopWords)
{
if (list.contain(word.key))
list.RemovAll(s=>s==word.key);
}
You can store all separation symbols and stopwords in constants or db:
public static readonly char[] WordsSeparators = {
' ', '\t', '\n', '\n', '\r', '\u0085'
};
public static readonly string[] StopWords = {
"stop", "word", "is", "here"
};
Remove all puctuations. Split text and filter:
var words = new List<string>();
var stopWords = new HashSet<string>(TextOperationConstants.StopWords);
foreach (var term in text.Split(TextOperationConstants.WordsSeparators))
{
if (String.IsNullOrWhiteSpace(term)) continue;
if (stopWords.Contains(term)) continue;
words .Add(term);
}

Categories

Resources