Punctuation Problems - c#

This is a program that reads in a CSV file, adds the values to a dictionary class and then analyses a string in a textbox to see if any of the words match the dictionary entry. It will replace abbreviations (LOL, ROFL etc) into their real words. It matches strings by splitting the inputted text into individual words.
public void btnanalyze_Click(object sender, EventArgs e)
{
var abbrev = new Dictionary<string, string>();
using (StreamReader reader = new StreamReader("C:/Users/Jordan Moffat/Desktop/coursework/textwords0.csv"))
{
string line;
string[] row;
while ((line = reader.ReadLine()) != null)
{
row = line.Split(',');
abbrev.Add(row[0], row[1]);
Console.WriteLine(abbrev);
}
}
string twitterinput;
twitterinput = "";
// string output;
twitterinput = txtInput.Text;
char[] delimiterChars = { ' ', ',', '.', ':', '\t' };
string text = twitterinput;
string[] words = twitterinput.Split(delimiterChars);
string merge;
foreach (string s in words)
{
if (abbrev.ContainsKey(s))
{
string value = abbrev[s];
merge = string.Join(" ", value);
}
if (!abbrev.ContainsKey(s))
{
string not = s;
merge = string.Join(" ", not);
}
MessageBox.Show(merge);
}
}
The problem is that the program won't translate the word if there's punctuation. I realised the character set I was using meant that punctuation wasn't a problem, but also didn't allow me to retain it when printing out. Is there a way that I can ignore the last character, as opposed to removing it, and still retain it for the output? I was trying to write it into a new variable, but I can't find a way to do that either...

That seems overly complicated. You can do the same thing with regular expressions and backreferences.
foreach(var line in yourReader)
{
var dict = new Dictionary<string,string>(); // your replacement dictionaries
foreach(var kvp in dict)
{
System.Text.RegularExpressions.Regex.Replace(line,"(\s|,|\.|:|\\t)" + kvp.Key + "(\s|,|\.|:|\\t)","\0" + kvp.Value + "\1");
}
}
I hacked this regex together so it may not be right, but it's the basic idea.

Related

Removing words from text with separators in front(using Regex or List)

I need to remove words from the text with separators next to them. The problem is that the program only removes 1 separator after the word but there are many of them. Any suggestions how to remove other separators?
Also, I need to make sure that the word is not connected with other letters. For example (If the word is fHouse or Housef it should not be removed)
At the moment I have:
public static void Process(string fin, string fout)
{
using (var foutv = File.CreateText(fout)) //fout - OutPut.txt
{
using (StreamReader reader = new StreamReader(fin)) // fin - InPut.txt
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] WordsToRemove = { "Home", "House", "Room" };
char[] seperators = {';', ' ', '.', ',', '!', '?', ':'};
foreach(string word in WordsToRemove)
{
foreach (char seperator in seperators)
{
line = line.Replace(word + seperator, string.Empty);
}
}
foutv.WriteLine(line);
}
}
}
}
I have :
fhgkHouse!House!Dog;;;!!Inside!C!Room!Home!House!Room;;;;;;;;;;!Table!London!Computer!Room;..;
Results I get:
fhgkDog;;;!!Inside!C!;;;;;;;;;!Table!London!Computer!..;
The results should be:
fhgkHouse!Dog;;;!!Inside!C!Table!London!Computer!
Try this regex : \b(Home|House|Room)(!|;)*\b|;+\.\.;+
See at: https://regex101.com/r/LUsyM8/1
In there, I substitute words and special characters with blank or empty string.
It produces the same expected result I guess.

Reading from txt file to array/list<>

I need to read all of .txt file and save data to array/list. File looks like this:
row11 row12 row13
row21 row22 row23
row31 row32 row33
between strings are only spaces.
Next I will insert data from array/list<> to mysql, but it is not problem.
Thanks.
EDIT: I need insert 3 columns to mysql like .txt file.
Use String.Split(Char[], StringSplitOptions) where the first parameter specifies that you want to split your string using spaces and tabs, and the second parameter specifies that you ignore empty entries (for cases where there are multiple spaces between entries)
Use this code:
var lines = System.IO.File.ReadAllLines(#"D:\test.txt");
var data = new List<List<string>>();
foreach (var line in lines)
{
var split = line.Split(new[]{' ', '\t'}, StringSplitOptions.RemoveEmptyEntries);
data.Add(split.ToList());
}
You can use File.ReadLines() to read the lines from the file, and then Regex.Split() to split each line into multiple strings:
static IEnumerable<String> SplitLines(string path, string splitPattern)
{
foreach (string line in File.ReadAllLines(path))
foreach (string part in Regex.Split(line, splitPattern))
yield return part;
}
To split by white space, you can use the regex pattern \s+:
var individualStrings = SplitLines(#"C:\path\to\file.txt", #"\s+");
You can use the ToList() extension method to convert it to a list:
List<string> individualStrings = SplitLines(#"D:\test\rows.txt", #"\s+").ToList();
As long as there are never spaces in the "values", then a simple line-by line parser will work.
A simple example
var reader = new StreamReader(filePath);
var resultList = new List<List<string>>();
string line;
while ((line = reader.ReadLine()) != null)
{
var currentValues = new List<string>();
// You can also use a StringBuilder
string currentValue = String.Empty;
foreach (char c in line)
{
if (Char.IsWhiteSpace(c))
{
if (currentValue.Length > 0)
{
currentValues.Add(currentValue);
currentValue = String.Empty;
}
continue;
}
currentValue += c;
}
resultList.Add(currentValues);
}
Here's a nifty one-liner based off Amadeusz's answer:
var lines = File.ReadAllLines(fileName).Select(l => l.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries)).SelectMany(words => words);

Replace List<string> with words from a file, keep the order

public static List<string> items = new List<string>() { "a","b","c","d","e" };
I am trying to change each of those from loading a file and replacing with their current inventory.
if (File.Exists("darColItems.txt") == true)
{
char c = ',';
StreamReader st = new StreamReader("darColItems.txt");
temp = st.ReadToEnd().ToCharArray();
foreach (c in temp)
{
}
st.Close();
}
Edit: Taking a file such as: iron,bronze,gold,diamond,iron and taking each name and place it into the list for each spot.
File.txt: "string1","string2","string3","string4","string5"
Startup of program:
List inventory (current):
"a","b","c","d","e"
Load inventory....
List inventory (final):
"string1","string2","string3","string4","string5"
Assuming that you actually want to replace all items in the list with all items in the file in the order of occurence and the delimiter is comma. You could use String.Split:
items = File.ReadAllText("path").Split(new [] { ',' }, StringSplitOptions.None).ToList();
If you have quotes around the words in the file which you want to remove, you can use String.Trim:
items = File.ReadAllText("path")
.Split(new char[] { ',' }, StringSplitOptions.None)
.Select(s => s.Trim('"', ' ')) // remove quotes + spaces at the beginning and end
.ToList();
//keep filename in a constant/variable for easy reuse (better, put it in a config file)
const string SourceFile = "darColItems.txt";
//what character separates data elements (if the elements may contain this character you may instead want to look into a regex; for now we'll keep it simple though, & assume that's not the case
const char delimeter = ',';
//here's where we'll store our values
var values = new List<string>();
//check that our input file exists
if (File.Exists(SourceFile))
{
//using statement ensures file is closed & disposed, even if there's an error mid-process
using (var reader = File.OpenText(SourceFile))
{
string line;
//read each line until the end of file (at which point the line read will be null)
while ((line = reader.ReadLine()) != null)
{
//split the string by the delimiter (',') and feed those values into our list
foreach (string value in line.Split(delimiter)
{
values.Add(value);
}
}
}
}

Replace word with match in CSV

public void replaceText(string messageText)
{
int counter = 1;
string csvFile = "textwords.csv";
string[] words = messageText.Split(' ');
char csvSeparator = ',';
foreach (string word in words)
{
foreach (string line in File.ReadLines(csvFile))
{
foreach (string value in line.Replace("\"", "").Split('\r', '\n', csvSeparator))
if (value.Trim() == word.Trim()) // case sensitive
{
messageText = Regex.Replace(messageText, value, string.Empty);
messageText = messageText.Insert(counter, " " + line);
}
}
counter++;
}
MessageBox.Show(messageText);
}
So I have the above code, it searches my CSV file for a match to every word in the messageText. The CSV file contains textspeak abbreviations and every time it finds a match it is to replace the word in messageText with the word it found. For example "hi LOL" would find "LOL, Laugh out loud" in the CSV and replace it
However this only works for one replacement. If I put in "Hi LOL" it would output "Hi LOL, Laugh out Loud"
But If I put "Hi LOL, how are you? LMAO" it outputs "Hi LOL LMFAO, Laughing my A** off, how are you?"
Can anyone tell me where I'm going wrong, I can't figure out why it is doing this
there are some issues with this method:
1 it takes 2 responsibilities (load key/value pair from csv file and replace text). everytime it's called, the csv file will be loaded.
2 the variable 'counter' looks weird for the purpose of the method.
here is the rewritten code:
static void Main(string[] args) {
var dictionary = LoadFromFile("c:\textWords.csv");
var message = "Hi LOL, LMAO";
message = ReplaceMessage(message, dictionary);
//
}
static Dictionary<String, String> LoadFromFile(String csvFile) {
var dictionary = new Dictionary<String, String>();
var lines = File.ReadAllLines(csvFile);
foreach (var line in lines) {
var fields = line.Split(',', '\r', '\n');
dictionary[fields[0].Trim()] = fields[1].Trim();
}
return dictionary;
}
static String ReplaceMessage(String message, Dictionary<String, String> dictionary) {
var words = message.Split(' ', ',');
var s = new StringBuilder();
foreach (var word in words) {
if (dictionary[word] != null) {
s.Append(String.Format("{0}, {1} ", word, dictionary[word]));
} else {
s.Append(word + " ");
}
}
return s.ToString().TrimEnd(' ');
}

Exception "String cannot be of Zero length"

We are trying to read each word from a text file and replace it with another word.
For smaller text files, it works well. But for larger text files we keep getting the exception: "String cannot be of zero length.
Parameter name: oldValue "
void replace()
{
string s1 = " ", s2 = " ";
StreamReader streamReader;
streamReader = File.OpenText("C:\\sample.txt");
StreamWriter streamWriter = File.CreateText("C:\\sample1.txt");
//int x = st.Rows.Count;
while ((line = streamReader.ReadLine()) != null)
{
char[] delimiterChars = { ' ', '\t' };
String[] words = line.Split(delimiterChars);
foreach (string str in words)
{
s1 = str;
DataRow drow = st.Rows.Find(str);
if (drow != null)
{
index = st.Rows.IndexOf(drow);
s2 = Convert.ToString(st.Rows[index]["Binary"]);
s2 += "000";
// Console.WriteLine(s1);
// Console.WriteLine(s2);
streamWriter.Write(s1.Replace(s1,s2)); // Exception occurs here
}
else
break;
}
}
streamReader.Close();
streamWriter.Close();
}
we're unable to find the reason.
Thanks in advance.
When you do your string.Split you may get empty entries if there are multiple spaces or tabs in sequence. These can't be replaced as the strings are 0 length.
Use the overload that strips empty results using the StringSplitOptions argument:
var words = line.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
The exception occurs because s1 is an empty string at some point. You can avoid this by replacing the line
String[] words = line.Split(delimiterChars);
with this:
String[] words = line.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
You want to change your Split method call like this:
String[] words = line.Split(delimiterChars,StringSplitOptions.RemoveEmptyEntries);
It means that s1 contains an empty string ("") which can happen if you have two consecutive white spaces or tabs in your file.

Categories

Resources