Read a file and replace test after a certain word - c#

I have a few files, for example:
FileBegin Finance Open 87547.25 Close 548484.54 EndDay 4 End
Another file example:
FileBegin Finance Open 344.34 Close -3434.34 EndDay 5 End
I need to read the text in the file and replace only the numeric value after the word Open leaving the rest of the text before and after the word Open intact. I have been using this code:
string fileToRead = "c:\\file.txt";
public void EditValue(string oldValue, string newValue, Control Item)
{
if (Item is TextBox)
{
string text = File.ReadAllText(fileToRead);
text = text.Replace(oldValue, newValue);
File.WriteAllText(activeSaveFile, text);
}
}
What would be the best way of going about replacing just the numeric value after the word open?

Using Regular Expressions:
Regex rgx = new Regex(#"Open [^\s]+");
string result = rgx.Replace(text, newValue);
File.WriteAllText(activeSaveFile, result );
Using this approach, you can store the regex object outside the method so you avoid recompiling it each time. I'm guessing it won't have a significant performance impact compared to the file I/O in your case, but it is a good practice in other situations.

Split the row by the empty spaces like string.split(new char[] { ' ' }, StringSplitOptions.Empty) and then get the _splittedRow[3] and replace and merge the new row together.

If I understand you, the line:
FileBegin Finance Open 344.34 Close -3434.34 EndDay 5 End
is the entire file? And you have been typing in "344.34" for the old value and "something" for the new value? And you'd like to just type the new value only?
You could say:
string fileToRead = "c:\\file.txt";
public void EditValue(string oldValue, string newValue, Control Item)
{
if (Item is TextBox)
{
string text = File.ReadAllText(fileToRead);
string[] words = text.Split(new char[] {' '}); // assuming space-delimited
words[3] = "new value"; // replace the target value
text = "";
foreach (string w in words)
{
text += w + " "; // build our new string
}
File.WriteAllText(activeSaveFile, text.Trim()); // and write it back out
}
}
That's a lot of ifs, but I think this is what you mean. Also there are a lot of different ways to replace that one part of the string, I just thought this would give you the flexibility to do other things with a convenient array of words.

Related

Trimming degree symbol on C#

Can anyone tell me why this is not working:
string txt = "+0°1,0'";
string degree = txt.TrimEnd('°');
I am trying to separate the degrees on this string, but after this, what remains on degree is the same content of txt.
I am using C# in Visual Studio.
string.TrimEnd remove char at the end. In your example, '°' isn't at the end.
For example :
string txt = "+0°°°°";
string degree = txt.TrimEnd('°');
// degree => "+0"
If you want remove '°' and all next characters, you can :
string txt = "+0°1,0'";
string degree = txt.Remove(txt.IndexOf('°'));
// degree => "+0"
string txt = "+0°1,0'";
if(txt.IndexOf('°') > 0) // Checking if character '°' exist in the string
{
string withoutdegree = txt.Remove(txt.IndexOf('°'),1);
}
Another safe way of handling the same is using the String.Split method. You will not have to bother to verify the presence of the character in this case.
string txt = "+0°1,0'";
var str = txt.Split('°')[0]; // "+0"
string txt = "+01,0'";
var str = txt.Split('°')[0]; // "+01,0'"
You can use this to remove all the '°' symbols present in your string using String.Replace
string txt = "+0°1,0'°°";
var text = txt.Replace(#"°", ""); // +01,0'
Edit: Added a safe way to handle the OP's exact query.

Find a delimiter of csv or text files in c#

I want to find a delimiter being used to separate the columns in csv or text files.
I am using TextFieldParser class to read those files.
Below is my code,
String path = #"c:\abc.csv";
DataTable dt = new DataTable();
if (File.Exists(path))
{
using (Microsoft.VisualBasic.FileIO.TextFieldParser parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(path))
{
parser.TextFieldType = FieldType.Delimited;
if (path.Contains(".txt"))
{
parser.SetDelimiters("|");
}
else
{
parser.SetDelimiters(",");
}
parser.HasFieldsEnclosedInQuotes = true;
bool firstLine = true;
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
if (firstLine)
{
foreach (var val in fields)
{
dt.Columns.Add(val);
}
firstLine = false;
continue;
}
dt.Rows.Add(fields);
}
}
lblCount.Text = "Count of total rows in the file: " + dt.Rows.Count.ToString();
dgvTextFieldParser1.DataSource = dt;
Instead of passing the delimiters manually based on the file type, I want to read the delimiter from the file and then pass it.
How can I do that?
Mathematically correct but totally useless answer: It's not possible.
Pragmatical answer: It's possible but it depends on how much you know about the file's structure. It boils down to a bunch of assumptions and depending on which we'll make, the answer will vary. And if you can't make any assumptions, well... see the mathematically correct answer.
For instance, can we assume that the delimiter is one or any of the elements in the set below?
List<char> delimiters = new List<char>{' ', ';', '|'};
Or can we assume that the delimiter is such that it produces elements of equal length?
Should we try to find a delimiter that's a single character or can a word be one?
Etc.
Based on the question, I'll assume that it's the first option and that we have a limited set of possible characters, precisely one of which is be a delimiter for a given file.
How about you count the number of occurrences of each such character and assume that the one that's occurring most frequently is the one? Is that sufficiently rigid or do you need to be more sure than that?
List<char> delimiters = new List<char>{' ', ';', '-'};
Dictionary<char, int> counts = delimiters.ToDictionary(key => key, value => 0);
foreach(char c in delimiters)
counts[c] = textArray.Count(t => t == c);
I'm not in front of a computer so I can't verify but the last step would be returning the key from the dictionary the value of which is the maximal.
You'll need to take into consideration a special case such that there's no delimiters detected, there are equally many delimiters of two types etc.
Very simple guessing approach using LINQ:
static class CsvSeperatorDetector
{
private static readonly char[] SeparatorChars = {';', '|', '\t', ','};
public static char DetectSeparator(string csvFilePath)
{
string[] lines = File.ReadAllLines(csvFilePath);
return DetectSeparator(lines);
}
public static char DetectSeparator(string[] lines)
{
var q = SeparatorChars.Select(sep => new
{Separator = sep, Found = lines.GroupBy(line => line.Count(ch => ch == sep))})
.OrderByDescending(res => res.Found.Count(grp => grp.Key > 0))
.ThenBy(res => res.Found.Count())
.First();
return q.Separator;
}
}
What this does is it reads the file line by line (note that CSV files may include line breaks), then checks for each potential separator how often it occurs in each line.
Then we check which separator occurs on the most lines, and of those which occur on the same number of lines, we take the one with the most even distribution (e.g. 5 occurences on every line are ranked higher than one that occurs once in one line and 10 times in another line).
Of course you might have to tweak this for your own purposes, add error handling, fallback logic and so forth. I'm sure it's not perfect, but it's good enough for me.
You could probably take n bytes from the file, count possible delimiter characters(or all characters found) using a hash map/dictionary, and then the character repeated most is probably the delimiter you're looking for. It would make sense to me that the characters used as delimiters would be the ones used the most. When done you reset the stream, but since you're using a text reader you would have to probably initialize another text reader or something. This would get slightly more hairy if the CSV used more than one delimiter. You would probably have to ignore some characters like alpha and numeric.
In python we can do this easily by using csv sniffer. It will cater for text files and also if you just need to read some bytes from the file.

Dictionary re-adding words as new keys instead of increasing value

I am writing a program that finds every unique word in a text and prints it in a text box. I do this by printing each key in a dictionary however my dictionary is adding each word as a separate key instead of ignoring words that are already there.
The function is being called correctly and it does work it simpy prints the entire text I hand it however.
EDIT: I am reading the string from a text file then sending it to the function.
This is the input string and the output:
Output:
To be or not to that is the question Whether tis nobler in mind suffer
The slings and arrows of outrageous fortune Or take arms against a sea
troubles And by opposing end them die sleep No more sleep say we end
The heartache thousand natural shocks That flesh heir Tis consummation
public string FindUniqueWords(string text)
{
Dictionary<string, int> dictionary = new Dictionary<string, int>();
string uniqueWord = "";
text = text.Replace(",", ""); //Just cleaning up a bit
text = text.Replace(".", ""); //Just cleaning up a bit
string[] arr = text.Split(' '); //Create an array of words
foreach (string word in arr) //let's loop over the words
{
if (dictionary.ContainsKey(word)) //if it's in the dictionary
dictionary[word] = dictionary[word] + 1; //Increment the count
else
dictionary[word] = 1; //put it in the dictionary with a count 1
}
foreach (KeyValuePair<string, int> pair in dictionary) //loop through the dictionary
{
uniqueWord += (pair.Key + " ");
}
uniqueWords.Text = uniqueWord;
return ("");
}
You're reading the text with System.IO.File.ReadAllText, so text may also contain newline characters.
Replace arr = text.Split(' ') by arr = text.Split(' ', '\r', '\n') or add another replace: text = text.Replace(Environment.NewLine, " ");
Of course, by looking at arr in the debugger, you could have found out by yourself.
A shorter way: (Dont forget to use Using System.Linq)
string strInput = "TEST TEST Text 123";
var words = strInput.Split().Distinct();
foreach (var word in words )
{
Console.WriteLine(word);
}
Your code works as it's supposed to (ignoring case though). The problem almost certainly lies with showing the results in your application, or with how you are calling the FindUniqueWords method (not the complete text at once).
Also, pretty important to note here: a Dictionary<TKey, TValue> by default simply cannot contain a single key multiple times. It would defeat the whole purpose of the dictionary in the first place. It's only possible if you override the Equality comparison somewhere, which you aren't doing.
If I try your code, with the following input:
To be or not to that is is is is is is is the question
The output becomes :
To be or not to that is the question
It works like it's supposed to.

String Concatenation / Overwriting?

This is a program that reads in a CSV file, adds the values to a dictionary class and then analyses a string in a textbox to see if any of the words match the dictionary entry. It will replace abbreviations (LOL, ROFL etc) into their real words. It matches strings by splitting the inputted text into individual words.
public void btnanalyze_Click(object sender, EventArgs e)
{
var abbrev = new Dictionary<string, string>();
using (StreamReader reader = new StreamReader("C:/Users/Jordan Moffat/Desktop/coursework/textwords0.csv"))
{
string line;
string[] row;
while ((line = reader.ReadLine()) != null)
{
row = line.Split(',');
abbrev.Add(row[0], row[1]);
Console.WriteLine(abbrev);
}
}
string twitterinput;
twitterinput = "";
// string output;
twitterinput = txtInput.Text;
{
char[] delimiterChars = { ' ', ',', '.', ':', '\t' };
string text = twitterinput;
string[] words = twitterinput.Split(delimiterChars);
string merge;
foreach (string s in words)
{
if (abbrev.ContainsKey(s))
{
string value = abbrev[s];
merge = string.Join(" ", value);
}
if (!abbrev.ContainsKey(s))
{
string not = s;
merge = string.Join(" ", not);
}
;
MessageBox.Show(merge);
}
The problem so far is that the final string is outputted into a text box, but only prints the last word as it overwrites. This is a University assignment, so I'm looking for a push in the correct direction as opposed to an actual answer. Many thanks!
string.Join() takes a collection of strings, concatenates them together and returns the result. But in your case, the collection contains only one item: value, or not.
To make your code work, you could use something like:
merge = string.Join(" ", merge, value);
But because of the way strings work, this will be quite slow, so you should use StringBuilder instead.
This is the problem:
string not = s;
merge = string.Join(" ", not);
You are just joining a single element (the latest) with a space delimiter, thus overwriting what you previously put into merge.
If you want to stick with string you need to use Concat to append the new word onto the output, though this will be slow as you are recreating the string each time. It will be more efficient to use StringBuilder to create the output.
If your assignment requires that you use Join to build up the output, then you'll need to replace the target words in the words array as you loop over them. However, for that you'll need to use some other looping mechanism than foreach as that doesn't let you modify the array you're looping over.
Better to User StringBuilder Class for such purpose
http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx

How can I efficiently process a delimited text file?

I'm simply trying to execute File.ReadAllLines against a specific file and, for every line, split on |. I have to use regex on this one.
This code below doesnt work, but you'll see what i'm trying to do:
string[] contents = File.ReadAllLines(filename);
string[] splitlines = Regex.Split(contents, '|');
foreach (string split in splitlines)
{
//Regex line = content.Split('|');
//content.Split('|');
string prefix = prefix = Regex.Match(line, #"(\S+)(\d+)").Groups[0].Value;
File.AppendAllText(workingdirform2 + "configuration.txt", prefix+"\r\n");
}
It's not entirely clear to me what you are trying to do, but there are a number of errors in your code. I have tried to guess what you are doing, but if this isn't what you want, please explain what you do want preferably with some examples:
string inputFilename = "input.txt";
string outputFilename = "output.txt";
using (StreamWriter streamWriter = File.AppendText(outputFilename))
{
using (StreamReader streamReader = File.OpenText(inputFilename))
{
while (true)
{
string line = streamReader.ReadLine();
if (line == null)
{
break;
}
string[] splitlines = line.Split('|');
foreach (string split in splitlines)
{
Match match = Regex.Match(split, #"\S+\d+");
if (match.Success)
{
string prefix = match.Groups[0].Value;
streamWriter.WriteLine(prefix);
}
else
{
// Handle match failed...
}
}
}
}
}
Key points:
You seem to want to perform an operation on each line, so you need to iterate over the lines.
Use the simple string.Split method if you want to split on a single character. Regex.Split doesn't accept a character and "|" has a special meaning in regular expressions so it wouldn't have worked anyway unless you escaped it.
You were opening and closing the output file multiple times. You should open it just once and keep it open until you have finished writing to it. The using keyword is useful here.
Use WriteLine instead of appending "\r\n".
If the input file is large, use a StreamReader instead of ReadAllLines.
If the match fails, your program will throw an exception. You probably should check match.Success before using the match and if this returns false, handle the error appropriately (skip the line, report a warning, throw an exception with an appropriate message, etc.)
You aren't actually using groups 1 and 2 in the regular expression, so you can remove the parentheses to save the regular expression engine from having to store results that you won't use anyway.
You should pass the original string to Regex.Split and not an array.
Looks like you are using line instead of split when settings the prefix. Without knowing more about your code I cant tell if it's right or not but in any case it sticks out as the error.(it shouldnt build either)
This is a really inefficient on at least two levels :)
Regex.Split takes a string, not an array of strings.
I would recommend calling Regex.Split on each item of contents individually, then looping over the results of that call. This would mean nested for loops.
string[] contents = File.ReadAllLines(filename);
foreach (string line in contents)
{
string[] splitlines = Regex.Split(line);
foreach (string splitline in splitlines)
{
string prefix = Regex.Match(splitline, #"(\S+)(\d+)").Groups[0].Value;
File.AppendAllText(workingdirform2 + "configuration.txt", prefix+"\r\n");
}
}
This, of course isn't the most efficient way to go about it.
A more efficient way might be to split on a regular expression instead. I think this works:
string splitlines = Regex.Split(File.ReadAllText(filename), "$|\\|");
I have to assume, based on the limited feedback, that this is what you're looking for:
string inputFile = filename;
string outputFile = Path.Combine( workingdirform2, "configuration.txt" );
using ( StreamReader inputFileStream = File.OpenText( inputFile ) )
{
using ( StreamWriter ouputFileStream = File.AppendText( outputFile ) )
{
// Iterate over the file contents to extract the prefix
string currentLine;
while ( ( currentLine = inputFileStream.ReadLine() ) != null )
{
// Notice the updated Regex - your's is a bit broken
string prefix = Regex.Match( currentLine, #"^(\S+?)\d+" ).Groups[1].Value;
ouputFileStream.WriteLine( prefix );
}
}
}
This would take a file full of:
Text1231|abc|abc
Text1232|abc|abc
Text1233|abc|abc
Text1234|abc|abc
and place:
Text
Text
Text
Text
into a new file.
I hope this, at least, gets you on the right path. My crystal ball is getting hazy.. haaazzzy..
Probably one of the best way to process text files in C# is to use fileHelpers. Give it a look. It allows you to strongly type your import data.

Categories

Resources