C#: Search and replace txt line - c#

I am looking for a way to search a comma separated txt file for a keyword, and then replace another keyword on that exact line. For example if i have the following line in a big txt file:
Help, 0
I want to find this line in the txt (by telling program to look for the first word 'help') and replace the 0 with 1 to indicate that i have read it once so it looks like:
Help, 1
Thanks

It is generally a very bad idea to try and overwrite data in the same file: if your code throws an exception, you'll be left with a partially processed file; if your search target and replacement value have different lengths, you have to re-write the rest of the file. Note that these don't apply in your specific situation - but it's best not to let it become habit.
My recommendation:
Open both the input file and a temporary file (Path.GetTempFileName)
process and write each line ( StreamReader.ReadLine)
When finished with no errors, rename the original file to something like origFile.old
rename the temporary file to the original file name.
If something goes wrong, delete the temporary file and exit. This way the original file is left intact in the event of an error.

If you want to do the replacement "in place" (meaning you don't want to use another, temporary, file) then you would do so with a FileStream.
You have a couple of options, you can Read through the file stream until you find the text that you're looking for, then issue a Write. Keep in mind that FileStream works at the byte level, so you'll need to take character encoding into consideration. Encoding.GetString will do the conversion.
Alternatively, you can search for the text, and note its position. Then you can open a FileStream and just Seek to that position. Then you can issue the Write.
This may be the most efficient way, but it's definitely more challenging then the naive option. With the naive implementation, you:
Read the entire file into memory (File.ReadAllText)
Perform the replace (Regex.Replace)
Write it back to disk (File.WriteAllText)
There's no second file, but you are bound by the amount of memory the system has. If you know you're always dealing with small files, then this could be an option. Otherwise, you need to read up on character encoding and file streams.
Here's another SO question on the topic (including sample code): Editing a text file in place through C#

Related

Using StreamReader.ReadLine() to read a specific line number w/o reading entire file

I'm using StreamReader.ReadLine() in C# to read through a text file to find the specific content like "Step-xx" and then read and use the contents that point to the next occurrence of "Step-xx+1". I know the occurrence of the "Step-xx" line is 100 lines apart in my textfile. How can I jump to line 2500 and read the contents following "Step-25", rather than reading 2500 lines and comparing it to "Step-25", which I'm doing now. I need to speed up this search.
Thanks.
Files are not lines based (or even character based), so you can't skip to a specific line in a file.
If you really need to skip ahead in the file, you would need to make a guess where the 2500th line might start based on average line lengths, seek to that position and start reading. You would need to use a FileStream directly, not a StreamReader, and read the file as bytes. You would be looking for the 0x0d 0x0a byte combination that is used as newline in a Windows text file. When you have the bytes between two newlines, you can decode them into a string and look for the Step-xx markers.
Thanks for all the replies. This will do the trick.
string line = File.ReadLines(FileName).Skip(14).Take(1).First();
I need to figure out how changing from StreamReader to ReadLines would impact other things.
Thanks again

Why am I getting "�" characters?

I've written a quick-and-dirty utility to parse a text file, but in some cases it's writing out a "�" character. My utility reads from a .txt file which contains "records" in this format:
Biography
Title:George F. Kennan: An American Life
Author:John Lewis Gaddis
Kindle: B0054TVO1G
Hardcover: B007R93I1U
Paperback: 0143122150
Image link: <img src="http://images.amazon.com/images/P/B0054TVO1G.01.MZZZZZZZ.jpg" alt="Book Cover" />
...and writes out lines from that to a CSV file such as:
Biography,"George F. Kennan: An American Life","John Lewis Gaddis",B0054TVO1G,B007R93I1U,0143122150,<img src="http://images.amazon.com/images/P/B0054TVO1G.01.MZZZZZZZ.jpg" alt="Book Cover" />
...but in several cases, as mentioned, that weird character is appending itself to an author's name. In most cases where this is happening, it's what appears to be a space character in the .txt file. I'm trimming the author's name prior to writing it out to the CSV file, so it's obviously not being seen as a space, though.
When I save the text file with these characters, I get the message about non-unicode characters, etc.
What could be the cause of that? And better yet, how can I delete them with a search and replace operation? In Notepad, they are not found, so I have to delete them one-by-one.
Prior to being in the .txt file, this data was in an Open Office/.odt file, if that means anything to anyone.
BTW, I have no idea how that "stackoverflow" got into the href above; it's not in the original text I pasted in...
UPDATE
I am curious how that character got in my files. I sure didn't put it there (deliberately), any more than I added the "stackoverflow" to the URL above. Could it be that a call to Environment.Newline would add that?
Here was my process:
1) Copy and paste info from the interwebs into an Open Office/.odt file
2) Copy and past that into a text (Notepad) file
3) Open that text file programmatically and loop through it, writing to a new "csv"/.txt file.
UPDATE 2
Silly me - all I had to do was save the file (which wouldn't save those weird characters), then open it again. IOW, when I opened it today (at home, after work) those were gone.
UPDATE 3
I wrote too soon - it replaced the weird character with a question mark (a "normal" one, not a stylized one).
They are almost certainly non-breaking spaces, U+00A0 (although there are other fixed-width space characters which are also possible.) These won't be trimmed as spaces, but will be rendered as spaces if the encoding of the file matches the encoding of the output device.
My guess is that your text file is in CP-1252 (i.e., Windows default one-byte coding) but your output is being rendered as though it were UTF-8.
Normally you would type these characters as AltGr+Space. You might try that with Notepad, but no guarantees.

Adding a Line to the Middle of a File with .NET

Hello I am working on something, and I need to be able to be able to add text into a .txt file. Although I have this completed I have a small problem. I need to write the string in the middle of the file more or less. Example:
Hello my name is Brandon,
I hope someone can help, //I want the string under this line.
Thank you.
Hopefully someone can help with a solution.
Edit Alright thanks guys, I'll try to figure it out, probably going to just rewrite the whole file. Ok well the program I am making is related to the hosts file, and not everyone has the same hosts file, so I was wondering if there is a way to read their hosts file, and copy all of it, while adding the string to it?
With regular files there's no way around it - you must read the text that follows the line you wish to append after, overwrite the file, and then append the original trailing text.
Think of files on disk as arrays - if you want to insert some items into the middle of an array, you need to shift all of the following items down to make room. The difference is that .NET offers convenience methods for arrays and Lists that make this easy to do. The file I/O APIs offer no such convenience methods, as far as I'm aware.
When you know in advance you need to insert in the middle of a file, it is often easier to simply write a new file with the altered content, and then perform a rename. If the file is small enough to read into memory, you can do this quite easily with some LINQ:
var allLines = File.ReadAllLines( filename ).ToList();
allLines.Insert( insertPos, "This is a new line..." );
File.WriteAllLines( filename, allLines.ToArray() );
This is the best method to insert a text in middle of the textfile.
string[] full_file = File.ReadAllLines("test.txt");
List<string> l = new List<string>();
l.AddRange(full_file);
l.Insert(20, "Inserted String");
File.WriteAllLines("test.txt", l.ToArray());
one of the trick is file transaction. first you read the file up to the line you want to add text but while reading keep saving the read lines in a separate file for example tmp.txt and then add your desired text to the tmp.txt (at the end of the file) after that continue the reading from the source file till the end. then replace the tmp.txt with the source file. at the end you got file with added text in the middle :)
Check out File.ReadAllLines(). Probably the easiest way.
string[] full_file = File.ReadAllLines("test.txt");
List<string> l = new List<string>();
l.AddRange(full_file);
l.Insert(20, "Inserted String");
File.WriteAllLines("test.txt", l.ToArray());
If you know the line index use readLine until you reach that line and write under it.
If you know exactly he text of that line do the same but compare the text returned from readLine with the text that you are searching for and then write under that line.
Or you can search for the index of a specified string and writ after it using th escape sequence \n.
As others mentioned, there is no way around rewriting the file after the point of the newly inserted text if you must stick with a simple text file. Depending on your requirements, though, it might be possible to speed up the finding of location to start writing. If you knew that you needed to add data after line N, then you could maintain a separate "index" of the offsets of line numbers. That would allow you to seek directly to the necessary location to start reading/writing.

C#: String.IndexOf to FileStream.Seek

having a FileStream that I read with a StreamReader (it is a very large file), how can I set the Seek position of the FileStream to the first occurrence of a certain substring so that I can start reading this large file from a given point?
Thanks
What's in the file? Just lines of Unicode text? Then you've got a problem.
You will never know the position of the start of a line until you've read all the previous lines at least once. Unless the file is encoded in UTF-32, each character may take a variable number of bytes to represent it. Each line will have a variable length.
The best you can do is to scan through the file once and then make note of the positions of the starts of lines, in an index.
FileStream cannot do the search for you. You'll have to manually search for it. Probably you'll want to use an efficient string searching algorithm such as Knuth Morris Pratt.
Maybe this can help (Building a Regular Expression Stream Search with the .NET Framework):
https://www.developer.com/design/building-a-regular-expression-stream-search-with-the-net-framework/
If you mean first time you read the file then well you will have to read to know the position (of the particular string). Next time if content of the file is not changing you can remember this position (in some variable for use in same run of program), set stream position and start reading it.
Take a look at this example on MSDN
filestream=new FileStream(s.Substring(s.IndexOf("string"),s.Length),FileMode.Open,FileAccess.Read);

How do you specify where to start reading in a file when using StreamReader?

How do you specify where to start reading in a file when using StreamReader?
I have created a streamreader object, along with a file stream object. After both objects are created, how would I go upon controlling where I want the StreamReader to start reading from a file?
Let's say the file's contents are as follows,
// song list.
// junk info.
1. Song Name
2. Song Name
3. Song Name
4. Song Name
5. Song Name
6. Song Name
How would I control the streamreader to read from let's say #2? Also, how could I also control where to make it stop reading by a similar delimiter like at #5?
Edit: By delimiter I mean, a way to make StreamReader start reading from ('2.')
Are you trying to deserialize a file into some in-memory object? If so, you may want to simply parse the entire file in using ReadLine or something similar, store each line, and then access it via a data structure such as a KeyValuePair<int, string>.
Update: Ok... With the new info, I think you have two options. If you're looking at reading until you find a match, you can Peek(), check to see if the character is the one you're looking for, and then Read(). Alternatively, if you're looking for a set position, you can simply Read() that many characters and throw away the return value.
If you're looking for complex delimiter, you can read the entire line or even the entire file into memory and use Regular Expressions.
Hope that helps...
If the file contains new line delimiters you can use ReadLine to read a line at a time.
So to start reading at line #2, you would read the first line and discard and then read lines until line #5.
Well if the content is just plain text like that, you should use the StreamReader's ReadLine method.
http://msdn.microsoft.com/en-us/library/system.io.streamreader.readline.aspx
-Oisin

Categories

Resources