I recognize that I cannot hold my complete data in memory, so I want to stream parts in memory and work with them and then write them back.
Yield is a very useful keyword, it saves the whole bunch of using an enumerator and saving the index,....
But when I want to shift IEnumerable via yield around and write them back to a collection/file do I need to use the enumerator concept or is there something similar like the opposite of yield?
I head about RX, but I´m not clear if it solves my problem?
public static IEnumerable<string> ReadFile()
{
string line;
var reader = new System.IO.StreamReader(#"c:\\temp\\test.txt");
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
reader.Close();
}
public static void StreamFile()
{
foreach (string line in ReadFile())
{
WriteFile(line);
}
}
public static void WriteFile(string line)
{
// how to save the state, of observe an collection/stream???
var writer = new System.IO.StreamWriter("c:\\temp\\test.txt");
writer.WriteLine(line);
writer.Close();
}
In your case, you can pass the IEnumerable<string> directly to WriteFile:
public static void WriteFile(IEnumerable<string> lines)
{
// how to save the state, of observe an collection/stream???
using(var writer = new System.IO.StreamWriter("c:\\temp\\test.txt"))
{
foreach(var line in lines)
writer.WriteLine(line);
}
}
Since the input is streamed through an IEnumerable<T>, the data will never be held in memory.
Note that, in this case, you could just use File.ReadLines to perform the read, as it already streams the results back via an IEnumerable<string>. With File.WriteAllLines, your code could be done as (though, you could also just use File.Copy):
File.WriteAllLines(outputFile, File.ReadLines(inputFile));
Related
I've found plenty of examples of how to read/write text to a file asynchronously, but I'm having a hard time finding how to do it with a List.
For the writing I've got this, which seems to work:
public async Task<List<string>> GetTextFromFile(string file)
{
using (var reader = File.OpenText(file))
{
var fileText = await reader.ReadToEndAsync();
return fileText.Split(new[] { Environment.NewLine }, StringSplitOptions.None).ToList();
}
}
The writing is a bit trickier though ...
public async Task WriteTextToFile(string file, List<string> lines, bool append)
{
if (!append && File.Exists(file)) File.Delete(file);
using (var writer = File.OpenWrite(file))
{
StringBuilder builder = new StringBuilder();
foreach (string value in lines)
{
builder.Append(value);
builder.Append(Environment.NewLine);
}
Byte[] info = new UTF8Encoding(true).GetBytes(builder.ToString());
await writer.WriteAsync(info, 0, info.Length);
}
}
My problem with this is that for a moment it appears my data is triple in memory.
The original List of my lines, then the StringBuilder makes it a single string with the newlines, then in info I have the byte representation of the string.
That seems excessive that I have to have three copies of essentially the same data in memory.
I am concerned with this because at times I'll be reading and writing large text files.
Following up on that, let me be clear - I know that for extremely large text files I can do this all line by line. What I am looking for are two methods of reading/writing data. The first is to read in the whole thing and process it, and the second is to do it line by line. Right now I am working on the first approach for my small and moderate sized text files. But I am still concerned with the data replication issue.
The following might suit your needs as it does not store the data again as well as writing it line by line:
public async Task WriteTextToFile(string file, List<string> lines, bool append)
{
if (!append && File.Exists(file))
File.Delete(file);
using (var writer = File.OpenWrite(file))
{
using (var streamWriter = new StreamWriter(writer))
foreach (var line in lines)
await streamWriter.WriteLineAsync(line);
}
}
So, let's say i have a text file with 20 lines, with on each line different text.
i want to be able to have a string that has the first line in it, but when i do NextLine(); i want it to be the next line. I tried this but it doesn't seem to work:
string CurrentLine;
int LastLineNumber;
Void NextLine()
{
System.IO.StreamReader file = new System.IO.StreamReader("c:\\test.txt");
CurrentLine = file.ReadLine(LastLineNumber + 1);
LastLineNumber++;
}
How would i be able to do this?
Thanks in advance.
In general, it would be better if you could design this in a way to leave your file open, and not try to reopen the file each time.
If that is not practical, you'll need to call ReadLine multiple times:
string CurrentLine;
int LastLineNumber;
void NextLine()
{
// using will make sure the file is closed
using(System.IO.StreamReader file = new System.IO.StreamReader("c:\\test.txt"))
{
// Skip lines
for (int i=0;i<LastLineNumber;++i)
file.ReadLine();
// Store your line
CurrentLine = file.ReadLine();
LastLineNumber++;
}
}
Note that this can be simplified via File.ReadLines:
void NextLine()
{
var lines = File.ReadLines("C:\\test.txt");
CurrentLine = lines.Skip(LastLineNumber).First();
LastLineNumber++;
}
One simple call should do it:
var fileLines = System.IO.File.ReadAllLines(fileName);
You will want to validate the file exists and of course you still need to watch for blank lines or invalid values but that should give you the basics. To loop over the file you can use the following:
foreach (var singleLine in fileLines) {
// process "singleLine" here
}
One more note - you won't want to do this with large files since it processes everything in memory.
Well, if you really don't mind re-opening the file each time, you can use:
CurrentLine = File.ReadLines("c:\\test.txt").Skip(LastLineNumber).First();
LastLineNumber++;
However, I'd advise you to just read the whole thing in one go using File.ReadAllLines, or perhaps File.ReadLines(...).ToList().
The ReadLine method already reads the next line in the StreamReader, you don't need the counter, or your custom function for that matter. Just keep reading until you reach your 20 lines or until the file ends.
You can't pass a line number to ReadLine and expect it to find that particular line. If you look at the ReadLine documentation, you'll see it doesn't accept any parameters.
public override string ReadLine()
When working with files, you must treat them as streams of data. Every time you open the file, you start at the very first byte/character of the file.
var reader = new StreamReader("c:\\test.txt"); // Starts at byte/character 0
You have to keep the stream open if you want to read more lines.
using (var reader = new StreamReader("c:\\test.txt"))
{
string line1 = reader.ReadLine();
string line2 = reader.ReadLine();
string line3 = reader.ReadLine();
// etc..
}
If you really want to write a method NextLine, then you need to store the created StreamReader object somewhere and use that every time. Somewhat like this:
public class MyClass : IDisposable
{
StreamReader reader;
public MyClass(string path)
{
this.reader = new StreamReader(path);
}
public string NextLine()
{
return this.reader.ReadLine();
}
public void Dispose()
{
reader.Dispose();
}
}
But I suggest you either loop through the stream:
using (var reader = new StreamReader("c:\\test.txt"))
{
while (some_condition)
{
string line = reader.ReadLine();
// Do something
}
}
Or get all the lines at once using the File class ReadAllLines method:
string[] lines = System.IO.File.ReadAllLines("c:\\test.txt");
for (int i = 0; i < lines.Length; i++)
{
string line = lines[i];
// Do something
}
I can currently remove the last line of a text file using:
var lines = System.IO.File.ReadAllLines("test.txt");
System.IO.File.WriteAllLines("test.txt", lines.Take(lines.Length - 1).ToArray());
Although, how is it possible to instead remove the beginning of the text file?
Instead of lines.Take, you can use lines.Skip, like:
var lines = File.ReadAllLines("test.txt");
File.WriteAllLines("test.txt", lines.Skip(1).ToArray());
to truncate at the beginning despite the fact that the technique used (read all text and write everything back) is very inefficient.
About the efficient way: The inefficiency comes from the necessity to read the whole file into memory. The other way around could easily be to seek in a stream and copy the stream to another output file, delete the original, and rename the old. That one would be equally fast and yet consume much less memory.
Truncating a file at the end is much easier. You can just find the trunaction position and call FileStream.SetLength().
Here is an alternative:
using (var stream = File.OpenRead("C:\\yourfile"))
{
var items = new LinkedList<string>();
using (var reader = new StreamReader(stream))
{
reader.ReadLine(); // skip one line
string line;
while ((line = reader.ReadLine()) != null)
{
//it's far better to do the actual processing here
items.AddLast(line);
}
}
}
Update
If you need an IEnumerable<string> and don't want to waste memory you could do something like this:
public static IEnumerable<string> GetFileLines(string filename)
{
using (var stream = File.OpenRead(filename))
{
using (var reader = new StreamReader(stream))
{
reader.ReadLine(); // skip one line
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
}
static void Main(string[] args)
{
foreach (var line in GetFileLines("C:\\yourfile.txt"))
{
// do something with the line here.
}
}
var lines = System.IO.File.ReadAllLines("test.txt");
System.IO.File.WriteAllLines("test.txt", lines.Skip(1).ToArray());
Skip eliminates the given number of elements from the beginning of the sequence. Take eliminates all but the given number of elements from the end of the sequence.
To remove fist line from a text file
System.IO.StreamReader file = new System.IO.StreamReader(filePath);
string data = file.ReadToEnd();
file.Close();
data = Regex.Replace(data, "<.*\n", "");
System.IO.StreamWriter file = new System.IO.StreamWriter(filePath, false);
file.Write(data);
file.Close();
can do in one line also
File.WriteAllLines(origialFilePath,File.ReadAllLines(originalFilePath).Skip(1));
Assuming you are passing your filePath as parameter to the function.
I know normally you would use the File.ReadAllLines, but I'm trying to do it with an uploaded file.
Can I somehow put it into a temporary location?, or read it from memory?
I was able to get this working
Is this a string, a Stream, or what? either way, you want a TextReader - the question is simply StringReader vs StreamReader. Once you have that, I would do something like:
public static IEnumerable<string> ReadLines(TextReader reader) {
string line;
while((line = reader.ReadLine()) != null) yield return line;
}
then with whichever reader, I can either user:
foreach(var line in ReadLines(reader)) {
// note: non-buffered - i.e. more memory-efficient
}
or:
string[] lines = ReadLines(reader).ToArray();
// note: buffered - all read into memory at once (less memory efficient)
i.e. if it is a Stream you are reading from:
using(var reader = new StreamReader(inputStream)) {
foreach(var line in ReadLines(reader)) {
// do something fun and interesting
}
}
This is the way I read file:
public static string readFile(string path)
{
StringBuilder stringFromFile = new StringBuilder();
StreamReader SR;
string S;
SR = File.OpenText(path);
S = SR.ReadLine();
while (S != null)
{
stringFromFile.Append(SR.ReadLine());
}
SR.Close();
return stringFromFile.ToString();
}
The problem is it so long (the .txt file is about 2.5 megs). Took over 5 minutes. Is there a better way?
Solution taken
public static string readFile(string path)
{
return File.ReadAllText(path);
}
Took less than 1 second... :)
S = SR.ReadLine();
while (S != null)
{
stringFromFile.Append(SR.ReadLine());
}
Of note here, S is never set after that initial ReadLine(), so the S != null condition never triggers if you enter the while loop. Try:
S = SR.ReadLine();
while (S != null)
{
stringFromFile.Append(S = SR.ReadLine());
}
or use one of the other comments.
If you need to remove newlines, use string.Replace(Environment.NewLine, "")
Leaving aside the horrible variable names and the lack of a using statement (you won't close the file if there are any exceptions) that should be okay, and certainly shouldn't take 5 minutes to read 2.5 megs.
Where does the file live? Is it on a flaky network share?
By the way, the only difference between what you're doing and using File.ReadAllText is that you're losing line breaks. Is this deliberate? How long does ReadAllText take?
return System.IO.File.ReadAllText(path);
Marcus Griep has it right. IT's taking so long because YOU HAVE AN INFINITE LOOP. copied your code and made his changes and it read a 2.4 M text file in less than a second.
but I think you might miss the first line of the file. Try this.
S = SR.ReadLine();
while (S != null){
stringFromFile.Append(S);
S = SR.ReadLine();
}
Do you need the entire 2.5 Mb in memory at once?
If not, I would try to work with what you need.
Use System.IO.File.RealAllLines instead.
http://msdn.microsoft.com/en-us/library/system.io.file.readalllines.aspx
Alternatively, estimating the character count and passing that to StringBuilder's constructor as the capacity should speed it up.
Try this, should be much faster:
var str = System.IO.File.ReadAllText(path);
return str.Replace(Environment.NewLine, "");
By the way: Next time you're in a similar situation, try pre-allocating memory. This improves runtime drastically, regardless of the exact data structures you use. Most containers (StringBuilder as well) have a constructor that allow you to reserve memory. This way, less time-consuming reallocations are necessary during the read process.
For example, you could write the following if you want to read data from a file into a StringBuilder:
var info = new FileInfo(path);
var sb = new StringBuilder((int)info.Length);
(Cast necessary because System.IO.FileInfo.Length is long.)
ReadAllText was a very good solution for me. I used following code for 3.000.000 row text file and it took 4-5 seconds to read all rows.
string fileContent = System.IO.File.ReadAllText(txtFilePath.Text)
string[] arr = fileContent.Split('\n');
The loop and StringBuilder may be redundant; Try using
ReadToEnd.
To read a text file fastest you can use something like this
public static string ReadFileAndFetchStringInSingleLine(string file)
{
StringBuilder sb;
try
{
sb = new StringBuilder();
using (FileStream fs = File.Open(file, FileMode.Open))
{
using (BufferedStream bs = new BufferedStream(fs))
{
using (StreamReader sr = new StreamReader(bs))
{
string str;
while ((str = sr.ReadLine()) != null)
{
sb.Append(str);
}
}
}
}
return sb.ToString();
}
catch (Exception ex)
{
return "";
}
}
Hope this will help you. and for more info, please visit to the following link-
Fastest Way to Read Text Files