I'm trying to read a (small-ish) file in chunks of a few lines at a time, and I need to return to the beginning of particular chunks.
The problem is, after the very first call to
streamReader.ReadLine();
the streamReader.BaseStream.Position property is set to the end of the file! Now I assume some caching is done in the backstage, but I was expecting this property to reflect the number of bytes that I used from that file. And yes, the file has more than one line :-)
For instance, calling ReadLine() again will (naturally) return the next line in the file, which does not start at the position previously reported by streamReader.BaseStream.Position.
How can I find the actual position where the 1st line ends, so I can return there later?
I can only think of manually doing the bookkeeping, by adding the lengths of the strings returned by ReadLine(), but even here there are a couple of caveats:
ReadLine() strips the new-line character(s) which may have a variable length (is is '\n'? Is it "\r\n"? Etc.)
I'm not sure if this would work OK with variable-length characters
...so right now it seems like my only option is to rethink how I parse the file, so I don't have to rewind.
If it helps, I open my file like this:
using (var reader = new StreamReader(
new FileStream(
m_path,
FileMode.Open,
FileAccess.Read,
FileShare.ReadWrite)))
{...}
Any suggestions?
If you need to read lines, and you need to go back to previous chunks, why not store the lines you read in a List? That should be easy enough.
You should not depend on calculating a length in bytes based on the length of the string - for the reasons you mention yourself: Multibyte characters, newline characters, etc.
I have done a similar implementation where I needed to access the n-th line in an extremely big text file fast.
The reason streamReader.BaseStream.Position had pointed to the end of file is that it has a built-in buffer, as you expected.
Bookkeeping by counting number of bytes read from each ReadLine() call will work for most plain text files. However, I have encounter cases where there control character, the unprintable one, mixed in the text file. The number of bytes calculated is wrong and caused my program not beeing able to seek to the correct location thereafter.
My final solution was to go with implementing the line reader on my own. It worked well so far. This should give some ideas what it looks like:
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
int ch;
int currentLine = 1, offset = 0;
while ((ch = fs.ReadByte()) >= 0)
{
offset++;
// This covers all cases: \r\n and only \n (for UNIX files)
if (ch == 10)
{
currentLine++;
// ... do sth such as log current offset with line number
}
}
}
And to go back to logged offset:
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
fs.Seek(yourOffset, SeekOrigin.Begin);
TextReader tr = new StreamReader(fs);
string line = tr.ReadLine();
}
Also note there is already buffering mechanism built into FileStream.
StreamReader isn't designed for this kind of usage, so if this is what you need I suspect that you'll have to write your own wrapper for FileStream.
A problem with the accepted answer is that if ReadLine() encounters an exception, say due to the logging framework locking the file temporarily right when you ReadLine(), then you will not have that line "saved" into a list because it never returned a line. If you catch this exception you cannot retry the ReadLine() a second time because StreamReaders internal state and buffer are screwed up from the last ReadLine() and you will only get part of a line returned, and you cannot ignore that broken line and seek back to the beginning of it as OP found out.
If you want to get to the true seekable location then you need to use reflection to get to StreamReaders private variables that allow you calculate its position inside it's own buffer. Granger's solution seen here: StreamReader and seeking, should work. Or do what other answers in other related questions have done: create your own StreamReader that exposes the true seekable location (this answer in this link: Tracking the position of the line of a streamreader). Those are the only two options I've come across while dealing with StreamReader and seeking, which for some reason decided to completely remove the possibility of seeking in nearly every situation.
edit: I used Granger's solution and it works. Just be sure you go in this order: GetActualPosition(), then set BaseStream.Position to that position, then make sure you call DiscardBufferedData(), and finally you can call ReadLine() and you will get the full line starting from the position given in the method.
Related
This question already has answers here:
Failed to write large amount of data to stream
(2 answers)
Closed 1 year ago.
I have a big stream (4Go), I need to replace some character (I need to replace one specific character with 2 or 3 ones) in that stream, i get the stream from à service à.d I have to return back a stream.
This is what I'm doing
private static Stream UpdateStream(Stream stream, string oldCharacters, string newCharacters, int size = 2048)
{
stream.Position = 0;
StreamReader reader = new StreamReader(stream);
MemoryStream outputStream = new MemoryStream();
StreamWriter writer = new StreamWriter(outputStream);
writer.AutoFlush = true;
char[] buffer = new char[size];
while (!reader.EndOfStream)
{
reader.Read(buffer, 0, buffer.Length);
if (buffer != null)
{
string line = new string(buffer);
if (!string.IsNullOrEmpty(line))
{
string newLine = line.Replace(oldCharacters, newCharacters);
writer.Write(newLine);
}
}
}
return outputStream;
}
But I'm getting an OutOfMemory exception at some point in this line but when looking at computer memery I still have planty available.
writer.Write(newLine);
Any advise ?
This is not an answer, but I couldn't possibly fit it into a comment.
Currently your problem is not solvable without making some assumptions. The problem, as I hear it, is that you want to replace some parts of a large body of text saved in a file and save the modified text in the file again.
Some unknown variables:
How long are those strings you are replacing?
How long are those strings you are replacing it with? The same length as the replaced strings?
What kinds of strings are you looking to replace? A single word? A whole sentence? A whole paragraph?
A solution to your problem would be to read the file into memory in chunks, replace the necessary text and save the "updated" text in a new file and then finally rename the "new file" to the name of the old file. However, without knowing the answers to the above points, you could potentially be wanting to replace a string as long as all text in the file (unlikely, yes). This means in order to do the "replacing" I would have to read the whole file into memory before I can replace any of the text, which causes an OutOfMemoryException. (Yes, you could do some clever scanning to replace such large strings without reading it all into memory at once, but I doubt such a solution is necessary here).
Please edit your question to address the above points.
So to make it work I had to :
use the HugeMemoryStream class from this post Failed to write large amount of data to stream
and define the gcAllowVeryLargeObjects parameter to true
and set the build to 64 bit (Prefer 32-bit unchecked)
I have a huge text file which i need to read.Currently I am reading text file like this..
string[] lines = File.ReadAllLines(FileToCopy);
But here all the lines are getting being stored in lines array and after this according to the condition is being processed programtically which is not efficient way as first it will Read irrelevant rows(lines) also of the text file into array and same way will go for the processing.
So my question is Can i put line number to be read from the text file..Suppose last time it had read 10001 lines and next time it should start from 10002..
How to achieve it?
Well you don't have to store all those lines - but you definitely have to read them. Unless the lines are of a fixed length (in bytes, not characters) how would you expect to be able to skip to a particular part of the file?
To store only the lines you want in memory though, use:
List<string> lines = File.ReadLines(FileToCopy).Skip(linesToSkip).ToList();
Note that File.ReadLines() was introduced in .NET 4, and reads the lines on-demand with an iterator instead of reading the entire file into memory.
If you only want to process a certain number of lines, you can use Take as well:
List<string> lines = File.ReadLines(FileToCopy)
.Skip(linesToSkip)
.Take(linesToRead)
.ToList();
So for example, linesToSkip=10000 and linesToRead=1000 would give you lines 10001-11000.
Ignore the lines, they're useless - if every line isn't the same length, you're going to have to read them one by one again, that's a huge waste.
Instead, use the position of the file stream. This way, you can skip right there on the second attempt, no need to read the data all over again. After that, you'll just use ReadLine in a loop until you get to the end, and mark the new end position.
Please, don't use ReadLines().Skip(). If you have a 10 GB file, it will read all the 10 GBs, create the appropriate strings, throw them away, and then, finally, read the 100 bytes you want to read. That's just crazy :) Of course, it's better than using File.ReadAllLines, but only because that doesn't need to keep the whole file in memory at once. Other than that, you're still reading every single byte of the file (you have to find out where the lines end).
Sample code of a method to read from last known location:
string[] ReadAllLinesFromBookmark(string fileName, ref long lastPosition)
{
using (var fs = File.OpenRead(fileName))
{
fs.Position = lastPosition;
using (var sr = new StreamReader(fs))
{
string line = null;
List<string> lines = new List<string>();
while ((line = sr.ReadLine()) != null)
{
lines.Add(line);
}
lastPosition = fs.Position;
return lines.ToArray();
}
}
}
Well you do have line numbers, in the form of the array index. Keep a note of the previously read lines array index and you start start reading from the next array index.
Use the Filestream.Position method to get the position of that file and then set the position.
I have text file which contain 200000 rows. I want to read first 50000 rows and then process it and then read second part say 50001 to 100000 etc. When I read second block I don't write to loop on first 1 to 50000. I want that reader pointer directly goes to row number 50001 and start reading.
How it can be possible? Which reader is used for that?
You need the StreamReader class.
With this you can do line by line reading with the ReadLine() method. You will need to keep track of the line count yourself and call a method to process your data every 50000 lines, but so long as you keep the reader open you should not need to restart the reading.
No unfortunately there is no way you can skip counting the Lines. At the raw level files do not work on a line number basis. Instead they work at a position / offset basis. The root file system has no concept of lines. It's a concept added by higher level components.
So there is no way to tell the operating system, please open file at line specified. Instead you have to open the file and skip around counting new lines until you've passed the specified number. Then store the next set of bytes into an array until you hit the next new line.
Though If each line has equal number of bytes present then you can try the following.
using( Stream stream = File.Open(fileName, FileMode.Open) )
{
stream.Seek(bytesPerLine * (myLine - 1), SeekOrigin.Begin);
using( StreamReader reader = new StreamReader(stream) )
{
string line = reader.ReadLine();
}
}
I believe the best way would be to use stream reader,
Here are two related questions to yours, in which you can get answers from there. But ultimately if you want to get blocks of text it is very hard to do unless it is a set amount.
However I believe these would be a good read for you to use:
Reading Block of text file
This one shows you how to separate blocks of code to read. The answer for this one would be best suited, you can just set the conditions to read how many lines you have read, and set the conditions to check if the line count == 50000 or so on then do something.
As you can see
This answer makes use of the keyword continue which I believe will be useful for what you are intending to do.
Reading text file block by block
This one shows you a more readable answer but doesn't really answer what you are looking for in reading blocks.
For your question I believe that what you want to do has confused you a little, it seems like you want to highlight 50000 lines and then read it as one, that is not the way streamreader works, and yes reading line by line makes the process longer but unfortunately that's the case.
Unless the rows are exactly the same length, you can't start directly at row 50001.
What you can do, however, is when reading the first 50000 rows, remember where the last row ends. You can then seek directly to that offset and continue reading from there.
Where the row length is fixed, you do something like this:
myfile.Seek(50000 * (rowCharacters + 2), SeekOrigin.Begin);
Seek goes to a specific offset in bytes, so you just need to tell it how many bytes 50000 rows occupy. Given an ASCII encoding, that's the number of characters in the line, plus 2 for the newline sequence.
I am trying to use StreamReader and StreamWriter to Open a text file (fixed width) and to modify a few specific columns of data. I have dates with the following format that are going to be converted to packed COMP-3 fields.
020100718F
020100716F
020100717F
020100718F
020100719F
I want to be able to read in the dates form a file using StreamReader, then convert them to packed fields (5 characters), and then output them using StreamWriter. However, I haven't found a way to use StreamWriter to right to a specific position, and beginning to wonder if is possible.
I have the following code snip-it.
System.IO.StreamWriter writer;
this.fileName = #"C:\Test9.txt";
reader = new System.IO.StreamReader(System.IO.File.OpenRead(this.fileName));
currentLine = reader.ReadLine();
currentLine = currentLine.Substring(30, 10); //Substring Containing the Date
reader.Close();
...
// Convert currentLine to Packed Field
...
writer = new System.IO.StreamWriter(System.IO.File.Open(this.fileName, System.IO.FileMode.Open));
writer.Write(currentLine);
Currently what I have does the following:
After:
!##$%0718F
020100716F
020100717F
020100718F
020100719F
!##$% = Ascii Characters SO can't display
Any ideas? Thanks!
UPDATE
Information on Packed Fields COMP-3
Packed Fields are used by COBOL systems to reduce the number of bytes a field requires in files. Please see the following SO post for more information: Here
Here is Picture of the following date "20120123" packed in COMP-3. This is my end result and I have included because I wasn't sure if it would effect possible answers.
My question is how do you get StreamWriter to dynamically replace data inside a file and change the lengths of rows?
I have always found it better to to read the input file, filter/process the data and write the output to a temporary file. After finished, delete the original file (or make a backup) and copy the temporary file over. This way you haven't lost half your input file in case something goes wrong in the middle of processing.
You should probably be using a Stream directly (probably a FileStream). This would allow you to change position.
However, you're not going to be able to change record sizes this way, at least, not in-line. You can have one Stream reading from the original file, and another writing to a new, converted copy of the file.
However, I haven't found a way to use StreamWriter to right to a specific position, and
beginning to wonder if is possible.
You can use StreamWriter.BaseStream.Seek method
using (StreamWriter wr = new StreamWriter(File.Create(#"c:\Temp\aaa.txt")))
{
wr.Write("ABC");
wr.Flush();
wr.BaseStream.Seek(0, SeekOrigin.Begin);
wr.Write("Z");
}
Hey there! I'm trying to read a 150mb file with a file stream but every time I do it all I get is: |zl instead of the whole stream. Note that it has some special characters in it.
Does anybody know what the problem could be? here is my code:
using (FileStream fs = File.OpenRead(path))
{
byte[] buffer = new byte[fs.Length];
fs.Read(buffer, 0, buffer.Length);
extract = Encoding.Default.GetString(buffer);
}
Edit:
I tried to read all text but it still returned the same four characters. It works fine on any other file except for these few. When I use read all lines it only gets the first line.
fs.Read() does not read the whole smash of bytes all at once, it reads some number of bytes and returns the number of bytes read. MSDN has an excellent example of how to use it to get the whole file:
http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx
For what it's worth, reading the entire 150MB of data into memory is really going to put a drain on your client's system -- the preferred option would be to optimize it so that you don't need the whole file all at once.
If you want to read text this way File.ReadAllLine (or ReadAllText) - http://msdn.microsoft.com/en-us/library/s2tte0y1.aspx is better option.
My guess the file is not text file to start with and the way you display resulting string does stop at 0 characters.
As debracey pointed out Read returns number of bytes read - check that out. Also for file operations it is unlikely to stop at 4 characters...