Advanced TextReader to EndOfFile - c#

I have a textReader that in a specific instance I want to be able to advance to the end of file quickly so other classes that might hold a reference to this object will not be able to call tr.ReadLine() without getting a null.
This is a large file. I cannot use TextReader.ReadToEnd() as it will often lead to an OutOfMemoryException
I thought I would ask the community if there was a way SEEK the stream without using TextReader.ReadToEnd() which returns a string of all data in the file.
Current method, inefficient.
The following example code is a mock up. Obviously I am not opening a file with an if statement directly following it asking if I want to read to the end.
TextReader tr = new StreamReader("Largefile");
if(needToAdvanceToEndOfFile)
{
while(tr.ReadLine() != null) { }
}
Desired solution (Note this code block contains fake 'concept' methods or methods that cannot be used due to risk of outofmemoryexception)
TextReader tr = new StreamReader("Largefile");
if(needToAdvanceToEndOfFile)
{
tr.SeekToEnd(); // A method that does not return anything. This method does not exist.
// tr.ReadToEnd() not acceptable as it can lead to OutOfMemoryException error as it is very large file.
}
A possible alternative is to read through the file in bigger chunks using tr.ReadBlock(args).
I poked around ((StreamReader)tr).BaseStream but could not find anything that worked.
As I am new to the community I figured I would see if someone knew the answer off the top of their head.

You have to discard any buffered data if you have read any file content - since data is buffered you might get content even if you seek the underlying stream to the end - working example:
StreamReader sr = new StreamReader(fileName);
string sampleLine = sr.ReadLine();
//discard all buffered data and seek to end
sr.DiscardBufferedData();
sr.BaseStream.Seek(0, SeekOrigin.End);
The problem as mentioned in the documentation is
The StreamReader class buffers input from the underlying stream when
you call one of the Read methods. If you manipulate the position of
the underlying stream after reading data into the buffer, the position
of the underlying stream might not match the position of the internal
buffer. To reset the internal buffer, call the DiscardBufferedData
method

Use
reader.BaseStream.Seek(0, SeekOrigin.End);
Test:
using (StreamReader reader = new StreamReader(#"Your Large File"))
{
reader.BaseStream.Seek(0, SeekOrigin.End);
int read = reader.Read();//read will be -1 since you are at the end of the stream
}
Edit: Test it with your code:
using (TextReader tr = new StreamReader("C:\\test.txt"))//test.txt is a file that has data and lines
{
((StreamReader)tr).BaseStream.Seek(0, SeekOrigin.End);
string foo = tr.ReadLine();
Debug.WriteLine(foo ?? "foo is null");//foo is null
int read = tr.Read();
Debug.WriteLine(read);//-1
}

Related

DeflateStream.Flush() has no functionality?

I was trying to compress a byte array using DeflateStream. After wring the data, I was looking for a way to close the compression (mark as done). At first, I tried Dispose() then Close(), but those made the result MemoryStream unreadable. Then I thought I may need to Flush(), but the description said "The current implementation of this method has no functionality"
But it seems that without Flush(), the result seems empty. What does it mean that "The current implementation of this method has no functionality"?
static void Main(string[] args)
{
byte[] result = Encoding.UTF8.GetBytes("わたしのこいはみなみのかぜにのってはしるわ");
var output = new MemoryStream();
var dstream = new DeflateStream(output, CompressionLevel.Optimal);
dstream.Write(result, 0, result.Length);
var compressedSize1 = output.Position;
dstream.Flush();
var compressedSize2 = output.Position;
What does it mean that "The current implementation of this method has no functionality" ?
it means: it doesn't do anything, so calling Flush() by itself isn't going to help. You need to close the deflate stream for it to finish writing what it needs, but without closing the underlying stream (blocks will be written as you write to the stream, as the internal buffer fills - but the final partial block cannot be written until the data is ready to be terminated; not all compression sequences support being able to flush at an arbitrary point and then resume compressed data in additional blocks)
What you probably want is:
using (var dstream = new DeflateStream(output, CompressionLevel.Optimal, true))
{
// Your deflate code
}
// Now check the memory stream
Note the extra bool leaveOpen parameter usage.

Memory Issue in string C#

I have little test program
public class Test
{
public string Response { get; set; }
}
My console simply call Test class
class Program
{
static void Main(string[] args)
{
Test t = new Test();
using (StreamReader reader = new StreamReader("C:\\Test.txt"))
{
t.Response = reader.ReadToEnd();
}
t.Response = t.Response.Substring(0, 5);
Console.WriteLine(t.Response);
Console.Read();
}
}
I have appox 60 MB data in my Test.txt file. When the program get executes, it is taking lot of memory because string is immutable. What is the better way handle this kind of scenario using string.
I know that i can use string builder. but i have created this program to replicate a scenario in one of my production application which uses string.
when i tried with GC.Collect(), memory is released immediately. I am not sure whether i can call GC in code.
Please help. Thanks.
UPDATE:
I think i did not explain it clearly. sorry for the confusion.
I am just reading data from file to get huge data as don't want create 60MB of data in code.
My pain point is below line of code where i have huge data in Response field.
t.Response = t.Response.Substring(0, 5);
You could limit your reads to a block of bytes (buffer). Loop through and read the next block into your buffer and write that buffer out. This will prevent a large chunk of data being stored in memory.
using (StreamReader reader = new StreamReader(#"C:\Test.txt", true))
{
char[] buffer = new char[1024];
int idx = 0;
while (reader.ReadBlock(buffer, idx, buffer.Length) > 0)
{
idx += buffer.Length;
Console.Write(buffer);
}
}
Can you read your file line by line? If so, I would recommend calling:
IEnumerable<string> lines = File.ReadLines(path)
When you iterate this collection using
foreach(string line in lines)
{
// do something with line
}
the collection will be iterated using lazy evaluation. That means the entire contents of the file won't need to be kept in memory while you do something with each line.
StreamReader provides just version of Read that you looking for - Read(Char[], Int32, Int32) - which lets you pick out first characters of the stream. Alternatively you can read char-by-char with regular StreamReader.Read till you decided that you have enough.
var textBuffer = new char[5];
reader.ReadToEnd(textBuffer, 0, 5); // TODO: check if it actually read engough
t.Response = new string(textBuffer);
Note that if you know encoding of the stream you may use lower level reading as byte array and use System.Text.Encoding classes to construct strings with encoding yourself instead of relaying on StreamReader.

StreamReader is too greedy

I'm trying to process part of a text file, and write the remainder of the text file to a cloud blob using UploadFromStream. The problem is that the StreamReader appears to be grabbing too much content from the underlying stream, and so the subsequent write does nothing.
Text file:
3
Col1,String
Col2,Integer
Col3,Boolean
abc,123,True
def,3456,False
ghijkl,532,True
mnop,1211,False
Code:
using (var stream = File.OpenRead("c:\\test\\testinput.txt"))
using (var reader = new StreamReader(stream))
{
var numColumns = int.Parse(reader.ReadLine());
while (numColumns-- > 0)
{
var colDescription = reader.ReadLine();
// do stuff
}
// Write remaining contents to another file, for testing
using (var destination = File.OpenWrite("c:\\test\\testoutput.txt"))
{
stream.CopyTo(destination);
destination.Flush();
}
// Actual intended usage:
// CloudBlockBlob blob = ...;
// blob.UploadFromStream(stream);
}
When debugging, I observe that stream.Position jumps to the end of the file on the first call to reader.ReadLine(), which I don't expect. I expected the stream to be advanced only as many positions as the reader needed to read some content.
I imagine that the stream reader is doing some buffering for performance reasons, but there doesn't seem to be a way to ask the reader where in the underlying stream it "really" is. (If there was, I could manually Seek the stream to that position before CopyingTo).
I know that I could keep taking lines using the same reader and sequentially append them to the text file I'm writing, but I'm wondering if there's a cleaner way?
EDIT:
I found a StreamReader constructor which leaves the underlying stream open when it is disposed, so I tried this, hoping that the reader would set the stream's position as it's being disposed:
using (var stream = File.OpenRead("c:\\test\\testinput.txt"))
{
using (var reader = new StreamReader(stream, Encoding.UTF8,
detectEncodingFromByteOrderMarks: true,
bufferSize: 1 << 12,
leaveOpen: true))
{
var numColumns = int.Parse(reader.ReadLine());
while (numColumns-- > 0)
{
var colDescription = reader.ReadLine();
// do stuff
}
}
// Write remaining contents to another file
using (var destination = File.OpenWrite("c:\\test\\testoutput.txt"))
{
stream.CopyTo(destination);
destination.Flush();
}
}
But it doesn't. Why would this constructor be exposed if it doesn't leave the stream in an intuitive state/position?
Sure, there's a cleaner way. Use ReadToEnd to read the remaining data, and then write it to a new file. For example:
using (var reader = new StreamReader("c:\\test\\testinput.txt"))
{
var numColumns = int.Parse(reader.ReadLine());
while (numColumns-- > 0)
{
var colDescription = reader.ReadLine();
// do stuff
}
// write everything else to another file.
File.WriteAllText("c:\\test\\testoutput.txt", reader.ReadToEnd());
}
Edit after comment
If you want to read the text and upload it to a stream, you could replace the File.WriteAllText with code that reads the remaining text, writes it to a StreamWriter backed by a MemoryStream, and then sends the contents of that MemoryStream. Something like:
using (var memStream = new MemoryStream())
{
using (var writer = new StreamWriter(memStream))
{
writer.Write(reader.ReadToEnd());
writer.Flush();
memStream.Position = 0;
blob.UploadFromStream(memStream);
}
}
You should never access the underlying stream of a StreamReader. Trying to use both is going to have an undefined behavior.
What's going on here is that the reader is buffering the data from the underlying stream. It doesn't read each byte exactly when you request it, because that's often going to be very inefficient. Instead it will grab chunks, put them in a buffer, and then provide you with data from that buffer, grabbing a new chunk when it needs to.
You should continue to use the StreamReader throughout the remainder of that block, instead of using stream. To minimize the memory footprint of the program, the most effective way of doing this would be to read the next line from the reader in a loop until it his the end of the file, writing each line to the output stream as you go.
Also note that you don't need to be disposing of both the stream reader and the underlying stream. The stream reader will dispose of the underlying stream itself, so you can simply adjust your header to:
using (var reader = new StreamReader(
File.OpenRead("c:\\test\\testinput.txt")))

Using StreamReader after the underlying stream has beed disposed?

Using a StreamReader, if you dispose the underlying stream, I thought you shouldn't be able to read anymore.
That this is true suggests this question, where it's suggested that you don't have to dispose the StreamWriter (in their case) if the life of the underlying stream is handled elsewhere.
But that's not the case. What I did was the following:
I have a file called delme.txt containing the following
abc
def
ghi
The I run this:
Stream s = File.OpenRead(#"C:\delme.txt");
StreamReader sr = new StreamReader(s, Encoding.ASCII);
Console.WriteLine(sr.ReadLine());
s.Dispose();
Console.WriteLine(sr.ReadLine());
And the result is:
abc
def
How is this possible?
Your StreamReader already read the next line into its buffer.
It won't go back to the source Stream until it runs out of data in its buffer.
In fact, it would be impossible for it to throw an exception in that case, since there is no idempotent way to find out whether a Stream has been disposed. (There is no IsDisposed property)
To add to #SLaks answer, here will demonstrate (using a file with a couple thousand lines of text):
Stream s = File.OpenRead(path);
StreamReader sr = new StreamReader(s, Encoding.ASCII);
Console.WriteLine(sr.ReadLine());
s.Dispose();
int i = 1;
try
{
while (!sr.EndOfStream)
{
Console.WriteLine(sr.ReadLine());
i++;
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
Console.WriteLine(i + " lines total");
Console.ReadLine();
It will print out lots and lots of lines, like a couple hundred, then will throw an Exception. My output ended like this:
qrs
tuv
wxy
zab
cde
fgh
ijk
lmn
Cannot access a closed file.
204 lines total
In fact we see that there is a constructor for StreamReader that takes a parameter bufferSize as the fourth parameter:
StreamReader sr = new StreamReader(s, Encoding.ASCII, false, 10000);
Using 10000, it actually prints out a total of 1248 lines for me before crashing. Also, the smallest possible value you can use is 1, and for that case, it still pre-fetches 25 lines.
What you need to understand here is what dispose is trying to do.
http://msdn.microsoft.com/en-us/library/ms227563.aspx
It says the TextReader will be an unusable state if the TextReader is finished. Perhaps, since it hasn't read everything, it is not considered finished; therefore, you can continue to use it. That is my guess.

Close a filestream without Flush()

Can I close a file stream without calling Flush (in C#)? I understood that Close and Dispose calls the Flush method first.
MSDN is not 100% clear, but Jon Skeet is saying "Flush", so do it before close/dispose. It won't hurt, right?
From FileStream.Close Method:
Any data previously written to the buffer is copied to the file before
the file stream is closed, so it is not necessary to call Flush before
invoking Close. Following a call to Close, any operations on the file
stream might raise exceptions. After Close has been called once, it
does nothing if called again.
Dispose is not as clear:
This method disposes the stream, by writing any changes to the backing
store and closing the stream to release resources.
Remark: the commentators might be right, it's not 100% clear from the Flush:
Override Flush on streams that implement a buffer. Use this method to
move any information from an underlying buffer to its destination,
clear the buffer, or both. Depending upon the state of the object, you
might have to modify the current position within the stream (for
example, if the underlying stream supports seeking). For additional
information see CanSeek.
When using the StreamWriter or BinaryWriter class, do not flush the
base Stream object. Instead, use the class's Flush or Close method,
which makes sure that the data is flushed to the underlying stream
first and then written to the file.
TESTS:
var textBytes = Encoding.ASCII.GetBytes("Test123");
using (var fileTest = System.IO.File.Open(#"c:\temp\fileNoCloseNoFlush.txt", FileMode.CreateNew))
{
fileTest.Write(textBytes,0,textBytes.Length);
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileCloseNoFlush.txt", FileMode.CreateNew))
{
fileTest.Write(textBytes, 0, textBytes.Length);
fileTest.Close();
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileFlushNoClose.txt", FileMode.CreateNew))
{
fileTest.Write(textBytes, 0, textBytes.Length);
fileTest.Flush();
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileCloseAndFlush.txt", FileMode.CreateNew))
{
fileTest.Write(textBytes, 0, textBytes.Length);
fileTest.Flush();
fileTest.Close();
}
What can I say ... all files got the text - maybe this is just too little data?
Test2
var rnd = new Random();
var size = 1024*1024*10;
var randomBytes = new byte[size];
rnd.NextBytes(randomBytes);
using (var fileTest = System.IO.File.Open(#"c:\temp\fileNoCloseNoFlush.bin", FileMode.CreateNew))
{
fileTest.Write(randomBytes, 0, randomBytes.Length);
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileCloseNoFlush.bin", FileMode.CreateNew))
{
fileTest.Write(randomBytes, 0, randomBytes.Length);
fileTest.Close();
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileFlushNoClose.bin", FileMode.CreateNew))
{
fileTest.Write(randomBytes, 0, randomBytes.Length);
fileTest.Flush();
}
using (var fileTest = System.IO.File.Open(#"c:\temp\fileCloseAndFlush.bin", FileMode.CreateNew))
{
fileTest.Write(randomBytes, 0, randomBytes.Length);
fileTest.Flush();
fileTest.Close();
}
And again - every file got its bytes ... to me it looks like it's doing what I read from MSDN: it doesn't matter if you call Flush or Close before dispose ... any thoughts on that?
You don't have to call Flush() on Close()/Dispose(), FileStream will do it for you as you can see from its source code:
http://referencesource.microsoft.com/#mscorlib/system/io/filestream.cs,e23a38af5d11ddd3
[System.Security.SecuritySafeCritical] // auto-generated
protected override void Dispose(bool disposing)
{
// Nothing will be done differently based on whether we are
// disposing vs. finalizing. This is taking advantage of the
// weak ordering between normal finalizable objects & critical
// finalizable objects, which I included in the SafeHandle
// design for FileStream, which would often "just work" when
// finalized.
try {
if (_handle != null && !_handle.IsClosed) {
// Flush data to disk iff we were writing. After
// thinking about this, we also don't need to flush
// our read position, regardless of whether the handle
// was exposed to the user. They probably would NOT
// want us to do this.
if (_writePos > 0) {
FlushWrite(!disposing); // <- Note this
}
}
}
finally {
if (_handle != null && !_handle.IsClosed)
_handle.Dispose();
_canRead = false;
_canWrite = false;
_canSeek = false;
// Don't set the buffer to null, to avoid a NullReferenceException
// when users have a race condition in their code (ie, they call
// Close when calling another method on Stream like Read).
//_buffer = null;
base.Dispose(disposing);
}
}
I've been tracking a newly introduced bug that seems to indicate .NET 4 does not reliably flush changes to disk when the stream is disposed (unlike .NET 2.0 and 3.5, which always did so reliably).
The .NET 4 FileStream class has been heavily modified in .NET 4, and while the Flush*() methods have been rewritten, similar attention seems to have been forgotten for .Dispose().
This is resulting in incomplete files.
Since you've stated that you understood that close & dispose called the flush method if it was not called explicitly by user code, I believe that (by close without flush) you actually want to have a possibility to discard changes made to a FileStream, if necessary.
If that is correct, using a FileStream alone won't help. You will need to load this file into a MemoryStream (or an array, depending on how you modify its contents), and then decide whether you want to save changes or not after you're done.
A problem with this is file size, obviously. FileStream uses limited size write buffers to speed up operations, but once they are depleted, changes need to be flushed. Due to .NET memory limits, you can only expect to load smaller files in memory, if you need to hold them entirely.
An easier alternative would be to make a disk copy of your file, and work on it using a plain FileStream. When finished, if you need to discard changes, simply delete the temporary file, otherwise replace the original with a modified copy.
Wrap the FileStream in a BufferedStream and close the filestream before the buffered stream.
var fs = new FileStream(...);
var bs = new BufferedStream(fs, buffersize);
bs.Write(datatosend, 0, length);
fs.Close();
try {
bs.Close();
}
catch (IOException) {
}
Using Flush() is worthy inside big Loops.
when you have to read and write a big File inside one Loop. In other case the buffer or the computer is big enough, and doesn´t matter to close() without making one Flush() before.
Example: YOU HAVE TO READ A BIG FILE (in one format) AND WRITE IT IN .txt
StreamWriter sw = .... // using StreamWriter
// you read the File ...
// and now you want to write each line for this big File using WriteLine ();
for ( .....) // this is a big Loop because the File is big and has many Lines
{
sw.WriteLine ( *whatever i read* ); //we write here somrewhere ex. one .txt anywhere
sw.Flush(); // each time the sw.flush() is called, the sw.WriteLine is executed
}
sw.Close();
Here it is very important to use Flush(); beacause otherwise each writeLine is save in the buffer and does not write it until the buffer is frull or until the program reaches sw.close();
I hope this helps a little to understand the function of Flush
I think it is safe to use simple using statement, which closes the stream after the call to GetBytes();
public static byte[] GetBytes(string fileName)
{
byte[] buffer = new byte[4096];
using (FileStream fs = new FileStream(fileName))
using (MemoryStream ms = new MemoryStream())
{
fs.BlockCopy(ms, buffer, 4096); // extension method for the Stream class
fs.Close();
return ms.ToByteArray();
}
}

Categories

Resources