StreamReader is too greedy

StreamReader is too greedy - c#

I'm trying to process part of a text file, and write the remainder of the text file to a cloud blob using UploadFromStream. The problem is that the StreamReader appears to be grabbing too much content from the underlying stream, and so the subsequent write does nothing.
Text file:
3
Col1,String
Col2,Integer
Col3,Boolean
abc,123,True
def,3456,False
ghijkl,532,True
mnop,1211,False
Code:
using (var stream = File.OpenRead("c:\\test\\testinput.txt"))
using (var reader = new StreamReader(stream))
{
var numColumns = int.Parse(reader.ReadLine());
while (numColumns-- > 0)
{
var colDescription = reader.ReadLine();
// do stuff
}
// Write remaining contents to another file, for testing
using (var destination = File.OpenWrite("c:\\test\\testoutput.txt"))
{
stream.CopyTo(destination);
destination.Flush();
}
// Actual intended usage:
// CloudBlockBlob blob = ...;
// blob.UploadFromStream(stream);
}
When debugging, I observe that stream.Position jumps to the end of the file on the first call to reader.ReadLine(), which I don't expect. I expected the stream to be advanced only as many positions as the reader needed to read some content.
I imagine that the stream reader is doing some buffering for performance reasons, but there doesn't seem to be a way to ask the reader where in the underlying stream it "really" is. (If there was, I could manually Seek the stream to that position before CopyingTo).
I know that I could keep taking lines using the same reader and sequentially append them to the text file I'm writing, but I'm wondering if there's a cleaner way?
EDIT:
I found a StreamReader constructor which leaves the underlying stream open when it is disposed, so I tried this, hoping that the reader would set the stream's position as it's being disposed:
using (var stream = File.OpenRead("c:\\test\\testinput.txt"))
{
using (var reader = new StreamReader(stream, Encoding.UTF8,
detectEncodingFromByteOrderMarks: true,
bufferSize: 1 << 12,
leaveOpen: true))
{
var numColumns = int.Parse(reader.ReadLine());
while (numColumns-- > 0)
{
var colDescription = reader.ReadLine();
// do stuff
}
}
// Write remaining contents to another file
using (var destination = File.OpenWrite("c:\\test\\testoutput.txt"))
{
stream.CopyTo(destination);
destination.Flush();
}
}
But it doesn't. Why would this constructor be exposed if it doesn't leave the stream in an intuitive state/position?

Sure, there's a cleaner way. Use ReadToEnd to read the remaining data, and then write it to a new file. For example:
using (var reader = new StreamReader("c:\\test\\testinput.txt"))
{
var numColumns = int.Parse(reader.ReadLine());
while (numColumns-- > 0)
{
var colDescription = reader.ReadLine();
// do stuff
}
// write everything else to another file.
File.WriteAllText("c:\\test\\testoutput.txt", reader.ReadToEnd());
}
Edit after comment
If you want to read the text and upload it to a stream, you could replace the File.WriteAllText with code that reads the remaining text, writes it to a StreamWriter backed by a MemoryStream, and then sends the contents of that MemoryStream. Something like:
using (var memStream = new MemoryStream())
{
using (var writer = new StreamWriter(memStream))
{
writer.Write(reader.ReadToEnd());
writer.Flush();
memStream.Position = 0;
blob.UploadFromStream(memStream);
}
}

You should never access the underlying stream of a StreamReader. Trying to use both is going to have an undefined behavior.
What's going on here is that the reader is buffering the data from the underlying stream. It doesn't read each byte exactly when you request it, because that's often going to be very inefficient. Instead it will grab chunks, put them in a buffer, and then provide you with data from that buffer, grabbing a new chunk when it needs to.
You should continue to use the StreamReader throughout the remainder of that block, instead of using stream. To minimize the memory footprint of the program, the most effective way of doing this would be to read the next line from the reader in a loop until it his the end of the file, writing each line to the output stream as you go.
Also note that you don't need to be disposing of both the stream reader and the underlying stream. The stream reader will dispose of the underlying stream itself, so you can simply adjust your header to:
using (var reader = new StreamReader(
File.OpenRead("c:\\test\\testinput.txt")))

Related

ITextSharp - Cannot Open .pdf because it is being used by another process?

I am having a issue where I write to a pdf file and then close it and later on open it up again and try to read it.
I get "Cannot Open .pdf because it is being used by another process?"
var path = // get path
Directory.CrateDirctory(path);
using(var writer = new PdfWriter(path, //writer properties)){
var reader = new PdfReader(//template);
var pdfDoc = new PdfDcoument(reader, writer);
writer.SetCloseStream(false)
// some code to fill in the pdf
reader.Close();
pdfDoc.Close();
}
//later in code
using(var stream = new MemoryStream(File.ReadAllBytes(//the file that I crated above)))
{
// do some stuff here
}
I get the error right on the using statement above. I thought all the streams for creating the pdf are closed at this point but it seems like it is not.

The issue is with the line writer.SetCloseStream(false);. That is telling it to not close the stream when the writer is closed. Since the stream is left open you will get an IOException when you create another stream for reading. Remove this line or set to true to resolve the issue.
If you need to keep the stream open for whatever reason, like issues with flushing the write buffer too soon when PdfWriter is closed. Then you can get access to the write stream and close it later when you are ready to read it. Something like this:
Stream outputStream;
using (var writer = new PdfWriter(path)){
writer.SetCloseStream(false);
// do whatever you need here
outputStream = writer.GetOutputStream();
}
// do whatever else you need here
// close the stream before creating a read stream
if(null != outputStream) {
outputStream.Close();
}
using (var stream = new MemoryStream(File.ReadAllBytes(path)))
{
// do some stuff here
}

Stream.CopyToAsync is empty after first iteration

Background: I need to relay the content of the request to multiple other servers (via client.SendAsync(request)).
Problem: After first request the content stream is empty
[HttpPost]
public async Task<IActionResult> PostAsync() {
for (var n = 0; n <= 1; n++) {
using (var stream = new MemoryStream()) {
await Request.Body.CopyToAsync(stream);
// why is stream.length == 0 in the second iteration?
}
}
return StatusCode((int)HttpStatusCode.OK);
}

Streams have a pointer indicating at which position the stream is; after copying it, the pointer is at the end. You need to rewind a stream by setting its position to 0.
This is however only supported in streams that support seeking. You can read the request stream only once. This is because it's read "from the wire", and therefore doesn't support seeking.
When you want to copy the request stream to multiple output streams, you have two options:
Forward while you read
Read once into memory, then forward at will
The first option means all forwards happen at the same speed; the entire transfer goes as slow as the input, or as slow as the slowest reader. You read a chunk from the caller, and forward that chunk to all forward addresses.
For the second approach, you'll want to evaluate whether you can hold the entire request body plus the body for each forward address in memory. If that's not expected to be a problem and properly configured with sensible limits, then simply copy the request stream to a single MemoryStream and copy and rewind that one after every call:
using (var bodyStream = new MemoryStream())
{
await Request.Body.CopyToAsync(bodyStream);
for (...)
{
using (var stream = new MemoryStream())
{
await bodyStream.CopyToAsync(stream);
// Rewind for next copy
bodyStream.Position = 0;
}
}
}

I found out that the CopyToAsync function sets the origin stream position to the last read position. The next time I use CopyToAsync the stream starts reading from the last read position and does not find more content. However I could not use Request.Body.Position = 0 since it is not supported. I ended up copying the stream once more and reset the position after each copy.
If someone knows a cleaner solution you are welcome to point it out.
using (var contentStream = new MemoryStream()) {
await Request.Body.CopyToAsync(contentStream);
for (var n = 0; n <= 1; n++) {
using (var stream = new MemoryStream()) {
contentStream.Position = 0;
await contentStream.CopyToAsync(stream);
// works
}
}
}

Modify File Stream in memory

I am reading a file using StreamReader fileReader = File.OpenText(filePath). I would like to modify one line in the file in memory and push the modified stream to another method.
What I would like to avoid is reading the whole file into a string and modifying the string (doesn't scale). I would also like to avoid modifying the actual file.
Is there a straightforward way of doing this?

There is no built-in way to do that in .Net framework.
Stream and StreamReader/StreamWriter classes are designed to be chained if necessary (like GZipStream wraps stream to compress it). So you can create wrapper StreamReader and update data as you need for every operation after calling wrapped reader.

You can open two stream -one for read, one for write- at the same time. I tested simple code that works, but not sure that's what you want:
// "2.bar\r\n" will be replaced by "!!!!!\r\n"
File.WriteAllText("test.txt",
#"1.foo
2.bar
3.fake");
// open inputStream for StreamReader, and open outputStream for StreamWriter
using (var inputStream = File.Open("test.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (var reader = new StreamReader(inputStream))
using (var outputStream = File.Open("test.txt", FileMode.Open, FileAccess.Write, FileShare.Read))
using (var writer = new StreamWriter(outputStream))
{
var position = 0L; // track the reading position
var newLineLength = Environment.NewLine.Length;
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
// your particular conditions here.
if (line.StartsWith("2."))
{
// seek line start position
outputStream.Seek(position, SeekOrigin.Begin);
// replace by something,
// but the length should be equal to original in this case.
writer.WriteLine(new String('!', line.Length));
}
position += line.Length + newLineLength;
}
}
/* as a result, test.txt will be:
1.foo
!!!!!
3.fake
*/
As you can see, both streams can be accessed by StreamReader and StreamWriter at the same time. And you can also manipulate both read/write position as well.

Copy MemoryStream to FileStream and save the file?

I don't understand what I'm doing wrong here. I generate couple of memory streams and in debug-mode I see that they are populated. But when I try to copy MemoryStream to FileStream in order to save the file fileStream is not populated and file is 0bytes long (empty).
Here is my code
if (file.ContentLength > 0)
{
var bytes = ImageUploader.FilestreamToBytes(file); // bytes is populated
using (var inStream = new MemoryStream(bytes)) // inStream is populated
{
using (var outStream = new MemoryStream())
{
using (var imageFactory = new ImageFactory())
{
imageFactory.Load(inStream)
.Resize(new Size(320, 0))
.Format(ImageFormat.Jpeg)
.Quality(70)
.Save(outStream);
}
// outStream is populated here
var fileName = "test.jpg";
using (var fileStream = new FileStream(Server.MapPath("~/content/u/") + fileName, FileMode.CreateNew, FileAccess.ReadWrite))
{
outStream.CopyTo(fileStream); // fileStream is not populated
}
}
}
}

You need to reset the position of the stream before copying.
outStream.Position = 0;
outStream.CopyTo(fileStream);
You used the outStream when saving the file using the imageFactory. That function populated the outStream. While populating the outStream the position is set to the end of the populated area. That is so that when you keep on writing bytes to the steam, it doesn't override existing bytes. But then to read it (for copy purposes) you need to set the position to the start so you can start reading at the start.

If your objective is simply to dump the memory stream to a physical file (e.g. to look at the contents) - it can be done in one move:
System.IO.File.WriteAllBytes(#"C:\\filename", memoryStream.ToArray());
No need to set the stream position first either, since the .ToArray() operation explicitly ignores that, as per #BaconBits comment below https://learn.microsoft.com/en-us/dotnet/api/system.io.memorystream.toarray?view=netframework-4.7.2.

Another alternative to CopyTo is WriteTo.
Advantage:
No need to reset Position.
Usage:
outStream.WriteTo(fileStream);
Function Description:
Writes the entire contents of this memory stream to another stream.

Advanced TextReader to EndOfFile

I have a textReader that in a specific instance I want to be able to advance to the end of file quickly so other classes that might hold a reference to this object will not be able to call tr.ReadLine() without getting a null.
This is a large file. I cannot use TextReader.ReadToEnd() as it will often lead to an OutOfMemoryException
I thought I would ask the community if there was a way SEEK the stream without using TextReader.ReadToEnd() which returns a string of all data in the file.
Current method, inefficient.
The following example code is a mock up. Obviously I am not opening a file with an if statement directly following it asking if I want to read to the end.
TextReader tr = new StreamReader("Largefile");
if(needToAdvanceToEndOfFile)
{
while(tr.ReadLine() != null) { }
}
Desired solution (Note this code block contains fake 'concept' methods or methods that cannot be used due to risk of outofmemoryexception)
TextReader tr = new StreamReader("Largefile");
if(needToAdvanceToEndOfFile)
{
tr.SeekToEnd(); // A method that does not return anything. This method does not exist.
// tr.ReadToEnd() not acceptable as it can lead to OutOfMemoryException error as it is very large file.
}
A possible alternative is to read through the file in bigger chunks using tr.ReadBlock(args).
I poked around ((StreamReader)tr).BaseStream but could not find anything that worked.
As I am new to the community I figured I would see if someone knew the answer off the top of their head.

You have to discard any buffered data if you have read any file content - since data is buffered you might get content even if you seek the underlying stream to the end - working example:
StreamReader sr = new StreamReader(fileName);
string sampleLine = sr.ReadLine();
//discard all buffered data and seek to end
sr.DiscardBufferedData();
sr.BaseStream.Seek(0, SeekOrigin.End);
The problem as mentioned in the documentation is
The StreamReader class buffers input from the underlying stream when
you call one of the Read methods. If you manipulate the position of
the underlying stream after reading data into the buffer, the position
of the underlying stream might not match the position of the internal
buffer. To reset the internal buffer, call the DiscardBufferedData
method

Use
reader.BaseStream.Seek(0, SeekOrigin.End);
Test:
using (StreamReader reader = new StreamReader(#"Your Large File"))
{
reader.BaseStream.Seek(0, SeekOrigin.End);
int read = reader.Read();//read will be -1 since you are at the end of the stream
}
Edit: Test it with your code:
using (TextReader tr = new StreamReader("C:\\test.txt"))//test.txt is a file that has data and lines
{
((StreamReader)tr).BaseStream.Seek(0, SeekOrigin.End);
string foo = tr.ReadLine();
Debug.WriteLine(foo ?? "foo is null");//foo is null
int read = tr.Read();
Debug.WriteLine(read);//-1
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

StreamReader is too greedy - c#

Related

ITextSharp - Cannot Open .pdf because it is being used by another process?

Stream.CopyToAsync is empty after first iteration

Modify File Stream in memory

Copy MemoryStream to FileStream and save the file?

Advanced TextReader to EndOfFile

Categories

Resources