Stream.CopyToAsync is empty after first iteration

Stream.CopyToAsync is empty after first iteration - c#

Background: I need to relay the content of the request to multiple other servers (via client.SendAsync(request)).
Problem: After first request the content stream is empty
[HttpPost]
public async Task<IActionResult> PostAsync() {
for (var n = 0; n <= 1; n++) {
using (var stream = new MemoryStream()) {
await Request.Body.CopyToAsync(stream);
// why is stream.length == 0 in the second iteration?
}
}
return StatusCode((int)HttpStatusCode.OK);
}

Streams have a pointer indicating at which position the stream is; after copying it, the pointer is at the end. You need to rewind a stream by setting its position to 0.
This is however only supported in streams that support seeking. You can read the request stream only once. This is because it's read "from the wire", and therefore doesn't support seeking.
When you want to copy the request stream to multiple output streams, you have two options:
Forward while you read
Read once into memory, then forward at will
The first option means all forwards happen at the same speed; the entire transfer goes as slow as the input, or as slow as the slowest reader. You read a chunk from the caller, and forward that chunk to all forward addresses.
For the second approach, you'll want to evaluate whether you can hold the entire request body plus the body for each forward address in memory. If that's not expected to be a problem and properly configured with sensible limits, then simply copy the request stream to a single MemoryStream and copy and rewind that one after every call:
using (var bodyStream = new MemoryStream())
{
await Request.Body.CopyToAsync(bodyStream);
for (...)
{
using (var stream = new MemoryStream())
{
await bodyStream.CopyToAsync(stream);
// Rewind for next copy
bodyStream.Position = 0;
}
}
}

I found out that the CopyToAsync function sets the origin stream position to the last read position. The next time I use CopyToAsync the stream starts reading from the last read position and does not find more content. However I could not use Request.Body.Position = 0 since it is not supported. I ended up copying the stream once more and reset the position after each copy.
If someone knows a cleaner solution you are welcome to point it out.
using (var contentStream = new MemoryStream()) {
await Request.Body.CopyToAsync(contentStream);
for (var n = 0; n <= 1; n++) {
using (var stream = new MemoryStream()) {
contentStream.Position = 0;
await contentStream.CopyToAsync(stream);
// works
}
}
}

Related

Skip First Row (CSV Header Row) of HttpResponseMessage Content.ReadAsStream

Below is a simplified example of a larger piece of code. Basically I'm calling one or more API endpoints and downloading a CSV file that gets written to an Azure Blob Container. If there's multiple files, the blob is appended for every new csv file loaded.
The issue is when I append the target blob I ended up with a multiple header rows scattered throughout the file depending on how may CSVs I consumed. All the CSVs have the same header row and I know the first row will always have a line feed. Is there a way to read the stream, skip the content until after the first line feed and then copy the stream to the blob?
It seemed simple in my head, but I'm having trouble finding my way there code-wise. I don't want to wait for the whole file to download and then in-memory delete the header row since some of these files can be several gigabytes.
I am using .net core v6 if that helps
using Stream blobStream = await blockBlobClient.OpenWriteAsync(true);
{
for (int i = 0; i < 3; i++)
{
using HttpResponseMessage response = await client.GetAsync(downloadUrls[i], HttpCompletionOption.ResponseHeadersRead);
Stream sourceStream = response.Content.ReadAsStream();
sourceStream.CopyTo(blobStream);
}
}

.CopyTo copies from the current position in the stream. So all you need to do is throw away all the characters until you have thrown away the first CR or Line Feed.
using Stream blobStream = await blockBlobClient.OpenWriteAsync(true);
{
for (int i = 0; i < 3; i++)
{
using HttpResponseMessage response = await client.GetAsync(downloadUrls[i], HttpCompletionOption.ResponseHeadersRead);
Stream sourceStream = response.Content.ReadAsStream();
if (i != 0)
{
char c;
do { c = (char)sourceStream.ReadByte(); } while (c != '\n');
}
sourceStream.CopyTo(blobStream);
}
}
If all the files always have the same size header row, you can come up with a constant for its length. That way you could just skip the stream to the exact correct location like this:
using Stream blobStream = await blockBlobClient.OpenWriteAsync(true);
{
for (int i = 0; i < 3; i++)
{
using HttpResponseMessage response = await client.GetAsync(downloadUrls[i], HttpCompletionOption.ResponseHeadersRead);
Stream sourceStream = response.Content.ReadAsStream();
if (i != 0)
sourceStream.Seek(HeaderSizeInBytes, SeekOrigin.Begin);
sourceStream.CopyTo(blobStream);
}
}
This will be slightly quicker but does have the downside that the files can't change format easily in the future.
P.S. You probably want to Dispose sourceStream. Either directly or by wrapping its creation in a using statement.

If we can assume that stream contains UTF 8 encoded text then you can do the following:
Create a streamReader against sourceStream
var reader = new StreamReader(sourceStream);
Read the first line (assumed the lines ends with \n)
var header = reader.ReadLine();
Convert the first line + a \n to byte array
var headerInBytes = Encoding.UTF8.GetBytes(header + Environment.NewLine);
Set the position after the first line
sourceStream.Position = headerInBytes.Length;
Copy the source stream from the desired position
sourceStream.CopyTo(blobStream);
This proposed solution is just an example, depending on the actual stream content you might need to further adjust it and make it more robust.

How to properly open and read from a StorageFile multiple times?

In Windows Phone 8.1 (WinRT) I'm grabbing a file from the user's document folder and trying to read through it twice. Once to read each line and get a count of total line for progress tracking purposes. And the second time to actually parse the data. However, on the second pass I get a "File is not readable" type error. So I have a small understanding of what's going on but not entirely. Am I getting this error because the stream of the file is already at the end of the file? Can't I just open a new stream from the same file, or do I have to close the first stream?
Here's my code:
public async Task UploadBerData(StorageFile file)
{
_csvParser = new CsvParser();
var stream = await file.OpenAsync(FileAccessMode.Read);
using (var readStream = stream.AsStreamForRead())
{
dataCount = _csvParser.GetDataCount(stream.AsStreamForRead());
// Set the progressBar total to 2x dataCount.
// Once for reading, twice for uploading data
TotalProgress = dataCount * 2;
CurrentProgress = 0;
}
var csvData = _csvParser.GetFileData(stream.AsStreamForRead());
...
}

After using the Stream, the position is the end of stream length.
You can set it to beginning to read stream again.
Add following line before your parse data function.
stream.Position = 0;

Can I get a GZipStream for a file without writing to intermediate temporary storage?

Can I get a GZipStream for a file on disk without writing the entire compressed content to temporary storage? I'm currently using a temporary file on disk in order to avoid possible memory exhaustion using MemoryStream on very large files (this is working fine).
public void UploadFile(string filename)
{
using (var temporaryFileStream = File.Open("tempfile.tmp", FileMode.CreateNew, FileAccess.ReadWrite))
{
using (var fileStream = File.OpenRead(filename))
using (var compressedStream = new GZipStream(temporaryFileStream, CompressionMode.Compress, true))
{
fileStream.CopyTo(compressedStream);
}
temporaryFileStream.Position = 0;
Uploader.Upload(temporaryFileStream);
}
}
What I'd like to do is eliminate the temporary storage by creating GZipStream, and have it read from the original file only as the Uploader class requests bytes from it. Is such a thing possible? How might such an implementation be structured?
Note that Upload is a static method with signature static void Upload(Stream stream).
Edit: The full code is here if it's useful. I hope I've included all the relevant context in my sample above however.

Yes, this is possible, but not easily with any of the standard .NET stream classes. When I needed to do something like this, I created a new type of stream.
It's basically a circular buffer that allows one producer (writer) and one consumer (reader). It's pretty easy to use. Let me whip up an example. In the meantime, you can adapt the example in the article.
Later: Here's an example that should come close to what you're asking for.
using (var pcStream = new ProducerConsumerStream(BufferSize))
{
// start upload in a thread
var uploadThread = new Thread(UploadThreadProc(pcStream));
uploadThread.Start();
// Open the input file and attach the gzip stream to the pcStream
using (var inputFile = File.OpenRead("inputFilename"))
{
// create gzip stream
using (var gz = new GZipStream(pcStream, CompressionMode.Compress, true))
{
var bytesRead = 0;
var buff = new byte[65536]; // 64K buffer
while ((bytesRead = inputFile.Read(buff, 0, buff.Length)) != 0)
{
gz.Write(buff, 0, bytesRead);
}
}
}
// The entire file has been compressed and copied to the buffer.
// Mark the stream as "input complete".
pcStream.CompleteAdding();
// wait for the upload thread to complete.
uploadThread.Join();
// It's very important that you don't close the pcStream before
// the uploader is done!
}
The upload thread should be pretty simple:
void UploadThreadProc(object state)
{
var pcStream = (ProducerConsumerStream)state;
Uploader.Upload(pcStream);
}
You could, of course, put the producer on a background thread and have the upload be done on the main thread. Or have them both on background threads. I'm not familiar with the semantics of your uploader, so I'll leave that decision to you.

StreamReader is too greedy

I'm trying to process part of a text file, and write the remainder of the text file to a cloud blob using UploadFromStream. The problem is that the StreamReader appears to be grabbing too much content from the underlying stream, and so the subsequent write does nothing.
Text file:
3
Col1,String
Col2,Integer
Col3,Boolean
abc,123,True
def,3456,False
ghijkl,532,True
mnop,1211,False
Code:
using (var stream = File.OpenRead("c:\\test\\testinput.txt"))
using (var reader = new StreamReader(stream))
{
var numColumns = int.Parse(reader.ReadLine());
while (numColumns-- > 0)
{
var colDescription = reader.ReadLine();
// do stuff
}
// Write remaining contents to another file, for testing
using (var destination = File.OpenWrite("c:\\test\\testoutput.txt"))
{
stream.CopyTo(destination);
destination.Flush();
}
// Actual intended usage:
// CloudBlockBlob blob = ...;
// blob.UploadFromStream(stream);
}
When debugging, I observe that stream.Position jumps to the end of the file on the first call to reader.ReadLine(), which I don't expect. I expected the stream to be advanced only as many positions as the reader needed to read some content.
I imagine that the stream reader is doing some buffering for performance reasons, but there doesn't seem to be a way to ask the reader where in the underlying stream it "really" is. (If there was, I could manually Seek the stream to that position before CopyingTo).
I know that I could keep taking lines using the same reader and sequentially append them to the text file I'm writing, but I'm wondering if there's a cleaner way?
EDIT:
I found a StreamReader constructor which leaves the underlying stream open when it is disposed, so I tried this, hoping that the reader would set the stream's position as it's being disposed:
using (var stream = File.OpenRead("c:\\test\\testinput.txt"))
{
using (var reader = new StreamReader(stream, Encoding.UTF8,
detectEncodingFromByteOrderMarks: true,
bufferSize: 1 << 12,
leaveOpen: true))
{
var numColumns = int.Parse(reader.ReadLine());
while (numColumns-- > 0)
{
var colDescription = reader.ReadLine();
// do stuff
}
}
// Write remaining contents to another file
using (var destination = File.OpenWrite("c:\\test\\testoutput.txt"))
{
stream.CopyTo(destination);
destination.Flush();
}
}
But it doesn't. Why would this constructor be exposed if it doesn't leave the stream in an intuitive state/position?

Sure, there's a cleaner way. Use ReadToEnd to read the remaining data, and then write it to a new file. For example:
using (var reader = new StreamReader("c:\\test\\testinput.txt"))
{
var numColumns = int.Parse(reader.ReadLine());
while (numColumns-- > 0)
{
var colDescription = reader.ReadLine();
// do stuff
}
// write everything else to another file.
File.WriteAllText("c:\\test\\testoutput.txt", reader.ReadToEnd());
}
Edit after comment
If you want to read the text and upload it to a stream, you could replace the File.WriteAllText with code that reads the remaining text, writes it to a StreamWriter backed by a MemoryStream, and then sends the contents of that MemoryStream. Something like:
using (var memStream = new MemoryStream())
{
using (var writer = new StreamWriter(memStream))
{
writer.Write(reader.ReadToEnd());
writer.Flush();
memStream.Position = 0;
blob.UploadFromStream(memStream);
}
}

You should never access the underlying stream of a StreamReader. Trying to use both is going to have an undefined behavior.
What's going on here is that the reader is buffering the data from the underlying stream. It doesn't read each byte exactly when you request it, because that's often going to be very inefficient. Instead it will grab chunks, put them in a buffer, and then provide you with data from that buffer, grabbing a new chunk when it needs to.
You should continue to use the StreamReader throughout the remainder of that block, instead of using stream. To minimize the memory footprint of the program, the most effective way of doing this would be to read the next line from the reader in a loop until it his the end of the file, writing each line to the output stream as you go.
Also note that you don't need to be disposing of both the stream reader and the underlying stream. The stream reader will dispose of the underlying stream itself, so you can simply adjust your header to:
using (var reader = new StreamReader(
File.OpenRead("c:\\test\\testinput.txt")))

Using unseakable stream more than once

using (var stream = GetS3ObjectStream(fooObj))
{
WriteStreamToFtp(stream, "ftp://firstserver");
WriteStreamToFtp(stream, "ftp://SecondServer");
}
First one would work, but the second one will create just an empty file
GetS3ObjectStream gets a stream for Amazon Simple storage object. It's unseakable stream and you can't change the cursor position in it.
Now, I either have to somehow restrict to use the stream object more than once or I have to to move the cursor to the beginning.
any ideas?
upd: yeah of course you can simply save the stream in temp. variable, but if you don't wanna do that, what then? is it ok to make methods that take streams and leave them with such side-effects?
what if I close and dispose the source stream in the method?

You can either re-create your stream or buffer it into a byte[] or MemoryStream before use.
Edit: I forgot to mention a solution which does not buffer:
Open the two destination streams
Open the source stream
Until the source is drained, read a buffer of N bytes and write the buffer to both destination streams (you can even do the writing in parallel using Parallel.Invoke())

Here’s an example of how it may be done reading the stream just once (without initializing a temporary copy of the stream’s contents).
This assumes that you can replace your WriteStreamToFtp call with access to the actual target stream. Note that the FtpStream constructor is placeholder code.
using (var source = GetS3ObjectStream(fooObj))
using (var target1 = new FtpStream("ftp://firstserver"))
using (var target2 = new FtpStream("ftp://SecondServer"))
{
byte[] buffer = new byte[1024];
while (true)
{
int count = source.Read(buffer, 0, buffer.Length);
if (count == 0)
break;
target1.Write(buffer, 0, count);
target2.Write(buffer, 0, count);
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Stream.CopyToAsync is empty after first iteration - c#

Related

Skip First Row (CSV Header Row) of HttpResponseMessage Content.ReadAsStream

How to properly open and read from a StorageFile multiple times?

Can I get a GZipStream for a file without writing to intermediate temporary storage?

StreamReader is too greedy

Using unseakable stream more than once

Categories

Resources