This is a little more tricky than I first imagined. I'm trying to read n bytes from a stream.
The MSDN claims that Read does not have to return n bytes, it just must return at least 1 and up to n bytes, with 0 bytes being the special case of reaching the end of the stream.
Typically, I'm using something like
var buf = new byte[size];
var count = stream.Read (buf, 0, size);
if (count != size) {
buf = buf.Take (count).ToArray ();
}
yield return buf;
I'm hoping for exactly size bytes but by spec FileStream would be allowed to return a large number of 1-byte chunks as well. This must be avoided.
One way to solve this would be to have 2 buffers, one for reading and one for collecting the chunks until we got the requested number of bytes. That's a little cumbersome though.
I also had a look at BinaryReader but its spec also does not clearly state that n bytes will be returned for sure.
To clarify: Of course, upon the end of the stream the returned number of bytes may be less than size - that's not a problem. I'm only talking about not receiving n bytes even though they are available in the stream.
A slightly more readable version:
int offset = 0;
while (offset < count)
{
int read = stream.Read(buffer, offset, count - offset);
if (read == 0)
throw new System.IO.EndOfStreamException();
offset += read;
}
Or written as an extension method for the Stream class:
public static class StreamUtils
{
public static byte[] ReadExactly(this System.IO.Stream stream, int count)
{
byte[] buffer = new byte[count];
int offset = 0;
while (offset < count)
{
int read = stream.Read(buffer, offset, count - offset);
if (read == 0)
throw new System.IO.EndOfStreamException();
offset += read;
}
System.Diagnostics.Debug.Assert(offset == count);
return buffer;
}
}
Simply; you loop;
int read, offset = 0;
while(leftToRead > 0 && (read = stream.Read(buf, offset, leftToRead)) > 0) {
leftToRead -= read;
offset += read;
}
if(leftToRead > 0) throw new EndOfStreamException(); // not enough!
After this, buf should have been populated with exactly the right amount of data from the stream, or will have thrown an EOF.
Getting everything together from answers here I came up with the following solution. It relies on a source stream length. Works on .NET core 3.1
/// <summary>
/// Copy stream based on source stream length
/// </summary>
/// <param name="source"></param>
/// <param name="destination"></param>
/// <param name="bufferSize">
/// A value that is the largest multiple of 4096 and is still smaller than the LOH threshold (85K).
/// So the buffer is likely to be collected at Gen0, and it offers a significant improvement in Copy performance.
/// </param>
/// <returns></returns>
private async Task CopyStream(Stream source, Stream destination, int bufferSize = 81920)
{
var buffer = new byte[bufferSize];
var offset = 0;
while (offset < source.Length)
{
var leftToRead = source.Length - offset;
var lengthToRead = leftToRead - buffer.Length < 0 ? (int)(leftToRead) : buffer.Length;
var read = await source.ReadAsync(buffer, 0, lengthToRead).ConfigureAwait(false);
if (read == 0)
break;
await destination.WriteAsync(buffer, 0, lengthToRead).ConfigureAwait(false);
offset += read;
}
destination.Seek(0, SeekOrigin.Begin);
}
Related
Here is the method I like to use. I believe, there is nothing new with this code.
public static byte[] ReadFully(Stream stream, int initialLength)
{
// If we've been passed an unhelpful initial length, just
// use 1K.
if (initialLength < 1)
{
initialLength = 1024;
}
byte[] buffer = new byte[initialLength];
int read = 0;
int chunk;
while ((chunk = stream.Read(buffer, read, buffer.Length - read)) > 0)
{
read += chunk;
// If we've reached the end of our buffer, check to see if there's
// any more information
if (read == buffer.Length)
{
int nextByte = stream.ReadByte();
// End of stream? If so, we're done
if (nextByte == -1)
{
return buffer;
}
// Nope. Resize the buffer, put in the byte we've just
// read, and continue
byte[] newBuffer = new byte[buffer.Length * 2];
Array.Copy(buffer, newBuffer, buffer.Length);
newBuffer[read] = (byte)nextByte;
buffer = newBuffer;
read++;
}
}
// Buffer is now too big. Shrink it.
byte[] ret = new byte[read];
Array.Copy(buffer, ret, read);
return ret;
}
My goal is to read data sent from TCP Clients e.g. box{"id":1,"aid":1}
It is a command to interpret in my application in Jason-like text.
And this text is not necessarily at the same size each time.
Next time there can be run{"id":1,"aid":1,"opt":1}.
The method called by this line;
var serializedMessageBytes = ReadFully(_receiveMemoryStream, 1024);
Please click to see; Received data in receiveMemoryStream
Although we can see the data in the stream,
in the ReadFully method, "chunck" always return 0 and the method returns {byte[0]}.
Any help effort greatly appreciated.
Looking at your stream in the Watch window, the Position of the stream (19) is at the end of the data, hence there is nothing left to read. This is possibly because you have just written data to the stream and have not subsequently reset the position.
Add a stream.Position = 0; or stream.Seek(0, System.IO.SeekOrigin.Begin); statement at the start of the function if you are happy to always read from the start of the stream, or check the code that populates the stream. Note though that some stream implementations do not support seeking.
While converting some older code to use async in c#, I started seeing problems in variations of return values from the Read() and ReadAsync() methods of the DeflateStream.
I thought that the transition from synchronous code like
bytesRead = deflateStream.Read(buffer, 0, uncompressedSize);
to it's equivalent asynchronous version of
bytesRead = await deflateStream.ReadAsync(buffer, 0, uncompressedSize);
should always return the same value.
See updated code added to bottom of question - that uses streams the correct way - hence making the initial question irrelevant
I found that after number of iterations this didn't hold true, and in my specific case was causing random errors in the converted application.
Am I missing something here?
Below is simple repro case (in a console app), where the Assert will break for me in the ReadAsync method on iteration #412, giving output that looks like this:
....
ReadAsync #410 - 2055 bytes read
ReadAsync #411 - 2055 bytes read
ReadAsync #412 - 453 bytes read
---- DEBUG ASSERTION FAILED ----
My question is, why is the DeflateStream.ReadAsync method returning 453 bytes at this point?
Note: this only happens with certain input strings - the massive StringBuilder stuff in the CreateProblemDataString was the best way I could think of constructing the string for this post.
class Program
{
static byte[] DataAsByteArray;
static int uncompressedSize;
static void Main(string[] args)
{
string problemDataString = CreateProblemDataString();
DataAsByteArray = Encoding.ASCII.GetBytes(problemDataString);
uncompressedSize = DataAsByteArray.Length;
MemoryStream memoryStream = new MemoryStream();
using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Compress, true))
{
for (int i = 0; i < 1000; i++)
{
deflateStream.Write(DataAsByteArray, 0, uncompressedSize);
}
}
// now read it back synchronously
Read(memoryStream);
// now read it back asynchronously
Task retval = ReadAsync(memoryStream);
retval.Wait();
}
static void Read(MemoryStream memoryStream)
{
memoryStream.Position = 0;
using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Decompress, true))
{
byte[] buffer = new byte[uncompressedSize];
int bytesRead = -1;
int i = 0;
while (bytesRead > 0 || bytesRead == -1)
{
bytesRead = deflateStream.Read(buffer, 0, uncompressedSize);
System.Diagnostics.Debug.WriteLine("Read #{0} - {1} bytes read", i, bytesRead);
System.Diagnostics.Debug.Assert(bytesRead == 0 || bytesRead == uncompressedSize);
i++;
}
}
}
static async Task ReadAsync(MemoryStream memoryStream)
{
memoryStream.Position = 0;
using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Decompress, true))
{
byte[] buffer = new byte[uncompressedSize];
int bytesRead = -1;
int i = 0;
while (bytesRead > 0 || bytesRead == -1)
{
bytesRead = await deflateStream.ReadAsync(buffer, 0, uncompressedSize);
System.Diagnostics.Debug.WriteLine("ReadAsync #{0} - {1} bytes read", i, bytesRead);
System.Diagnostics.Debug.Assert(bytesRead == 0 || bytesRead == uncompressedSize);
i++;
}
}
}
/// <summary>
/// This is one of the strings of data that was causing issues.
/// </summary>
/// <returns></returns>
static string CreateProblemDataString()
{
StringBuilder sb = new StringBuilder();
sb.Append("0601051081 ");
sb.Append(" ");
sb.Append(" 225021 0300420");
sb.Append("34056064070072076361102 13115016017");
sb.Append("5 192 230237260250 2722");
sb.Append("73280296 326329332 34535535");
sb.Append("7 3 ");
sb.Append(" 4");
sb.Append(" ");
sb.Append(" 50");
sb.Append("6020009 030034045 063071076 360102 13");
sb.Append("1152176160170 208206 23023726025825027227328");
sb.Append("2283285 320321333335341355357 622005009 0");
sb.Append("34053 060070 361096 130151176174178172208");
sb.Append("210198 235237257258256275276280290293 3293");
sb.Append("30334 344348350 ");
sb.Append(" ");
sb.Append(" ");
sb.Append(" ");
sb.Append(" 225020012014 046042044034061");
sb.Append("075078 361098 131152176160170 208195210 230");
sb.Append("231260257258271272283306 331332336 3443483");
sb.Append("54 29 ");
sb.Append(" ");
sb.Append(" 2");
sb.Append("5 29 06 0");
sb.Append("1 178 17");
sb.Append("4 205 2");
sb.Append("05 195 2");
sb.Append("31 231 23");
sb.Append("7 01 01 0");
sb.Append("2 260 26");
sb.Append("2 274 2");
sb.Append("72 274 01 01 0");
sb.Append("3 1 5 3 6 43 52 ");
return sb.ToString();
}
}
UPDATED CODE TO READ STREAMS INTO BUFFER CORRECTLY
Output now looks like this:
...
ReadAsync #410 - 2055 bytes read
ReadAsync #411 - 2055 bytes read
ReadAsync PARTIAL #412 - 453 bytes read, offset for next read = 453
ReadAsync #412 - 1602 bytes read
ReadAsync #413 - 2055 bytes read
...
static void Read(MemoryStream memoryStream)
{
memoryStream.Position = 0;
using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Decompress, true))
{
byte[] buffer = new byte[uncompressedSize]; // buffer to hold known fixed size record.
int bytesRead; // number of bytes read from Read operation
int offset = 0; // offset for writing into buffer
int i = -1; // counter to track iteration #
while ((bytesRead = deflateStream.Read(buffer, offset, uncompressedSize - offset)) > 0)
{
offset += bytesRead; // offset in buffer for results of next reading
System.Diagnostics.Debug.Assert(offset <= uncompressedSize, "should never happen - because would mean more bytes read than requested.");
if (offset == uncompressedSize) // buffer full, complete fixed size record in buffer.
{
offset = 0; // buffer is now filled, next read to start at beginning of buffer again.
i++; // increment counter that tracks iteration #
System.Diagnostics.Debug.WriteLine("Read #{0} - {1} bytes read", i, bytesRead);
}
else // buffer still not full
{
System.Diagnostics.Debug.WriteLine("Read PARTIAL #{0} - {1} bytes read, offset for next read = {2}", i+1, bytesRead, offset);
}
}
}
}
static async Task ReadAsync(MemoryStream memoryStream)
{
memoryStream.Position = 0;
using (DeflateStream deflateStream = new DeflateStream(memoryStream, CompressionMode.Decompress, true))
{
byte[] buffer = new byte[uncompressedSize]; // buffer to hold known fixed size record.
int bytesRead; // number of bytes read from Read operation
int offset = 0; // offset for writing into buffer
int i = -1; // counter to track iteration #
while ((bytesRead = await deflateStream.ReadAsync(buffer, offset, uncompressedSize - offset)) > 0)
{
offset += bytesRead; // offset in buffer for results of next reading
System.Diagnostics.Debug.Assert(offset <= uncompressedSize, "should never happen - because would mean more bytes read than requested.");
if (offset == uncompressedSize) // buffer full, complete fixed size record in buffer.
{
offset = 0; // buffer is now filled, next read to start at beginning of buffer again.
i++; // increment counter that tracks iteration #
System.Diagnostics.Debug.WriteLine("ReadAsync #{0} - {1} bytes read", i, bytesRead);
}
else // buffer still not full
{
System.Diagnostics.Debug.WriteLine("ReadAsync PARTIAL #{0} - {1} bytes read, offset for next read = {2}", i+1, bytesRead, offset);
}
}
}
}
Damien's comments are exactly correct. But, your mistake is a common enough one and IMHO the question deserves an actual answer, if for no other reason than to help others who make the same mistake more easily find the answer to the question.
So, to be clear:
As is true for all of the stream-oriented I/O methods in .NET where one provides a byte[] buffer and the number of bytes read is returned by the method, the only assumptions you can make about the number of bytes are:
The number will not be larger than the maximum number of bytes you asked to read (i.e. passed to the method as the count of bytes to read)
The number will be non-negative, and will be greater than 0 as long as there were in fact data remaining to be read (0 will be returned when you reach the end of the stream).
When reading using any of these methods, you cannot even count on the same method always returning the same number of bytes (depending on context…obviously in some cases, this is in fact deterministic, but you should still not rely on that), and there is no guarantee of any sort that different methods, even those which are reading from the same source, will always return the same number of bytes as some other method.
It is up to the caller to read the bytes as a stream, taking into account the return value specifying the number of bytes read for each call, and reassembling those bytes in whatever manner is appropriate for that particular stream of bytes.
Note that when dealing with Stream objects, you can use the Stream.CopyTo() method. Of course, it only copies to another Stream object. But in many cases, the destination object can be used without treating it as a Stream. E.g. you just want to write the data as a file, or you want to copy it to a MemoryStream and then use the MemoryStream.ToArray() method to turn that into an array of bytes (which you can then access without any concern about how many bytes have been read in a given read operation…by the time you get to the array, all of them have been read :) ).
Is there any way to use Stream.CopyTo to copy only certain number of bytes to destination stream? what is the best workaround?
Edit:
My workaround (some code omitted):
internal sealed class Substream : Stream
{
private readonly Stream stream;
private readonly long origin;
private readonly long length;
private long position;
public Substream(Stream stream, long length)
{
this.stream = stream;
this.origin = stream.Position;
this.position = stream.Position;
this.length = length;
}
public override int Read(byte[] buffer, int offset, int count)
{
var n = Math.Max(Math.Min(count, origin + length - position), 0);
int bytesRead = stream.Read(buffer, offset, (int) n);
position += bytesRead;
return bytesRead;
}
}
then to copy n bytes:
var substream = new Substream(stream, n);
substream.CopyTo(stm);
The implementation of copying streams is not overly complicated. If you want to adapt it to copy only a certain number of bytes then it shouldn't be too difficult to tweak the existing method, something like this
public static void CopyStream(Stream input, Stream output, int bytes)
{
byte[] buffer = new byte[32768];
int read;
while (bytes > 0 &&
(read = input.Read(buffer, 0, Math.Min(buffer.Length, bytes))) > 0)
{
output.Write(buffer, 0, read);
bytes -= read;
}
}
The check for bytes > 0 probably isn't strictly necessary, but can't do any harm.
What's wrong with just copying the bytes you need using a buffer?
long CopyBytes(long bytesRequired, Stream inStream, Stream outStream)
{
long readSoFar = 0L;
var buffer = new byte[64*1024];
do
{
var toRead = Math.Min(bytesRequired - readSoFar, buffer.Length);
var readNow = inStream.Read(buffer, 0, (int)toRead);
if (readNow == 0)
break; // End of stream
outStream.Write(buffer, 0, readNow);
readSoFar += readNow;
} while (readSoFar < bytesRequired);
return readSoFar;
}
Best explained with code:
long pieceLength = Math.Pow(2,18); //simplification
...
public void HashFile(string path)
{
using (FileStream fin = File.OpenRead(path))
{
byte[] buffer = new byte[(int)pieceLength];
int pieceNum = 0;
long remaining = fin.Length;
int done = 0;
int offset = 0;
while (remaining > 0)
{
while (done < pieceLength)
{
int toRead = (int)Math.Min(pieceLength, remaining);
int read = fin.Read(buffer, offset, toRead);
//if read == 0, EOF reached
if (read == 0)
break;
offset += read;
done += read;
remaining -= read;
}
HashPiece(buffer, pieceNum);
done = 0;
pieceNum++;
buffer = new byte[(int)pieceLength];
}
}
}
This works fine if the file is smaller than pieceLength and only does the outer loop once. However, if the file is larger, it throws this at me:
This is in the int read = fin.Read(buffer, offset, toRead); line.
Unhandled Exception: System.ArgumentException: Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.
at System.IO.FileStream.Read(Byte[] array, Int32 offset, Int32 count)
done, buffer DO get reinitialized properly. File is larger than 1 MB.
Thanks in advance
Well, at least one problem is that you're not taking into account the "piece already read" when you work out how much to read. Try this:
int toRead = (int) Math.Min(pieceLenght - done, remaining);
And then also adjust where you're reading to within the buffer:
int read = fin.Read(buffer, done, toRead);
(as you're resetting done for the new buffer, but not offset).
Oh, and at that point offset is irrelevant, so remove it.
Then note djna's answer as well - consider the case where for whatever reason you read to the end of the file, but without remaining becoming zero. You may want to consider whether remaining is actually useful at all... why not just keep reading blocks until you get to the end of the stream?
You don't adjust the value of "remaining" in this case
if (read == 0)
break;
The FileStream.Read method's Offset and Length parameters relate to positions in the buffer, not to positions in the file.
Basically, this should fix it:
int read = fin.Read(buffer, 0, toRead);
I was searching for a BinaryReader.Skip function, while I came across this feature request on msdn.
He said you can provide your own BinaryReader.Skip() function, by using this.
Only looking at this code, I'm wondering why he chose this way to skip a certain amount of bytes:
for (int i = 0, i < count; i++) {
reader.ReadByte();
}
Is there a difference between that and:
reader.ReadBytes(count);
Even if it's just a small optimalisation, I'd like to undestand. Because now it doesnt make sense to me why you would use the for loop.
public void Skip(this BinaryReader reader, int count) {
if (reader.BaseStream.CanSeek) {
reader.BaseStream.Seek(count, SeekOffset.Current);
}
else {
for (int i = 0, i < count; i++) {
reader.ReadByte();
}
}
}
No, there is no difference. EDIT: Assuming that the stream has enough byes
The ReadByte method simply forwards to the underlying Stream's ReadByte method.
The ReadBytes method calls the underlying stream's Read until it reads the required number of bytes.
It's defined like this:
public virtual byte[] ReadBytes(int count) {
if (count < 0) throw new ArgumentOutOfRangeException("count", Environment.GetResourceString("ArgumentOutOfRange_NeedNonNegNum"));
Contract.Ensures(Contract.Result<byte[]>() != null);
Contract.Ensures(Contract.Result<byte[]>().Length <= Contract.OldValue(count));
Contract.EndContractBlock();
if (m_stream==null) __Error.FileNotOpen();
byte[] result = new byte[count];
int numRead = 0;
do {
int n = m_stream.Read(result, numRead, count);
if (n == 0)
break;
numRead += n;
count -= n;
} while (count > 0);
if (numRead != result.Length) {
// Trim array. This should happen on EOF & possibly net streams.
byte[] copy = new byte[numRead];
Buffer.InternalBlockCopy(result, 0, copy, 0, numRead);
result = copy;
}
return result;
}
For most streams, ReadBytes will probably be faster.
ReadByte will throw an EndOfStreamException if the end of the stream is reached, whereas ReadBytes will not. It depends on whether you want Skip to throw if it cannot skip the requested number of bytes without reaching the end of the stream.
ReadBytes is faster than multiple ReadByte calls.
Its a very small optimization which will occasionally skip bytes (rather then reading them into ReadByte) Think of it this way
if(vowel)
{
println(vowel);
}
else
{
nextLetter();
}
If you can prevent that extra function call you save a little runtime