Copying from one stream to another? - c#

For work, the specification on my project is to use .Net 2.0 so I don't get the handy CopyTo function brought about later on.
I need to copy the response stream from an HttpWebResponse to another stream (most likely a MemoryStream, but it could be any subclass of Stream). My normal tactic has been something along the lines of:
BufferedStream bufferedresponse = new BufferedStream(HttpResponse.GetResponseStream());
int count = 0;
byte[] buffer = new byte[1024];
do {
count = bufferedresponse.Read(buffer, 0, buffer.Length);
target.Write(buffer, 0, count);
} while (count > 0);
bufferedresponse.Close();
Are there more efficient ways to do this? Does the size of the buffer really matter? What is the best way to copy from one stream to another in .Net 2.0?
P.S. This is for downloading large, 200+ MB GIS tif images. Of course reliability is paramount.

This is a handy function. And yes, the buffer size matters. Increasing it might give you better performance on large files.
public static void WriteTo(Stream sourceStream, Stream targetStream)
{
byte[] buffer = new byte[0x10000];
int n;
while ((n = sourceStream.Read(buffer, 0, buffer.Length)) != 0)
targetStream.Write(buffer, 0, n);
}

The size of the buffer does matter. If you're copying a megabyte of data one byte at a time, for example, you're going to make a 2^20 iterations through the loop. If you're copying 1 kilobyte at a time, you'll only make 2^10 iterations through the loop. There is significant overhead in the calls to Read and Write when you're making a million of them.
For reading FileStream, I typically use a buffer that's between 64K and 256K. Anything less than 32K shows a marked decrease in performance, as does anything above 256K. The difference between using a 64K buffer and a 256K buffer is not worth the extra memory. Be aware, though, that those numbers are on my system and across my network. Your numbers will vary depending on hardware and operating system.
For network streams, you should select a buffer size that will keep up with the incoming data stream. I'd suggest at least 4 kilobytes, which will give you some buffer if the write stalls for any reason.

You can get rid of the BufferedStream, it's only useful if you are reading small chunks of the stream. Just get the response stream in a variable and use that:
Stream response = HttpResponse.GetResponseStream();
A buffer that is too small will reduce the performance. You can use a larger buffer, so that at least the data from an entire IP packet fits. I looked around a bit, and 4096 bytes should be enough for that. You can really use any size up to 85 kb, after that it's allocated in the large objects heap, which you should avoid to do when there is no reason for it.
Other than that, it's about as efficient as it gets.

I can think of two ways.
Check if Stream.MemberWiseClone() fits your need. It gets you a shallow copy of your object.
Check if this works, if both the ends are of the Stream type.
BufferedStream bs = new BufferedStream((Stream)memoryStreamObject);

Related

C# - How many bytes should I read from a FileStream at one time?

The title doesn't make it that obvious what I'm asking, but I have created an algorithm that compresses a bunch of files into a single file, and then decompresses them again. To avoid OutOfMemory Exceptions, I use two fileStreams to, first read segments of data from the original files, and then the other fileStream to write these segments into the final file.
I have included my code excerpt below. In this case, rStream and wStream are already declared accordingly, and the bufferSize is currently at 16 mB. fInfo is the file info for the file we are reading from.
Obviously the higher the bufferSize, the faster the operation is completed. I want to know what the maximum possible bufferSize I should use to maximize the efficiency of the operation?
int bytesRead = 0;
long toRead = fInfo.Length - curFileSize;
if (toRead > bufferSize) { toRead = bufferSize; }
byte[] fileSegment = new byte[toRead];
while (bytesRead < toRead)
{
bytesRead += rStream.Read(fileSegment, bytesRead, (int)toRead - bytesRead);
}
wStream.Seek(finalFileSize, SeekOrigin.Begin);
wStream.Write(fileSegment, 0, (int)toRead);
A buffer of 16 MB definitely sounds like overkill. Usually a few kilobytes is used for a buffer like that. At 16 MB you would have very little gain in making the buffer larger, or none at all.
Consider that if you are using a large buffer, it won't fit in the processor cache and would be slower to access. If you make it really large some part of it may even be swapped out to disk, so at that point making the buffer larger would only make it slower.
The amount of data depends on a variety of factors. If you use smaller chunks, the compression algorithm doesn't have to wait as long for the first chunk to come in, improving compression of lower file sizes. Larger chunks take more memory but result in fewer read operations, improving performance. If your file is smaller than 16MB in this case, your read is no different than ReadToEnd() and doesn't let compression begin.

What does BinaryReader do if the bytes I am reading aren't present yet?

I am using a BinaryReader on top of a NetworkStream to read data off of a network. This has worked really well for me, but I want to understand what's going on behind the scenes, so I took a look at the documentation for BinaryReader and found it to be extremely sparse.
My question is this: What will BinaryReader.ReadBytes(bufferSize) do if bufferSize bytes are not present on the network stream when I call ReadBytes?
In my mind there are a few options:
1) Read any bytes that are present on the network stream and return only that many
2) Wait until bufferSize bytes are present on the stream, then read
3) Throw an exception
I assume option 2 is happening, since I've never received any exceptions and all my data is received whole, not in pieces. However, I would like to know for sure what is going on. If someone could enlighten me, I would be grateful.
I believe it actually goes for hidden option 4:
Read the data as it becomes available, looping round in the same way that you normally would do manually. It will only return a value less than the number of bytes you asked for if it reaches the end of the stream while reading.
This is subtly different from your option 2 as it does drain the stream as data becomes available - it doesn't wait until it could read all of the data in one go.
It's easy to show that it does return a lower number of bytes than you asked for if it reaches the end:
var ms = new MemoryStream(new byte[10]);
var readData = new BinaryReader(ms).ReadBytes(100);
Console.WriteLine(readData.Length); // 10
It's harder to prove the looping part, without a custom stream which would explicitly require multiple Read calls to return all the data.
The documentation isn't as clear as it might be, but the return value part is at least somewhat helpful:
A byte array containing data read from the underlying stream. This might be less than the number of bytes requested if the end of the stream is reached.
Note the final part that I've highlighted, and compare that with Stream.Read:
The total number of bytes read into the buffer. This can be less than the number of bytes requested if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.
If you're expecting an exact amount of data and only that amount will be useful, I suggest you write a ReadExactly method which calls Read and throws EndOfStreamException if you need more data than the stream provided before it was closed.
If, by “present on the stream”, you’re asking whether the method would block until the specified number of bytes are available, then it is option 2. It would only return a smaller amount of bytes if the end of the stream is reached.
Here is some sample code on how BinaryReader.ReadBytes(int) may be implemented:
byte[] ReadBytes(int count)
{
byte[] buffer = new byte[count];
int total = 0;
int read = 0;
do
{
read = stream.Read(buffer, read, count - total);
total += read;
}
while (read > 0 && total < count);
// Resize buffer if smaller than count (code not shown).
return buffer;
}

Is there any reason GZipStream.Read would return less than the number of requested bytes?

The docs say:
Return Value:
The total number of bytes read into the buffer. This can be less than the number of bytes requested if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.
But why would the bytes "not be available" when reading from disk?
Let me clarify a bit:
I'm reading from disk (underlying type is FileStream)
There are at least N bytes left to be read (before EOF)
I request to read N bytes
Will the return value/number of bytes read ever be less than N in this scenario?
In answer to your edited question:
Will the return value/number of bytes read ever be less than N in this scenario?
I think you need to ask a hardware expert, and I suppose this isn't the right forum to attract the attention of such a person.
Disclaimer: I'm not a hardware expert, and I don't play one on TV. This is just speculation:
I think that when you're reading from disk, the only reason you'd get fewer bytes than you request is because the stream has run out of bytes to give you. However, it's conceivable that you might have a situation similar to a network stream, where your program is reading bytes faster than the hardware can provide them. In that case, the Read method would presumably populate the buffer only partially and then return.
Obviously, the answer to the question depends on whether such a situation could occur. I think the answer is "no". I have certainly never seen a counterexample. But it would be a mistake to write code depending on that assumption.
Consider: even if you could examime the specifications of all the hardware your code will run on, and prove that the buffer will always be completely filled until the end of the stream is reached, there's no saying what new disk drive somebody might install on the machine in the future, that might behave differently. It's much simpler just to treat all streams the same, and undertake the modest amount of work required to handle the possibility that the buffer comes back incompletely filled.
My solution:
public static class StreamExt
{
public static void ReadBytes(this Stream stream, byte[] buffer, int offset, int count)
{
int totalBytesRead = 0;
while (totalBytesRead < count)
{
int bytesRead = stream.Read(buffer, offset + totalBytesRead, count - totalBytesRead);
if (bytesRead == 0) throw new IOException("Premature end of stream");
totalBytesRead += bytesRead;
}
}
}
Using this method should safely read in all the bytes you requested.
Compression streams are wrapper stream - so behavior depends on behavior of underlying stream. As result it has the same restriction as generic stream "can return less bytes than requested".
As for potential reasons (see also provided by Lloyd)
EOF reached on last read
No data in network stream yet
Custom stram decided to return data in fixed size chunks (perfectly ok).
Stream.Read requires that you pre-allocate the space to read into, and if you allocate more space then can be read from the stream, then the return value is its way of telling you how much of it has been used.
Consider the following:
If you allocated a buffer of say 4096 bytes and the EOF is reached at 2046, then the return value would only be 2046. This allows you to know how much of your buffer is full on return.

What is the best memory buffer size to allocate to download a file from Internet?

What is the best memory buffer size to allocate to download a file from Internet? Some of the samples said that it should be 1K. Well, I need to know in general why is it? And also what's the difference if we download a small .PNG or a large .AVI?
Stream remoteStream;
Stream localStream;
WebResponse response;
try
{
response = request.EndGetResponse(result);
if (response == null)
return;
remoteStream = response.GetResponseStream();
var localFile = Path.Combine(FileManager.GetFolderContent(), TaskResult.ContentItem.FileName);
localStream = File.Create(localFile);
var buffer = new byte[1024];
int bytesRead;
do
{
bytesRead = remoteStream.Read(buffer, 0, buffer.Length);
localStream.Write(buffer, 0, bytesRead);
BytesProcessed += bytesRead;
} while (bytesRead > 0);
}
For what it's worth, I tested reading a 1484 KB text file using progressive powers of two (sizes of 2,4,8,16...). I printed out to the console window the number of milliseconds required to read each one. Much past 8192 it didn't seem like much of a difference. Here are the results on my Windows 7 64 bit machine.
2^1 = 2 :264.0151
2^2 = 4 :193.011
2^3 = 8 :175.01
2^4 = 16 :153.0088
2^5 = 32 :139.0079
2^6 = 64 :134.0077
2^7 = 128 :132.0075
2^8 = 256 :130.0075
2^9 = 512 :133.0076
2^10 = 1024 :133.0076
2^11 = 2048 :90.0051
2^12 = 4096 :69.0039
2^13 = 8192 :60.0035
2^14 = 16384 :56.0032
2^15 = 32768 :53.003
2^16 = 65536 :53.003
2^17 = 131072 :52.003
2^18 = 262144 :53.003
2^19 = 524288 :54.0031
2^20 = 1048576 :55.0031
2^21 = 2097152 :54.0031
2^22 = 4194304 :54.0031
2^23 = 8388608 :54.003
2^24 = 16777216 :55.0032
Use at least 4KB. It's the normal page size for Windows (i.e. the granularity at which Windows itself manages memory), which means that the .Net memory allocator doesn't need to break down a 4KB page into 1KB allocations.
Of course, using a 64KB block will be faster, but only marginally so.
2k, 4k or 8k are good choices.
It is not important how much is the page size, the change in speed would be really marginal and unpredictable.
First of all, C# memory can be moved, C# uses a compacting generational garbage collector. There is not any kind of information on where data will be allocated.
Second, arrays in C# can be formed by non-contiguous area of memory!
Arrays are stored contiguously in virtual memory but contiguous virtual memory doesn't mean contiguous physical memory.
Third, array data structure in C# occupies some bytes more than the content itself (it stores array size and other informations). If you allocate page size amount of bytes, using the array will switch page almost always!
I would think that optimizing code using page size can be an non-optimization.
Usually C# arrays performs very well but if you really need precise allocation of data you need to use pinned arrays or Marshal allocation, but this will slow down the garbage collector.
Using marshal allocation and unsafe code can be a little faster but really it don't worth the effort.
I would say it is better to just use your arrays without thinking too much about the page size. Use 2k, 4k or 8k buffers.
I have problem with remote machine closing connection when used 64K buffer when downloading from iis.
I solved the problem raising the buffer to 2M
It will depend on the hardware and scope too. I work for cloud deployed workloads, in server world you may find 40G Ethernet cards and you can assume MTUs of 9000 bytes. Additionally, you dont want your ethernet card interrupt your processor for every single frame. So, ignoring the middle actors in the Windows/Linux kernel you should go for a one or two folds higher:
100 * 9000 ~~ 900kB so I generally choose 512KB as default value (as long as I know this value is not oversizing the regular expected file size being downloaded)
In some cases you can find out (or know, or hack around in a debugger and hence find out albeit in a non-change-resistant way) the size of a buffer used by the stream(s) you are writing to or reading from. In this case it will give a slight advantage if you match that size, or failing that, for one buffer to be a whole multiple of the other.
Otherwise 4096 unless you've a reason otherwise (wanting a small buffer to give rapid UI feedback for example), for the reasons already given.

Why I need to read file piece by piece to buffer?

I have seen following code for getting the file into array, which is in turn used as a parameter for SQL command inserting it into a blob column:
using (FileStream fs = new FileStream(soubor,FileMode.Open,FileAccess.Read))
int length = (int)fs.Length;
buffer = new byte[length];
int count;
int sum = 0;
while ((count = fs.Read(buffer, sum, length - sum)) > 0)
sum += count;
Why I cannot simply do that:
fs.Read(buffer, 0, length) in order to just copy content of file to the buffer?
Thanks
There's more to it than just "the file may not fit in memory". The contract for Stream.Read explicitly says:
Implementations of this method read a
maximum of count bytes from the
current stream and store them in
buffer beginning at offset. The
current position within the stream is
advanced by the number of bytes read;
however, if an exception occurs, the
current position within the stream
remains unchanged. Implementations
return the number of bytes read. The
return value is zero only if the
position is currently at the end of
the stream. The implementation will
block until at least one byte of data
can be read, in the event that no data
is available. Read returns 0 only when
there is no more data in the stream
and no more is expected (such as a
closed socket or end of file). An
implementation is free to return fewer
bytes than requested even if the end
of the stream has not been reached.
Note the last sentence - you can't rely on a single call to Stream.Read to read everything.
The docs for FileStream.Read have a similar warning:
The total number of bytes read into
the buffer. This might be less than
the number of bytes requested if that
number of bytes are not currently
available, or zero if the end of the
stream is reached.
For a local file system I don't know for sure whether this will ever actually happen - but it could do for a network mounted file. Do you want your app to brittle in that way?
Reading in a loop is the robust way to do things. Personally I prefer not to require the stream to support the Length property, either:
public static byte[] ReadFully(Stream stream)
{
byte[] buffer = new byte[8192];
using (MemoryStream tmpStream = new MemoryStream())
{
int bytesRead;
while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
{
tmpStream.Write(buffer, 0, bytesRead);
}
return tmpStream.ToArray();
}
}
That is slightly less efficient when the length is known beforehand, but it's nice and simple. You only need to implement it once, put it in a utility library, and call it whenever you need to. If you really mind the efficiency loss, you could use CanSeek to test whether the Length property is supported, and repeatedly read into a single buffer in that case. Be aware of the possibility that the length of the stream could change while you're reading though...
Of course, File.ReadAllBytes will do the trick even more simply when you only need to deal with a file rather than a general stream.
Because your file could be very large and the buffer has usually a fixed size of 4-32 KB. This way you know you're not filling your memory unnessecarily.
Of course, if you KNOW the size of your file is not too large or if you store the contents in memory anyways, there is no reason not to read it all in one shot.
Although, if you want to read the contents of your file directly into a variable, you don't need the Stream API. Rather use
File.ReadAllText(...)
or
File.ReadAllBytes(...)
A simple fs.Read(buffer, 0, length) will probably work, and it will even be hard to find a test to break it. But it simply is not guaranteed, and it might break in the future.
The best answer here is to use a specialized method from the library. In this case
byte[] buffer = System.IO.File.ReadAllBytes(fileName);
A quick look with Reflector confirms that this will get you the partial-buffer logic and the exception-safe Dispose() of your stream.
And when future versions of the Framework allow for better ways to do this your code will automatically profit.

Categories

Resources