About choose buffer size in HttpWebRequest GetResponseStream

About choose buffer size in HttpWebRequest GetResponseStream - c#

var h = (HttpWebRequest)WebRequest.Create(url);
using (var hr = (HttpWebResponse)(await h.GetResponseAsync()))
{
using (var s = hr.GetResponseStream())
{
using (var f = new FileStream(saveTo, FileMode.Create, FileAccess.Write, FileShare.None))
{
int bytesCount = 0;
byte[] buf = new byte[2048]; //<------------------------------
while ((bytesCount = await s.ReadAsync(buf, 0, buf.Length)) > 0)
{
await f.WriteAsync(buf, 0, bytesSize);
// Update UI : downloaded size, percent,...
}
}
}
}
I'm writing a downloader support update UI (ObservableCollection of thounsands items - Batch download) when download progress changed and resume download but not support multi-segment download (as each item's size ussually < 10MB).
I run about 5-20 downloads concurrency. What buffer size is suitable for this case (good for both UI update and for the download)?

You want to use a buffer size that is a multiple of the OS page size, because that is the granularity for writes to disk and pages in memory. Using anything smaller than an OS page size will be suboptimal.
OS pages are generally 4096 bytes. The default buffer size for a FileStream, used if no buffer size is provided during its construction is also 4096 bytes.
For disk I/O it is generally preferable to have a buffer that is somewhat larger (32-128 KB).
In your scenario, using a maximum of 20 concurrent downloads, if you were to use a buffer size of 32 or 64 KB, this would only require 640 KB or 1.2 MB of memory, so those are clearly viable options.
Let's assume you are in the USA, where the average download speeds are 23 Mbps and 12 Mbps for broadband and mobile respectively, then if you were using 64 KB buffers (1.2 MB for 20 concurrent downloads), you could update the UI for each of the 20 downloads several times per second.
So, use at least 32 - 64 KB buffers.
One thing to take care off, is not to continuously allocate new byte buffers, but recycle these fixed size buffers by using a buffer pool

Related

How to increase or keep starting speed of copying a file

I am using these codes to copy a big file:
const int CopyBufferSize = 64 * 1024;
string src = #"F:\Test\src\Setup.exe";
string dst = #"F:\Test\dst\Setup.exe";
public void CopyFile()
{
Stream input = File.OpenRead(src);
long length = input.Length;
byte[] buffer = new byte[CopyBufferSize];
Stopwatch swTotal = Stopwatch.StartNew();
Invoke((MethodInvoker)delegate
{
progressBar1.Maximum = (int)Math.Abs(length / CopyBufferSize) + 1;
});
using (Stream output = File.OpenWrite(dst))
{
int bytesRead = 1;
// This will finish silently if we couldn't read "length" bytes.
// An alternative would be to throw an exception
while (length > 0 && bytesRead > 0)
{
bytesRead = input.Read(buffer, 0, Math.Min(CopyBufferSize, buffer.Length));
output.Write(buffer, 0, bytesRead);
length -= bytesRead;
Invoke((MethodInvoker)delegate
{
progressBar1.Value++;
label1.Text = (100 * progressBar1.Value / progressBar1.Maximum).ToString() + " %";
label3.Text = ((int)swTotal.Elapsed.TotalSeconds).ToString() + " Seconds";
});
}
Invoke((MethodInvoker)delegate
{
progressBar1.Value = progressBar1.Maximum;
});
}
Invoke((MethodInvoker)delegate
{
swTotal.Stop();
Console.WriteLine("Total time: {0:N4} seconds.", swTotal.Elapsed.TotalSeconds);
label3.Text += ((int)swTotal.Elapsed.TotalSeconds - int.Parse(label3.Text.Replace(" Seconds",""))).ToString() + " Seconds";
});
}
The file size is about 4 GB.
In the first 7 seconds it can copy up to 400 MB then this hot speed calms down.
What happen and how to keep this hot speed or even increase it?
Another question is here:
When the file copied, windows is still working on destination file(about 10 seconds).
Copy Time: 116 seconds
extra time: 10-15 seconds or even more
How to remove or decrease this extra time?

What happens? Caching, mostly.
The OS pretends you copied 400 MiB in seven seconds, but you didn't. You just sent 400 MiB to the OS (or file system) to write in the future, and that's as much as the buffer can take. If you try to write a 400 MiB file and you pull the plug as soon as it's "done", your file will not be written. The same thing deals with the "overtime" - your application has sent all it has to the buffer, but the buffer isn't yet written to the drive itself (either its buffer, or even slower, the actual physical platter).
This is especially visible with USB flash drives, which tend to use caching heavily. This makes working with the (usually very slow) drive much more pleasant, with the trade-off that you have to wait for the OS to finish writing everything before pulling the drive out (that's why you get the "Safe remove" icon).
So it should be obvious that you can't really make the total time shorter. All you can do is try and make the user interface reflect reality a bit better, so that the user doesn't see the "first 400 MiB are so fast!" thing... but it doesn't really work well. In any case, your read->write speed is ~30 MiB/s. The OS just hides the peaks to make it easier to deal with the slow hard drive - very useful when you're dealing with lots of small files, worthless when dealing with files bigger than the buffer.
You have a bit of control over this when you use the FileStream constructor directly, instead of using File.OpenWrite - you can use FileOptions.WriteThrough to instruct the OS to avoid any caching and write directly to disk[1], giving you a better idea of the real write speed. Do note that this usually makes the total time larger, though, and it may make concurrent access even worse. You definitely don't want to use it for small files.
[1] - Haha, right. The drive usually has caching of its own, and some ignore the OS' pleas. Tough luck.

One thing you could try is to increase the buffer size. This really matters when the write cache can no longer keep up (as discussed in other answer). Writing a lot of small blocks is often slower than writing a few large blocks. Instead of 64 kB, try 1 MB, 4 MB or even bigger:
const int CopyBufferSize = 1 * 1024 * 1024; // 1 MB
// or
const int CopyBufferSize = 4 * 1024 * 1024; // 4 MB

Memory mapped files that are contiguous on disk

I've read quite a few SO posts and general articles on trying to allocate over 1GB of memory so before getting shot down like the others here is some context.
This app will run as a kiosk with a dedicated machine running no unnecessary processes.
My app acquires images from a high-speed camera with a rolling shutter at a rate of 120 frames per second at a resolution of 1920 x 1080 with a bit depth of 24. The app needs to write every single frame to disk for post-processing. The current problem I am facing is that the Disk I/O won't keep up with the capture rate even though it is limited to 120 frames per second. The Disk I/O bandwidth needed is around 750MBps!
The total length of the recording needs to be at least 10 seconds (7.5GB) in raw form. Performing any on-the-fly transcoding or compression brings the frame-rate down to utterly unacceptable levels.
To work around this, I have tried the following:
Compromising on quality by reducing the bit-depth at hardware-level to 16 which is still around 500MBps.
Disabled all image encoding and writing raw camera data to disk. This has saved some processing time.
Creating a single 10GB file on disk and doing a sequential write-through as frames come in. This has helped most so far. All dev and production systems have a 100GB dedicated drive for this application.
Using Contig.exe from Sysinternals to defragment the file. This has had astonishing gains on non-SSD drives.
Out of options to explore here. I am not familiar with memory-mapped files and when trying to create them, I get an IOException saying Not enough storage is available to process this command..
using (var file = MemoryMappedFile.CreateFromFile(#"D:\Temp.VideoCache", FileMode.OpenOrCreate, "MyMapName", int.MaxValue, MemoryMappedFileAccess.CopyOnWrite))
{
...
}
The large file I currently use requires either sequential write-though or sequential read access. Any pointers would be appreciated.
I could even force the overall recording size down to 1.8GB if only there was a way to allocate that much RAM. Once again, this will run on a dedicated with 8GB available memory and 100GB free space. However, not all production systems will have SSD drives.

32 bit processes on a 64 bit system can allocate 4 GB of RAM, so it should be possible to get 1.8 GB of RAM for storing the video, but of course you need to consider loaded DLLs and a buffer until the video is compressed.
Other than that, you could use a RAMDisk, e.g. from DataRam. You just need to find a balance between how much memory the application needs and how much memory you can grant the disk. IMHO a 5 GB / 3 GB setting could work well: 1 GB for the OS, 4 GB for your application and 3 GB for the file.
Don't forget to copy the file from the RAM disk to HDD if you want it persistent.

Commodity hardware is cheap for a reason. You need faster hardware.
Buy a faster disk system. A good RAID controller and four SSDs. Put the drives into a RAID 1+0 configuration and be done with this problem.
How much money is your company planning on spending developing and testing software to push cheap hardware past its limitations? And even if you can get it to work fast enough, how much do they plan on spending to maintain that software?

Memory mapped files don't speed-up very much writing to a file...
If you have a big file, you normally don't try to map it entirely in RAM... you map a "window" of it, then "move" the window (in C#/Windows API you create a "view" of the file starting at any one location and with a certain size)
Example of code: (here the window is 1mb big... bigger windows are possible... at 32 bits it should be possible to allocate a 64 or 128mb window without any problem)
const string fileName = "Test.bin";
const long fileSize = 1024L * 1024 * 16;
const long windowSize = 1024 * 1024;
if (!File.Exists(fileName)) {
using (var file = File.Create(fileName)) {
file.SetLength(fileSize);
}
}
long realFileSize = new FileInfo(fileName).Length;
if (realFileSize < fileSize) {
using (var file = File.Create(fileName)) {
file.SetLength(fileSize);
}
}
using (var mm = MemoryMappedFile.CreateFromFile(fileName, FileMode.Open)) {
long start = 0;
while (true) {
long size = Math.Min(fileSize - start, windowSize);
if (size <= 0) {
break;
}
using (var acc = mm.CreateViewAccessor(start, size)) {
for (int i = 0; i < size; i++) {
// It is probably faster if you write the file with
// acc.WriteArray()
acc.Write(i, (byte)i);
}
}
start += windowSize;
}
}
Note that here I'm writing code that will write a fixed pre-known number of bytes (fileSize)... Your code should be different (because you can't pre-know the "exact" fileSize). Still remember: Memory mapped files don't speed-up very much writing to a file.

Managed memory use of MemoryMappedFile.CreateViewStream()

Will MemoryMappedFile.CreateViewStream(0, len) allocate a managed block of memory of size len, or will it allocate smaller buffer that acts as a sliding window over the unmanaged data?
I wonder because I aim to replace an intermediate buffer for deserialization that is a MemoryStream today, which is giving me trouble for large datasets, both because of the size of the buffer and because of LOH fragmentation.
If the viewstream's internal buffer becomes the same size then making this switch wouldn't make sense.
Edit:
In a quick test I found these numbers when comparing the MemoryStream to the MemoryMapped file. Readings from GC.GetTotalMemory(true)/1024 and Process.GetCurrentProcess.VirtualMemorySize64/1024
Allocate an 1GB memory stream:
Managed Virtual
Initial: 81 kB 190 896 kB
After alloc: 1 024 084 kB 1 244 852 kB
As expected, a gig of both managed and virtual memory.
Now, for the MemoryMappedFile:
Managed Virtual
Initial: 81 kB 189 616 kB
MMF allocated: 84 kB 189 684 kB
1GB viewstream allocd: 84 kB 1 213 368 kB
Viewstream disposed: 84 kB 190 964 kB
So using a not very scientific test, my assumption is that the ViewStream uses only unmanaged data. Correct?

A MMF like that doesn't solve your problem. A program bombs on OOM because there isn't hole in the virtual memory space big enough to fit the allocation. You are still consuming VM address space with an MMF, as you can tell.
Using a small sliding view would be a workaround, but that isn't any different from writing to a file. Which is what an MMF does when you remap the view, it needs to flush the dirty pages to disk. Simply streaming to a FileStream is the proper workaround. That still uses RAM, the file system cache helps make writing fast. If you've got a gigabyte of RAM available, not hard to come by these days, then writing to a FileStream is just a memory-to-memory copy. Very fast, 5 gigabytes/sec and up. The file gets written in a lazy fashion in the background.
Trying too hard to keep data in memory is unproductive in Windows. Private data in memory is backed by the paging file and will be written to that file when Windows needs the RAM for other processes. And read back when you access it again. That's slow, the more memory you use, the worse it gets. Like any demand-paged virtual memory operating system, the distinction between disk and memory is a small one.

given the example on http://msdn.microsoft.com/en-us/library/system.io.memorymappedfiles.memorymappedfile.aspx it seems to me that you get a sliding window, at least that is what i interpret when reading the example.
Here the example for convenience:
using System;
using System.IO;
using System.IO.MemoryMappedFiles;
using System.Runtime.InteropServices;
class Program
{
static void Main(string[] args)
{
long offset = 0x10000000; // 256 megabytes
long length = 0x20000000; // 512 megabytes
// Create the memory-mapped file.
using (var mmf = MemoryMappedFile.CreateFromFile(#"c:\ExtremelyLargeImage.data", FileMode.Open,"ImgA"))
{
// Create a random access view, from the 256th megabyte (the offset)
// to the 768th megabyte (the offset plus length).
using (var accessor = mmf.CreateViewAccessor(offset, length))
{
int colorSize = Marshal.SizeOf(typeof(MyColor));
MyColor color;
// Make changes to the view.
for (long i = 0; i < length; i += colorSize)
{
accessor.Read(i, out color);
color.Brighten(10);
accessor.Write(i, ref color);
}
}
}
}
}
public struct MyColor
{
public short Red;
public short Green;
public short Blue;
public short Alpha;
// Make the view brigher.
public void Brighten(short value)
{
Red = (short)Math.Min(short.MaxValue, (int)Red + value);
Green = (short)Math.Min(short.MaxValue, (int)Green + value);
Blue = (short)Math.Min(short.MaxValue, (int)Blue + value);
Alpha = (short)Math.Min(short.MaxValue, (int)Alpha + value);
}
}

IO Reads for the File C#

I am reading all files(around 3000 files and size is 50 GB) from specified path with 4k bytes at a time. Below is the code for the same. My query is when i see the CPU and Memory of the application in task manager i could see that the IO Reads are gradually increasing to high level, i can understand that it might be because of 4k read but does that affect to other things or its ok to increase the IO Read. Also is FileStream the optimum way to read the file as it does not load the entire file in memory?
fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read)
do
{
BytesRead = fileStream.Read(Buffer, 0, MAX_BUFFER);
}
while (BytesRead != 0);
fileStream.Close();

Check Hans Passant's answer about this issue, i find it very clear.
Files are already buffered by the file system cache, You just need to
pick a buffer size that doesn't force FileStream to make the native
Windows ReadFile() API call to fill the buffer too often. Don't go
below a kilobyte, more than 16 KB is a waste of memory.
Take a look at this post too, it provides some benchmarking code.

Calculate upload transfer speed problem

I have implemented a file transfer rate calculator to display kB/sec for an upload process occuring in my app, however with the following code it seems I am getting 'bursts' in my KB/s readings just after the file commences to upload.
This is the portion of my stream code, this streams a file in 1024 chunks to a server using httpWebRequest:
using (Stream httpWebRequestStream = httpWebRequest.GetRequestStream())
{
if (request.DataStream != null)
{
byte[] buffer = new byte[1024];
int bytesRead = 0;
Debug.WriteLine("File Start");
var duration = new Stopwatch();
duration.Start();
while (true)
{
bytesRead = request.DataStream.Read(buffer, 0, buffer.Length);
if (bytesRead == 0)
break;
httpWebRequestStream.Write(buffer, 0, bytesRead);
totalBytes += bytesRead;
double bytesPerSecond = 0;
if (duration.Elapsed.TotalSeconds > 0)
bytesPerSecond = (totalBytes / duration.Elapsed.TotalSeconds);
Debug.WriteLine(((long)bytesPerSecond).FormatAsFileSize());
}
duration.Stop();
Debug.WriteLine("File End");
request.DataStream.Close();
}
}
Now an output log of the upload process and associated kB/sec readings are as follows:
(You will note a new file starts and ends with 'File Start' and 'File End')
File Start
5.19 MB
7.89 MB
9.35 MB
11.12 MB
12.2 MB
13.13 MB
13.84 MB
14.42 MB
41.97 kB
37.44 kB
41.17 kB
37.68 kB
40.81 kB
40.21 kB
33.8 kB
34.68 kB
33.34 kB
35.3 kB
33.92 kB
35.7 kB
34.36 kB
35.99 kB
34.7 kB
34.85 kB
File End
File Start
11.32 MB
14.7 MB
15.98 MB
17.82 MB
18.02 MB
18.88 MB
18.93 MB
19.44 MB
40.76 kB
36.53 kB
40.17 kB
36.99 kB
40.07 kB
37.27 kB
39.92 kB
37.44 kB
39.77 kB
36.49 kB
34.81 kB
36.63 kB
35.15 kB
36.82 kB
35.51 kB
37.04 kB
35.71 kB
37.13 kB
34.66 kB
33.6 kB
34.8 kB
33.96 kB
35.09 kB
34.1 kB
35.17 kB
34.34 kB
35.35 kB
34.28 kB
File End
My problem is as you will notice, the 'burst' I am talking about starts at the beginning of every new file, peaking in MB's and then evens out properly. is this normal for an upload to burst like this? My upload speeds typically won't go higher than 40k/sec here so it can't be right.
This is a real issue, when I take an average of the last 5 - 10 seconds for on-screen display, it really throws things out producing a result around ~3MB/sec!
Any ideas if I am approaching this problem the best way? and what I should do? :S
Graham
Also: Why can't I do 'bytesPerSecond = (bytesRead / duration.Elapsed.TotalSeconds)' and move duration.Start & duration.Stop into the while loop and receive accurate results? I would have thought this would be more accurate? Each speed reads as 900 bytes/sec, 800 bytes/sec etc.

The way i do this is:
Save up all bytes transfered in a long.
Then every 1 second i check how much has been transfered. So i basicly only trigger the code to save speed once pr second. Your while loop is going to loop maaaaaaaaaaaany times in one second on a fast network.
Depending on the speed of your network you may need to check the bytes transfered in a seperate thread or function. I prefere doing this with a Timer so i can easly update UI
EDIT:
From your looking at your code, im guessing what your doing wrong is that you dont take into account that one loop in the while(true) is not 1 second
EDIT2:
Another advatage with only doing the speed check once pr second is that things will go much quicker. In cases like this updating the UI can be the slowest thing your are doing, so if you try to update the UI every loop, thats most likely your slowest point and is going to produce unresponsive UI.
Your also correct that you should avarage out the values, so you dont get the microsoft minutes bugs. I normaly do this in the Timer function running by doing something like this:
//Global variables
long gTotalDownloadedBytes;
long gCurrentDownloaded; // Where you add up from the download/upload untill the speedcheck is done.
int gTotalDownloadSpeedChecks;
//Inside function that does speedcheck
gTotalDownloadedBytes += gCurrentDownloaded;
gTotalDownloadSpeedChecks++;
long AvgDwnSpeed = gTotalDownloadedBytes / gTotalDownloadSpeedChecks; // Assumes 1 speedcheck pr second.

There's many layers of software and hardware between you and the system you're sending to, and several of those layers have a certain amount of buffer space available.
When you first start sending, you can pump out data quite quickly until you fill those buffers - it's not actually getting all the way to the other end that fast, though! After you fill up the send buffers, you're limited to putting more data into them at the same rate it's draining out, so the rate you see will drop to the underlying networking sending rate.

All, I think I have fixed my issue by adjusting the 5 - 10 averging variable to wait one second to account for the burst, not the best, but will allow internet to sort itself out and allow me to capture a smooth transfer.
It appears from my network traffic it down right is bursting so there is nothing in code I could do differently to stop this.
Please will still be interested in more answers before I hesitantly accept my own.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.