Stream files to Azure Blob Storage?

Stream files to Azure Blob Storage? - c#

Consider the following code:
var container = new BlobContainerClient(...);
// fileStream is a stream delivering 10 MB of data
await container.UploadBlobAsync("name-of-blob", fileStream);
Using Fiddler Proxy to watch the HTTP requests, I can see that this ends up in 4 HTTP PUT requests (the address is 127.0.0.1 as I am locally testing using the Azurite emulator):
The first two requests (603 and 607) are 4 MB in size, the third one (613) is 2 MB in size and the fourth one (614) finally commits all sent blocks.
Instead of making 3 requests (4 MB + 4 MB + 2 MB) for the data, is it somehow possible to stream the 10 MB of data in one request to save some overhead?
As the data is sent in 4 MB chunks, does this mean that the Azure Storage clients wait until it got 4 MB from the fileStream to start sending, meaning 4 MB of RAM is used for caching? My intention to use a fileStream was to reduce memory usage by directly passing through the fileStream to Azure Blob Storage.
I am using Azure.Storage.Blobs version 12.8.0 (the latest stable version as I am writing this).

Ad 1) The maximum size of a single block in a PUT operation depends on the Azure Storage server version (see here). For testing purpose I just created a new storage account in Azure and started an UploadBlobAsync() operation with a 10.5 MB video file, Fiddler shows me this
A single PUT operation with 10544014 Byte. Note the x-ms-version request header which gives the client the ability to specify which version it wants to use (see here). I suppose your local emulator is just using an older API version.
Ad 2) Yes, for larger files UploadBlobAsync() will chunk the request, reads a set of bytes from the stream, performs a PUT, reads the next set of bytes, performs a PUT and so on.

Related

C# Azure Function Limits 250 MB Response Size; I need 2GB response

All the theory around C# azure function app says no size limitation on response. But practically it limits to 250 MB. Anything more than that produces 0 byte response; no error anywhere. My use case is to return ~2GB response. Here's the production app service plan that I'm using. Do I need to configure anything to increase from 250 MB of the size limitation?
foreach(var outputData in outputList)
await httpResponseData.Body.WriteAsync(outputData);

How does IIS Express - or web servers in general - handle files that users can download or why is my Heap growing?

I'm writing a webservice that creates a file a user can download. The data source for that file is given by one or more URI's so I end up using Stream and Reader a lot. All of them are in using blocks. Even in the endpoint that offers the file I'm using return File(byte[], string, string) and the source stream of the byte[] is disposed.
Since I'm trying to type the data I receive via strings I'm using a lot of double.TryParse and Datetime.TryParse a lot.
I see some Garbage Collector runs but they are not freeing much. (Less than 1%)
But I'm observing a Heap size grow for every request I send with Swagger.
Some Numbers:
Memory usage before 1st request: 90 MB
Memory usage after 1st, 2nd, 3rd request: 180 MB, 277 MB, 480 MB
File size 200 Kb - the same for every request.
This led me to these questions:
Where are files stored a web site hosted in IIS Express offers while debugging using Swagger? Memory or disc?
Could there be a Swagger overhead that is the reason for this Memory growth?
What else could be the source of this memory leak?
.Net 5.0
3rd party library in use: CsvHelper latest

Kestrel request per second issue

i'm a newbie to asp.net core
i'm write a web api service, which store passed data to database. in theory there is about 300-400 request per second to server in future and response time must be less than 10 seconds
but first of all i try to run some load test with locust.
i write simple app with one controller and only one post method which simple return Ok() without any processing.
i try to create load to this service for 1000 users. my service run under ubuntu 16.04 with .net core 2.1 (2 Xeon 8175M with 8 GB of RAM). Locust run from dedicated computer
but i see only ~400 RPS and response time about 1400 ms. For empty action it is very big value.
i'm turn off all loging, run in production mode but no luck - still ~400 rps.
in system monitor (i use nmon) i see that both cpu loads only for 12-15% (total 24-30%). I have about 3 GB free ram, no network usage (about 200-300 KB/s), no disk usage, so system have hardware resource for handling request.
so i think, that there is problem with some configuration or may be with system resource like sockets, handles etc
i also try to use libuv instead of managed socket, but result is same
in kestrel configuration i setup explicitly Limit.MaxConnection and MaxUpgradedConnection to null (but it is default value)
so, i have two question:
- in theory, can kestrel provide high rps?
- if first is true, can you give me some advise for start point (links, articles and so on)

How to fix incosistent and slow Google Cloud Storage response times?

I'm using Google Cloud Storage to store and retrieve some files, and my problem is that the response times I'm getting are inconsistent, and sometimes very slow.
My application is an ASP.NET Core app running in the Google Container Engine. The Container Engine cluster is in europe-west1-c. The Cloud Storage bucket is Multi-Regional, in the location EU, and it's a secure bucket (not publicly accessible). I'm using the latest version of the official Google.Cloud.Storage.V1 SDK package to access the Cloud Storage. (I tried both 1.0.0 and the new 2.0.0-beta01.) I'm using a singleton instance of the StorageClient object, which should do connection pooling under the hood.
I'm measuring and logging the time it takes to download a file from the Cloud Storage, this is the measurement I do.
var sw = Stopwatch.CreateNew();
await client.DownloadObjectAsync(googleCloudOptions.StorageBucketName, filepath, ms);
sw.Stop();
So I'm directly measuring the SDK call without any of my own application logic.
The numbers I'm getting for this measurement look like this in an average period.
44ms
56ms
501ms
274ms
90ms
237ms
145ms
979ms
446ms
148ms
You can see that the variance is already pretty large to begin with (and the response time is often really sluggish).
But occasionally I even get response times like this (the slowest I've seen was over 10 seconds).
172ms
4,348ms
72ms
51ms
179ms
2,508ms
2,592ms
100ms
Which is really bad considering that the file I'm downloading is ~2 KB in size, and my application is doing less than 1 requests per second, and I'm running my application inside the Google Cloud. I don't think that the bucket not being warmed up can be a problem, since I'm mainly downloading the same handful of files, and I'm doing at least a couple of requests every minute.
Does anyone know what can be the reason for this slowness, or how I could investigate what's going wrong?
Update: Following #jterrace's suggestion, I've run gsutil perfdiag on the production environment, and uploaded both the terminal output and the generated json report here.
I also collected some more measurements, here you can see the statistics for the last 7 days.
So you can see that slow requests don't happen super-often, but over half a second response time is not rare, and we even have a handful of requests over 5 seconds every day.
What I'd like to figure out is whether we're doing something wrong, or this is expected with Cloud Storage and we have to be prepared to be able to handle these slow responses on our side.

We have the same issue with GCS. The only answer we got (from GCS support) is to use exponential backoff.
First request should be with 200ms timeout, next try 400ms and so on.

A common problem I've seen in GCE is that due to gcloud clients having a heavy DNS dependency, that bursts of traffic are being throttled by DNS queries, not the actual clients (storage or otherwise). I highly recommend you adding etcd or some other DNS cache to your container. Any real amount of traffic in GCE will choke otherwise.

How is IDM And Flashget,... are working? (maximum speed download)

I need to download with maximum available download speed in C#.
FlashGet, IDM and other download managers seem to be able to.

It's nothing special, they're simply opening up multiple download connections to the same file and use segmented downloading so each connection pulls down a different range of bytes from the file.
For more information see for example - http://www.ehow.com/how-does_4615524_download-accelerator-work.html
For the C# side you might want to look at existing .NET projects such as this - http://www.codeproject.com/Articles/21053/MyDownloader-A-Multi-thread-C-Segmented-Download-M

The magic is in multiple connection and http Range header.
Say a file is 100MB in size. You plan to open 10 connections. So for each connections you'll download 10Mb. Now open a http connection and start downloading same file but 10 connecitons will be assigned to 10 different segments.
Connection 1 sends Range: bytes=0-1048575
Connection 2 sends Range: bytes=1048576-2097151
and so on

You have to set Window size in TCP protocol. but this future is not support in .net

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.