I'm attempting to stream a large JSON file built on the fly to a client (could be 500 MB+). I'm trying to disable response buffering for a variety of reasons, though mostly for memory efficiency.
I've tried writing directly to the HttpContext.Response.BodyWriter but the response seems to be buffered in memory before writing to the output. The return type of this method is Task.
HttpContext.Response.ContentType = "application/json";
HttpContext.Response.ContentLength = null;
await HttpContext.Response.StartAsync(cancellationToken);
var bodyStream = HttpContext.Response.BodyWriter.AsStream(true);
await bodyStream.WriteAsync(Encoding.UTF8.GetBytes("["), cancellationToken);
await foreach (var item in cursor.WithCancellation(cancellationToken)
.ConfigureAwait(false))
{
await bodyStream.WriteAsync(JsonSerializer.SerializeToUtf8Bytes(item, DefaultSettings.JsonSerializerOptions), cancellationToken);
await bodyStream.WriteAsync(Encoding.UTF8.GetBytes(","), cancellationToken);
await bodyStream.FlushAsync(cancellationToken);
await Task.Delay(100,cancellationToken);
}
await bodyStream.WriteAsync(Encoding.UTF8.GetBytes("]"), cancellationToken);
bodyStream.Close();
await HttpContext.Response.CompleteAsync().ConfigureAwait(false);
Note: I realize this code is very hacky, trying to make it work, then clean it up
I'm using the Task.Delay to verify the response is not being buffered when testing locally as I do not have full production data. I have also tried IAsyncEnumerable and yield return, but that fails because the response is so large that Kestrel thinks the enumerable is infinite.
I've tried
Setting KestrelServerLimits.MaxResponseBufferSize to a small number, even 0;
Writing with HttpContext.Response.WriteAsync
Writing with HttpContext.Response.BodyWriter.AsStream()
Writing with a pipe writer patter and HttpContext.Response.BodyWriter
Removing all middleware
Removing calls to IApplicationBuilder.UseResponseCompression
Update
Tried disabling response buffering before setting the ContentType (so before any writes to the response) with no effect
var responseBufferingFeature = context.Features.Get<IHttpResponseBodyFeature>();
responseBufferingFeature?.DisableBuffering();
Updated Sample Code
This reproduces the issue quite simply. The client doesn't receive any data until response.CompleteAsync() is called.
[HttpGet]
[Route("stream")]
public async Task<EmptyResult> FileStream(CancellationToken cancellationToken)
{
var response = DisableResponseBuffering(HttpContext);
HttpContext.Response.Headers.Add("Content-Type", "application/gzip");
HttpContext.Response.Headers.Add("Content-Disposition", $"attachment; filename=\"player-data.csv.gz\"");
await response.StartAsync().ConfigureAwait(false);
var memory = response.Writer.GetMemory(1024*1024*10);
response.Writer.Advance(1024*1024*10);
await response.Writer.FlushAsync(cancellationToken).ConfigureAwait(false);
await Task.Delay(5000).ConfigureAwait(false);
var str2 = Encoding.UTF8.GetBytes("Bar!\r\n");
memory = response.Writer.GetMemory(str2.Length);
str2.CopyTo(memory);
response.Writer.Advance(str2.Length);
await response.CompleteAsync().ConfigureAwait(false);
return new EmptyResult();
}
private IHttpResponseBodyFeature DisableResponseBuffering(HttpContext context)
{
var responseBufferingFeature = context.Features.Get<IHttpResponseBodyFeature>();
responseBufferingFeature?.DisableBuffering();
return responseBufferingFeature;
}
I was able to get this working when using http.sys (with ASP.NET Core 6):
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Http.Features;
public class Program
{
public static void Main(string[] args)
{
var builder = WebApplication.CreateBuilder(args);
builder.WebHost.UseHttpSys();
var app = builder.Build();
app.MapGet("/", async (context) =>
{
context.Response.StatusCode = 201;
await context.Response.StartAsync();
await context.Response.WriteAsync("x"); // client gets status code after this line
await context.Response.WriteAsync("Hello World!");
});
app.Run();
}
}
Try to disable buffering on response futures:
HttpContext.Features.Get<IHttpResponseBodyFeature>().DisableBuffering()
//As mentioned in documentation, to take effect, call it before any writes
And use BodyWriter in Utf8JsonWriter for more efficiency:
var pipe = context.HttpContext.Response.BodyWriter;
await pipe.WriteAsync(startArray);
using (var writer = new Utf8JsonWriter(pipe,
new JsonWriterOptions
{
Indented = option.WriteIndented,
Encoder = option.Encoder,
SkipValidation = true
}))
{
var dotSet = false;
foreach (var item in enumerable)
{
if (dotSet)
await pipe.WriteAsync(dot);
JsonSerializer.Serialize(writer, item, itemType, option);
await pipe.FlushAsync();
writer.Reset();
dotSet = true;
}
}
await pipe.WriteAsync(endArray);
In my case it give results: total memory allocation become greater over 80% compared to newcoreapp2.2 after first requests, but no more memory leaks.
For those who is still interested this code sends data right away when using curl:
public async Task Invoke(HttpContext context)
{
var g = context.Features.Get<IHttpResponseBodyFeature>();
g.DisableBuffering(); // doesn't seem to make a difference
context.Response.StatusCode = 200;
context.Response.ContentType = "text/plain; charset=utf-8";
//context.Response.ContentLength = null;
await g.StartAsync();
for (int i = 0; i < 10; ++i)
{
var line = $"this is line {i}\r\n";
var bytes = utf8.GetBytes(line);
// it seems context.Response.Body.WriteAsync() and
// context.Response.BodyWriter.WriteAsync() work exactly the same
await g.Writer.WriteAsync(new ReadOnlyMemory<byte>(bytes));
await g.Writer.FlushAsync();
await Task.Delay(1000);
}
await g.CompleteAsync();
}
Variations I tried with and without DisableBufering() as well as writing to a pipe (IHttpResponseBodyFeature.Writer vs HttpContext.Response.Body) didn't seem to make a difference.
In curl it shows messages right away, however in Chrome and some rest clients it waits for the whole stream to show up.
So I would recommend testing your code behavior with a client that doesn't wait for the whole stream to present it. Another option I am still checking if aspnet core automatically picks up compression possibility if client asks for it even though compression is not configured in the pipeline.
So I would recomm
Related
I have a chron job which calls a database table and gets about half a million records returned. I need to loop through all of that data, and send API post's to a third party API. In general, this works fine, but the processing time is forever (10 hours). I need a way to speed it up. I've been trying to use a list of Task with SemaphoreSlim, but running into issues (it doesn't like that my api call returns a Task). I'm wondering if anyone has a solution to this that won't destroy the VM's memory?
Current code looks something like:
foreach(var data in dataList)
{
try
{
var response = await _apiService.PostData(data);
_logger.Trace(response.Message);
} catch//
}
But I'm trying to do this and getting the syntax wrong:
var tasks = new List<Task<DataObj>>();
var throttler = new SemaphoreSlim(10);
foreach(var data in dataList)
{
await throttler.WaitAsync();
tasks.Add(Task.Run(async () => {
try
{
var response = await _apiService.PostData(data);
_logger.Trace(response.Message);
}
finally
{
throttler.Release();
}
}));
}
Your list is of type Task<DataObj>, but your async lambda doesn't return anything, so its return type is Task. To fix the syntax, just return the value:
var response = await _apiService.PostData(data);
_logger.Trace(response.Message);
return response;
As others have noted in the comments, I also recommend not using Task.Run here. A local async method would work fine:
var tasks = new List<Task<DataObj>>();
var throttler = new SemaphoreSlim(10);
foreach(var data in dataList)
{
tasks.Add(ThrottledPostData(data));
}
var results = await Task.WhenAll(tasks);
async Task<DataObj> ThrottledPostData(Data data)
{
await throttler.WaitAsync();
try
{
var response = await _apiService.PostData(data);
_logger.Trace(response.Message);
return response;
}
finally
{
throttler.Release();
}
}
I Have endpoint which use handlers of 2 others endpoints it's probably not best practice, but it's not the point. In this methods I use a lot of MemoryStreams, ZipStream and stuff like that. Of course I dispose all of them. And everything works good till I run all tests together, then tests throw errors like: “Input string was not in a correct format.”, "Cannot read Zip file" or other weird messages. This are also test of this 2 handlers which I use in previous test.
Solution what I found is to add "Thread.Sleep(1);" at the end of the "Handle" method, just before return. It looks like something need more time to dispose, but why?. Have you any ideas why this 1ms sleep help with this?
ExtractFilesFromZipAndWriteToGivenZipArchive is an async method.
public async Task<MemoryStream> Handle(MultipleTypesExportQuery request, CancellationToken cancellationToken)
{
var stepwiseData = await HandleStepwise(request.RainmeterId, request.StepwiseQueries, cancellationToken);
var periodicData = await HandlePeriodic(request.RainmeterId, request.PeriodicQueries, cancellationToken);
var data = new List<MemoryStream>();
data.AddRange(stepwiseData);
data.AddRange(periodicData);
await using (var ms = new MemoryStream())
using (var archive = new ZipArchive(ms, ZipArchiveMode.Create,false))
{
int i = 0;
foreach (var d in data)
{
d.Open();
d.Position = 0;
var file = ZipFile.Read(d);
ExtractFilesFromZipAndWriteToGivenZipArchive(file, archive, i, cancellationToken);
i++;
file.Dispose();
d.Dispose();
}
//Thread.Sleep(100);
return ms;
}
}
ExtractFilesFromZipAndWriteToGivenZipArchive() is an asynchronous function which means, in this case, that you need to await it:
await ExtractFilesFromZipAndWriteToGivenZipArchive(file, archive, i, cancellationToken);
Otherwise, the execution will keep going without waiting the function to return.
I'm using TLSharp. My goal is to send files to the user. I created ASP.NET Core Web API service and make HTTP request when I need to send file.
It works well with one file but every time when I get 2 or more requests in a short period of time I get an error:
System.InvalidOperationException: invalid checksum! skip.
Controller:
[Route("upload/{driveId}")]
public async Task<ActionResult> Upload(string driveId)
{
var ms = new MemoryStream();
var file = service.Files.Get(driveId);
string filename = file.Execute().Name;
await file.DownloadAsync(ms);
ms.Position = 0;
new FileExtensionContentTypeProvider().TryGetContentType(filename, out var mime);
var stream = new StreamReader(ms, true);
await _client.SendFileToBot(filename, mime, stream, driveId);
return Ok();
}
SendFileToBot method:
public async Task SendFileToBot(string filename, string mime, StreamReader stream)
{
var found = await client.SearchUserAsync("username", 1);
//find user
var userToSend = found.Users
.Where(x => x.GetType() == typeof(TLUser))
.Cast<TLUser>()
.FirstOrDefault(x => x.Id == 1234567);
var fileResult = await client.UploadFile(filename, stream);
var attr = new TLVector<TLAbsDocumentAttribute>()
{
new TLDocumentAttributeFilename { FileName = filename }
};
var bot = new TLInputPeerUser() { UserId = userToSend.Id, AccessHash = userToSend.AccessHash.Value };
await client.SendUploadedDocument(bot, fileResult, "caption", mime, attr);
}
When the requests are sent together (or in short period of time), they're sent in a single packet to Telegram server and this error occurs. I need help with this error. I've tried to use Task.Delay but it doesn't help.
How can I handle requests to avoid this error?
According this issue, you are not first person who received this error.
Seems like there are something request/response validation issues when using multithreading in TLSharp library.
There is one stable workaround for such type of problems.
Make all upload requests synchronous
Actually, they will be asynchronous, but with one-task-at-one-time access
This dirty but workable solution can be achieved by creating task queue:
public class TaskQueue
{
private readonly SemaphoreSlim _semaphoreSlim;
public TaskQueue()
{
_semaphoreSlim = new SemaphoreSlim(1, 1); // Max threads limited to 1.
}
public async Task<T> Enqueue<T>(Func<Task<T>> taskGenerator)
{
await _semaphoreSlim.WaitAsync();
try
{
return await taskGenerator();
}
finally
{
_semaphoreSlim.Release();
}
}
public async Task Enqueue(Func<Task> taskGenerator)
{
await _semaphoreSlim.WaitAsync();
try
{
await taskGenerator();
}
finally
{
_semaphoreSlim.Release();
}
}
}
Now you must register queue as singleton in Startup.cs file to be sure that your asp.net core application using one task queue instance to perform uploading on telegram servers:
public void ConfigureServices(IServiceCollection services)
{
//...
services.AddSingleton<TaskQueue>();
//...
}
Next, get your task queue instance in your api controller constructor like:
private readonly TaskQueue taskQueue;
public MyController(TaskQueue taskQueue)
{
this.taskQueue = taskQueue
}
Then, just use it in all of your api methods:
// Code from your API method...
await taskQueue.Enqueue(() => client.SendUploadedDocument(bot, fileResult, "caption", mime, attr));
This will make all requests to telegram servers through TLSharp library synchronous and prevent multithreading issues like in a question.
To be honest, it's just a workaround, not solution of this problem. I'm sure that this issue on github about checksum error must be investigated more detailed and fixed if it possible.
I am sending five HttpClient requests to the same URL, but with a varying page number parameter. They all fire async, and then I await for them all to finish using Tasks.WaitAll(). My requests are using System.Net.Http.HttpClient.
This mostly works fine, and I get five distinct results representing each page of the data about 99% of the time.
But every so often, and I have not dug into deep analysis yet, I get the exact same response for each task. Each task does indeed instantiate its own HttpClient. When I was reusing one client instance, I got this problem. But since I started instantiating new clients for every call, the problem went away.
I am calling a 3rd party web service over which I have no control. So before nagging their team too much about this, I do want to know if I may be doing something wrong here, or if there is some aspect of HttpClient ot Task that I'm missing.
Here is the calling code:
for (int i = 1; i <= 5; i++)
{
page = load_made + i;
var t_page = page;
var t_url = url;
var task = new Task<List<T>>(() => DoPagedLoad<T>(t_page, per_page, t_url));
task.Run();
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
Here is the code in the DoPagedLoad, which returns a Task:
var client = new HttpClient();
var response = client.GetAsync(url).Result;
var results = response.Content.ReadAsStringAsync().Result();
I would appreciate any help from folks familiar with the possible quirks of Task and HttpClient
NOTE: Run is an extension method to help with async exceptions.
public static Task Run(this Task task)
{
task.Start();
task.ContinueWith(t =>
{
if(t.Exception != null)
Log.Error(t.Exception.Flatten().ToString());
});
return task;
}
It's hard to give a definitive answer because we don't have all the detail but here's a sample implementation of how you should fire off HTTP requests. Notice that all async operations are awaited - Result and Wait / WaitAll are not used. You should almost never need / use any of those - they block synchronously and can create problems.
Also notice that there are no global cookie containers, default headers, etc. defined for the HTTP client. If you need any of that stuff, just create individial HttpRequestMessage objects and add whatever headers you need to add. Don't use the global properties - it's a lot cleaner to just set per-request properties.
// Globally defined HTTP client.
private static readonly HttpClient _httpClient = new HttpClient();
// Other stuff here...
private async Task SomeFunctionToGetContent()
{
var requestTasks = new List<Task<HttpResponseMessage>>();
var responseTasks = new List<Task>();
for (var i = 0; i < 5; i++)
{
// Fake URI but still based on the counter (or other
// variable, similar to page in the question)
var uri = new Uri($"https://.../{i}.html");
requestTasks.Add(_httpClient.GetAsync(uri));
}
await (Task.WhenAll(requestTasks));
for (var i = 0; i < 5; i++)
{
var response = await (requestTasks[i]);
responseTasks.Add(HandleResponse(response));
}
await (Tasks.WhenAll(responseTasks));
}
private async Task HandleResponse(HttpResponseMessage response)
{
try
{
if (response.Content != null)
{
var content = await (response.Content.ReadAsStringAsync());
// do something with content here; check IsSuccessStatusCode to
// see if the request failed or succeeded
}
else
{
// Do something when no content
}
}
finally
{
response.Dispose();
}
}
I'm working on a console app that take a list of endpoints to video data, makes an HTTP request, and saves the result to a file. These are relatively small videos. Because of an issue outside of my control, one of the videos is very large (145 minutes instead of a few seconds).
The problem I'm seeing is that my memory usage spikes to ~1 GB after that request is called, and I eventually get a "Task was cancelled" error (presumably because the client timed out). This is fine, I don't want this video, but what is concerning is that my allocated memory stays high no matter what I do. I want to be able to release the memory. It seems concerning that Task Manager shows ~14 MB memory usage until this call, then trickles up continuously afterwards. In the VS debugger I just see a spike.
I tried throwing everything in a using statement, re-initializing the HttpClient on exception, manually invoking GC.Collect() with no luck. The code I'm working with looks something like this:
consumer.Received += async (model, ea) =>
{
InitializeHttpClient(source);
...
foreach(var item in queue)
{
await SaveFileFromEndpoint(url, fileName);
...
}
}
and the methods:
public void InitializeHttpClient(string source)
{
...
_client = new HttpClient();
...
}
public async Task SaveFileFromEndpoint(string endpoint, string fileName)
{
try
{
using (HttpResponseMessage response = await _client.GetAsync(endpoint))
{
if (response.IsSuccessStatusCode)
{
using(var content = await response.Content.ReadAsStreamAsync())
using (var fileStream = File.Create($"{fileName}"))
{
await response.Content.CopyToAsync(fileStream);
}
}
}
}
catch (Exception ex)
{
}
}
Here is a look at my debugger output:
I guess I have a few questions about what I'm seeing:
Is the memory usage I'm seeing actually an issue?
Is there any way I can release the memory being allocated by a large HTTP request?
Is there any way I can see the content length of the request before the call is made and memory is allocated? So far I haven't been able to find a way to find out before the actual memory is allocated.
Thanks in advance for your help!
If you use HttpClient.SendAsync(HttpRequestMessage, HttpCompletionOption) instead of GetAsync, you can supply HttpCompletionOption.ResponseHeadersRead, (as opposed to the default ResponseContentRead). This means that the response stream will be handed back to you before the response body has downloaded (rather than after it), and will require significantly less buffer to operate.
In addition to #spender's answers (which is on point), you need to also make sure that you dispose the response when you are done with it. You can find more information about this on "Efficiently Streaming Large HTTP Responses With HttpClient" article.
Here is a code sample:
using (HttpClient client = new HttpClient())
{
const string url = "https://github.com/tugberkugurlu/ASPNETWebAPISamples/archive/master.zip";
using (HttpResponseMessage response = await client.GetAsync(url, HttpCompletionOption.ResponseHeadersRead))
using (Stream streamToReadFrom = await response.Content.ReadAsStreamAsync())
{
string fileToWriteTo = Path.GetTempFileName();
using (Stream streamToWriteTo = File.Open(fileToWriteTo, FileMode.Create))
{
await streamToReadFrom.CopyToAsync(streamToWriteTo);
}
}
}
You also need to take into account that you should not be creating an HttpClient instance per operation. HttpClientFactory is a very organised way to make sure that you flow the HttpClient within your app safely in a most performant way.