Run HttpClient POST in parallel

Run HttpClient POST in parallel - c#

I am using .net HTTPClient to save some data using POST REST API. The payload size for this API is around 10 MB. I am splitting my data in chunks and calling this POST API for each chunk. I have question mostly around approach:
I am planning to create single static instance of HTTPClient and will use same instance across application. What should be my approach? (create singleton or new client per chunk POST API call)
I would like to call all these chunk POST calls in parallel (using TASKS in .net). Is there any way to stop remaining tasks if any one task fails. I am looking for some sample code.
_factory = new TaskFactory();
_factory.StartNew(() =>
//Call to async POST API using HttpClient
).ContinueWith((response) =>
{
if (!response.IsFaulted)
{
//Do something
}
else {
this._logger.Error("log the error");
}
});

If your calls are all to the same host, use a shared HttpClient instance.
There is no need to explicitly Create tasks using TaskFactory.StartNew in order to do I/O-bound asynchronous work in parallel. I would suggest using Task.WhenAll. Something like this:
try {
await Task.WhenAll(chunks.Select(MakeCall));
}
catch (Exception) {
_client.CancelPendingRequests();
}
private async Task MakeCall(string chunk) {
var response = await _client.PostAsync(chunk);
if (!response.IsFaulted) {
//Do something
}
else {
this._logger.Error("log the error");
throw new Exception("call failed!");
}
}

Related

Parallel HttpClient requests timing out due to async problem?

I'm running a method synchronously in parallel using System.Threading.Tasks.Parallel.ForEach. At the end of the method, it needs to make a few dozen HTTP POST requests, which do not depend on each other. Since I'm on .NET Framework 4.6.2, System.Net.Http.HttpClient is exclusively asynchronous, so I'm using Nito.AsyncEx.AsyncContext to avoid deadlocks, in the form:
public static void MakeMultipleRequests(IEnumerable<MyClass> enumerable)
{
AsyncContext.Run(async () => await Task.WhenAll(enumerable.Select(async c =>
await getResultsFor(c).ConfigureAwait(false))));
}
The getResultsFor(MyClass c) method then creates an HttpRequestMessage and sends it using:
await httpClient.SendAsync(request);
The response is then parsed and the relevant fields are set on the instance of MyClass.
My understanding is that the synchronous thread will block at AsyncContext.Run(...), while a number of tasks are performed asynchronously by the single AsyncContextThread owned by AsyncContext. When they are all complete, the synchronous thread will unblock.
This works fine for a few hundred requests, but when it scales up to a few thousand over five minutes, some of the requests start returning HTTP 408 Request Timeout errors from the server. My logs indicate that these timeouts are happening at the peak load, when there are the most requests being sent, and the timeouts happen long after many of the other requests have been received back.
I think the problem is that the tasks are awaiting the server handshake inside HttpClient, but they are not continued in FIFO order, so by the time they are continued the handshake has expired. However, I can't think of any way to deal with this, short of using a System.Threading.SemaphoreSlim to enforce that only one task can await httpClient.SendAsync(...) at a time.
My application is very large, and converting it entirely to async is not viable.

This isn't something that can be done with wrapping the tasks before blocking. For starters, if the requests go through, you may end up nuking the server. Right now you're nuking the client. There's a 2 concurrent-request per domain limit in .NET Framework that can be relaxed, but if you set it too high you may end up nuking the server.
You can solve this by using DataFlow blocks in a pipeline to execute requests with a fixed degree of parallelism and then parse them. Let's say you have a class called MyPayload with lots of Items in a property:
ServicePointManager.DefaultConnectionLimit = 1000;
var options=new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
var downloader=new TransformBlock<string,MyPayload>(async url=>{
var json=await _client.GetStringAsync(url);
var data=JsonConvert.DeserializeObject<MyPayload>(json);
return data;
},options);
var importer=new ActionBlock<MyPayload>(async data=>
{
var items=data.Items;
using(var connection=new SqlConnection(connectionString))
using(var bcp=new SqlBulkCopy(connection))
using(var reader=ObjectReader.Create(items))
{
bcp.DestinationTableName = destination;
connection.Open();
await bcp.WriteToServerAsync(reader);
}
});
downloader.LinkTo(importer,new DataflowLinkOptions {
PropagateCompletion=true
});
I'm using FastMember's ObjectReader to wrap the items in a DbDataReader that can be used to bulk insert the records to a database.
Once you have this pipeline, you can start posting URLs to the head block, downloader :
foreach(var url in hugeList)
{
downloader.Post(url);
}
downloader.Complete();
Once all URLs are posted, you tell donwloader to complete and await for the last block in the pipeline to finish with :
await importer.Completion;

Firstly, Nito.AsyncEx.AsyncContext will execute on a threadpool thread; to avoid deadlocks in the way described requires an instance of Nito.AsyncEx.AsyncContextThread, as outlined in the documentation.
There are two possible causes:
a bug in System.Net.Http.HttpClient in .NET Framework 4.6.2
the continuation priority issue outlined in the question, in which individual requests are not continued promptly enough and so time out.
As described at this answer and its comments, from a similar question, it may be possible to deal with the priority problem using a custom TaskScheduler, but throttling the number of concurrent requests using a semaphore is probably the best answer:
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using Nito.AsyncEx;
public class MyClass
{
private static readonly AsyncContextThread asyncContextThread
= new AsyncContextThread();
private static readonly HttpClient httpClient = new HttpClient();
private static readonly SemaphoreSlim semaphore = new SemaphoreSlim(10);
public HttpRequestMessage Request { get; set; }
public HttpResponseMessage Response { get; private set; }
private async Task GetResponseAsync()
{
await semaphore.WaitAsync();
try
{
Response = await httpClient.SendAsync(Request);
}
finally
{
semaphore.Release();
}
}
public static void MakeMultipleRequests(IEnumerable<MyClass> enumerable)
{
Task.WaitAll(enumerable.Select(c =>
asyncContextThread.Factory.Run(() =>
c.GetResponseAsync())).ToArray());
}
}
Edited to use AsyncContextThread for executing async code on non-threadpool thread, as intended. AsyncContext does not do this on its own.

Multiple HttpClients with proxies, trying to achieve maximum download speed

I need to use proxies to download a forum. The problem with my code is that it takes only 10% of my internet bandwidth. Also I have read that I need to use a single HttpClient instance, but with multiple proxies I don't know how to do it. Changing MaxDegreeOfParallelism doesn't change anything.
public static IAsyncEnumerable<IFetchResult> FetchInParallelAsync(
this IEnumerable<Url> urls, FetchContext context)
{
var fetchBlcock = new TransformBlock<Url, IFetchResult>(
transform: url => url.FetchAsync(context),
dataflowBlockOptions: new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 128
}
);
foreach(var url in urls)
fetchBlcock.Post(url);
fetchBlcock.Complete();
var result = fetchBlcock.ToAsyncEnumerable();
return result;
}
Every call to FetchAsync will create or reuse a HttpClient with a WebProxy.
public static async Task<IFetchResult> FetchAsync(this Url url, FetchContext context)
{
var httpClient = context.ProxyPool.Rent();
var result = await url.FetchAsync(httpClient, context.Observer, context.Delay,
context.isReloadWithCookie);
context.ProxyPool.Return(httpClient);
return result;
}
public HttpClient Rent()
{
lock(_lockObject)
{
if (_uninitiliazedDatacenterProxiesAddresses.Count != 0)
{
var proxyAddress = _uninitiliazedDatacenterProxiesAddresses.Pop();
return proxyAddress.GetWebProxy(DataCenterProxiesCredentials).GetHttpClient();
}
return _proxiesQueue.Dequeue();
}
}
I am a novice at software developing, but the task of downloading using hundreds or thousands of proxies asynchronously looks like a trivial task that many should have been faced with and found a correct way to do it. So far I was unable to find any solutions to my problem on the internet. Any thoughts of how to achieve maximum download speed?

Let's take a look at what happens here:
var result = await url.FetchAsync(httpClient, context.Observer, context.Delay, context.isReloadWithCookie);
You are actually awaiting before you continue with the next item. That's why it is asynchronous and not parallel programming. async in Microsoft docs
The await keyword is where the magic happens. It yields control to the caller of the method that performed await, and it ultimately allows a UI to be responsive or a service to be elastic.
In essence, it frees the calling thread to do other stuff but the original calling code is suspended from executing, until the IO operation is done.
Now to your problem:
You can either use this excellent solution here: foreach async
You can use the Parallel library to execute your code in different threads.
Something like the following from Parallel for example
Parallel.For(0, urls.Count,
index => fetchBlcock.Post(urls[index])
});

Reading from multiple WebSockets with async/await

I'm writing a .NET Core Console App that needs to continuously read data from multiple WebSockets. My current approach is to create a new Task (via Task.Run) per WebSocket that runs an infinite while loop and blocks until it reads the data from the socket. However, since the data is pushed at a rather low frequency, the threads just block most of the time which seems quite inefficient.
From my understanding, the async/await pattern should be ideal for blocking I/O operations. However, I'm not sure how to apply it for my situation or even if async/await can improve this in any way - especially since it's a Console app.
I've put together a proof of concept (doing a HTTP GET instead of reading from WebSocket for simplicity). The only way I was able to achieve this was without actually awaiting. Code:
static void Main(string[] args)
{
Console.WriteLine($"ThreadId={ThreadId}: Main");
Task task = Task.Run(() => Process("https://duckduckgo.com", "https://stackoverflow.com/"));
// Do main work.
task.Wait();
}
private static void Process(params string[] urls)
{
Dictionary<string, Task<string>> tasks = urls.ToDictionary(x => x, x => (Task<string>)null);
HttpClient client = new HttpClient();
while (true)
{
foreach (string url in urls)
{
Task<string> task = tasks[url];
if (task == null || task.IsCompleted)
{
if (task != null)
{
string result = task.Result;
Console.WriteLine($"ThreadId={ThreadId}: Length={result.Length}");
}
tasks[url] = ReadString(client, url);
}
}
Thread.Yield();
}
}
private static async Task<string> ReadString(HttpClient client, string url)
{
var response = await client.GetAsync(url);
Console.WriteLine($"ThreadId={ThreadId}: Url={url}");
return await response.Content.ReadAsStringAsync();
}
private static int ThreadId => Thread.CurrentThread.ManagedThreadId;
This seems to be working and executing on various Worker Threads on the ThreadPool. However, this definitely doesn't seem as any typical async/await code which makes me think there has to be a better way.
Is there a more proper / more elegant way of doing this?

You've basically written a version of Task.WhenAny that uses a CPU loop to check for completed tasks rather than... whatever magic the framework method uses behind the scenes.
A more idiomatic version might look like this. (Although it might not - I feel like there should be an easier method of "re-run the completed task" than the reverse dictionary I've used here.)
static void Main(string[] args)
{
Console.WriteLine($"ThreadId={ThreadId}: Main");
// No need for Task.Run here.
var task = Process("https://duckduckgo.com", "https://stackoverflow.com/");
task.Wait();
}
private static async Task Process(params string[] urls)
{
// Set up initial dictionary mapping task (per URL) to the URL used.
HttpClient client = new HttpClient();
var tasks = urls.ToDictionary(u => client.GetAsync(u), u => u);
while (true)
{
// Wait for any task to complete, get its URL and remove it from the current tasks.
var firstCompletedTask = await Task.WhenAny(tasks.Keys);
var firstCompletedUrl = tasks[firstCompletedTask];
tasks.Remove(firstCompletedTask);
// Do work with completed task.
try
{
Console.WriteLine($"ThreadId={ThreadId}: URL={firstCompletedUrl}");
using (var response = await firstCompletedTask)
{
var content = await response.Content.ReadAsStringAsync();
Console.WriteLine($"ThreadId={ThreadId}: Length={content.Length}");
}
}
catch (Exception ex)
{
Console.WriteLine($"ThreadId={ThreadId}: Ex={ex}");
}
// Queue the task again.
tasks.Add(client.GetAsync(firstCompletedUrl), firstCompletedUrl);
}
}
private static int ThreadId => Thread.CurrentThread.ManagedThreadId;

I've accepted Rawling's answer - I believe it is correct for the exact scenario I described. However, with a bit of inverted logic, I ended up with something way simpler - leaving it in case anyone needs something like this:
static void Main(string[] args)
{
string[] urls = { "https://duckduckgo.com", "https://stackoverflow.com/" };
HttpClient client = new HttpClient();
var tasks = urls.Select(async url =>
{
while (true) await ReadString(client, url);
});
Task.WhenAll(tasks).Wait();
}
private static async Task<string> ReadString(HttpClient client, string url)
{
var response = await client.GetAsync(url);
string data = await response.Content.ReadAsStringAsync();
Console.WriteLine($"Fetched data from url={url}. Length={data.Length}");
return data;
}

Maybe better question is: do you really need thread per socket in this case? You should think of threads as system-wide resource and you should take this into consideration when spawning them, especially if you don't really know the number of threads that your application will be using. This is a good read: What's the maximum number of threads in Windows Server 2003?
Few years ago .NET team introduced Asynchronous sockets.
...The client is built with an asynchronous socket, so execution of
the client application is not suspended while the server returns a
response. The application sends a string to the server and then
displays the string returned by the server on the console.
Asynchronous Client Socket Example
There are a lot more examples out there showcasing this approach. While it is a bit more complicated and "low level" it let's you be in control.

Xamarin Gcm Network Manager await httpclient

I'm using the Gcm Network Manager to schedule tasks, in one of those tasks I need to perform an HTTP request. Until now it was written with HttpWebRequest so nothing was async.
Now I would like to reuse code that is written with HttpClient and is async.
The problem that arrises is that I cannot make the OnRunTask() async as it needs to return an int:
e.g.
public override int OnRunTask(TaskParams #params)
{
var result = await performSync();
if(result)
{
return GcmNetworkManager.ResultSuccess;
}
return GcmNetworkManager.ResultReschedule;
}
What could I do to be able to reuse async code here ?

You can use Task.Run inside your OnRunTask method like this :
Task.Run( async () =>
{
// Do your stuff here
await asyncTask();
});
You will no need to have OnRunTask async with this technique
Hope it helps
Edit
If you need the return value to match the framework / library signature, you can also use .Result
E.g.
var result = asyncTask().Result;
...

Relating Task exception to a Task<T> response

I have just started working with tasks. We have a system setup that uses requests/responses. The service running the tasks accepts a master request that has a list of request objects and returns a master response that has a list of response objects. So it looks something like this
var MasterRequest = new MasterRequest;
MasterRequest.Requests.Add(new BlueRequest);
MasterRequest.Requests.Add(new RedRequest);
MasterRequest.Requests.Add(new YellowRequest);
The request implements a simple IRequest interface and each color is a concrete class. The service has concrete classes (request processors) set up to be able to process each request separately and simultaneously according to a concrete request object. Each concrete class on the service has a GetTask method with a signature like this:
Task<IResponse> GetTask(IRequest);
{
// some setup stuff
return Task.Factory.StartNew<IResponse>(() =>
{
// do task stuff
return response; // implements IResponse
});
}
My service takes the passed in MasterRequest and builds a list of tasks by calling the GetTask call listed above on the concrete request processors. I then use a Parallel.ForEach on the list to process the tasks.
// this is what is returned from the service.
// it has a List<IResponse> on it to hold the resposnes
MasterResposne resposne = new MasterResponse();
List<Task<IResponse>> tasks = new List<Task<IResponse>>();
foreach(IRequest req in MasterRequest.Requests)
{
// factory to get the proper request processor
RequestProcessor p = rp.GetProcessor(req);
tasks.add(p.GetTask(req));
}
Parallel.ForEach(tasks, t =>
{
t.Wait();
// check for faulted and cancelled
// this is where I need help
response.Responses.Add(t.Result);
}
This all works great. But if the task throws an exception I don't know how to tie it back to the specific concrete request that triggered it. I need to know so I can pass back a properly built response to the caller.
My first thought was to subclass Task but that brings up it's own set of issues that I don't want to deal with.
I read this SO article and it seems like I want to do something like this
Is this ok to derive from TPL Task to return more details from method?
I think Reed's second example is my solution but I still cannot see how to run the tasks simultaneously and be able to tie exceptions to the request so I can return a properly built list of responses.
Thanks in advance.

So I was able to use Reed's solution from the link I supplied. My service code to process the requests turned into this
// this is what is returned from the service.
// it has a List<IResponse> on it to hold the resposnes
MasterResposne resposne = new MasterResponse();
List<ExecutionResult> tasks = new List<ExecutionResult>();
foreach(IRequest req in MasterRequest.Requests)
{
// factory to get the proper request processor
RequestProcessor p = rp.GetProcessor(req);
tasks.add(p.GetResult(req));
}
Parallel.ForEach(tasks, t =>
{
t.task.Wait();
response.Responses.Add(t.Result);
}
Where ExecutionResult is defined like so
class ExecutionResult
{
public IResult Result;
public Task<IResponse> task;
}
That gives me access to a pre-built response object so I can pass it back to the caller.
EDIT:
So I reviewed my Parallel.ForEach and was able to redo my code and use await Task.WhenAll as suggested. New code looks more like this:
// this is what is returned from the service.
// it has a List<IResponse> on it to hold the resposnes
MasterResposne resposne = new MasterResponse();
List<ExecutionResult> tasks = new List<ExecutionResult>();
List<ExecutionResult> executionResults = new List<ExecutionResult>();
foreach(IRequest req in MasterRequest.Requests)
{
// factory to get the proper request processor
RequestProcessor p = rp.GetProcessor(req);
ExecutionResult er = engine.GetResult(req);
executionResults.Add(er);
tasks.Add(er.Task);
}
await Task.WhenAll<IResponse>(tasks);
foreach (ExecutionResult r in executionResults)
{
if (r.Task.IsCompleted)
{
response.AddResponse(r.Task.Result);
}
else
{
r.Response.Status = false;
AggregateException flat = r.Task.Exception.Flatten();
foreach (Exception e in flat.InnerExceptions)
{
Log.ErrorFormat("Reqest [{0}] threw [{1}]", r.Response.RequestId, e);
r.Response.StatusReason.AppendLine(e.Message);
}
}
}
This allows me to tie my request information to my task and get the response back that I need to return to my caller.
Thanks for the guidance.

I then use a Parallel.ForEach on the list to process the tasks.
This is actually pretty bad. It's throwing a ton of threads into the mix just to block on the tasks completing.
But if the task throws an exception I don't know how to tie it back to the specific concrete request that triggered it. I need to know so I can pass back a properly built response to the caller.
Whenever you have a "process tasks after they complete" kind of problem, usually the best solution is a higher-level asynchronous operation:
private async Task<IResponse> ProcessAsync(IRequest request)
{
try
{
return await engine.GetResult(request);
}
catch (Exception ex)
{
IResponse result = /* create error response */;
return result;
}
}
This allows a much simpler main function:
MasterResposne resposne = new MasterResponse();
var tasks = MasterRequest.Requests.Select(req => ProcessAsync(req));
response.AddRange(await Task.WhenAll(tasks));

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Run HttpClient POST in parallel - c#

Related

Parallel HttpClient requests timing out due to async problem?

Multiple HttpClients with proxies, trying to achieve maximum download speed

Reading from multiple WebSockets with async/await

Xamarin Gcm Network Manager await httpclient

Relating Task exception to a Task<T> response

Categories

Resources