Does re-using HttpClient with simultaneous requests cause them to queue up? - c#

I'm writing a responsive API. We have to handle 10 requests per second.
The problem is, sending a response takes half a second. So you can imagine, the server is overwhelmed quickly.
I made the code process asynchronously, up to 10 tasks at once, to help mitigate this.
However I have concerns about whether using a single instance of HttpClient is the correct approach. The advice as soon as someone mentions HttpClient is always create a single instance of it.
I have a static instance of it. Although it is thread-safe, at least for PostAsync, should I really create 10 HttpClients (or a pool of HttpClients) to be able to send data out faster?
I assume that during the half a second it's sending out, that it won't let you send out other 'postasync's. However I can't confirm this behaviour.
Most benchmarks and resources simply look at sending requests synchronously, i.e. one after the other (i.e. await postasync)
However for my use case I need to send several simultaneously, i.e. from separate threads. The only way to reply to 10 messages per second that take half a second each is to to send five simultanous messages back - not five queued to go out one by one, but five simultaneous messages.
I cannot find any documentation on how HttpClient handles this. I've only seen a few references to it having a connection pool, but it's unclear whether it will actually perform multiple connections simultaneously, or if I need to create a small pool of 5 httpclients to rotate through.
Question: Does a single instance of HttpClient support multiple connections simultaneously?
And I don't mean just letting you call postasync lots of times in a thread-safe way before it has finished, but I mean truly opening five simultaneous connections and sending data through each of them at the exact same time?
An example would be, you're sending fifty 10 byte files to the moon, and there is a latency of 10 seconds. Your program scoops up all fifty files and makes fifty calls to HttpClient.PostAsync almost instantly.
Assuming the listening service can support it, would the cross-thread calls to HttpClient.PostAsync open fifty connections (or whatever, some limit, but more than 1) and send the data, meaning that the server receives all fifty files ~10 seconds later?
Or would it internally queue them up and you'd end up waiting 10x50=500 seconds?

Seems there is no limit, or at least, it's a high one.
I made a default web api application, modified the boilerplate controller method to this:
// GET api/values
public async Task<IEnumerable<string>> Get()
{
Debug.Print("Called");
Thread.Sleep(100000);
return new string[] { "value1", "value2" };
}
I then made a program that using a single instance of HttpClient, would make lots of simultaneous connections using Task.Run.
List<Task> tasks = new List<Task>();
var task1 = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var task2 = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var task3 = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var task4 = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var task5 = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var task6 = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var task7 = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var task8 = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var task9 = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var taskA = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var taskB = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var taskC = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var taskD = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
var taskE = Task.Run(() => httpClient.GetAsync("http://localhost:57984/api/values"));
await Task.WhenAll(task1, task2, task3, task4, task5, task6, task7, task8, task9, taskA, taskB, taskC, taskD);
I ran them and the word 'Called' was logged 14 times.
Since the Thread.Sleep will have blocked the response, it should mean there were 14 simultaneous connections.
There are two properties that might affect the maximum number of connections, that I've found by looking on google:
ServicePointManager.DefaultConnectionLimit which is defaulted to 2
and, the HttpClientHandler.MaxConnectionsPerServer which is also 2.
As I'm able to make many more than 2 connections, I really don't know if it's just ignored, or if these are the wrong settings, or what. Changing them appears to have no effect.
I noticed after a lot of stopping and starting my test projects that new connections were much slower to be made. I am guessing that I saturated the connection pool.
My conclusion is that if you set those two values to something higher (just in case, I mean, why not), then you can use a single httpclient concurrently where the connections will be truly concurrent, rather than sequential and thread safe.

However I can't confirm this behaviour.
Why not? Just create a webapi with a few seconds delay, and test calling it with HttpClient. Or you can use a service like Slowwly.
static async Task Main(string[] args)
{
var stopwatch = Stopwatch.StartNew();
await Serial(stopwatch);
Console.WriteLine($"Serial took {stopwatch.Elapsed}");
stopwatch.Restart();
await Concurrent(stopwatch);
Console.WriteLine($"Concurrent took {stopwatch.Elapsed}");
}
private static async Task Serial(Stopwatch stopwatch)
{
for (var i = 0; i != 5; ++i)
{
var client = new HttpClient();
await MakeRequest(stopwatch, client);
}
}
private static async Task Concurrent(Stopwatch stopwatch)
{
var client = new HttpClient();
var tasks = Enumerable.Range(0, 5).Select(async _ => { await MakeRequest(stopwatch, client); }).ToList();
await Task.WhenAll(tasks);
}
private static async Task MakeRequest(Stopwatch stopwatch, HttpClient client)
{
Console.WriteLine($"{stopwatch.Elapsed}: Issuing request.");
var response = await client.GetStringAsync("http://slowwly.robertomurray.co.uk/delay/3000/url/http://www.google.com");
Console.WriteLine($"{stopwatch.Elapsed}: Received {response.Length} bytes.");
}
Output for me (from the US):
00:00:00.0463664: Issuing request.
00:00:04.2560734: Received 49237 bytes.
00:00:04.2562498: Issuing request.
00:00:07.6731908: Received 49247 bytes.
00:00:07.6734158: Issuing request.
00:00:11.0882322: Received 49364 bytes.
00:00:11.0883803: Issuing request.
00:00:14.4990981: Received 49294 bytes.
00:00:14.4993977: Issuing request.
00:00:17.9082167: Received 49328 bytes.
Serial took 00:00:17.9083969
00:00:00.0025096: Issuing request.
00:00:00.0252402: Issuing request.
00:00:00.0422682: Issuing request.
00:00:00.0588887: Issuing request.
00:00:00.0755351: Issuing request.
00:00:03.4631815: Received 49278 bytes.
00:00:03.4632073: Received 49293 bytes.
00:00:03.4844698: Received 49313 bytes.
00:00:03.4913929: Received 49308 bytes.
00:00:03.4915415: Received 49280 bytes.
Concurrent took 00:00:03.4917199
Question: Does a single instance of HttpClient support multiple connections simultaneously?
Yes.

Related

HttpClient SendAsync blocks main thread

I have written a little winforms application that sends http requests to every ip address within my local network to discover a certain device of mine. On my particular subnet mask thats 512 addresses. I have written this using backGroundWorker but I wanted to tryout httpClient and the Async/Await pattern to achieve the same thing. The code below uses a single instance of httpClient and I wait until all the requests have completed. This issue is that the main thread gets blocked. I know this because I have a picturebox + loading gif and its not animating uniformly. I put the GetAsync method in a Task.Run as suggested here but that didn't work either.
private async void button1_Click(object sender, EventArgs e)
{
var addresses = networkUtils.generateIPRange..
await MakeMultipleHttpRequests(addresses);
}
public async Task MakeMultipleHttpRequests(IPAddress[] addresses)
{
List<Task<HttpResponseMessage>> httpTasks = new List<Task<HttpResponseMessage>>();
foreach (var address in addresses)
{
Task<HttpResponseMessage> response = MakeHttpGetRequest(address.ToString());
httpTasks.Add(response);
}
try
{
if (httpTasks.ToArray().Length != 0)
{
await Task.WhenAll(httpTasks.ToArray());
}
}
catch (Exception ex)
{
Console.WriteLine("\thttp tasks did not complete Exception : {0}", ex.Message);
}
}
private async Task<HttpResponseMessage> MakeHttpGetRequest(string address)
{
var url = string.Format("http://{0}/getStatus", address);
var cts = new System.Threading.CancellationTokenSource();
cts.CancelAfter(TimeSpan.FromSeconds(10));
HttpResponseMessage response = null;
var request = new HttpRequestMessage(HttpMethod.Get, url);
response = await httpClient.SendAsync(request, cts.Token);
return response;
}
I have read a similar issue here but my gui thread is not doing much. I have read here that I maybe running out of threads. Is this the issue, how can I resolve it?
I know its the Send Async because if I replace the code with the simple task below there is no blocking.
await Task.Run(() =>
{
Thread.Sleep(1000);
});
So one of the issues here is that you are creating 500+ tasks one after another in quick succession with a timeout set outside the task creation.
Just because you ask to run 500+ tasks, doesn't mean 500+ tasks are all going to run at the same time. They get queued up and run when the scheduler deems it's possible.
You set a timeout at the time of creation of 10 seconds. But they could sit in the scheduler for 10 seconds before they even get executed.
You want to have your Http requests to timeout organically, you can do that like this when you create the HttpClient:
private static readonly HttpClient _httpClient = new HttpClient
{
Timeout = TimeSpan.FromSeconds(10)
};
So, by moving the timeout to the HttpClient, your method should now look like this:
private static Task<HttpResponseMessage> MakeHttpGetRequest(string address)
{
return _httpClient.SendAsync(new HttpRequestMessage(HttpMethod.Get, new UriBuilder
{
Host = address,
Path = "getStatus"
}.Uri));
}
Try using that method and see if it improves your lock-up issue in Debug mode.
As far as the issue you were having: It's locking up because you are in Debug mode and the debugger is trying to say "hey, you got an exception" 500 times all at the same time because they were all spawned at the same time. Run it in Release mode and see if it still locks up.
What I would consider doing is batching out your operations. Do 20, then wait until those 20 finish, do 20 more, so on and so forth.
If you'd like to see a slick way of batching tasks, let me know and I would be more than happy to show you.
On .NET Framework, the number of connections to a server is controlled by the ServicePointManager Class.
For a client, the default connection limit is 2 on client processes.
No matter how many HttpClient.SendAsync invocations you do, only 2 will be active at the same time.
But you can manage the connections yourself.
On .NET Core here isn't the concept of service point manager and the equivalent default limit is int.MaxValue.

Are these webrequests actually concurrent?

I have a UrlList of only 4 URLs which I want to use to make 4 concurrent requests. Does the code below truly make 4 requests which start at the same time?
My testing appears to show that it does, but am I correct in thinking that there will actually be 4 requests retrieving data from the URL target at the same time or does it just appear that way?
static void Main(string[] args)
{
var t = Do_TaskWhenAll();
t.Wait();
}
public static async Task Do_TaskWhenAll()
{
var downloadTasksQuery = from url in UrlList select Run(url);
var downloadTasks = downloadTasksQuery.ToArray();
Results = await Task.WhenAll(downloadTasks);
}
public static async Task<string> Run(string url)
{
var client = new WebClient();
AddHeaders(client);
var content = await client.DownloadStringTaskAsync(new Uri(url));
return content;
}
Correct, when ToArray is called, the enumerable downloadTasksQuery will yield a task for every URL, running your web requests concurrently.
await Task.WhenAll ensures your task completes only when all web requests have completed.
You can rewrite your code to be less verbose, while doing effectively the same, like so:
public static async Task Do_TaskWhenAll()
{
var downloadTasks = from url in UrlList select Run(url);
Results = await Task.WhenAll(downloadTasks);
}
There's no need for ToArray because Task.WhenAll will enumerate your enumerable for you.
I advice you to use HttpClient instead of WebClient. Using HttpClient, you won't have to create a new instance of the client for each concurrent request, as it allows you to reuse the same client for doing multiple requests, concurrently.
The short answer is yes: if you generate multiple Tasks without awaiting each one individually, they can run simultaneously, as long as they are truly asynchronous.
When DownloadStringTaskAsync is awaited, a Task is returned from your Run method, allowing the next iteration to occur whilst waiting for the response.
So the next HTTP request is allowed to be sent without waiting for the first to complete.
As an aside, your method can be written more concisely:
public static async Task Do_TaskWhenAll()
{
Results = await Task.WhenAll(UrlList.Select(Run));
}
Task.WhenAll has an overload that accepts IEnumerable<Task<TResult>> which is returned from UrlList.Select(Run).
No, there is no guarantee that your requests will be executed in parallel, or immediately.
Starting a task merely queues it to the thread pool. If all of the pool's threads are occupied, that task will necessarily wait until a thread frees up.
In your case, since there are a relatively large number of threads available in the pool, and you are queueing only a small number of items, the pool has no problem servicing them as they come in. The more tasks you queue at once, the more likely this is to change.
If you truly need concurrency, you need to be aware of what the thread pool size is, and how busy it is. The ThreadPool class will help you to manage this.

Throttle/block async http request

I have a number of producer tasks that push data into a BlockingCollection, lets call it requestQueue.
I also have a consumer task that pops requests from the requestQueue, and forwards async http requests to a remote web service.
I need to throttle or block the number of active requests sent to the web service. On some machines that are far away from the service or have a slower internet connection, the http response time is long enough that the number of active requests fills up more memory than I'd like.
At the moment I am using a semaphore approach, calling WaitOne on the consumer thread multiple times, and Release on the HTTP response callback. Is there a more elegant solution?
I am bound to .net 4.0, and would like a standard library based solution.
You are already using a BlockingCollection why have a WaitHandle?
The way I would do it is to have a BlockingCollection with n as it's bounded capacity where n is the maximum number of concurrent requests you want to have at any given time.
You can then do something like....
var n = 4;
var blockingQueue = new BlockingCollection<Request>(n);
Action<Request> consumer = request =>
{
// do something with request.
};
var noOfWorkers = 4;
var workers = new Task[noOfWorkers];
for (int i = 0; i < noOfWorkers; i++)
{
var task = new Task(() =>
{
foreach (var item in blockingQueue.GetConsumingEnumerable())
{
consumer(item);
}
}, TaskCreationOptions.LongRunning | TaskCreationOptions.DenyChildAttach);
workers[i] = task;
workers[i].Start();
}
Task.WaitAll(workers);
I let you take care of cancellation and error handling but using this you can also control how many workers you want to have at any given time, if the workers are busy sending and processing the request any other producer will be blocked until more room is available in the queue.

Optimizing for fire & forget using async/await and tasks

I have about 5 million items to update. I don't really care about the response (A response would be nice to have so I can log it, but I don't want a response if that will cost me time.) Having said that, is this code optimized to run as fast as possible? If there are 5 million items, would I run the risk of getting any task cancelled or timeout errors? I get about 1 or 2 responses back every second.
var tasks = items.Select(async item =>
{
await Update(CreateUrl(item));
}).ToList();
if (tasks.Any())
{
await Task.WhenAll(tasks);
}
private async Task<HttpResponseMessage> Update(string url)
{
var client = new HttpClient();
var response = await client.SendAsync(url).ConfigureAwait(false);
//log response.
}
UPDATE:
I am actually getting TaskCanceledExceptions. Did my system run out of threads? What could I do to avoid this?
You method will kick off all tasks at the same time, which may not be what you want. There wouldn't be any threads involved because with async operations There is no thread, but there may be number of concurrent connection limits.
There may be better tools to do this but if you want to use async/await one option is to use Stephen Toub's ForEachAsync as documented in this article. It allows you to control how many simultaneous operations you want to execute, so you don't overrun your connection limit.
Here it is from the article:
public static class Extensions
{
public static async Task ExecuteInPartition<T>(IEnumerator<T> partition, Func<T, Task> body)
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select ExecuteInPartition(partition, body));
}
}
Usage:
public async Task UpdateAll()
{
// Allow for 100 concurrent Updates
await items.ForEachAsync(100, async t => await Update(t));
}
A much better approach would be to use TPL Dataflow's ActionBlock with MaxDegreeOfParallelism and a single HttpClient:
Task UpdateAll(IEnumerable<Item> items)
{
var block = new ActionBlock<Item>(
item => UpdateAsync(CreateUrl(item)),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 1000});
foreach (var item in items)
{
block.Post(item);
}
block.Complete();
return block.Completion;
}
async Task UpdateAsync(string url)
{
var response = await _client.SendAsync(url).ConfigureAwait(false);
Console.WriteLine(response.StatusCode);
}
A single HttpClient can be used concurrently for multiple requests, and so it's much better to only create and disposing a single instance instead of 5 million.
There are numerous problems in firing so many request at the same time: The machine's network stack, the target web site, timeouts and so forth. The ActionBlock caps that number with the MaxDegreeOfParallelism (which you should test and optimize for your specific case). It's important to note that TPL may choose a lower number when it deems it to be appropriate.
When you have a single async call at the end of an async method or lambda expression, it's better for performance to remove the redundant async-await and just return the task (i.e return block.Completion;)
Complete will notify the ActionBlock to not accept any more items, but finish processing items it already has. When it's done the Completion task will be done so you can await it.
I suspect you are suffering from outgoing connection management preventing large numbers of simultaneous connections to the same domain. The answers given in this extensive Q+A might give you some avenues to investigate.
What is limiting the # of simultaneous connections my ASP.NET application can make to a web service?
In terms of your code structure, I'd personally try and use a dynamic pool of connections. You know that you cant actually get 5m connections simultaneously so trying to attempt it will just fail to work - you may as well deal with a reasonable and configured limit of (for instance) 20 connections and use them in a pool. In this way you can tune up or down.
alternatively you could investigate HTTP Pipelining (which I've not used) which is intended specifically for the job you are doing (batching up Http requests). http://en.wikipedia.org/wiki/HTTP_pipelining

Sending multiple requests to a server using multithreading

I have a task where I form thousands of requests which are later sent to a server. The server returns the response for each request and that response is then dumped to an output file line by line.
The pseudo code goes like this:
//requests contains thousands of requests to be sent to the server
string[] requests = GetRequestsString();
foreach(string request in requests)
{
string response = MakeWebRequest(request);
ParseandDump(response);
}
Now, as can be seen the serve is handling my requests one by one. I want to make this entire process fast. The server in question is capable of handling multiple requests at a time. I want to apply multi-threading and send let's say 4 requests to the server at a time and dump the response in same thread.
Can you please give me any pointer to possible approaches.
You can take advantage of Task from .NET 4.0 and the new toy HttpClient, sample code below is showed how you send requests in parallel, then dump response in the same thread by using ContinueWith:
var httpClient = new HttpClient();
var tasks = requests.Select(r => httpClient.GetStringAsync(r).ContinueWith(t =>
{
ParseandDump(t.Result);
}));
Task uses ThreadPool under the hood, so you don't need to specify how many threads should be used, ThreadPool will manage this for you in optimized way.
The easiest way would be to use Parallel.ForEach like this:
string[] requests = GetRequestsString();
Parallel.ForEach(requests, request => ParseandDump(MakeWebRequest(request)));
.NET framework 4.0 or greater is required to use Parallel.
I think this could be done in a consumer-producer-pattern. You could use a ConcurrentQueue (from the namespace System.Collections.Concurrent) as a shared resource between the many parallel WebRequests and the dumping thread.
The pseudo code would be something like:
var requests = GetRequestsString();
var queue = new ConcurrentQueue<string>();
Task.Factory.StartNew(() =>
{
Parallel.ForEach(requests , currentRequest =>
{
queue.Enqueue(MakeWebRequest(request));
}
});
Task.Factory.StartNew(() =>
{
while (true)
{
string response;
if (queue.TryDequeue(out response))
{
ParseandDump(response);
}
}
});
Maybe a BlockingCollection might serve you even better, depending on how you want to go about synchronizing the threads to signal the end of incoming requests.

Categories

Resources