I have an API that must call in parallel 4 HttpClients supporting a concurrency of 500 users per second (all of them calling the API at the same time)
There must be a strict timeout letting the API to return a result even if not all the HttpClients calls have returned a value.
The endpoints are external third party APIs and I don't have any control on them or know the code.
I did extensive research on the matter, but even if many solution works, I need the one that consume less CPU as possible since I have a low server budget.
So far I came up with this:
var conn0 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var conn1 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var conn2 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var conn3 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var list = new List<HttpClient>() { conn0, conn1, conn2, conn3 };
var timeout = TimeSpan.FromMilliseconds(1000);
var allTasks = new List<Task<Task>>();
//the async DoCall method just call the HttpClient endpoint and return a MyResponse object
foreach (var call in list)
{
allTasks.Add(Task.WhenAny(DoCall(call), Task.Delay(timeout)));
}
var completedTasks = await Task.WhenAll(allTasks);
var allResults = completedTasks.OfType<Task<MyResponse>>().Select(task => task.Result).ToList();
return allResults;
I use WhenAny and two tasks, one for the call, one for the timeout.If the call task is late, the other one return anyway.
Now, this code works perfectly and everything is async, but I wonder if there is a better way of achieving this.
Ever single call to this API creates lot of threads and with 500concurrent users it needs an avarage of 8(eight) D3_V2 Azure 4-core machines resulting in crazy expenses, and the higher the timeout is, the higher the CPU use is.
Is there a better way to do this without using so many CPU resources (maybe Parallel Linq a better choice than this)?
Is the HttpClient timeout alone sufficient to stop the call and return if the endpoint do not reply in time, without having to use the second task in WhenAny?
UPDATE:
The endpoints are third party APIs, I don't know the code or have any control, the call is done in JSON and return JSON or a string.
Some of them reply after 10+ seconds once in a while or got stuck and are extremely slow,so the timeout is to free the threads and return even if with partial data from the other that returned in time.
Caching is possible but only partially since the data change all the time, like stocks and forex real time currency trading.
Your approach using the two tasks just for timeout do work, but you can do a better thing: use CancellationToken for the task, and for getting the answers from a server:
var cts = new CancellationTokenSource();
// set the timeout equal to the 1 second
cts.CancelAfter(1000);
// provide the token for your request
var response = await client.GetAsync(url, cts.Token);
After that, you simply can filter the completed tasks:
var allResults = completedTasks
.Where(t => t.IsCompleted)
.Select(task => task.Result).ToList();
This approach will decrease the number of tasks you're creating no less than two times, and will decrease the overhead on your server. Also, it will provide you a simple way to cancel some part of the handling or even whole one. If your tasks are completely independent from each other, you may use a Parallel.For for calling the http clients, yet still usee the token for cancelling the operation:
ParallelLoopResult result = Parallel.For(list, call => DoCall(call, cts.Token));
// handle the result of the parallel tasks
or, using the PLINQ:
var results = list
.AsParallel()
.Select(call => DoCall(call, cts.Token))
.ToList();
Related
I need to use proxies to download a forum. The problem with my code is that it takes only 10% of my internet bandwidth. Also I have read that I need to use a single HttpClient instance, but with multiple proxies I don't know how to do it. Changing MaxDegreeOfParallelism doesn't change anything.
public static IAsyncEnumerable<IFetchResult> FetchInParallelAsync(
this IEnumerable<Url> urls, FetchContext context)
{
var fetchBlcock = new TransformBlock<Url, IFetchResult>(
transform: url => url.FetchAsync(context),
dataflowBlockOptions: new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 128
}
);
foreach(var url in urls)
fetchBlcock.Post(url);
fetchBlcock.Complete();
var result = fetchBlcock.ToAsyncEnumerable();
return result;
}
Every call to FetchAsync will create or reuse a HttpClient with a WebProxy.
public static async Task<IFetchResult> FetchAsync(this Url url, FetchContext context)
{
var httpClient = context.ProxyPool.Rent();
var result = await url.FetchAsync(httpClient, context.Observer, context.Delay,
context.isReloadWithCookie);
context.ProxyPool.Return(httpClient);
return result;
}
public HttpClient Rent()
{
lock(_lockObject)
{
if (_uninitiliazedDatacenterProxiesAddresses.Count != 0)
{
var proxyAddress = _uninitiliazedDatacenterProxiesAddresses.Pop();
return proxyAddress.GetWebProxy(DataCenterProxiesCredentials).GetHttpClient();
}
return _proxiesQueue.Dequeue();
}
}
I am a novice at software developing, but the task of downloading using hundreds or thousands of proxies asynchronously looks like a trivial task that many should have been faced with and found a correct way to do it. So far I was unable to find any solutions to my problem on the internet. Any thoughts of how to achieve maximum download speed?
Let's take a look at what happens here:
var result = await url.FetchAsync(httpClient, context.Observer, context.Delay, context.isReloadWithCookie);
You are actually awaiting before you continue with the next item. That's why it is asynchronous and not parallel programming. async in Microsoft docs
The await keyword is where the magic happens. It yields control to the caller of the method that performed await, and it ultimately allows a UI to be responsive or a service to be elastic.
In essence, it frees the calling thread to do other stuff but the original calling code is suspended from executing, until the IO operation is done.
Now to your problem:
You can either use this excellent solution here: foreach async
You can use the Parallel library to execute your code in different threads.
Something like the following from Parallel for example
Parallel.For(0, urls.Count,
index => fetchBlcock.Post(urls[index])
});
Note: I am running on .NET Framework 4.6.2
Background
I have a long running Windows Services that, once a minute, queues up a series of business related tasks that are ran on their own threads that are each awaited on by the main thread. There can only be one set of business related tasks running at the same time, as to disallow for race conditions. At certain points, each business task makes a series of asynchronous calls, in parallel, off to an external API via an HttpClient in a singleton wrapper. This results in anywhere between 20-100 API calls per second being made via HttpClient.
The issue
About twice a week for the past month, a deadlock issue (I believe) has been cropping up. Whenever it does happen, I have been restarting the Windows Service frequently as we can't afford to have the service going down for more than 20 minutes at a time without it causing serious business impact. From what I can see, any one of the business tasks will try sending a set of API calls and further API calls made using the HttpClient will fail to ever return, resulting in the task running up against a fairly generous timeout on the cancellation token that is created for each business task. I can see that the requests are reaching the await HttpClientInstance.SendAsync(request, cts.Token).ConfigureAwait(false) line, but do not advance past it.
For a additional clarification here, once the first business task begins deadlocking with HttpClient, any new threads attempting to send API requests using the HttpClient end up timing out. New business threads are being queued up, but they cannot utilize the instance of HttpClient at all.
Is this a deadlocking situation? If so, how do I avoid it?
Relevant Code
HttpClientWrapper
public static class HttpClientWrapper
{
private static HttpClientHandler _httpClientHandler;
//legacy class that is extension of DelegatingHandler. I don't believe we are using any part of
//it outside of the inner handler. This could probably be cleaned up a little more to be fair
private static TimeoutHandler _timeoutHandler;
private static readonly Lazy<HttpClient> _httpClient =
new Lazy<HttpClient>(() => new HttpClient(_timeoutHandler));
public static HttpClient HttpClientInstance => _httpClient.Value;
public static async Task<Response> CallAPI(string url, HttpMethod httpMethod, CancellationTokenSource cts, string requestObj = "")
{
//class that contains fields for logging purposes
var response = new Response();
string accessToken;
var content = new StringContent(requestObj, Encoding.UTF8, "application/json");
var request = new HttpRequestMessage(httpMethod, new Uri(url));
if (!string.IsNullOrWhiteSpace(requestObj))
{
request.Content = content;
}
HttpResponseMessage resp = null;
try
{
resp = await HttpClientInstance.SendAsync(request, cts.Token).ConfigureAwait(false);
}
catch (Exception ex)
{
if ((ex.InnerException is OperationCanceledException || ex.InnerException is TaskCanceledException) && !cts.IsCancellationRequested)
throw new TimeoutException();
throw;
}
response.ReturnedJson = await resp.Content.ReadAsStringAsync();
// non-relevant post-call variables being set for logging...
return response;
}
//called on start up of the Windows Service
public static void SetProxyUse(bool useProxy)
{
if (useProxy || !ServerEnv.IsOnServer)
{
_httpClientHandler = new HttpClientHandler
{
UseProxy = true,
Proxy = new WebProxy {Address = /* in-house proxy */},
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};
}
else
{
_httpClientHandler = new HttpClientHandler
{
UseProxy = false,
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};
}
_handler = new TimeoutHandler
{
DefaultTimeout = TimeSpan.FromSeconds(120),
InnerHandler = _httpClientHandler
};
}
}
Generalized function from a business class
For more context.
//Code for generating work parameters in each batch of work
...
foreach (var workBatch in batchesOfWork)
{
var tasks = workBatch.Select(async batch =>
workBatch.Result = await GetData(/* work related parms*/)
);
await Task.WhenAll(tasks);
}
...
GetData() function
//code for formating url
try
{
response = await HttpClientWrapper.CallAPI(formattedUrl, HttpMethod.Get, cts);
}
catch (TimeoutException)
{
//retry logic
}
...
//JSON deserialization, error handling, etc.....
Edit
I forgot to mention that this also set on start-up.
ServicePointManager
.FindServicePoint(/* base uri for the API that we are contacting*/)
.ConnectionLeaseTimeout = 60000; // 1 minute
ServicePointManager.DnsRefreshTimeout = 60000;
The above mentioned code example shows that a common instance of HttpClient is being used by all the running applications.
Microsoft documentation recommends that the HttpClient object be instantiated once per application, rather than per-use.
This recommendation is applicable for the requests within one application.
This is for the purpose of ensuring common connection settings for all requests made to specific destination API.
However, when there are multiple applications, then the recommended approach is to have one instance of HttpClient per application instance, in order to avoid the scenario of one application waiting for the other to finish.
Removing the static keyword for the HttpClientWrapper class and updating the code so that each application can have its own instance of HttpClient will resolve the reported problem.
More information:
https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclient?view=netcore-3.1
After taking #David Browne - Microsoft's advice in the comment section, I changed the default amount of connections from the default (2) to the API provider's rate limit for my organization (100) and that seems to have done the trick. It has been several days since I've installed the change to production, and it is humming along nicely.
Additionally, I slimmed down the HttpClientWrapper class I had to contain the CallAPI function and a default HttpClientHandler implementation with the proxy/decompression settings I have above. It doesn't override the default timer anymore, as my thought is is that I should just retry the API call if it takes more than the default 100 seconds.
To anyone stumbling upon this thread:
1) One HttpClient being used throughout the entirety of your application will be fine, no matter the amount of threads or API calls being done by it. Just make sure to increase the number of DefaultConnections via the ServicePointManager. You also DO NOT have to use the HttpClient in a using context. It will work just fine in a lazy singleton as I demonstrate above. Don't worry about disposing of the HttpClient in a long running service.
2) Use async-await throughout your application. It is worth the pay-off as it makes the application much more readable and allows your threads to be freed up as you are awaiting a response back from the API. This might seem obvious, but it isn't if you haven't used the async-await architecture in an application before.
I have a Web API method that takes in a list of strings, performs a web request for each of those strings, and compiles all of the data into a list for returning.
The input list can be variable length, up into the thousands. Do I need to manually limit the number of concurrent tasks, by batching them into groups, or is it safe to create thousands of tasks and await them with Task.WhenAll()? Here is a snippet of what I am using now:
public async Task<List<Customer>> GetDashboard(List<string> customerIds)
{
HttpClient client = new HttpClient();
var tasks = new List<Task>();
var customers = new List<Customer>();
foreach (var customerId in customerIds)
{
string customerIdCopy = customerId;
tasks.Add(client.GetStringAsync("http://testurl.com/" + customerId)
.ContinueWith(t => {
customers.Add(new Customer { Id = customerIdCopy, Data = t.Result });
}));
}
await Task.WhenAll(tasks);
return customers;
}
HttpClient can perform concurrent requests efficiently, with the caveat that it limits the number of concurrent requests to a single server.
If your requests are all going to the same site, the excess requests will be put into a queue. When requests are in this queue, the request timeout is ticking down... before it ever tries to connect to the server. So, manage that carefully, and if appropriate maybe even turn the timeout off.
Beyond this, it is perfectly fine to launch thousands of requests at once.
If you think that'll affect you, you can use a SemaphoreSlim or maybe TPL Dataflow to limit the number of concurrent requests.
The first thing which comes to mind is to delegate all this "multithreading performance" part to TPL. Use async in your requests instead of manually created tasks and ContinueWith.
It will also make C# take care about thread performance.
private async Task<Customer> GetDashboardAsync(string customerId)
{
using (var httpClient = new HttpClient())
{
string data = await httpClient.GetStringAsync("http://testurl.com/" + id);
return new Customer { Id = id, Data = data });
}
}
public async Task<Customer[]> GetDashboardAsync(List<string> customerIds)
{
var tasks = customerIds
.Select(GetDashboardAsync)
.ToArray();
return await Task.WhenAll(tasks);
}
Do I need to manually limit the number of concurrent tasks, by batching them into groups, or is it safe to create thousands of tasks and await them with Task.WhenAll()?
If you just create tasks using a List, foreach and ContinueWith, then it can cause performance drop due to the excessive number of tasks.
However, if you use async/await in your code as above, then you don't need to bother about it. TPL will use asynchronous tasks and continuations to provide the best performance.
You can just try this out to make sure that it works :)
Inside a c# project I'm making some calls to a web api, the thing is that I'm doing them within a loop in a method. Usually there are not so many but even though I was thinking of taking advantage of parallelism.
What I am trying so far is
public void DeployView(int itemId, string itemCode, int environmentTypeId)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(ConfigurationManager.AppSettings["ApiUrl"]);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var agents = _agentRepository.GetAgentsByitemId(itemId);
var tasks = agents.Select(async a =>
{
var viewPostRequest = new
{
AgentId = a.AgentId,
itemCode = itemCode,
EnvironmentId = environmentTypeId
};
var response = await client.PostAsJsonAsync("api/postView", viewPostRequest);
});
Task.WhenAll(tasks);
}
}
But wonder if that's the correct path, or should I try to parallel the whole DeployView (i.e. even before using the HttpClient)
Now that I see it posted, I reckon I can't just remove the variable response as well, just do the await without setting it to any variable
Thanks
Usually there is no need to parallelize the requests - one thread making async requests should be enough (even if you have hundreds of requests). Consider this code:
var tasks = agents.Select(a =>
{
var viewPostRequest = new
{
AgentId = a.AgentId,
itemCode = itemCode,
EnvironmentId = environmentTypeId
};
return client.PostAsJsonAsync("api/postView", viewPostRequest);
});
//now tasks is IEnumerable<Task<WebResponse>>
await Task.WhenAll(tasks);
//now all the responses are available
foreach(WebResponse response in tasks.Select(p=> p.Result))
{
//do something with the response
}
However, you can utilize parallelism when processing the responses. Instead of the above 'foreach' loop you may use:
Parallel.Foreach(tasks.Select(p=> p.Result), response => ProcessResponse(response));
But TMO, this is the best utilization of asynchronous and parallelism:
var tasks = agents.Select(async a =>
{
var viewPostRequest = new
{
AgentId = a.AgentId,
itemCode = itemCode,
EnvironmentId = environmentTypeId
};
var response = await client.PostAsJsonAsync("api/postView", viewPostRequest);
ProcessResponse(response);
});
await Task.WhenAll(tasks);
There is a major difference between the first and last examples:
In the first one, you have one thread launching async requests, waits (non blocking) for all of them to return, and only then processing them.
In the second example, you attach a continuation to each Task. That way, every response gets processed as soon as it arrives. Assuming the current TaskScheduler allows parallel (multithreaded) execution of Tasks, no response remains idle as in the first example.
*Edit - if you do decide to do it parallel, you can use just one instance of HttpClient - it's thread safe.
What you're introducing is concurrency, not parallelism. More on that here.
Your direction is good, though a few minor changes that I would make:
First, you should mark your method as async Task as you're using Task.WhenAll, which returns an awaitable, which you will need to asynchronously wait on. Next, You can simply return the operation from PostAsJsonAsync, instead of awaiting each call inside your Select. This will save a little bit of overhead as it won't generate the state-machine for the async call:
public async Task DeployViewAsync(int itemId, string itemCode, int environmentTypeId)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(ConfigurationManager.AppSettings["ApiUrl"]);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(
new MediaTypeWithQualityHeaderValue("application/json"));
var agents = _agentRepository.GetAgentsByitemId(itemId);
var agentTasks = agents.Select(a =>
{
var viewPostRequest = new
{
AgentId = a.AgentId,
itemCode = itemCode,
EnvironmentId = environmentTypeId
};
return client.PostAsJsonAsync("api/postView", viewPostRequest);
});
await Task.WhenAll(agentTasks);
}
}
HttpClient is able to make concurrent requests (see #usr link for more), thus I don't see a reason to create a new instance each time inside your lambda. Note that if you consume DeployViewAsync multiple times, perhaps you'll want to keep your HttpClient around instead of allocating one each time, and dispose it once you no longer need its services.
HttpClient appears to be usable for concurrent requests. I have not verified this myself, this is just what I gather from searching. Therefore, you don't have to create a new client for each task that you are starting. You can do what is most convenient to you.
In general I strive to share as little (mutable) state as possible. Resource acquisitions should generally be pushed inwards towards their usage. I think it's better style to create a helper CreateHttpClient and create a new client for each request here. Consider making the Select body a new async method. Then, the HttpClient usage is completely hidden from DeployView.
Don't forget to await the WhenAll task and make the method async Task. (If you do not understand why that is necessary you've got some research about await to do.)
I am trying to consume a service reference, making multiple requests at the same time using a task scheduler. The service includes an synchronous and an asynchronous function that returns a result set. I am a bit confused, and I have a couple of initial questions, and then I will share how far I got in each. I am using some logging, concurrency visualizer, and fiddler to investigate. Ultimately I want to use a reactive scheduler to make as many requests as possible.
1) Should I use the async function to make all the requests?
2) If I were to use the synchronous function in multiple tasks what would be the limited resources that would potentially starve my thread count?
Here is what I have so far:
var myScheduler = new myScheduler();
var myFactory = new Factory(myScheduler);
var myClientProxy = new ClientProxy();
var tasks = new List<Task<Response>>();
foreach( var request in Requests )
{
var localrequest = request;
tasks.Add( myFactory.StartNew( () =>
{
// log stuff
return client.GetResponsesAsync( localTransaction.Request );
// log some more stuff
}).Unwrap() );
}
Task.WaitAll( tasks.ToArray() );
// process all the requests after they are done
This runs but according to fiddler it just tries to do all of the requests at once. It could be the scheduler but I trust that more then I do the above.
I have also tried to implement it without the unwrap command and instead using an async await delegate and it does the same thing. I have also tried referencing the .result and that seems to do it sequentially. Using the non synchronous service function call with the scheduler/factory it only gets up to about 20 simultaneous requests at the same time per client.
Yes. It will allow your application to scale better by using fewer threads to accomplish more.
Threads. When you initiate a synchronous operation that is inherently asynchronous (e.g. I/O) you have a blocked thread waiting for the operation to complete. You could however be using this thread in the meantime to execute CPU bound operations.
The simplest way to limit the amount of concurrent requests is to use a SemaphoreSlim which allows to asynchronously wait to enter it:
async Task ConsumeService()
{
var client = new ClientProxy();
var semaphore = new SemaphoreSlim(100);
var tasks = Requests.Select(async request =>
{
await semaphore.WaitAsync();
try
{
return await client.GetResponsesAsync(request);
}
finally
{
semaphore.Release();
}
}).ToList();
await Task.WhenAll(tasks);
// TODO: Process responses...
}
Regardless of how you are calling the WCF service whether it is an Async call or a Synchronous one you will be bound by the WCF serviceThrottling limits. You should look at these settings and possible adjust them higher (if you have them set to low values for some reason), in .NET4 the defaults are pretty good, however In older versions of the .NET framework, these defaults were much more conservative than .NET4.
.NET 4.0
MaxConcurrentSessions: default is 100 * ProcessorCount
MaxConcurrentCalls: default is 16 * ProcessorCount
MaxConcurrentInstances: default is MaxConcurrentCalls+MaxConcurrentSessions
1.)Yes.
2.)Yes.
If you want to control the number of simultaneous requests you can try using Stephen Toub's ForEachAsync method. it allows you to control how many tasks are processed at the same time.
public static class Extensions
{
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
}
void Main()
{
var myClientProxy = new ClientProxy();
var responses = new List<Response>();
// Max 10 concurrent requests
Requests.ForEachAsync<Request>(10, async (r) =>
{
var response = await client.GetResponsesAsync( localTransaction.Request );
responses.Add(response);
}).Wait();
}