I'm running a method synchronously in parallel using System.Threading.Tasks.Parallel.ForEach. At the end of the method, it needs to make a few dozen HTTP POST requests, which do not depend on each other. Since I'm on .NET Framework 4.6.2, System.Net.Http.HttpClient is exclusively asynchronous, so I'm using Nito.AsyncEx.AsyncContext to avoid deadlocks, in the form:
public static void MakeMultipleRequests(IEnumerable<MyClass> enumerable)
{
AsyncContext.Run(async () => await Task.WhenAll(enumerable.Select(async c =>
await getResultsFor(c).ConfigureAwait(false))));
}
The getResultsFor(MyClass c) method then creates an HttpRequestMessage and sends it using:
await httpClient.SendAsync(request);
The response is then parsed and the relevant fields are set on the instance of MyClass.
My understanding is that the synchronous thread will block at AsyncContext.Run(...), while a number of tasks are performed asynchronously by the single AsyncContextThread owned by AsyncContext. When they are all complete, the synchronous thread will unblock.
This works fine for a few hundred requests, but when it scales up to a few thousand over five minutes, some of the requests start returning HTTP 408 Request Timeout errors from the server. My logs indicate that these timeouts are happening at the peak load, when there are the most requests being sent, and the timeouts happen long after many of the other requests have been received back.
I think the problem is that the tasks are awaiting the server handshake inside HttpClient, but they are not continued in FIFO order, so by the time they are continued the handshake has expired. However, I can't think of any way to deal with this, short of using a System.Threading.SemaphoreSlim to enforce that only one task can await httpClient.SendAsync(...) at a time.
My application is very large, and converting it entirely to async is not viable.
This isn't something that can be done with wrapping the tasks before blocking. For starters, if the requests go through, you may end up nuking the server. Right now you're nuking the client. There's a 2 concurrent-request per domain limit in .NET Framework that can be relaxed, but if you set it too high you may end up nuking the server.
You can solve this by using DataFlow blocks in a pipeline to execute requests with a fixed degree of parallelism and then parse them. Let's say you have a class called MyPayload with lots of Items in a property:
ServicePointManager.DefaultConnectionLimit = 1000;
var options=new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
var downloader=new TransformBlock<string,MyPayload>(async url=>{
var json=await _client.GetStringAsync(url);
var data=JsonConvert.DeserializeObject<MyPayload>(json);
return data;
},options);
var importer=new ActionBlock<MyPayload>(async data=>
{
var items=data.Items;
using(var connection=new SqlConnection(connectionString))
using(var bcp=new SqlBulkCopy(connection))
using(var reader=ObjectReader.Create(items))
{
bcp.DestinationTableName = destination;
connection.Open();
await bcp.WriteToServerAsync(reader);
}
});
downloader.LinkTo(importer,new DataflowLinkOptions {
PropagateCompletion=true
});
I'm using FastMember's ObjectReader to wrap the items in a DbDataReader that can be used to bulk insert the records to a database.
Once you have this pipeline, you can start posting URLs to the head block, downloader :
foreach(var url in hugeList)
{
downloader.Post(url);
}
downloader.Complete();
Once all URLs are posted, you tell donwloader to complete and await for the last block in the pipeline to finish with :
await importer.Completion;
Firstly, Nito.AsyncEx.AsyncContext will execute on a threadpool thread; to avoid deadlocks in the way described requires an instance of Nito.AsyncEx.AsyncContextThread, as outlined in the documentation.
There are two possible causes:
a bug in System.Net.Http.HttpClient in .NET Framework 4.6.2
the continuation priority issue outlined in the question, in which individual requests are not continued promptly enough and so time out.
As described at this answer and its comments, from a similar question, it may be possible to deal with the priority problem using a custom TaskScheduler, but throttling the number of concurrent requests using a semaphore is probably the best answer:
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using Nito.AsyncEx;
public class MyClass
{
private static readonly AsyncContextThread asyncContextThread
= new AsyncContextThread();
private static readonly HttpClient httpClient = new HttpClient();
private static readonly SemaphoreSlim semaphore = new SemaphoreSlim(10);
public HttpRequestMessage Request { get; set; }
public HttpResponseMessage Response { get; private set; }
private async Task GetResponseAsync()
{
await semaphore.WaitAsync();
try
{
Response = await httpClient.SendAsync(Request);
}
finally
{
semaphore.Release();
}
}
public static void MakeMultipleRequests(IEnumerable<MyClass> enumerable)
{
Task.WaitAll(enumerable.Select(c =>
asyncContextThread.Factory.Run(() =>
c.GetResponseAsync())).ToArray());
}
}
Edited to use AsyncContextThread for executing async code on non-threadpool thread, as intended. AsyncContext does not do this on its own.
Related
The following situation is given:
A new job is sent to an API via Post Request. This API returns a JobID and the HTTP ResponseCode 202.
This JobID is then used to request a status endpoint. If the end point has a "Finished" property set in the response body, you can continue with step 3.
The results are queried via a result endpoint using the JobID and can be processed.
My question is how I can solve this elegantly and cleanly. Are there perhaps already ready-to-use libraries that implement exactly this functionality? I could not find such functionality for RestSharp or another HttpClient.
The current solution looks like this:
async Task<string> PostNewJob()
{
var restClient = new RestClient("https://baseUrl/");
var restRequest = new RestRequest("jobs");
//add headers
var response = await restClient.ExecutePostTaskAsync(restRequest);
string jobId = JsonConvert.DeserializeObject<string>(response.Content);
return jobId;
}
async Task WaitTillJobIsReady(string jobId)
{
string jobStatus = string.Empty;
var request= new RestRequest(jobId) { Method = Method.GET };
do
{
if (!String.IsNullOrEmpty(jobStatus))
Thread.Sleep(5000); //wait for next status update
var response = await restClient.ExecuteGetTaskAsync(request, CancellationToken.None);
jobStatus = JsonConvert.DeserializeObject<string>(response.Content);
} while (jobStatus != "finished");
}
async Task<List<dynamic>> GetJobResponse(string jobID)
{
var restClient = new RestClient(#"Url/bulk/" + jobID);
var restRequest = new RestRequest(){Method = Method.GET};
var response = await restClient.ExecuteGetTaskAsync(restRequest, CancellationToken.None);
dynamic downloadResponse = JsonConvert.DeserializeObject(response.Content);
var responseResult = new List<dynamic>() { downloadResponse?.ToList() };
return responseResult;
}
async main()
{
var jobId = await PostNewJob();
WaitTillJobIsReady(jobID).Wait();
var responseResult = await GetJobResponse(jobID);
//handle result
}
As #Paulo Morgado said, I should not use Thread.Sleep / Task Delay in production code. But in my opinion I have to use it in the method WaitTillJobIsReady() ? Otherwise I would overwhelm the API with Get Requests in the loop?
What is the best practice for this type of problem?
Long Polling
There are multiple ways you can handle this type of problem, but as others have already pointed out no library such as RestSharp currently has this built in. In my opinion, the preferred way of overcoming this would be to modify the API to support some type of long-polling like Nikita suggested. This is where:
The server holds the request open until new data is available. Once
available, the server responds and sends the new information. When the
client receives the new information, it immediately sends another
request, and the operation is repeated. This effectively emulates a
server push feature.
Using a scheduler
Unfortunately this isn't always possible. Another more elegant solution would be to create a service that checks the status, and then using a scheduler such as Quartz.NET or HangFire to schedule the service at reoccurring intervals such as 500ms to 3s until it is successful. Once it gets back the "Finished" property you can then mark the task as complete to stop the process from continuing to poll. This would arguably be better than your current solution and offer much more control and feedback over whats going on.
Using Timers
Aside from using Thread.Sleep a better choice would be to use a Timer. This would allow you to continuously call a delegate at specified intervals, which seems to be what you are wanting to do here.
Below is an example usage of a timer that will run every 2 seconds until it hits 10 runs. (Taken from the Microsoft documentation)
using System;
using System.Threading;
using System.Threading.Tasks;
class Program
{
private static Timer timer;
static void Main(string[] args)
{
var timerState = new TimerState { Counter = 0 };
timer = new Timer(
callback: new TimerCallback(TimerTask),
state: timerState,
dueTime: 1000,
period: 2000);
while (timerState.Counter <= 10)
{
Task.Delay(1000).Wait();
}
timer.Dispose();
Console.WriteLine($"{DateTime.Now:HH:mm:ss.fff}: done.");
}
private static void TimerTask(object timerState)
{
Console.WriteLine($"{DateTime.Now:HH:mm:ss.fff}: starting a new callback.");
var state = timerState as TimerState;
Interlocked.Increment(ref state.Counter);
}
class TimerState
{
public int Counter;
}
}
Why you don't want to use Thread.Sleep
The reason that you don't want to use Thread.Sleep for operations that you want on a reoccurring schedule is because Thread.Sleep actually relinquishes control and ultimately when it regains control is not up to the thread. It's simply saying it wants to relinquish control of it's remaining time for a least x milliseconds, but in reality it could take much longer for it to regain it.
Per the Microsoft documentation:
The system clock ticks at a specific rate called the clock resolution.
The actual timeout might not be exactly the specified timeout, because
the specified timeout will be adjusted to coincide with clock ticks.
For more information on clock resolution and the waiting time, see the
Sleep function from the Windows system APIs.
Peter Ritchie actually wrote an entire blog post on why you shouldn't use Thread.Sleep.
EndNote
Overall I would say your current approach has the appropriate idea on how this should be handled however, you may want to 'future proof' it by doing some refactoring to utilize on of the methods mentioned above.
I need to use proxies to download a forum. The problem with my code is that it takes only 10% of my internet bandwidth. Also I have read that I need to use a single HttpClient instance, but with multiple proxies I don't know how to do it. Changing MaxDegreeOfParallelism doesn't change anything.
public static IAsyncEnumerable<IFetchResult> FetchInParallelAsync(
this IEnumerable<Url> urls, FetchContext context)
{
var fetchBlcock = new TransformBlock<Url, IFetchResult>(
transform: url => url.FetchAsync(context),
dataflowBlockOptions: new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 128
}
);
foreach(var url in urls)
fetchBlcock.Post(url);
fetchBlcock.Complete();
var result = fetchBlcock.ToAsyncEnumerable();
return result;
}
Every call to FetchAsync will create or reuse a HttpClient with a WebProxy.
public static async Task<IFetchResult> FetchAsync(this Url url, FetchContext context)
{
var httpClient = context.ProxyPool.Rent();
var result = await url.FetchAsync(httpClient, context.Observer, context.Delay,
context.isReloadWithCookie);
context.ProxyPool.Return(httpClient);
return result;
}
public HttpClient Rent()
{
lock(_lockObject)
{
if (_uninitiliazedDatacenterProxiesAddresses.Count != 0)
{
var proxyAddress = _uninitiliazedDatacenterProxiesAddresses.Pop();
return proxyAddress.GetWebProxy(DataCenterProxiesCredentials).GetHttpClient();
}
return _proxiesQueue.Dequeue();
}
}
I am a novice at software developing, but the task of downloading using hundreds or thousands of proxies asynchronously looks like a trivial task that many should have been faced with and found a correct way to do it. So far I was unable to find any solutions to my problem on the internet. Any thoughts of how to achieve maximum download speed?
Let's take a look at what happens here:
var result = await url.FetchAsync(httpClient, context.Observer, context.Delay, context.isReloadWithCookie);
You are actually awaiting before you continue with the next item. That's why it is asynchronous and not parallel programming. async in Microsoft docs
The await keyword is where the magic happens. It yields control to the caller of the method that performed await, and it ultimately allows a UI to be responsive or a service to be elastic.
In essence, it frees the calling thread to do other stuff but the original calling code is suspended from executing, until the IO operation is done.
Now to your problem:
You can either use this excellent solution here: foreach async
You can use the Parallel library to execute your code in different threads.
Something like the following from Parallel for example
Parallel.For(0, urls.Count,
index => fetchBlcock.Post(urls[index])
});
I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}
In my class I have a download function. Now in order to not allow a too high number of concurrent downloads, I would like to block this function until a "download-spot" is free ;)
void Download(Uri uri)
{
currentDownloads++;
if (currentDownloads > MAX_DOWNLOADS)
{
//wait here
}
DoActualDownload(uri); // blocks long time
currentDownloads--;
}
Is there a ready made programming pattern for this in C# / .NET?
edit: unfortunatelyt i cant use features from .net4.5 but only .net4.0
May be this
var parallelOptions = new ParallelOptions
{
MaxDegreeOfParallelism = 3
};
Parallel.ForEach(downloadUri, parallelOptions, (uri, state, index) =>
{
YourDownLoad(uri);
});
You should use Semaphore for this concurrency problem, see more in the documentation:
https://msdn.microsoft.com/en-us/library/system.threading.semaphore(v=vs.110).aspx
For such cases, I create an own Queue<MyQuery> in a custom class like QueryManager, with some methods :
Each new query is enqueued in Queue<MyQuery> queries
After each "enqueue" AND in each query answer, I call checkIfQueryCouldBeSent()
The checkIfQueryCouldBeSent() method checks your conditions : number of concomitant queries, and so on. In your case you accept to launch a new query if global counter is less than 5. And you increment the counter
Decrement the counter in query answer
It works only if all your queries are asynchronous.
You have to store Callback in MyQuery class, and call it when query is over.
You're doing async IO bound work, there's no need to be using multiple threads with a call such as Parallel.ForEach.
You can simply use naturally async API's exposed in the BCL, such ones that make HTTP calls using HttpClient. Then, you can throttle your connections using SemaphoreSlim and it's WaitAsync method which asynchronously waits:
private readonly SemaphoreSlim semaphoreSlim = new SemaphoreSlim(3);
public async Task DownloadAsync(Uri uri)
{
await semaphoreSlim.WaitAsync();
try
{
string result = await DoActualDownloadAsync(uri);
}
finally
{
semaphoreSlim.Release();
}
}
And your DoActualyDownloadAsync will use HttpClient to do it's work. Something along the lines of:
public Task<string> DoActualDownloadAsync(Uri uri)
{
var httpClient = new HttpClient();
return httpClient.GetStringAsync(uri);
}
I am trying to consume a service reference, making multiple requests at the same time using a task scheduler. The service includes an synchronous and an asynchronous function that returns a result set. I am a bit confused, and I have a couple of initial questions, and then I will share how far I got in each. I am using some logging, concurrency visualizer, and fiddler to investigate. Ultimately I want to use a reactive scheduler to make as many requests as possible.
1) Should I use the async function to make all the requests?
2) If I were to use the synchronous function in multiple tasks what would be the limited resources that would potentially starve my thread count?
Here is what I have so far:
var myScheduler = new myScheduler();
var myFactory = new Factory(myScheduler);
var myClientProxy = new ClientProxy();
var tasks = new List<Task<Response>>();
foreach( var request in Requests )
{
var localrequest = request;
tasks.Add( myFactory.StartNew( () =>
{
// log stuff
return client.GetResponsesAsync( localTransaction.Request );
// log some more stuff
}).Unwrap() );
}
Task.WaitAll( tasks.ToArray() );
// process all the requests after they are done
This runs but according to fiddler it just tries to do all of the requests at once. It could be the scheduler but I trust that more then I do the above.
I have also tried to implement it without the unwrap command and instead using an async await delegate and it does the same thing. I have also tried referencing the .result and that seems to do it sequentially. Using the non synchronous service function call with the scheduler/factory it only gets up to about 20 simultaneous requests at the same time per client.
Yes. It will allow your application to scale better by using fewer threads to accomplish more.
Threads. When you initiate a synchronous operation that is inherently asynchronous (e.g. I/O) you have a blocked thread waiting for the operation to complete. You could however be using this thread in the meantime to execute CPU bound operations.
The simplest way to limit the amount of concurrent requests is to use a SemaphoreSlim which allows to asynchronously wait to enter it:
async Task ConsumeService()
{
var client = new ClientProxy();
var semaphore = new SemaphoreSlim(100);
var tasks = Requests.Select(async request =>
{
await semaphore.WaitAsync();
try
{
return await client.GetResponsesAsync(request);
}
finally
{
semaphore.Release();
}
}).ToList();
await Task.WhenAll(tasks);
// TODO: Process responses...
}
Regardless of how you are calling the WCF service whether it is an Async call or a Synchronous one you will be bound by the WCF serviceThrottling limits. You should look at these settings and possible adjust them higher (if you have them set to low values for some reason), in .NET4 the defaults are pretty good, however In older versions of the .NET framework, these defaults were much more conservative than .NET4.
.NET 4.0
MaxConcurrentSessions: default is 100 * ProcessorCount
MaxConcurrentCalls: default is 16 * ProcessorCount
MaxConcurrentInstances: default is MaxConcurrentCalls+MaxConcurrentSessions
1.)Yes.
2.)Yes.
If you want to control the number of simultaneous requests you can try using Stephen Toub's ForEachAsync method. it allows you to control how many tasks are processed at the same time.
public static class Extensions
{
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
}
void Main()
{
var myClientProxy = new ClientProxy();
var responses = new List<Response>();
// Max 10 concurrent requests
Requests.ForEachAsync<Request>(10, async (r) =>
{
var response = await client.GetResponsesAsync( localTransaction.Request );
responses.Add(response);
}).Wait();
}