Limit number of parallel running methods - c#

In my class I have a download function. Now in order to not allow a too high number of concurrent downloads, I would like to block this function until a "download-spot" is free ;)
void Download(Uri uri)
{
currentDownloads++;
if (currentDownloads > MAX_DOWNLOADS)
{
//wait here
}
DoActualDownload(uri); // blocks long time
currentDownloads--;
}
Is there a ready made programming pattern for this in C# / .NET?
edit: unfortunatelyt i cant use features from .net4.5 but only .net4.0

May be this
var parallelOptions = new ParallelOptions
{
MaxDegreeOfParallelism = 3
};
Parallel.ForEach(downloadUri, parallelOptions, (uri, state, index) =>
{
YourDownLoad(uri);
});

You should use Semaphore for this concurrency problem, see more in the documentation:
https://msdn.microsoft.com/en-us/library/system.threading.semaphore(v=vs.110).aspx

For such cases, I create an own Queue<MyQuery> in a custom class like QueryManager, with some methods :
Each new query is enqueued in Queue<MyQuery> queries
After each "enqueue" AND in each query answer, I call checkIfQueryCouldBeSent()
The checkIfQueryCouldBeSent() method checks your conditions : number of concomitant queries, and so on. In your case you accept to launch a new query if global counter is less than 5. And you increment the counter
Decrement the counter in query answer
It works only if all your queries are asynchronous.
You have to store Callback in MyQuery class, and call it when query is over.

You're doing async IO bound work, there's no need to be using multiple threads with a call such as Parallel.ForEach.
You can simply use naturally async API's exposed in the BCL, such ones that make HTTP calls using HttpClient. Then, you can throttle your connections using SemaphoreSlim and it's WaitAsync method which asynchronously waits:
private readonly SemaphoreSlim semaphoreSlim = new SemaphoreSlim(3);
public async Task DownloadAsync(Uri uri)
{
await semaphoreSlim.WaitAsync();
try
{
string result = await DoActualDownloadAsync(uri);
}
finally
{
semaphoreSlim.Release();
}
}
And your DoActualyDownloadAsync will use HttpClient to do it's work. Something along the lines of:
public Task<string> DoActualDownloadAsync(Uri uri)
{
var httpClient = new HttpClient();
return httpClient.GetStringAsync(uri);
}

Related

Parallel HttpClient requests timing out due to async problem?

I'm running a method synchronously in parallel using System.Threading.Tasks.Parallel.ForEach. At the end of the method, it needs to make a few dozen HTTP POST requests, which do not depend on each other. Since I'm on .NET Framework 4.6.2, System.Net.Http.HttpClient is exclusively asynchronous, so I'm using Nito.AsyncEx.AsyncContext to avoid deadlocks, in the form:
public static void MakeMultipleRequests(IEnumerable<MyClass> enumerable)
{
AsyncContext.Run(async () => await Task.WhenAll(enumerable.Select(async c =>
await getResultsFor(c).ConfigureAwait(false))));
}
The getResultsFor(MyClass c) method then creates an HttpRequestMessage and sends it using:
await httpClient.SendAsync(request);
The response is then parsed and the relevant fields are set on the instance of MyClass.
My understanding is that the synchronous thread will block at AsyncContext.Run(...), while a number of tasks are performed asynchronously by the single AsyncContextThread owned by AsyncContext. When they are all complete, the synchronous thread will unblock.
This works fine for a few hundred requests, but when it scales up to a few thousand over five minutes, some of the requests start returning HTTP 408 Request Timeout errors from the server. My logs indicate that these timeouts are happening at the peak load, when there are the most requests being sent, and the timeouts happen long after many of the other requests have been received back.
I think the problem is that the tasks are awaiting the server handshake inside HttpClient, but they are not continued in FIFO order, so by the time they are continued the handshake has expired. However, I can't think of any way to deal with this, short of using a System.Threading.SemaphoreSlim to enforce that only one task can await httpClient.SendAsync(...) at a time.
My application is very large, and converting it entirely to async is not viable.
This isn't something that can be done with wrapping the tasks before blocking. For starters, if the requests go through, you may end up nuking the server. Right now you're nuking the client. There's a 2 concurrent-request per domain limit in .NET Framework that can be relaxed, but if you set it too high you may end up nuking the server.
You can solve this by using DataFlow blocks in a pipeline to execute requests with a fixed degree of parallelism and then parse them. Let's say you have a class called MyPayload with lots of Items in a property:
ServicePointManager.DefaultConnectionLimit = 1000;
var options=new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
var downloader=new TransformBlock<string,MyPayload>(async url=>{
var json=await _client.GetStringAsync(url);
var data=JsonConvert.DeserializeObject<MyPayload>(json);
return data;
},options);
var importer=new ActionBlock<MyPayload>(async data=>
{
var items=data.Items;
using(var connection=new SqlConnection(connectionString))
using(var bcp=new SqlBulkCopy(connection))
using(var reader=ObjectReader.Create(items))
{
bcp.DestinationTableName = destination;
connection.Open();
await bcp.WriteToServerAsync(reader);
}
});
downloader.LinkTo(importer,new DataflowLinkOptions {
PropagateCompletion=true
});
I'm using FastMember's ObjectReader to wrap the items in a DbDataReader that can be used to bulk insert the records to a database.
Once you have this pipeline, you can start posting URLs to the head block, downloader :
foreach(var url in hugeList)
{
downloader.Post(url);
}
downloader.Complete();
Once all URLs are posted, you tell donwloader to complete and await for the last block in the pipeline to finish with :
await importer.Completion;
Firstly, Nito.AsyncEx.AsyncContext will execute on a threadpool thread; to avoid deadlocks in the way described requires an instance of Nito.AsyncEx.AsyncContextThread, as outlined in the documentation.
There are two possible causes:
a bug in System.Net.Http.HttpClient in .NET Framework 4.6.2
the continuation priority issue outlined in the question, in which individual requests are not continued promptly enough and so time out.
As described at this answer and its comments, from a similar question, it may be possible to deal with the priority problem using a custom TaskScheduler, but throttling the number of concurrent requests using a semaphore is probably the best answer:
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using Nito.AsyncEx;
public class MyClass
{
private static readonly AsyncContextThread asyncContextThread
= new AsyncContextThread();
private static readonly HttpClient httpClient = new HttpClient();
private static readonly SemaphoreSlim semaphore = new SemaphoreSlim(10);
public HttpRequestMessage Request { get; set; }
public HttpResponseMessage Response { get; private set; }
private async Task GetResponseAsync()
{
await semaphore.WaitAsync();
try
{
Response = await httpClient.SendAsync(Request);
}
finally
{
semaphore.Release();
}
}
public static void MakeMultipleRequests(IEnumerable<MyClass> enumerable)
{
Task.WaitAll(enumerable.Select(c =>
asyncContextThread.Factory.Run(() =>
c.GetResponseAsync())).ToArray());
}
}
Edited to use AsyncContextThread for executing async code on non-threadpool thread, as intended. AsyncContext does not do this on its own.

Multiple HttpClients with proxies, trying to achieve maximum download speed

I need to use proxies to download a forum. The problem with my code is that it takes only 10% of my internet bandwidth. Also I have read that I need to use a single HttpClient instance, but with multiple proxies I don't know how to do it. Changing MaxDegreeOfParallelism doesn't change anything.
public static IAsyncEnumerable<IFetchResult> FetchInParallelAsync(
this IEnumerable<Url> urls, FetchContext context)
{
var fetchBlcock = new TransformBlock<Url, IFetchResult>(
transform: url => url.FetchAsync(context),
dataflowBlockOptions: new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 128
}
);
foreach(var url in urls)
fetchBlcock.Post(url);
fetchBlcock.Complete();
var result = fetchBlcock.ToAsyncEnumerable();
return result;
}
Every call to FetchAsync will create or reuse a HttpClient with a WebProxy.
public static async Task<IFetchResult> FetchAsync(this Url url, FetchContext context)
{
var httpClient = context.ProxyPool.Rent();
var result = await url.FetchAsync(httpClient, context.Observer, context.Delay,
context.isReloadWithCookie);
context.ProxyPool.Return(httpClient);
return result;
}
public HttpClient Rent()
{
lock(_lockObject)
{
if (_uninitiliazedDatacenterProxiesAddresses.Count != 0)
{
var proxyAddress = _uninitiliazedDatacenterProxiesAddresses.Pop();
return proxyAddress.GetWebProxy(DataCenterProxiesCredentials).GetHttpClient();
}
return _proxiesQueue.Dequeue();
}
}
I am a novice at software developing, but the task of downloading using hundreds or thousands of proxies asynchronously looks like a trivial task that many should have been faced with and found a correct way to do it. So far I was unable to find any solutions to my problem on the internet. Any thoughts of how to achieve maximum download speed?
Let's take a look at what happens here:
var result = await url.FetchAsync(httpClient, context.Observer, context.Delay, context.isReloadWithCookie);
You are actually awaiting before you continue with the next item. That's why it is asynchronous and not parallel programming. async in Microsoft docs
The await keyword is where the magic happens. It yields control to the caller of the method that performed await, and it ultimately allows a UI to be responsive or a service to be elastic.
In essence, it frees the calling thread to do other stuff but the original calling code is suspended from executing, until the IO operation is done.
Now to your problem:
You can either use this excellent solution here: foreach async
You can use the Parallel library to execute your code in different threads.
Something like the following from Parallel for example
Parallel.For(0, urls.Count,
index => fetchBlcock.Post(urls[index])
});

How to correctly queue up tasks to run in C#

I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}

Running Task<T> on a custom scheduler

I am creating a generic helper class that will help prioritise requests made to an API whilst restricting parallelisation at which they occur.
Consider the key method of the application below;
public IQueuedTaskHandle<TResponse> InvokeRequest<TResponse>(Func<TClient, Task<TResponse>> invocation, QueuedClientPriority priority, CancellationToken ct) where TResponse : IServiceResponse
{
var cts = CancellationTokenSource.CreateLinkedTokenSource(ct);
_logger.Debug("Queueing task.");
var taskToQueue = Task.Factory.StartNew(async () =>
{
_logger.Debug("Starting request {0}", Task.CurrentId);
return await invocation(_client);
}, cts.Token, TaskCreationOptions.None, _schedulers[priority]).Unwrap();
taskToQueue.ContinueWith(task => _logger.Debug("Finished task {0}", task.Id), cts.Token);
return new EcosystemQueuedTaskHandle<TResponse>(cts, priority, taskToQueue);
}
Without going into too many details, I want to invoke tasks returned by Task<TResponse>> invocation when their turn in the queue arises. I am using a collection of queues constructed using QueuedTaskScheduler indexed by a unique enumeration;
_queuedTaskScheduler = new QueuedTaskScheduler(TaskScheduler.Default, 3);
_schedulers = new Dictionary<QueuedClientPriority, TaskScheduler>();
//Enumerate the priorities
foreach (var priority in Enum.GetValues(typeof(QueuedClientPriority)))
{
_schedulers.Add((QueuedClientPriority)priority, _queuedTaskScheduler.ActivateNewQueue((int)priority));
}
However, with little success I can't get the tasks to execute in a limited parallelised environment, leading to 100 API requests being constructed, fired, and completed in one big batch. I can tell this using a Fiddler session;
I have read some interesting articles and SO posts (here, here and here) that I thought would detail how to go about this, but so far I have not been able to figure it out. From what I understand, the async nature of the lambda is working in a continuation structure as designed, which is marking the generated task as complete, basically "insta-completing" it. This means that whilst the queues are working fine, runing a generated Task<T> on a custom scheduler is turning out to be the problem.
This means that whilst the queues are working fine, runing a generated Task on a custom scheduler is turning out to be the problem.
Correct. One way to think about it[1] is that an async method is split into several tasks - it's broken up at each await point. Each one of these "sub-tasks" are then run on the task scheduler. So, the async method will run entirely on the task scheduler (assuming you don't use ConfigureAwait(false)), but at each await it will leave the task scheduler, and then re-enter that task scheduler after the await completes.
So, if you want to coordinate asynchronous work at a higher level, you need to take a different approach. It's possible to write the code yourself for this, but it can get messy. I recommend you first try ActionBlock<T> from the TPL Dataflow library, passing your custom task scheduler to its ExecutionDataflowBlockOptions.
[1] This is a simplification. The state machine will avoid creating actual task objects unless necessary (in this case, they are necessary because they're being scheduled to a task scheduler). Also, only await points where the awaitable isn't complete actually cause a "method split".
Stephen Cleary's answer explains well why you can't use TaskScheduler for this purpose and how you can use ActionBlock to limit the degree of parallelism. But if you want to add priorities to that, I think you'll have to do that manually. Your approach of using a Dictionary of queues is reasonable, a simple implementation (with no support for cancellation or completion) of that could look something like this:
class Scheduler
{
private static readonly Priority[] Priorities =
(Priority[])Enum.GetValues(typeof(Priority));
private readonly IReadOnlyDictionary<Priority, ConcurrentQueue<Func<Task>>> queues;
private readonly ActionBlock<Func<Task>> executor;
private readonly SemaphoreSlim semaphore;
public Scheduler(int degreeOfParallelism)
{
queues = Priorities.ToDictionary(
priority => priority, _ => new ConcurrentQueue<Func<Task>>());
executor = new ActionBlock<Func<Task>>(
invocation => invocation(),
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = degreeOfParallelism,
BoundedCapacity = degreeOfParallelism
});
semaphore = new SemaphoreSlim(0);
Task.Run(Watch);
}
private async Task Watch()
{
while (true)
{
await semaphore.WaitAsync();
// find item with highest priority and send it for execution
foreach (var priority in Priorities.Reverse())
{
Func<Task> invocation;
if (queues[priority].TryDequeue(out invocation))
{
await executor.SendAsync(invocation);
}
}
}
}
public void Invoke(Func<Task> invocation, Priority priority)
{
queues[priority].Enqueue(invocation);
semaphore.Release(1);
}
}

Task Scheduler with WCF Service Reference async function

I am trying to consume a service reference, making multiple requests at the same time using a task scheduler. The service includes an synchronous and an asynchronous function that returns a result set. I am a bit confused, and I have a couple of initial questions, and then I will share how far I got in each. I am using some logging, concurrency visualizer, and fiddler to investigate. Ultimately I want to use a reactive scheduler to make as many requests as possible.
1) Should I use the async function to make all the requests?
2) If I were to use the synchronous function in multiple tasks what would be the limited resources that would potentially starve my thread count?
Here is what I have so far:
var myScheduler = new myScheduler();
var myFactory = new Factory(myScheduler);
var myClientProxy = new ClientProxy();
var tasks = new List<Task<Response>>();
foreach( var request in Requests )
{
var localrequest = request;
tasks.Add( myFactory.StartNew( () =>
{
// log stuff
return client.GetResponsesAsync( localTransaction.Request );
// log some more stuff
}).Unwrap() );
}
Task.WaitAll( tasks.ToArray() );
// process all the requests after they are done
This runs but according to fiddler it just tries to do all of the requests at once. It could be the scheduler but I trust that more then I do the above.
I have also tried to implement it without the unwrap command and instead using an async await delegate and it does the same thing. I have also tried referencing the .result and that seems to do it sequentially. Using the non synchronous service function call with the scheduler/factory it only gets up to about 20 simultaneous requests at the same time per client.
Yes. It will allow your application to scale better by using fewer threads to accomplish more.
Threads. When you initiate a synchronous operation that is inherently asynchronous (e.g. I/O) you have a blocked thread waiting for the operation to complete. You could however be using this thread in the meantime to execute CPU bound operations.
The simplest way to limit the amount of concurrent requests is to use a SemaphoreSlim which allows to asynchronously wait to enter it:
async Task ConsumeService()
{
var client = new ClientProxy();
var semaphore = new SemaphoreSlim(100);
var tasks = Requests.Select(async request =>
{
await semaphore.WaitAsync();
try
{
return await client.GetResponsesAsync(request);
}
finally
{
semaphore.Release();
}
}).ToList();
await Task.WhenAll(tasks);
// TODO: Process responses...
}
Regardless of how you are calling the WCF service whether it is an Async call or a Synchronous one you will be bound by the WCF serviceThrottling limits. You should look at these settings and possible adjust them higher (if you have them set to low values for some reason), in .NET4 the defaults are pretty good, however In older versions of the .NET framework, these defaults were much more conservative than .NET4.
.NET 4.0
MaxConcurrentSessions: default is 100 * ProcessorCount
MaxConcurrentCalls: default is 16 * ProcessorCount
MaxConcurrentInstances: default is MaxConcurrentCalls+MaxConcurrentSessions
1.)Yes.
2.)Yes.
If you want to control the number of simultaneous requests you can try using Stephen Toub's ForEachAsync method. it allows you to control how many tasks are processed at the same time.
public static class Extensions
{
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
}
void Main()
{
var myClientProxy = new ClientProxy();
var responses = new List<Response>();
// Max 10 concurrent requests
Requests.ForEachAsync<Request>(10, async (r) =>
{
var response = await client.GetResponsesAsync( localTransaction.Request );
responses.Add(response);
}).Wait();
}

Categories

Resources