I have a List<string> containing 50K to 100K words
I would like to iterate through it in a parallel and asynchronous fashion
As an example, I could use
while (true)
{
Parallel.ForEach(words, new ParallelOptions { MaxDegreeOfParallelism = 100 }, ...)
}
But problems are:
Parallel.ForEach isn't asynchronous
When we arrive at the end of the list, we have to wait for each thread to end before the while (true) statement continues
Which means that there aren't always 100 threads running, which is what I want
How would I be able to achieve this?
Please let me know if this is confusing, or if I'm bad at explaining things.
Here is a totally contrived async friendly TPL DataFlow example of how you could achieve what you ask.
It's suitable for an async IO workload
It's cancellable
It limits max parallelism
It has bounded capacity so there is always 100 jobs available
It's infinite
Given
private static CancellationTokenSource _cs;
private static CancellationToken _token;
private static ActionBlock<string> _block;
private static async Task MethodAsync(string something)
{
// Your async workload
}
public static async Task EndlessRunner(string[] someArray)
{
try
{
var index = 0;
while (!_token.IsCancellationRequested)
{
await _block.SendAsync(someArray[index],_token);
if (++index >= someArray.Length) index = 0;
}
}
catch (OperationCanceledException)
{
Console.WriteLine("Cancelled");
}
}
Example
private static async Task Main()
{
_cs = new CancellationTokenSource();
_token = _cs.Token;
_block = new ActionBlock<string>(
MethodAsync,
new ExecutionDataflowBlockOptions()
{
EnsureOrdered = false,
MaxDegreeOfParallelism = 100,
BoundedCapacity = 100,
CancellationToken = _cs.Token,
SingleProducerConstrained = true
});
var someList = Enumerable
.Range(0,5000)
.Select(I => $"something {I}")
.ToArray();
Task.Run(() => EndlessRunner(someList));
Console.ReadKey();
_cs.Cancel();
_block.Complete();
await _block.Completion;
}
Related
I'm using Polly to make parallel API calls. The server however can't process more than 25 calls per second and so I'm wondering if there is a way to add a 1s delay after each batch of 25 calls?
var policy = Policy
.Handle<HttpRequestException>()
.RetryAsync(3);
foreach (var mediaItem in uploadedMedia)
{
var mediaRequest = new HttpRequestMessage { *** }
async Task<string> func()
{
var response = await client.SendAsync(mediaRequest);
return await response.Content.ReadAsStringAsync();
}
tasks.Add(policy.ExecuteAsync(() => func()));
}
await Task.WhenAll(tasks);
I added a count as per the suggestion below but doesn't seem to work
foreach (var mediaItem in uploadedMedia.Items)
{
var mediaRequest = new HttpRequestMessage
{
RequestUri = new Uri($"https://u48ydao1w4.execute-api.ap-southeast-2.amazonaws.com/test/downloads/thumbnails/{mediaItem.filename.S}"),
Method = HttpMethod.Get,
Headers = {
{ "id-token", id_Token },
{ "access-token", access_Token }
}
};
async Task<string> func()
{
if (count == 24)
{
Thread.Sleep(1000);
count = 0;
}
var response = await client.SendAsync(mediaRequest);
count++;
return await response.Content.ReadAsStringAsync();
}
tasks.Add(policy.ExecuteAsync(() => func()));
}
await Task.WhenAll(tasks);
foreach (var t in tasks)
{
var postResponse = await t;
urls.Add(postResponse);
}
Just scanned quickly over the code and perhaps another similar solution would be to add a Thread.Sleep(calculatedDelay):
foreach (var mediaItem in uploadedMedia.Items)
{
Thread.Sleep(calculatedDelay);
var mediaRequest = new HttpRequestMessage
Where calculatedDelay is some value based on 1000/25.
However I feel you would need a better solution than putting in a delay of some specified value as you cannot be sure of overhead delays issues in transferring data. Also you dont indicate what happens when you reach the 25+ limit, how does the server respond.. do you get an error or is it handled more elegantly? Here is perhaps the area where you can find a more reliable solution?
There are many ways to do this, however it's fairly easy to write a simple thread safe reusable async rate limiter.
The advantages with the async approach, it doesn't block thread pool threads, it's fairly efficient, and would work well in existing async workflows and pipelines like TPL Dataflow, and Reactive Extensions.
Example
// 3 calls every 3 seconds as an example
var rateLimiter = new RateLimiter(3, TimeSpan.FromSeconds(3));
// create some work
var task1 = Task.Run(async () =>
{
for (var i = 0; i < 5; i++)
{
await rateLimiter.WaitAsync();
Console.WriteLine($"{Thread.CurrentThread.ManagedThreadId} : {DateTime.Now}");
}
}
);
var task2 = Task.Run(async () =>
{
for (var i = 0; i < 5; i++)
{
await rateLimiter.WaitAsync();
Console.WriteLine($"{Thread.CurrentThread.ManagedThreadId} : {DateTime.Now}");
}
}
);
await Task.WhenAll(task1, task2);
Output
4 : 10/25/2020 05:16:15
5 : 10/25/2020 05:16:15
4 : 10/25/2020 05:16:15
5 : 10/25/2020 05:16:18
5 : 10/25/2020 05:16:18
5 : 10/25/2020 05:16:18
5 : 10/25/2020 05:16:21
5 : 10/25/2020 05:16:21
5 : 10/25/2020 05:16:21
4 : 10/25/2020 05:16:24
Full Demo Here
Usage
private RateLimiter _rateLimiter = new RateLimiter(25 , TimeSpan.FromSeconds(1));
...
async Task<string> func()
{
await _rateLimiter.WaitAsync();
var response = await client.SendAsync(mediaRequest);
return await response.Content.ReadAsStringAsync();
}
Class
public class RateLimiter
{
private readonly CancellationToken _cancellationToken;
private readonly TimeSpan _timeSpan;
private bool _isProcessing;
private readonly int _count;
private readonly Queue<DateTime> _completed = new Queue<DateTime>();
private readonly Queue<TaskCompletionSource<bool>> _waiting = new Queue<TaskCompletionSource<bool>>();
private readonly object _sync = new object();
public RateLimiter(int count, TimeSpan timeSpan, CancellationToken cancellationToken = default)
{
_count = count;
_timeSpan = timeSpan;
_cancellationToken = cancellationToken;
}
private void Cleanup()
{
// if the cancellation was request, we need to throw on all waiting items
while (_cancellationToken.IsCancellationRequested && _waiting.Any())
if (_waiting.TryDequeue(out var item))
item.TrySetCanceled();
_waiting.Clear();
_completed.Clear();
_isProcessing = false;
}
private async Task ProcessAsync()
{
try
{
while (true)
{
_cancellationToken.ThrowIfCancellationRequested();
var time = DateTime.Now - _timeSpan;
lock (_sync)
{
// remove anything out of date from the queue
while (_completed.Any() && _completed.Peek() < time)
_completed.TryDequeue(out _);
// signal waiting tasks to process
while (_completed.Count < _count && _waiting.Any())
{
if (_waiting.TryDequeue(out var item))
item.TrySetResult(true);
_completed.Enqueue(DateTime.Now);
}
if (!_waiting.Any() && !_completed.Any())
{
Cleanup();
break;
}
}
var delay = (_completed.Peek() - time) + TimeSpan.FromMilliseconds(20);
if (delay.Ticks > 0)
await Task.Delay(delay, _cancellationToken);
Console.WriteLine(delay);
}
}
catch (OperationCanceledException)
{
lock (_sync)
Cleanup();
}
}
public ValueTask WaitAsync()
{
// ReSharper disable once InconsistentlySynchronizedField
_cancellationToken.ThrowIfCancellationRequested();
lock (_sync)
{
try
{
if (_completed.Count < _count && !_waiting.Any())
{
_completed.Enqueue(DateTime.Now);
return new ValueTask();
}
var tcs = new TaskCompletionSource<bool>();
_waiting.Enqueue(tcs);
return new ValueTask(tcs.Task);
}
finally
{
if (!_isProcessing)
{
_isProcessing = true;
_ = ProcessAsync();
}
}
}
}
}
Note 1 : It would be optimal to use this with a max degree of parallelism.
Note 2 : Although I tested this, I only wrote it for this answer as a novel solution.
The Polly library currently lacks a rate-limiting policy (requests/time). Fortunately this functionality is relatively easy to implement using a SemaphoreSlim. The trick to make the rate-limiting happen is to configure the capacity of the semaphore equal to the dividend (requests), and delay the Release of the semaphore for a time span equal to the divisor (time), after acquiring the semaphore. This way the rate limit will be applied consistently to any possible time window.
Update: I realized that the Polly library is extensible, and allows to implement custom policies with custom functionality. So I'm scraping my original suggestion in favor of the custom RateLimitAsyncPolicy class below:
public class RateLimitAsyncPolicy : AsyncPolicy
{
private readonly SemaphoreSlim _semaphore;
private readonly TimeSpan _timeUnit;
public RateLimitAsyncPolicy(int maxOperationsPerTimeUnit, TimeSpan timeUnit)
{
// Arguments validation omitted
_semaphore = new SemaphoreSlim(maxOperationsPerTimeUnit);
_timeUnit = timeUnit;
}
protected async override Task<TResult> ImplementationAsync<TResult>(
Func<Context, CancellationToken, Task<TResult>> action,
Context context,
CancellationToken cancellationToken,
bool continueOnCapturedContext)
{
await _semaphore.WaitAsync(cancellationToken)
.ConfigureAwait(continueOnCapturedContext);
ScheduleSemaphoreRelease();
return await action(context, cancellationToken).ConfigureAwait(false);
}
private async void ScheduleSemaphoreRelease()
{
await Task.Delay(_timeUnit);
_semaphore.Release();
}
}
This policy ensures that no more than maxOperationsPerTimeUnit operations will be started during any time window of timeUnit size. The duration of the operations is not taken into account. In other words no restriction is imposed on how many operations can be running concurrently at any given moment. This restriction can be optionally imposed by the BulkheadAsync policy. Combining these two policies (the RateLimitAsyncPolicy and the BulkheadAsync) is possible, as shown in the example below:
var policy = Policy.WrapAsync
(
Policy
.Handle<HttpRequestException>()
.RetryAsync(retryCount: 3),
new RateLimitAsyncPolicy(
maxOperationsPerTimeUnit: 25, timeUnit: TimeSpan.FromSeconds(1)),
Policy.BulkheadAsync( // Optional
maxParallelization: 25, maxQueuingActions: Int32.MaxValue)
);
The order is important only for the RetryAsync policy, that must be placed first for a reason explained in the documentation:
BulkheadPolicy: Usually innermost unless wraps a final TimeoutPolicy. Certainly inside any WaitAndRetry. The Bulkhead intentionally limits the parallelization. You want that parallelization devoted to running the delegate, not occupied by waits for a retry.
Similarly the RateLimitPolicy must follow the Retry, so that each retry to be considered an independent operation, and to count towards the rate limit.
You should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
HttpRequestMessage MakeMessage(MediaItem mi) => new HttpRequestMessage
{
RequestUri = new Uri($"https://u48ydao1w4.execute-api.ap-southeast-2.amazonaws.com/test/downloads/thumbnails/{mi.filename}"),
Method = HttpMethod.Get,
Headers = {
{ "id-token", id_Token },
{ "access-token", access_Token }
}
};
var urls = await
uploadedMedia
.Items
.ToObservable()
.Buffer(24)
.Zip(Observable.Timer(TimeSpan.Zero, TimeSpan.FromSeconds(1.0)), (mrs, _) => mrs)
.SelectMany(mrs => mrs.ToObservable().SelectMany(mr => Observable.FromAsync(() => client.SendAsync(MakeMessage(mr)))))
.SelectMany(x => Observable.FromAsync(() => x.Content.ReadAsStringAsync()))
.ToList();
I haven't been able to test it, but it should be fairly close.
I have this function which checks for proxy servers and currently it checks only a number of threads and waits for all to finish until the next set is starting. Is it possible to start a new thread as soon as one is finished from the maximum allowed?
for (int i = 0; i < listProxies.Count(); i+=nThreadsNum)
{
for (nCurrentThread = 0; nCurrentThread < nThreadsNum; nCurrentThread++)
{
if (nCurrentThread < nThreadsNum)
{
string strProxyIP = listProxies[i + nCurrentThread].sIPAddress;
int nPort = listProxies[i + nCurrentThread].nPort;
tasks.Add(Task.Factory.StartNew<ProxyAddress>(() => CheckProxyServer(strProxyIP, nPort, nCurrentThread)));
}
}
Task.WaitAll(tasks.ToArray());
foreach (var tsk in tasks)
{
ProxyAddress result = tsk.Result;
UpdateProxyDBRecord(result.sIPAddress, result.bOnlineStatus);
}
tasks.Clear();
}
This seems much more simple:
int numberProcessed = 0;
Parallel.ForEach(listProxies,
new ParallelOptions { MaxDegreeOfParallelism = nThreadsNum },
(p)=> {
var result = CheckProxyServer(p.sIPAddress, s.nPort, Thread.CurrentThread.ManagedThreadId);
UpdateProxyDBRecord(result.sIPAddress, result.bOnlineStatus);
Interlocked.Increment(numberProcessed);
});
With slots:
var obj = new Object();
var slots = new List<int>();
Parallel.ForEach(listProxies,
new ParallelOptions { MaxDegreeOfParallelism = nThreadsNum },
(p)=> {
int threadId = Thread.CurrentThread.ManagedThreadId;
int slot = slots.IndexOf(threadId);
if (slot == -1)
{
lock(obj)
{
slots.Add(threadId);
}
slot = slots.IndexOf(threadId);
}
var result = CheckProxyServer(p.sIPAddress, s.nPort, slot);
UpdateProxyDBRecord(result.sIPAddress, result.bOnlineStatus);
});
I took a few shortcuts there to guarantee thread safety. You don't have to do the normal check-lock-check dance because there will never be two threads attempting to add the same threadid to the list, so the second check will always fail and isn't needed. Secondly, for the same reason, I don't believe you need to ever lock around the outer IndexOf either. That makes this a very highly efficient concurrent routine that rarely locks (it should only lock nThreadsNum times) no matter how many items are in the enumerable.
Another solution is to use a SemaphoreSlim or the Producer-Consumer Pattern using a BlockinCollection<T>. Both solution support cancellation.
SemaphoreSlim
private async Task CheckProxyServerAsync(IEnumerable<object> proxies)
{
var tasks = new List<Task>();
int currentThreadNumber = 0;
int maxNumberOfThreads = 8;
using (semaphore = new SemaphoreSlim(maxNumberOfThreads, maxNumberOfThreads))
{
foreach (var proxy in proxies)
{
// Asynchronously wait until thread is available if thread limit reached
await semaphore.WaitAsync();
string proxyIP = proxy.IPAddress;
int port = proxy.Port;
tasks.Add(Task.Run(() => CheckProxyServer(proxyIP, port, Interlocked.Increment(ref currentThreadNumber)))
.ContinueWith(
(task) =>
{
ProxyAddress result = task.Result;
// Method call must be thread-safe!
UpdateProxyDbRecord(result.IPAddress, result.OnlineStatus);
Interlocked.Decrement(ref currentThreadNumber);
// Allow to start next thread if thread limit was reached
semaphore.Release();
},
TaskContinuationOptions.OnlyOnRanToCompletion));
}
// Asynchronously wait until all tasks are completed
// to prevent premature disposal of semaphore
await Task.WhenAll(tasks);
}
}
Producer-Consumer Pattern
// Uses a fixed number of same threads
private async Task CheckProxyServerAsync(IEnumerable<ProxyInfo> proxies)
{
var pipe = new BlockingCollection<ProxyInfo>();
int maxNumberOfThreads = 8;
var tasks = new List<Task>();
// Create all threads (count == maxNumberOfThreads)
for (int currentThreadNumber = 0; currentThreadNumber < maxNumberOfThreads; currentThreadNumber++)
{
tasks.Add(
Task.Run(() => ConsumeProxyInfo(pipe, currentThreadNumber)));
}
proxies.ToList().ForEach(pipe.Add);
pipe.CompleteAdding();
await Task.WhenAll(tasks);
}
private void ConsumeProxyInfo(BlockingCollection<ProxyInfo> proxiesPipe, int currentThreadNumber)
{
while (!proxiesPipe.IsCompleted)
{
if (proxiesPipe.TryTake(out ProxyInfo proxy))
{
int port = proxy.Port;
string proxyIP = proxy.IPAddress;
ProxyAddress result = CheckProxyServer(proxyIP, port, currentThreadNumber);
// Method call must be thread-safe!
UpdateProxyDbRecord(result.IPAddress, result.OnlineStatus);
}
}
}
If I'm understanding your question properly, this is actually fairly simple to do with await Task.WhenAny. Basically, you keep a collection of all of the running tasks. Once you reach a certain number of tasks running, you wait for one or more of your tasks to finish, and then you remove the tasks that were completed from your collection and continue to add more tasks.
Here's an example of what I mean below:
var tasks = new List<Task>();
for (int i = 0; i < 20; i++)
{
// I want my list of tasks to contain at most 5 tasks at once
if (tasks.Count == 5)
{
// Wait for at least one of the tasks to complete
await Task.WhenAny(tasks.ToArray());
// Remove all of the completed tasks from the list
tasks = tasks.Where(t => !t.IsCompleted).ToList();
}
// Add some task to the list
tasks.Add(Task.Factory.StartNew(async delegate ()
{
await Task.Delay(1000);
}));
}
I suggest changing your approach slightly. Instead of starting and stopping threads, put your proxy server data in a concurrent queue, one item for each proxy server. Then create a fixed number of threads (or async tasks) to work on the queue. This is more likely to provide smooth performance (you aren't starting and stopping threads over and over, which has overhead) and is a lot easier to code, in my opinion.
A simple example:
class ProxyChecker
{
private ConcurrentQueue<ProxyInfo> _masterQueue = new ConcurrentQueue<ProxyInfo>();
public ProxyChecker(IEnumerable<ProxyInfo> listProxies)
{
foreach (var proxy in listProxies)
{
_masterQueue.Enqueue(proxy);
}
}
public async Task RunChecks(int maximumConcurrency)
{
var count = Math.Max(maximumConcurrency, _masterQueue.Count);
var tasks = Enumerable.Range(0, count).Select( i => WorkerTask() ).ToList();
await Task.WhenAll(tasks);
}
private async Task WorkerTask()
{
ProxyInfo proxyInfo;
while ( _masterList.TryDequeue(out proxyInfo))
{
DoTheTest(proxyInfo.IP, proxyInfo.Port)
}
}
}
How to cancel all tasks, if one of them return i.e. false (bool) result?
Is it possible to identify which task returned a result?
class Program
{
private static Random _rnd = new Random();
static void Main(string[] args)
{
var tasksCounter = _rnd.Next(4, 7);
var cts = new CancellationTokenSource();
var tasks = new Task<bool>[tasksCounter];
for (int i = 0; i < tasks.Length; i++)
{
tasks[i] = CreateTask(cts);
}
Console.WriteLine("Waiting..");
Task.WaitAny(tasks);
Console.WriteLine("Done!");
Console.ReadKey();
}
private static Task<bool> CreateTask(CancellationTokenSource cts)
{
return Task.Factory.StartNew(TaskAction, cts.Token).Unwrap();
}
private static async Task<bool> TaskAction()
{
var delay = _rnd.Next(2, 5);
await Task.Delay(delay * 1000);
var taskResult = _rnd.Next(10) < 4;
return await Task.FromResult(taskResult);
}
}
I tried to use Task.WaitAll, Task.WaitAny etc, but none of these methods provide useful (in my case) functionality.
Edit:
As #ckuri stated in the comments, it would be easier to leverage the already existing properties of Task instead of writing a custom result class. Adjusted my answer accordingly.
One solution would be to check for the result in the CreateTask() method and pass a CancellationToken into your TaskAction() method:
private static Task<bool> CreateTask(CancellationTokenSource cts)
{
return Task.Run(async () =>
{
var result = await TaskAction(cts.Token);
// If result is false, cancel all tasks
if (!result)
cts.Cancel();
return result;
});
}
private static async Task<bool> TaskAction(CancellationToken token)
{
// Check for cancellation
token.ThrowIfCancellationRequested();
var delay = Rnd.Next(2, 5);
// Pass the cancellation token to Task.Delay()
await Task.Delay(delay * 1000, token);
var taskResult = Rnd.Next(10) < 4;
// Check for cancellation
token.ThrowIfCancellationRequested();
return taskResult;
}
Now you can do something like this in your Main method to receive all tasks which did not get cancelled:
try
{
// Wait for all tasks inside a try catch block because `WhenAll` throws a `AggregationException`
// containing a `TaskCanceledException`, if the token gets canceled
await Task.WhenAll(tasks);
}
catch { }
var tasksWithResult = tasks.Where(t => !t.IsCanceled).ToList();
I don't have an interpreter on hand, so excuse any mistakes. Your Task needs access to the token if you want to cancel all tasks from the task itself. See https://learn.microsoft.com/en-us/dotnet/api/system.threading.cancellationtokensource?view=netcore-3.0 for an example. I.e. pass it as an argument to the task so one task can cancel the other tasks.
private static async Task<bool> TaskAction(CancellationTokenSource cts)
{
var delay = _rnd.Next(2, 5);
await Task.Delay(delay * 1000);
var taskResult = _rnd.Next(10) < 4;
if (!taskResult)
cts.Cancel()
return await Task.FromResult(taskResult);
}
Just be sure to capture the AggregateException and check if one of the inner excepts is an TaskCanceledException.
Let's say I have 100 tasks that do something that takes 10 seconds.
Now I want to only run 10 at a time like when 1 of those 10 finishes another task gets executed till all are finished.
Now I always used ThreadPool.QueueUserWorkItem() for such task but I've read that it is bad practice to do so and that I should use Tasks instead.
My problem is that I nowhere found a good example for my scenario so could you get me started on how to achieve this goal with Tasks?
SemaphoreSlim maxThread = new SemaphoreSlim(10);
for (int i = 0; i < 115; i++)
{
maxThread.Wait();
Task.Factory.StartNew(() =>
{
//Your Works
}
, TaskCreationOptions.LongRunning)
.ContinueWith( (task) => maxThread.Release() );
}
TPL Dataflow is great for doing things like this. You can create a 100% async version of Parallel.Invoke pretty easily:
async Task ProcessTenAtOnce<T>(IEnumerable<T> items, Func<T, Task> func)
{
ExecutionDataflowBlockOptions edfbo = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
ActionBlock<T> ab = new ActionBlock<T>(func, edfbo);
foreach (T item in items)
{
await ab.SendAsync(item);
}
ab.Complete();
await ab.Completion;
}
You have several options. You can use Parallel.Invoke for starters:
public void DoWork(IEnumerable<Action> actions)
{
Parallel.Invoke(new ParallelOptions() { MaxDegreeOfParallelism = 10 }
, actions.ToArray());
}
Here is an alternate option that will work much harder to have exactly 10 tasks running (although the number of threads in the thread pool processing those tasks may be different) and that returns a Task indicating when it finishes, rather than blocking until done.
public Task DoWork(IList<Action> actions)
{
List<Task> tasks = new List<Task>();
int numWorkers = 10;
int batchSize = (int)Math.Ceiling(actions.Count / (double)numWorkers);
foreach (var batch in actions.Batch(actions.Count / 10))
{
tasks.Add(Task.Factory.StartNew(() =>
{
foreach (var action in batch)
{
action();
}
}));
}
return Task.WhenAll(tasks);
}
If you don't have MoreLinq, for the Batch function, here's my simpler implementation:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int batchSize)
{
List<T> buffer = new List<T>(batchSize);
foreach (T item in source)
{
buffer.Add(item);
if (buffer.Count >= batchSize)
{
yield return buffer;
buffer = new List<T>();
}
}
if (buffer.Count >= 0)
{
yield return buffer;
}
}
You can create a method like this:
public static async Task RunLimitedNumberAtATime<T>(int numberOfTasksConcurrent,
IEnumerable<T> inputList, Func<T, Task> asyncFunc)
{
Queue<T> inputQueue = new Queue<T>(inputList);
List<Task> runningTasks = new List<Task>(numberOfTasksConcurrent);
for (int i = 0; i < numberOfTasksConcurrent && inputQueue.Count > 0; i++)
runningTasks.Add(asyncFunc(inputQueue.Dequeue()));
while (inputQueue.Count > 0)
{
Task task = await Task.WhenAny(runningTasks);
runningTasks.Remove(task);
runningTasks.Add(asyncFunc(inputQueue.Dequeue()));
}
await Task.WhenAll(runningTasks);
}
And then you can call any async method n times with a limit like this:
Task task = RunLimitedNumberAtATime(10,
Enumerable.Range(1, 100),
async x =>
{
Console.WriteLine($"Starting task {x}");
await Task.Delay(100);
Console.WriteLine($"Finishing task {x}");
});
Or if you want to run long running non async methods, you can do it that way:
Task task = RunLimitedNumberAtATime(10,
Enumerable.Range(1, 100),
x => Task.Factory.StartNew(() => {
Console.WriteLine($"Starting task {x}");
System.Threading.Thread.Sleep(100);
Console.WriteLine($"Finishing task {x}");
}, TaskCreationOptions.LongRunning));
Maybe there is a similar method somewhere in the framework, but I didn't find it yet.
I would love to use the simplest solution I can think of which as I think using the TPL:
string[] urls={};
Parallel.ForEach(urls, new ParallelOptions() { MaxDegreeOfParallelism = 2}, url =>
{
//Download the content or do whatever you want with each URL
});
Let's say I have 100 tasks that do something that takes 10 seconds.
Now I want to only run 10 at a time like when 1 of those 10 finishes another task gets executed till all are finished.
Now I always used ThreadPool.QueueUserWorkItem() for such task but I've read that it is bad practice to do so and that I should use Tasks instead.
My problem is that I nowhere found a good example for my scenario so could you get me started on how to achieve this goal with Tasks?
SemaphoreSlim maxThread = new SemaphoreSlim(10);
for (int i = 0; i < 115; i++)
{
maxThread.Wait();
Task.Factory.StartNew(() =>
{
//Your Works
}
, TaskCreationOptions.LongRunning)
.ContinueWith( (task) => maxThread.Release() );
}
TPL Dataflow is great for doing things like this. You can create a 100% async version of Parallel.Invoke pretty easily:
async Task ProcessTenAtOnce<T>(IEnumerable<T> items, Func<T, Task> func)
{
ExecutionDataflowBlockOptions edfbo = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
ActionBlock<T> ab = new ActionBlock<T>(func, edfbo);
foreach (T item in items)
{
await ab.SendAsync(item);
}
ab.Complete();
await ab.Completion;
}
You have several options. You can use Parallel.Invoke for starters:
public void DoWork(IEnumerable<Action> actions)
{
Parallel.Invoke(new ParallelOptions() { MaxDegreeOfParallelism = 10 }
, actions.ToArray());
}
Here is an alternate option that will work much harder to have exactly 10 tasks running (although the number of threads in the thread pool processing those tasks may be different) and that returns a Task indicating when it finishes, rather than blocking until done.
public Task DoWork(IList<Action> actions)
{
List<Task> tasks = new List<Task>();
int numWorkers = 10;
int batchSize = (int)Math.Ceiling(actions.Count / (double)numWorkers);
foreach (var batch in actions.Batch(actions.Count / 10))
{
tasks.Add(Task.Factory.StartNew(() =>
{
foreach (var action in batch)
{
action();
}
}));
}
return Task.WhenAll(tasks);
}
If you don't have MoreLinq, for the Batch function, here's my simpler implementation:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int batchSize)
{
List<T> buffer = new List<T>(batchSize);
foreach (T item in source)
{
buffer.Add(item);
if (buffer.Count >= batchSize)
{
yield return buffer;
buffer = new List<T>();
}
}
if (buffer.Count >= 0)
{
yield return buffer;
}
}
You can create a method like this:
public static async Task RunLimitedNumberAtATime<T>(int numberOfTasksConcurrent,
IEnumerable<T> inputList, Func<T, Task> asyncFunc)
{
Queue<T> inputQueue = new Queue<T>(inputList);
List<Task> runningTasks = new List<Task>(numberOfTasksConcurrent);
for (int i = 0; i < numberOfTasksConcurrent && inputQueue.Count > 0; i++)
runningTasks.Add(asyncFunc(inputQueue.Dequeue()));
while (inputQueue.Count > 0)
{
Task task = await Task.WhenAny(runningTasks);
runningTasks.Remove(task);
runningTasks.Add(asyncFunc(inputQueue.Dequeue()));
}
await Task.WhenAll(runningTasks);
}
And then you can call any async method n times with a limit like this:
Task task = RunLimitedNumberAtATime(10,
Enumerable.Range(1, 100),
async x =>
{
Console.WriteLine($"Starting task {x}");
await Task.Delay(100);
Console.WriteLine($"Finishing task {x}");
});
Or if you want to run long running non async methods, you can do it that way:
Task task = RunLimitedNumberAtATime(10,
Enumerable.Range(1, 100),
x => Task.Factory.StartNew(() => {
Console.WriteLine($"Starting task {x}");
System.Threading.Thread.Sleep(100);
Console.WriteLine($"Finishing task {x}");
}, TaskCreationOptions.LongRunning));
Maybe there is a similar method somewhere in the framework, but I didn't find it yet.
I would love to use the simplest solution I can think of which as I think using the TPL:
string[] urls={};
Parallel.ForEach(urls, new ParallelOptions() { MaxDegreeOfParallelism = 2}, url =>
{
//Download the content or do whatever you want with each URL
});