Have a set of Tasks with only X running at a time - c#

Let's say I have 100 tasks that do something that takes 10 seconds.
Now I want to only run 10 at a time like when 1 of those 10 finishes another task gets executed till all are finished.
Now I always used ThreadPool.QueueUserWorkItem() for such task but I've read that it is bad practice to do so and that I should use Tasks instead.
My problem is that I nowhere found a good example for my scenario so could you get me started on how to achieve this goal with Tasks?

SemaphoreSlim maxThread = new SemaphoreSlim(10);
for (int i = 0; i < 115; i++)
{
maxThread.Wait();
Task.Factory.StartNew(() =>
{
//Your Works
}
, TaskCreationOptions.LongRunning)
.ContinueWith( (task) => maxThread.Release() );
}

TPL Dataflow is great for doing things like this. You can create a 100% async version of Parallel.Invoke pretty easily:
async Task ProcessTenAtOnce<T>(IEnumerable<T> items, Func<T, Task> func)
{
ExecutionDataflowBlockOptions edfbo = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
ActionBlock<T> ab = new ActionBlock<T>(func, edfbo);
foreach (T item in items)
{
await ab.SendAsync(item);
}
ab.Complete();
await ab.Completion;
}

You have several options. You can use Parallel.Invoke for starters:
public void DoWork(IEnumerable<Action> actions)
{
Parallel.Invoke(new ParallelOptions() { MaxDegreeOfParallelism = 10 }
, actions.ToArray());
}
Here is an alternate option that will work much harder to have exactly 10 tasks running (although the number of threads in the thread pool processing those tasks may be different) and that returns a Task indicating when it finishes, rather than blocking until done.
public Task DoWork(IList<Action> actions)
{
List<Task> tasks = new List<Task>();
int numWorkers = 10;
int batchSize = (int)Math.Ceiling(actions.Count / (double)numWorkers);
foreach (var batch in actions.Batch(actions.Count / 10))
{
tasks.Add(Task.Factory.StartNew(() =>
{
foreach (var action in batch)
{
action();
}
}));
}
return Task.WhenAll(tasks);
}
If you don't have MoreLinq, for the Batch function, here's my simpler implementation:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int batchSize)
{
List<T> buffer = new List<T>(batchSize);
foreach (T item in source)
{
buffer.Add(item);
if (buffer.Count >= batchSize)
{
yield return buffer;
buffer = new List<T>();
}
}
if (buffer.Count >= 0)
{
yield return buffer;
}
}

You can create a method like this:
public static async Task RunLimitedNumberAtATime<T>(int numberOfTasksConcurrent,
IEnumerable<T> inputList, Func<T, Task> asyncFunc)
{
Queue<T> inputQueue = new Queue<T>(inputList);
List<Task> runningTasks = new List<Task>(numberOfTasksConcurrent);
for (int i = 0; i < numberOfTasksConcurrent && inputQueue.Count > 0; i++)
runningTasks.Add(asyncFunc(inputQueue.Dequeue()));
while (inputQueue.Count > 0)
{
Task task = await Task.WhenAny(runningTasks);
runningTasks.Remove(task);
runningTasks.Add(asyncFunc(inputQueue.Dequeue()));
}
await Task.WhenAll(runningTasks);
}
And then you can call any async method n times with a limit like this:
Task task = RunLimitedNumberAtATime(10,
Enumerable.Range(1, 100),
async x =>
{
Console.WriteLine($"Starting task {x}");
await Task.Delay(100);
Console.WriteLine($"Finishing task {x}");
});
Or if you want to run long running non async methods, you can do it that way:
Task task = RunLimitedNumberAtATime(10,
Enumerable.Range(1, 100),
x => Task.Factory.StartNew(() => {
Console.WriteLine($"Starting task {x}");
System.Threading.Thread.Sleep(100);
Console.WriteLine($"Finishing task {x}");
}, TaskCreationOptions.LongRunning));
Maybe there is a similar method somewhere in the framework, but I didn't find it yet.

I would love to use the simplest solution I can think of which as I think using the TPL:
string[] urls={};
Parallel.ForEach(urls, new ParallelOptions() { MaxDegreeOfParallelism = 2}, url =>
{
//Download the content or do whatever you want with each URL
});

Related

"Infinite" asynchronous parallel foreach loop

I have a List<string> containing 50K to 100K words
I would like to iterate through it in a parallel and asynchronous fashion
As an example, I could use
while (true)
{
Parallel.ForEach(words, new ParallelOptions { MaxDegreeOfParallelism = 100 }, ...)
}
But problems are:
Parallel.ForEach isn't asynchronous
When we arrive at the end of the list, we have to wait for each thread to end before the while (true) statement continues
Which means that there aren't always 100 threads running, which is what I want
How would I be able to achieve this?
Please let me know if this is confusing, or if I'm bad at explaining things.
Here is a totally contrived async friendly TPL DataFlow example of how you could achieve what you ask.
It's suitable for an async IO workload
It's cancellable
It limits max parallelism
It has bounded capacity so there is always 100 jobs available
It's infinite
Given
private static CancellationTokenSource _cs;
private static CancellationToken _token;
private static ActionBlock<string> _block;
private static async Task MethodAsync(string something)
{
// Your async workload
}
public static async Task EndlessRunner(string[] someArray)
{
try
{
var index = 0;
while (!_token.IsCancellationRequested)
{
await _block.SendAsync(someArray[index],_token);
if (++index >= someArray.Length) index = 0;
}
}
catch (OperationCanceledException)
{
Console.WriteLine("Cancelled");
}
}
Example
private static async Task Main()
{
_cs = new CancellationTokenSource();
_token = _cs.Token;
_block = new ActionBlock<string>(
MethodAsync,
new ExecutionDataflowBlockOptions()
{
EnsureOrdered = false,
MaxDegreeOfParallelism = 100,
BoundedCapacity = 100,
CancellationToken = _cs.Token,
SingleProducerConstrained = true
});
var someList = Enumerable
.Range(0,5000)
.Select(I => $"something {I}")
.ToArray();
Task.Run(() => EndlessRunner(someList));
Console.ReadKey();
_cs.Cancel();
_block.Complete();
await _block.Completion;
}

C# Multithreading with slots

I have this function which checks for proxy servers and currently it checks only a number of threads and waits for all to finish until the next set is starting. Is it possible to start a new thread as soon as one is finished from the maximum allowed?
for (int i = 0; i < listProxies.Count(); i+=nThreadsNum)
{
for (nCurrentThread = 0; nCurrentThread < nThreadsNum; nCurrentThread++)
{
if (nCurrentThread < nThreadsNum)
{
string strProxyIP = listProxies[i + nCurrentThread].sIPAddress;
int nPort = listProxies[i + nCurrentThread].nPort;
tasks.Add(Task.Factory.StartNew<ProxyAddress>(() => CheckProxyServer(strProxyIP, nPort, nCurrentThread)));
}
}
Task.WaitAll(tasks.ToArray());
foreach (var tsk in tasks)
{
ProxyAddress result = tsk.Result;
UpdateProxyDBRecord(result.sIPAddress, result.bOnlineStatus);
}
tasks.Clear();
}
This seems much more simple:
int numberProcessed = 0;
Parallel.ForEach(listProxies,
new ParallelOptions { MaxDegreeOfParallelism = nThreadsNum },
(p)=> {
var result = CheckProxyServer(p.sIPAddress, s.nPort, Thread.CurrentThread.ManagedThreadId);
UpdateProxyDBRecord(result.sIPAddress, result.bOnlineStatus);
Interlocked.Increment(numberProcessed);
});
With slots:
var obj = new Object();
var slots = new List<int>();
Parallel.ForEach(listProxies,
new ParallelOptions { MaxDegreeOfParallelism = nThreadsNum },
(p)=> {
int threadId = Thread.CurrentThread.ManagedThreadId;
int slot = slots.IndexOf(threadId);
if (slot == -1)
{
lock(obj)
{
slots.Add(threadId);
}
slot = slots.IndexOf(threadId);
}
var result = CheckProxyServer(p.sIPAddress, s.nPort, slot);
UpdateProxyDBRecord(result.sIPAddress, result.bOnlineStatus);
});
I took a few shortcuts there to guarantee thread safety. You don't have to do the normal check-lock-check dance because there will never be two threads attempting to add the same threadid to the list, so the second check will always fail and isn't needed. Secondly, for the same reason, I don't believe you need to ever lock around the outer IndexOf either. That makes this a very highly efficient concurrent routine that rarely locks (it should only lock nThreadsNum times) no matter how many items are in the enumerable.
Another solution is to use a SemaphoreSlim or the Producer-Consumer Pattern using a BlockinCollection<T>. Both solution support cancellation.
SemaphoreSlim
private async Task CheckProxyServerAsync(IEnumerable<object> proxies)
{
var tasks = new List<Task>();
int currentThreadNumber = 0;
int maxNumberOfThreads = 8;
using (semaphore = new SemaphoreSlim(maxNumberOfThreads, maxNumberOfThreads))
{
foreach (var proxy in proxies)
{
// Asynchronously wait until thread is available if thread limit reached
await semaphore.WaitAsync();
string proxyIP = proxy.IPAddress;
int port = proxy.Port;
tasks.Add(Task.Run(() => CheckProxyServer(proxyIP, port, Interlocked.Increment(ref currentThreadNumber)))
.ContinueWith(
(task) =>
{
ProxyAddress result = task.Result;
// Method call must be thread-safe!
UpdateProxyDbRecord(result.IPAddress, result.OnlineStatus);
Interlocked.Decrement(ref currentThreadNumber);
// Allow to start next thread if thread limit was reached
semaphore.Release();
},
TaskContinuationOptions.OnlyOnRanToCompletion));
}
// Asynchronously wait until all tasks are completed
// to prevent premature disposal of semaphore
await Task.WhenAll(tasks);
}
}
Producer-Consumer Pattern
// Uses a fixed number of same threads
private async Task CheckProxyServerAsync(IEnumerable<ProxyInfo> proxies)
{
var pipe = new BlockingCollection<ProxyInfo>();
int maxNumberOfThreads = 8;
var tasks = new List<Task>();
// Create all threads (count == maxNumberOfThreads)
for (int currentThreadNumber = 0; currentThreadNumber < maxNumberOfThreads; currentThreadNumber++)
{
tasks.Add(
Task.Run(() => ConsumeProxyInfo(pipe, currentThreadNumber)));
}
proxies.ToList().ForEach(pipe.Add);
pipe.CompleteAdding();
await Task.WhenAll(tasks);
}
private void ConsumeProxyInfo(BlockingCollection<ProxyInfo> proxiesPipe, int currentThreadNumber)
{
while (!proxiesPipe.IsCompleted)
{
if (proxiesPipe.TryTake(out ProxyInfo proxy))
{
int port = proxy.Port;
string proxyIP = proxy.IPAddress;
ProxyAddress result = CheckProxyServer(proxyIP, port, currentThreadNumber);
// Method call must be thread-safe!
UpdateProxyDbRecord(result.IPAddress, result.OnlineStatus);
}
}
}
If I'm understanding your question properly, this is actually fairly simple to do with await Task.WhenAny. Basically, you keep a collection of all of the running tasks. Once you reach a certain number of tasks running, you wait for one or more of your tasks to finish, and then you remove the tasks that were completed from your collection and continue to add more tasks.
Here's an example of what I mean below:
var tasks = new List<Task>();
for (int i = 0; i < 20; i++)
{
// I want my list of tasks to contain at most 5 tasks at once
if (tasks.Count == 5)
{
// Wait for at least one of the tasks to complete
await Task.WhenAny(tasks.ToArray());
// Remove all of the completed tasks from the list
tasks = tasks.Where(t => !t.IsCompleted).ToList();
}
// Add some task to the list
tasks.Add(Task.Factory.StartNew(async delegate ()
{
await Task.Delay(1000);
}));
}
I suggest changing your approach slightly. Instead of starting and stopping threads, put your proxy server data in a concurrent queue, one item for each proxy server. Then create a fixed number of threads (or async tasks) to work on the queue. This is more likely to provide smooth performance (you aren't starting and stopping threads over and over, which has overhead) and is a lot easier to code, in my opinion.
A simple example:
class ProxyChecker
{
private ConcurrentQueue<ProxyInfo> _masterQueue = new ConcurrentQueue<ProxyInfo>();
public ProxyChecker(IEnumerable<ProxyInfo> listProxies)
{
foreach (var proxy in listProxies)
{
_masterQueue.Enqueue(proxy);
}
}
public async Task RunChecks(int maximumConcurrency)
{
var count = Math.Max(maximumConcurrency, _masterQueue.Count);
var tasks = Enumerable.Range(0, count).Select( i => WorkerTask() ).ToList();
await Task.WhenAll(tasks);
}
private async Task WorkerTask()
{
ProxyInfo proxyInfo;
while ( _masterList.TryDequeue(out proxyInfo))
{
DoTheTest(proxyInfo.IP, proxyInfo.Port)
}
}
}

Immediately process asynchronous results in the order they were requested

Suppose I kick off 5 async tasks, and I want to print the results in the order they were requested:
public async void RunTasks()
{
var tasks = new List<Task<int>>();
for(int i=1; i<=5; i++)
{
tasks.Add(DoSomething(i));
}
var results = await Task.WhenAll(tasks);
Console.WriteLine(String.Join(',', results));
}
public async Task<int> DoSomething(int taskNumber)
{
var random = new Random();
await Task.Delay(random.Next(5000));
return taskNumber;
}
This will always print "1,2,3,4,5" - because Task.WhenAll() orders the results by the order requested, not by the order in which they finished.
Unfortunately this means I have to wait for ALL Tasks to finish until I can print anything.
How might I instead print the result of each task as soon as it's finished, but still respecting the order they were requested?
So I should always see "1,2,3,4,5" - but it may arrive gradually:
"1"
"1,2,3"
"1,2,3,4"
"1,2,3,4,5"
(no need to worry about the actual reasoning for doing this, treat it as a fun problem)
var tasks = new List<Task<int>>();
for(int i=1; i<=5; i++)
{
tasks.Add(DoSomething(i));
}
foreach (var task in tasks)
{
var result = await task;
Console.WriteLine(result);
}
We kick off all of the tasks first, then loop over them in order, awaiting each in turn. If the task being awaited has previously completed, the await just returns its result. Otherwise we wait until it completes.
Try a TransformBlock it will output the items it processes one by one in the order the were received by default even if the elements are processed in parallel.
public async Task Order()
{
var tBlock = new TransformBlock<int, string>(async x =>
{
await Task.Delay(100);
return x.ToString();
}, new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = 10 });
var sub = tBlock.AsObservable().Subscribe(x => Console.Write(x));
foreach (var num in Enumerable.Range(0, 10))
{
tBlock.Post(num);
}
tBlock.Complete();
await tBlock.Completion;
sub.Dispose();
}
Output:
0123456789

Should this work as expected when controlling the amount of tasks running [duplicate]

Let's say I have 100 tasks that do something that takes 10 seconds.
Now I want to only run 10 at a time like when 1 of those 10 finishes another task gets executed till all are finished.
Now I always used ThreadPool.QueueUserWorkItem() for such task but I've read that it is bad practice to do so and that I should use Tasks instead.
My problem is that I nowhere found a good example for my scenario so could you get me started on how to achieve this goal with Tasks?
SemaphoreSlim maxThread = new SemaphoreSlim(10);
for (int i = 0; i < 115; i++)
{
maxThread.Wait();
Task.Factory.StartNew(() =>
{
//Your Works
}
, TaskCreationOptions.LongRunning)
.ContinueWith( (task) => maxThread.Release() );
}
TPL Dataflow is great for doing things like this. You can create a 100% async version of Parallel.Invoke pretty easily:
async Task ProcessTenAtOnce<T>(IEnumerable<T> items, Func<T, Task> func)
{
ExecutionDataflowBlockOptions edfbo = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
ActionBlock<T> ab = new ActionBlock<T>(func, edfbo);
foreach (T item in items)
{
await ab.SendAsync(item);
}
ab.Complete();
await ab.Completion;
}
You have several options. You can use Parallel.Invoke for starters:
public void DoWork(IEnumerable<Action> actions)
{
Parallel.Invoke(new ParallelOptions() { MaxDegreeOfParallelism = 10 }
, actions.ToArray());
}
Here is an alternate option that will work much harder to have exactly 10 tasks running (although the number of threads in the thread pool processing those tasks may be different) and that returns a Task indicating when it finishes, rather than blocking until done.
public Task DoWork(IList<Action> actions)
{
List<Task> tasks = new List<Task>();
int numWorkers = 10;
int batchSize = (int)Math.Ceiling(actions.Count / (double)numWorkers);
foreach (var batch in actions.Batch(actions.Count / 10))
{
tasks.Add(Task.Factory.StartNew(() =>
{
foreach (var action in batch)
{
action();
}
}));
}
return Task.WhenAll(tasks);
}
If you don't have MoreLinq, for the Batch function, here's my simpler implementation:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int batchSize)
{
List<T> buffer = new List<T>(batchSize);
foreach (T item in source)
{
buffer.Add(item);
if (buffer.Count >= batchSize)
{
yield return buffer;
buffer = new List<T>();
}
}
if (buffer.Count >= 0)
{
yield return buffer;
}
}
You can create a method like this:
public static async Task RunLimitedNumberAtATime<T>(int numberOfTasksConcurrent,
IEnumerable<T> inputList, Func<T, Task> asyncFunc)
{
Queue<T> inputQueue = new Queue<T>(inputList);
List<Task> runningTasks = new List<Task>(numberOfTasksConcurrent);
for (int i = 0; i < numberOfTasksConcurrent && inputQueue.Count > 0; i++)
runningTasks.Add(asyncFunc(inputQueue.Dequeue()));
while (inputQueue.Count > 0)
{
Task task = await Task.WhenAny(runningTasks);
runningTasks.Remove(task);
runningTasks.Add(asyncFunc(inputQueue.Dequeue()));
}
await Task.WhenAll(runningTasks);
}
And then you can call any async method n times with a limit like this:
Task task = RunLimitedNumberAtATime(10,
Enumerable.Range(1, 100),
async x =>
{
Console.WriteLine($"Starting task {x}");
await Task.Delay(100);
Console.WriteLine($"Finishing task {x}");
});
Or if you want to run long running non async methods, you can do it that way:
Task task = RunLimitedNumberAtATime(10,
Enumerable.Range(1, 100),
x => Task.Factory.StartNew(() => {
Console.WriteLine($"Starting task {x}");
System.Threading.Thread.Sleep(100);
Console.WriteLine($"Finishing task {x}");
}, TaskCreationOptions.LongRunning));
Maybe there is a similar method somewhere in the framework, but I didn't find it yet.
I would love to use the simplest solution I can think of which as I think using the TPL:
string[] urls={};
Parallel.ForEach(urls, new ParallelOptions() { MaxDegreeOfParallelism = 2}, url =>
{
//Download the content or do whatever you want with each URL
});

C# async within an action

I would like to write a method which accept several parameters, including an action and a retry amount and invoke it.
So I have this code:
public static IEnumerable<Task> RunWithRetries<T>(List<T> source, int threads, Func<T, Task<bool>> action, int retries, string method)
{
object lockObj = new object();
int index = 0;
return new Action(async () =>
{
while (true)
{
T item;
lock (lockObj)
{
if (index < source.Count)
{
item = source[index];
index++;
}
else
break;
}
int retry = retries;
while (retry > 0)
{
try
{
bool res = await action(item);
if (res)
retry = -1;
else
//sleep if not success..
Thread.Sleep(200);
}
catch (Exception e)
{
LoggerAgent.LogException(e, method);
}
finally
{
retry--;
}
}
}
}).RunParallel(threads);
}
RunParallel is an extention method for Action, its look like this:
public static IEnumerable<Task> RunParallel(this Action action, int amount)
{
List<Task> tasks = new List<Task>();
for (int i = 0; i < amount; i++)
{
Task task = Task.Factory.StartNew(action);
tasks.Add(task);
}
return tasks;
}
Now, the issue: The thread is just disappearing or collapsing without waiting for the action to finish.
I wrote this example code:
private static async Task ex()
{
List<int> ints = new List<int>();
for (int i = 0; i < 1000; i++)
{
ints.Add(i);
}
var tasks = RetryComponent.RunWithRetries(ints, 100, async (num) =>
{
try
{
List<string> test = await fetchSmthFromDb();
Console.WriteLine("#" + num + " " + test[0]);
return test[0] == "test";
}
catch (Exception e)
{
Console.WriteLine(e.StackTrace);
return false;
}
}, 5, "test");
await Task.WhenAll(tasks);
}
The fetchSmthFromDb is a simple Task> which fetches something from the db and works perfectly fine when invoked outside of this example.
Whenever the List<string> test = await fetchSmthFromDb(); row is invoked, the thread seems to be closing and the Console.WriteLine("#" + num + " " + test[0]); not even being triggered, also when debugging the breakpoint never hit.
The Final Working Code
private static async Task DoWithRetries(Func<Task> action, int retryCount, string method)
{
while (true)
{
try
{
await action();
break;
}
catch (Exception e)
{
LoggerAgent.LogException(e, method);
}
if (retryCount <= 0)
break;
retryCount--;
await Task.Delay(200);
};
}
public static async Task RunWithRetries<T>(List<T> source, int threads, Func<T, Task<bool>> action, int retries, string method)
{
Func<T, Task> newAction = async (item) =>
{
await DoWithRetries(async ()=>
{
await action(item);
}, retries, method);
};
await source.ParallelForEachAsync(newAction, threads);
}
The problem is in this line:
return new Action(async () => ...
You start an async operation with the async lambda, but don't return a task to await on. I.e. it runs on worker threads, but you'll never find out when it's done. And your program terminates before the async operation is complete -that's why you don't see any output.
It needs to be:
return new Func<Task>(async () => ...
UPDATE
First, you need to split responsibilities of methods, so you don't mix retry policy (which should not be hardcoded to a check of a boolean result) with running tasks in parallel.
Then, as previously mentioned, you run your while (true) loop 100 times instead of doing things in parallel.
As #MachineLearning pointed out, use Task.Delay instead of Thread.Sleep.
Overall, your solution looks like this:
using System.Collections.Async;
static async Task DoWithRetries(Func<Task> action, int retryCount, string method)
{
while (true)
{
try
{
await action();
break;
}
catch (Exception e)
{
LoggerAgent.LogException(e, method);
}
if (retryCount <= 0)
break;
retryCount--;
await Task.Delay(millisecondsDelay: 200);
};
}
static async Task Example()
{
List<int> ints = new List<int>();
for (int i = 0; i < 1000; i++)
ints.Add(i);
Func<int, Task> actionOnItem =
async item =>
{
await DoWithRetries(async () =>
{
List<string> test = await fetchSmthFromDb();
Console.WriteLine("#" + item + " " + test[0]);
if (test[0] != "test")
throw new InvalidOperationException("unexpected result"); // will be re-tried
},
retryCount: 5,
method: "test");
};
await ints.ParallelForEachAsync(actionOnItem, maxDegreeOfParalellism: 100);
}
You need to use the AsyncEnumerator NuGet Package in order to use the ParallelForEachAsync extension method from the System.Collections.Async namespace.
Besides the final complete reengineering, I think it's very important to underline what was really wrong with the original code.
0) First of all, as #Serge Semenov immediately pointed out, Action has to be replaced with
Func<Task>
But there are still other two essential changes.
1) With an async delegate as argument it is necessary to use the more recent Task.Run instead of the older pattern new TaskFactory.StartNew (or otherwise you have to add Unwrap() explicitly)
2) Moreover the ex() method can't be async since Task.WhenAll must be waited with Wait() and without await.
At that point, even though there are logical errors that need reengineering, from a pure technical standpoint it does work and the output is produced.
A test is available online: http://rextester.com/HMMI93124

Categories

Resources