Cancel background running jobs in Parallel.foreach - c#

I have a parallel foreach for few api calls. Api will return some data and I can process it. Lets say from front end, I call to ProcessInsu method and suddenly the user change his mind and go to other page. (There should be a clear button and user can exit the page after clicking clear button.) After calling ProcessInsu method, it will run APIs in background. It is waste of resources because, the user already change his mind and do other work. I need some methodology to cancel background running jobs.
public async Task ProcessInsu(InsuranceAccounts insuranceCompAccounts,string resourceId)
{
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = Convert.ToInt32(Math.Ceiling((Environment.ProcessorCount * 0.75) * 2.0));
Parallel.ForEach(insuranceCompAccounts, parallelOptions, async (insuranceComp) =>
{
await Processor.Process(resourceId,insuranceComp); //When looping, From each call, it will call to different different insurance companies.
});
}
I tried this sample code and I could not do my work with that. Any expert can guide me to do this?

As others have noted, you can't use async with Parallel - it won't work correctly. Instead, you want to do asynchronous concurrency as such:
public async Task ProcessInsu(InsuranceAccounts insuranceCompAccounts, string resourceId)
{
var tasks = insuranceCompAccounts.Select(async (insuranceComp) =>
{
await Processor.Process(resourceId, insuranceComp);
}).ToList();
await Task.WhenAll(tasks);
}
Now that the code is corrected, you can add cancellation support. E.g.:
public async Task ProcessInsu(InsuranceAccounts insuranceCompAccounts, string resourceId)
{
var cts = new CancellationTokenSource();
var tasks = insuranceCompAccounts.Select(async (insuranceComp) =>
{
await Processor.Process(resourceId, insuranceComp, cts.Token);
}).ToList();
await Task.WhenAll(tasks);
}
When you are ready to cancel the operation, call cts.Cancel();.
Note that Process now takes a CancellationToken. It should pass this token on to whatever I/O APIs it's using.

Related

Creating and stopping threads from listview

I need your help, with threads I'm full 0 and you only need to create a certain thread and complete it on command, BUT I do not create each thread in advance, as there will be a lot of them, I do it like this:
Thread thread = new Thread(() => Go(..... many many variables that are taken from the listview ......));
thread.Start();
So, as noted above, variables are taken from the listview, which in turn is loaded by me from the file and then I run the threads I need. BUT the process in the stream is infinite and will end only if I completely close the program, and I would like to end the stream in the same way as I started it (right click on the desired line-start/stop). As I said, I have never worked with threads and thought that it was somehow simple, like when you start a thread, you assign it an ID and end it with the same ID, but alas. I have searched all over Google and have not found an EXAMPLE that suits me (I will repeat for the third time - I have never worked with threads and I do not need to say "go read about TPL"), so I ask for help, preferably with an example)
I have a very bad idea: in the sheet there is an invisible column in which an id is generated at the start, then when I send a command to start the thread, a unique variable is created with the name for example int id1=0 and its name is passed to the thread itself and each time the loop starts, id1=0 or 1 is checked in it, respectively, if 0-continue, if 1-empty. Well, it is logical that when you click the stop button, its value changes to 1. But something seems to me that the holy spirit of multithreading will punish me for this when the threads become 100+. I read this idea somewhere, so don't swear)
You do not need hundreds of threads for this. Your worker "threads" are performing HTTP requests, which can be done asynchronously without requiring a new thread. Also, hundreds of threads wouldn't really help you unless you have hundreds of CPU cores (you don't).
For this sort of work, I'd recommend the following:
Write a method that does all the work your thread does, but also checks a CancellationToken with each iteration.
Calls the method in a loop, once for each account, and store the resulting tasks in an array or list. Or use LINQ (as I do in this example) to create the list.
When your program terminates, activate the CancellationToken.
After cancelling, you have to await all the tasks in order to observe any possible exceptions and exit cleanly.
For example
public async Task DoTheWork(Account account, CancellationToken token)
{
while (!token.IsCancellationRequested)
{
var result = await httpClient.GetAsync(account.Url);
await DoSomethingWithResult(result);
await Task.Delay(1000);
}
}
//Main program
var accounts = GetAccountList();
var source = new CancellationTokenSource();
var tasks = accounts.Select( x => DoTheWork(x, source.Token) ).ToList();
//When exiting
source.Cancel();
await Task.WhenAll( tasks );
source.Dispose();
Indivivdual cancellation
Here's another approach that keeps a list of the accounts and a delegate that can be used for cancelling the task for that specific account.
//Declare this somewhere it will persist for the duration of the program
//The key to this dictionary is the account you wish to cancel
//The value is a delegate that you can call to cancel its task
Dictionary<Account, Func<Task>> _tasks = new Dictionary<Account, Func<Task>>();
async Task CreateTasks()
{
var accounts = GetAccounts();
foreach (var account in accounts)
{
var source = new CancellationTokenSource();
var task = DoTheWork(account, source.Token);
_tasks.Add(account, () => { source.Cancel(); return task; });
}
}
//Retrieve the delegate from the dictionary and call it to cancel its task
//Then await the task to observe any exceptions
//Then remove it from the list
async Task CancelTask(Account account)
{
var cancelAction = _tasks[account];
var task = cancelAction();
await task;
_tasks.Remove(account);
}
async Task CancelAllTasks()
{
var tasks = _tasks.Select(x => x.Value()).ToList();
await Task.WhenAll(tasks);
}

Nested async methods in a Parallel.ForEach

I have a method that runs multiple async methods within it. I have to iterate over a list of devices, and pass the device to this method. I am noticing that this is taking a long time to complete so I am thinking of using Parallel.ForEach so it can run this process against multiple devices at the same time.
Let's say this is my method.
public async Task ProcessDevice(Device device) {
var dev = await _deviceService.LookupDeviceIndbAsNoTracking(device);
var result = await DoSomething(dev);
await DoSomething2(dev);
}
Then DoSomething2 also calls an async method.
public async Task DoSomething2(Device dev) {
foreach(var obj in dev.Objects) {
await DoSomething3(obj);
}
}
The list of devices continuously gets larger over time, so the more this list grows, the longer it takes the program to finish running ProcessDevice() against each device. I would like to process more than one device at a time. So I have been looking into using Parallel.ForEach.
Parallel.ForEach(devices, async device => {
try {
await ProcessDevice(device);
} catch (Exception ex) {
throw ex;
}
})
It appears that the program is finishing before the device is fully processed. I have also tried creating a list of tasks, and then foreach device, add a new task running ProcessDevice to that list and then awaiting Task.WhenAll(listOfTasks);
var listOfTasks = new List<Task>();
foreach(var device in devices) {
var task = Task.Run(async () => await ProcessDevice(device));
listOfTasks.Add(task);
}
await Task.WhenAll(listOfTasks);
But it appears that the task is marked as completed before ProcessDevice() is actually finished running.
Please excuse my ignorance on this issue as I am new to parallel processing and not sure what is going on. What is happening to cause this behavior and is there any documentation that you could provide that could help me better understand what to do?
You can't mix async with Parallel.ForEach. Since your underlying operation is asynchronous, you'd want to use asynchronous concurrency, not parallelism. Asynchronous concurrency is most easily expressed with WhenAll:
var listOfTasks = devices.Select(ProcessDevice).ToList();
await Task.WhenAll(listOfTasks);
In your last example there's a few problems:
var listOfTasks = new List<Task>();
foreach (var device in devices)
{
await Task.Run(async () => await ProcessDevice(device));
}
await Task.WhenAll(listOfTasks);
Doing await Task.Run(async () => await ProcessDevice(device)); means you are not moving to the next iteration of the foreach loop until the previous one is done. Essentially, you're still doing them one at a time.
Additionally, you aren't adding any tasks to listOfTasks so it remains empty and therefore Task.WhenAll(listOfTasks) completes instantly because there's no tasks to await.
Try this:
var listOfTasks = new List<Task>();
foreach (var device in devices)
{
var task = Task.Run(async () => await ProcessDevice(device))
listOfTasks.Add(task);
}
await Task.WhenAll(listOfTasks);
I can explain the problem with Parallel.ForEach. An important thing to understand is that when the await keyword acts on an incomplete Task, it returns. It will return its own incomplete Task if the method signature allows (if it's not void). Then it is up to the caller to use that Task object to wait for the job to finish.
But the second parameter in Parallel.ForEach is an Action<T>, which is a void method, which means no Task can be returned, which means the caller (Parallel.ForEach in this case) has no way to wait until the job has finished.
So in your case, as soon as it hits await ProcessDevice(device), it returns and nothing waits for it to finish so it starts the next iteration. By the time Parallel.ForEach is finished, all it has done is started all the tasks, but not waited for them.
So don't use Parallel.ForEach with asynchronous code.
Stephen's answer is more appropriate. You can also use WSC's answer, but that can be dangerous with larger lists. Creating hundreds or thousands of new threads all at once will not help your performance.
not very sure it this if what you are asking for, but I can give example of how we start async process
private readonly Func<Worker> _worker;
private void StartWorkers(IEnumerable<Props> props){
Parallel.ForEach(props, timestamp => { _worker.Invoke().Consume(timestamp); });
}
Would recommend reading about Parallel.ForEach as it will do some part for you.

How to correctly queue up tasks to run in C#

I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}

Running Task<T> on a custom scheduler

I am creating a generic helper class that will help prioritise requests made to an API whilst restricting parallelisation at which they occur.
Consider the key method of the application below;
public IQueuedTaskHandle<TResponse> InvokeRequest<TResponse>(Func<TClient, Task<TResponse>> invocation, QueuedClientPriority priority, CancellationToken ct) where TResponse : IServiceResponse
{
var cts = CancellationTokenSource.CreateLinkedTokenSource(ct);
_logger.Debug("Queueing task.");
var taskToQueue = Task.Factory.StartNew(async () =>
{
_logger.Debug("Starting request {0}", Task.CurrentId);
return await invocation(_client);
}, cts.Token, TaskCreationOptions.None, _schedulers[priority]).Unwrap();
taskToQueue.ContinueWith(task => _logger.Debug("Finished task {0}", task.Id), cts.Token);
return new EcosystemQueuedTaskHandle<TResponse>(cts, priority, taskToQueue);
}
Without going into too many details, I want to invoke tasks returned by Task<TResponse>> invocation when their turn in the queue arises. I am using a collection of queues constructed using QueuedTaskScheduler indexed by a unique enumeration;
_queuedTaskScheduler = new QueuedTaskScheduler(TaskScheduler.Default, 3);
_schedulers = new Dictionary<QueuedClientPriority, TaskScheduler>();
//Enumerate the priorities
foreach (var priority in Enum.GetValues(typeof(QueuedClientPriority)))
{
_schedulers.Add((QueuedClientPriority)priority, _queuedTaskScheduler.ActivateNewQueue((int)priority));
}
However, with little success I can't get the tasks to execute in a limited parallelised environment, leading to 100 API requests being constructed, fired, and completed in one big batch. I can tell this using a Fiddler session;
I have read some interesting articles and SO posts (here, here and here) that I thought would detail how to go about this, but so far I have not been able to figure it out. From what I understand, the async nature of the lambda is working in a continuation structure as designed, which is marking the generated task as complete, basically "insta-completing" it. This means that whilst the queues are working fine, runing a generated Task<T> on a custom scheduler is turning out to be the problem.
This means that whilst the queues are working fine, runing a generated Task on a custom scheduler is turning out to be the problem.
Correct. One way to think about it[1] is that an async method is split into several tasks - it's broken up at each await point. Each one of these "sub-tasks" are then run on the task scheduler. So, the async method will run entirely on the task scheduler (assuming you don't use ConfigureAwait(false)), but at each await it will leave the task scheduler, and then re-enter that task scheduler after the await completes.
So, if you want to coordinate asynchronous work at a higher level, you need to take a different approach. It's possible to write the code yourself for this, but it can get messy. I recommend you first try ActionBlock<T> from the TPL Dataflow library, passing your custom task scheduler to its ExecutionDataflowBlockOptions.
[1] This is a simplification. The state machine will avoid creating actual task objects unless necessary (in this case, they are necessary because they're being scheduled to a task scheduler). Also, only await points where the awaitable isn't complete actually cause a "method split".
Stephen Cleary's answer explains well why you can't use TaskScheduler for this purpose and how you can use ActionBlock to limit the degree of parallelism. But if you want to add priorities to that, I think you'll have to do that manually. Your approach of using a Dictionary of queues is reasonable, a simple implementation (with no support for cancellation or completion) of that could look something like this:
class Scheduler
{
private static readonly Priority[] Priorities =
(Priority[])Enum.GetValues(typeof(Priority));
private readonly IReadOnlyDictionary<Priority, ConcurrentQueue<Func<Task>>> queues;
private readonly ActionBlock<Func<Task>> executor;
private readonly SemaphoreSlim semaphore;
public Scheduler(int degreeOfParallelism)
{
queues = Priorities.ToDictionary(
priority => priority, _ => new ConcurrentQueue<Func<Task>>());
executor = new ActionBlock<Func<Task>>(
invocation => invocation(),
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = degreeOfParallelism,
BoundedCapacity = degreeOfParallelism
});
semaphore = new SemaphoreSlim(0);
Task.Run(Watch);
}
private async Task Watch()
{
while (true)
{
await semaphore.WaitAsync();
// find item with highest priority and send it for execution
foreach (var priority in Priorities.Reverse())
{
Func<Task> invocation;
if (queues[priority].TryDequeue(out invocation))
{
await executor.SendAsync(invocation);
}
}
}
}
public void Invoke(Func<Task> invocation, Priority priority)
{
queues[priority].Enqueue(invocation);
semaphore.Release(1);
}
}

Parallel.ForEach using Thread.Sleep equivalent

So here's the situation: I need to make a call to a web site that starts a search. This search continues for an unknown amount of time, and the only way I know if the search has finished is by periodically querying the website to see if there's a "Download Data" link somewhere on it (it uses some strange ajax call on a javascript timer to check the backend and update the page, I think).
So here's the trick: I have hundreds of items I need to search for, one at a time. So I have some code that looks a little bit like this:
var items = getItems();
Parallel.ForEach(items, item =>
{
startSearch(item);
var finished = isSearchFinished(item);
while(finished == false)
{
finished = isSearchFinished(item); //<--- How do I delay this action 30 Secs?
}
downloadData(item);
}
Now, obviously this isn't the real code, because there could be things that cause isSearchFinished to always be false.
Obvious infinite loop danger aside, how would I correctly keep isSearchFinished() from calling over and over and over, but instead call every, say, 30 seconds or 1 minute?
I know Thread.Sleep() isn't the right solution, and I think the solution might be accomplished by using Threading.Timer() but I'm not very familiar with it, and there are so many threading options that I'm just not sure which to use.
It's quite easy to implement with tasks and async/await, as noted by #KevinS in the comments:
async Task<ItemData> ProcessItemAsync(Item item)
{
while (true)
{
if (await isSearchFinishedAsync(item))
break;
await Task.Delay(30 * 1000);
}
return await downloadDataAsync(item);
}
// ...
var items = getItems();
var tasks = items.Select(i => ProcessItemAsync(i)).ToArray();
await Task.WhenAll(tasks);
var data = tasks.Select(t = > t.Result);
This way, you don't block ThreadPool threads in vain for what is mostly a bunch of I/O-bound network operations. If you're not familiar with async/await, the async-await tag wiki might be a good place to start.
I assume you can convert your synchronous methods isSearchFinished and downloadData to asynchronous versions using something like HttpClient for non-blocking HTTP request and returning a Task<>. If you are unable to do so, you still can simply wrap them with Task.Run, as await Task.Run(() => isSearchFinished(item)) and await Task.Run(() => downloadData(item)). Normally this is not recommended, but as you have hundreds of items, it sill would give you a much better level of concurrency than with Parallel.ForEach in this case, because you won't be blocking pool threads for 30s, thanks to asynchronous Task.Delay.
You can also write a generic function using TaskCompletionSource and Threading.Timer to return a Task that becomes complete once a specified retry function succeeds.
public static Task RetryAsync(Func<bool> retryFunc, TimeSpan retryInterval)
{
return RetryAsync(retryFunc, retryInterval, CancellationToken.None);
}
public static Task RetryAsync(Func<bool> retryFunc, TimeSpan retryInterval, CancellationToken cancellationToken)
{
var tcs = new TaskCompletionSource<object>();
cancellationToken.Register(() => tcs.TrySetCanceled());
var timer = new Timer((state) =>
{
var taskCompletionSource = (TaskCompletionSource<object>) state;
try
{
if (retryFunc())
{
taskCompletionSource.TrySetResult(null);
}
}
catch (Exception ex)
{
taskCompletionSource.TrySetException(ex);
}
}, tcs, TimeSpan.FromMilliseconds(0), retryInterval);
// Once the task is complete, dispose of the timer so it doesn't keep firing. Also captures the timer
// in a closure so it does not get disposed.
tcs.Task.ContinueWith(t => timer.Dispose(),
CancellationToken.None,
TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default);
return tcs.Task;
}
You can then use RetryAsync like this:
var searchTasks = new List<Task>();
searchTasks.AddRange(items.Select(
downloadItem => RetryAsync( () => isSearchFinished(downloadItem), TimeSpan.FromSeconds(2)) // retry timout
.ContinueWith(t => downloadData(downloadItem),
CancellationToken.None,
TaskContinuationOptions.OnlyOnRanToCompletion,
TaskScheduler.Default)));
await Task.WhenAll(searchTasks.ToArray());
The ContinueWith part specifies what you do once the task has completed successfully. In this case it will run your downloadData method on a thread pool thread because we specified TaskScheduler.Default and the continuation will only execute if the task ran to completion, i.e. it was not canceled and no exception was thrown.

Categories

Resources