Parallel.ForEach using Thread.Sleep equivalent - c#

So here's the situation: I need to make a call to a web site that starts a search. This search continues for an unknown amount of time, and the only way I know if the search has finished is by periodically querying the website to see if there's a "Download Data" link somewhere on it (it uses some strange ajax call on a javascript timer to check the backend and update the page, I think).
So here's the trick: I have hundreds of items I need to search for, one at a time. So I have some code that looks a little bit like this:
var items = getItems();
Parallel.ForEach(items, item =>
{
startSearch(item);
var finished = isSearchFinished(item);
while(finished == false)
{
finished = isSearchFinished(item); //<--- How do I delay this action 30 Secs?
}
downloadData(item);
}
Now, obviously this isn't the real code, because there could be things that cause isSearchFinished to always be false.
Obvious infinite loop danger aside, how would I correctly keep isSearchFinished() from calling over and over and over, but instead call every, say, 30 seconds or 1 minute?
I know Thread.Sleep() isn't the right solution, and I think the solution might be accomplished by using Threading.Timer() but I'm not very familiar with it, and there are so many threading options that I'm just not sure which to use.

It's quite easy to implement with tasks and async/await, as noted by #KevinS in the comments:
async Task<ItemData> ProcessItemAsync(Item item)
{
while (true)
{
if (await isSearchFinishedAsync(item))
break;
await Task.Delay(30 * 1000);
}
return await downloadDataAsync(item);
}
// ...
var items = getItems();
var tasks = items.Select(i => ProcessItemAsync(i)).ToArray();
await Task.WhenAll(tasks);
var data = tasks.Select(t = > t.Result);
This way, you don't block ThreadPool threads in vain for what is mostly a bunch of I/O-bound network operations. If you're not familiar with async/await, the async-await tag wiki might be a good place to start.
I assume you can convert your synchronous methods isSearchFinished and downloadData to asynchronous versions using something like HttpClient for non-blocking HTTP request and returning a Task<>. If you are unable to do so, you still can simply wrap them with Task.Run, as await Task.Run(() => isSearchFinished(item)) and await Task.Run(() => downloadData(item)). Normally this is not recommended, but as you have hundreds of items, it sill would give you a much better level of concurrency than with Parallel.ForEach in this case, because you won't be blocking pool threads for 30s, thanks to asynchronous Task.Delay.

You can also write a generic function using TaskCompletionSource and Threading.Timer to return a Task that becomes complete once a specified retry function succeeds.
public static Task RetryAsync(Func<bool> retryFunc, TimeSpan retryInterval)
{
return RetryAsync(retryFunc, retryInterval, CancellationToken.None);
}
public static Task RetryAsync(Func<bool> retryFunc, TimeSpan retryInterval, CancellationToken cancellationToken)
{
var tcs = new TaskCompletionSource<object>();
cancellationToken.Register(() => tcs.TrySetCanceled());
var timer = new Timer((state) =>
{
var taskCompletionSource = (TaskCompletionSource<object>) state;
try
{
if (retryFunc())
{
taskCompletionSource.TrySetResult(null);
}
}
catch (Exception ex)
{
taskCompletionSource.TrySetException(ex);
}
}, tcs, TimeSpan.FromMilliseconds(0), retryInterval);
// Once the task is complete, dispose of the timer so it doesn't keep firing. Also captures the timer
// in a closure so it does not get disposed.
tcs.Task.ContinueWith(t => timer.Dispose(),
CancellationToken.None,
TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default);
return tcs.Task;
}
You can then use RetryAsync like this:
var searchTasks = new List<Task>();
searchTasks.AddRange(items.Select(
downloadItem => RetryAsync( () => isSearchFinished(downloadItem), TimeSpan.FromSeconds(2)) // retry timout
.ContinueWith(t => downloadData(downloadItem),
CancellationToken.None,
TaskContinuationOptions.OnlyOnRanToCompletion,
TaskScheduler.Default)));
await Task.WhenAll(searchTasks.ToArray());
The ContinueWith part specifies what you do once the task has completed successfully. In this case it will run your downloadData method on a thread pool thread because we specified TaskScheduler.Default and the continuation will only execute if the task ran to completion, i.e. it was not canceled and no exception was thrown.

Related

Cancel background running jobs in Parallel.foreach

I have a parallel foreach for few api calls. Api will return some data and I can process it. Lets say from front end, I call to ProcessInsu method and suddenly the user change his mind and go to other page. (There should be a clear button and user can exit the page after clicking clear button.) After calling ProcessInsu method, it will run APIs in background. It is waste of resources because, the user already change his mind and do other work. I need some methodology to cancel background running jobs.
public async Task ProcessInsu(InsuranceAccounts insuranceCompAccounts,string resourceId)
{
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = Convert.ToInt32(Math.Ceiling((Environment.ProcessorCount * 0.75) * 2.0));
Parallel.ForEach(insuranceCompAccounts, parallelOptions, async (insuranceComp) =>
{
await Processor.Process(resourceId,insuranceComp); //When looping, From each call, it will call to different different insurance companies.
});
}
I tried this sample code and I could not do my work with that. Any expert can guide me to do this?
As others have noted, you can't use async with Parallel - it won't work correctly. Instead, you want to do asynchronous concurrency as such:
public async Task ProcessInsu(InsuranceAccounts insuranceCompAccounts, string resourceId)
{
var tasks = insuranceCompAccounts.Select(async (insuranceComp) =>
{
await Processor.Process(resourceId, insuranceComp);
}).ToList();
await Task.WhenAll(tasks);
}
Now that the code is corrected, you can add cancellation support. E.g.:
public async Task ProcessInsu(InsuranceAccounts insuranceCompAccounts, string resourceId)
{
var cts = new CancellationTokenSource();
var tasks = insuranceCompAccounts.Select(async (insuranceComp) =>
{
await Processor.Process(resourceId, insuranceComp, cts.Token);
}).ToList();
await Task.WhenAll(tasks);
}
When you are ready to cancel the operation, call cts.Cancel();.
Note that Process now takes a CancellationToken. It should pass this token on to whatever I/O APIs it's using.

Nested async methods in a Parallel.ForEach

I have a method that runs multiple async methods within it. I have to iterate over a list of devices, and pass the device to this method. I am noticing that this is taking a long time to complete so I am thinking of using Parallel.ForEach so it can run this process against multiple devices at the same time.
Let's say this is my method.
public async Task ProcessDevice(Device device) {
var dev = await _deviceService.LookupDeviceIndbAsNoTracking(device);
var result = await DoSomething(dev);
await DoSomething2(dev);
}
Then DoSomething2 also calls an async method.
public async Task DoSomething2(Device dev) {
foreach(var obj in dev.Objects) {
await DoSomething3(obj);
}
}
The list of devices continuously gets larger over time, so the more this list grows, the longer it takes the program to finish running ProcessDevice() against each device. I would like to process more than one device at a time. So I have been looking into using Parallel.ForEach.
Parallel.ForEach(devices, async device => {
try {
await ProcessDevice(device);
} catch (Exception ex) {
throw ex;
}
})
It appears that the program is finishing before the device is fully processed. I have also tried creating a list of tasks, and then foreach device, add a new task running ProcessDevice to that list and then awaiting Task.WhenAll(listOfTasks);
var listOfTasks = new List<Task>();
foreach(var device in devices) {
var task = Task.Run(async () => await ProcessDevice(device));
listOfTasks.Add(task);
}
await Task.WhenAll(listOfTasks);
But it appears that the task is marked as completed before ProcessDevice() is actually finished running.
Please excuse my ignorance on this issue as I am new to parallel processing and not sure what is going on. What is happening to cause this behavior and is there any documentation that you could provide that could help me better understand what to do?
You can't mix async with Parallel.ForEach. Since your underlying operation is asynchronous, you'd want to use asynchronous concurrency, not parallelism. Asynchronous concurrency is most easily expressed with WhenAll:
var listOfTasks = devices.Select(ProcessDevice).ToList();
await Task.WhenAll(listOfTasks);
In your last example there's a few problems:
var listOfTasks = new List<Task>();
foreach (var device in devices)
{
await Task.Run(async () => await ProcessDevice(device));
}
await Task.WhenAll(listOfTasks);
Doing await Task.Run(async () => await ProcessDevice(device)); means you are not moving to the next iteration of the foreach loop until the previous one is done. Essentially, you're still doing them one at a time.
Additionally, you aren't adding any tasks to listOfTasks so it remains empty and therefore Task.WhenAll(listOfTasks) completes instantly because there's no tasks to await.
Try this:
var listOfTasks = new List<Task>();
foreach (var device in devices)
{
var task = Task.Run(async () => await ProcessDevice(device))
listOfTasks.Add(task);
}
await Task.WhenAll(listOfTasks);
I can explain the problem with Parallel.ForEach. An important thing to understand is that when the await keyword acts on an incomplete Task, it returns. It will return its own incomplete Task if the method signature allows (if it's not void). Then it is up to the caller to use that Task object to wait for the job to finish.
But the second parameter in Parallel.ForEach is an Action<T>, which is a void method, which means no Task can be returned, which means the caller (Parallel.ForEach in this case) has no way to wait until the job has finished.
So in your case, as soon as it hits await ProcessDevice(device), it returns and nothing waits for it to finish so it starts the next iteration. By the time Parallel.ForEach is finished, all it has done is started all the tasks, but not waited for them.
So don't use Parallel.ForEach with asynchronous code.
Stephen's answer is more appropriate. You can also use WSC's answer, but that can be dangerous with larger lists. Creating hundreds or thousands of new threads all at once will not help your performance.
not very sure it this if what you are asking for, but I can give example of how we start async process
private readonly Func<Worker> _worker;
private void StartWorkers(IEnumerable<Props> props){
Parallel.ForEach(props, timestamp => { _worker.Invoke().Consume(timestamp); });
}
Would recommend reading about Parallel.ForEach as it will do some part for you.

How do I stop awaiting a task but keep the task running in background?

If I have a list of tasks which I want to execute together but at the same time I want to execute a certain number of them together, so I await for one of them until one of them finishes, which then should mean awaiting should stop and a new task should be allowed to start, but when one of them finishes, I don't know how to stop awaiting for the task which is currently being awaited, I don't want to cancel the task, just stop awaiting and let it continue running in the background.
I have the following code
foreach (var link in SharedVars.DownloadQueue)
{
if (currentRunningCount != batch)
{
var task = DownloadFile(extraPathPerLink, link, totalLen);
_ = task.ContinueWith(_ =>
{
downloadQueueLinksTasks.Remove(task);
currentRunningCount--;
// TODO SHOULD CHANGE WHAT IS AWAITED
});
currentRunningCount++;
downloadQueueLinksTasks.Add(task);
}
if (currentRunningCount == batch)
{
// TODO SHOULD NOT AWAIT 0
await downloadQueueLinksTasks[0];
}
}
I found about Task.WhenAny but from this comment here I understood that the other tasks will be ignored so it's not the solution I want to achieve.
I'm sorry if the question is stupid or wrong but I can't seem to find any information related on how to solve it, or what is the name of the operation I want to achieve so I can even search correctly.
Solution Edit
All the answers provided are correct, I accepted the one I decided to use but still all of them are correct.
Thank you everyone, I learned a lot from all of you from these different answers and different ways to approach the problem and how to think about it.
What I learned about this specific problem was that I still needed to await for the other tasks left, so the solution was to have the Task.WhenAny inside the loop (which returns the finished task (this is also important)) AND Task.WhenAll outside the loop to await the other left tasks.
Task.WhenAny returns the Task which completed.
foreach (var link in SharedVars.DownloadQueue)
{
var task = DownloadFile(extraPathPerLink, link, totalLen);
downloadQueueLinksTasks.Add(task);
if (downloadQueueLinksTasks.Count == batch)
{
// Wait for any Task to complete, then remove it from
// the list of pending tasks.
var completedTask = await Task.WhenAny(downloadQueueLinksTasks);
downloadQueueLinksTasks.Remove(completedTask);
}
}
// Wait for all of the remaining Tasks to complete
await Task.WhenAll(downloadQueueLinksTasks);
You can use Task.WaitAny()
Here is the demonstration of the behavior:
public static async Task Main(string[] args)
{
IList<Task> tasks = new List<Task>();
tasks.Add(TestAsync(0));
tasks.Add(TestAsync(1));
tasks.Add(TestAsync(2));
tasks.Add(TestAsync(3));
tasks.Add(TestAsync(4));
tasks.Add(TestAsync(5));
var result = Task.WaitAny(tasks.ToArray());
Console.WriteLine("returned task id is {0}", result);
///do other operations where
//before exiting wait for other tasks so that your tasks won't get cancellation signal
await Task.WhenAll(tasks.ToArray());
}
public static async Task TestAsync(int i)
{
Console.WriteLine("Staring to wait" + i);
await Task.Delay(new Random().Next(1000, 10000));
Console.WriteLine("Task finished" + i);
}
Output:
Staring to wait0
Staring to wait1
Staring to wait2
Staring to wait3
Staring to wait4
Staring to wait5
Task finished0
returned task id is 0
Task finished4
Task finished2
Task finished1
Task finished5
Task finished3
What you're asking about is throttling, which for asynchronous code is best expressed via SemaphoreSlim:
var semaphore = new SemaphoreSlim(batch);
var tasks = SharedVars.DownloadQueue.Select(link =>
{
await semaphore.WaitAsync();
try { return DownloadFile(extraPathPerLink, link, totalLen); }
finally { semaphore.Release(); }
});
var results = await Task.WhenAll(tasks);
You should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
IDisposable subscription =
SharedVars.DownloadQueue
.ToObservable()
.Select(link =>
Observable.FromAsync(() => DownloadFile(extraPathPerLink, link, totalLen)))
.Merge(batch) //max concurrent downloads
.Subscribe(file =>
{
/* process downloaded file here
(no longer a task) */
});
If you need to stop the downloads before they would naturally finish just call subscription.Dispose().

How to correctly queue up tasks to run in C#

I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}

Parallel.Invoke does not wait for async methods to complete

I have an application that pulls a fair amount of data from different sources. A local database, a networked database, and a web query. Any of these can take a few seconds to complete. So, first I decided to run these in parallel:
Parallel.Invoke(
() => dataX = loadX(),
() => dataY = loadY(),
() => dataZ = loadZ()
);
As expected, all three execute in parallel, but execution on the whole block doesn't come back until the last one is done.
Next, I decided to add a spinner or "busy indicator" to the application. I don't want to block the UI thread or the spinner won't spin. So these need to be ran in async mode. But if I run all three in an async mode, then they in affect happen "synchronously", just not in the same thread as the UI. I still want them to run in parallel.
spinner.IsBusy = true;
Parallel.Invoke(
async () => dataX = await Task.Run(() => { return loadX(); }),
async () => dataY = await Task.Run(() => { return loadY(); }),
async () => dataZ = await Task.Run(() => { return loadZ(); })
);
spinner.isBusy = false;
Now, the Parallel.Invoke does not wait for the methods to finish and the spinner is instantly off. Worse, dataX/Y/Z are null and exceptions occur later.
What's the proper way here? Should I use a BackgroundWorker instead? I was hoping to make use of the .NET 4.5 features.
It sounds like you really want something like:
spinner.IsBusy = true;
try
{
Task t1 = Task.Run(() => dataX = loadX());
Task t2 = Task.Run(() => dataY = loadY());
Task t3 = Task.Run(() => dataZ = loadZ());
await Task.WhenAll(t1, t2, t3);
}
finally
{
spinner.IsBusy = false;
}
That way you're asynchronously waiting for all the tasks to complete (Task.WhenAll returns a task which completes when all the other tasks complete), without blocking the UI thread... whereas Parallel.Invoke (and Parallel.ForEach etc) are blocking calls, and shouldn't be used in the UI thread.
(The reason that Parallel.Invoke wasn't blocking with your async lambdas is that it was just waiting until each Action returned... which was basically when it hit the start of the await. Normally you'd want to assign an async lambda to Func<Task> or similar, in the same way that you don't want to write async void methods usually.)
As you stated in your question, two of your methods query a database (one via sql, the other via azure) and the third triggers a POST request to a web service. All three of those methods are doing I/O bound work.
What happeneds when you invoke Parallel.Invoke is you basically trigger three ThreadPool threads to block and wait for I/O based operations to complete, which is pretty much a waste of resources, and will scale pretty badly if you ever need to.
Instead, you could use async apis which all three of them expose:
SQL Server via Entity Framework 6 or ADO.NET
Azure has async api's
Web request via HttpClient.PostAsync
Lets assume the following methods:
LoadXAsync();
LoadYAsync();
LoadZAsync();
You can call them like this:
spinner.IsBusy = true;
try
{
Task t1 = LoadXAsync();
Task t2 = LoadYAsync();
Task t3 = LoadZAsync();
await Task.WhenAll(t1, t2, t3);
}
finally
{
spinner.IsBusy = false;
}
This will have the same desired outcome. It wont freeze your UI, and it would let you save valuable resources.

Categories

Resources