Cancelling long-running tasks in PLINQ

Cancelling long-running tasks in PLINQ - c#

I am trying to use the NET 4.0 parallel task library to handle multiple FTS queries. If the query takes too much time, I want to cancel it and continue forward with processing the rest.
This code doesn't stop when one query goes over the threshold. I think I'm calling it such that the cancel task and time limit is reached for the whole of the process rather than the single transaction. If I set the time period to be very small (300ms), then it gets called for all search strings.
I think I'm missing something obvious .. thanks in advance for any insight.
Additionally, this still doesn't seem to stop the very long query from executing. Is this even the correct way to cancel a long running query once it's been triggered?
Modified code:
CancellationTokenSource cts = new CancellationTokenSource();
CancellationToken token = cts.Token;
var query = searchString.Values.Select(c =>myLongQuery(c)).AsParallel().AsOrdered()
.Skip(counter * numToProcess).Take(numToProcess).WithCancellation(cts.Token);
new Thread(() =>
{
Thread.Sleep(5000);
cts.Cancel();
}).Start();
try
{
List<List<Threads>> results = query.ToList();
foreach (List<Threads> threads in results)
{
// does something with data
}
} catch (OperationCanceledException) {
Console.WriteLine("query took too long");
}

PLINQ will poll the cancellation token after every some number of elements. If the frequency of checks is insufficient for your application, make sure all expensive delegates in the PLINQ query regularly call cts.Token.ThrowIfCancellationRequested().
For more details, see this article: Link

This is just a guess: isn't the problem that the query is lazy (as in normal LINQ) and so it isn't executed until later?

Related

How to handle multiple tasks running in parallel at different intervals inside a C# based Windows service?

I already have some experience in working with threads in Windows but most of that experience comes from using Win32 API functions in C/C++ applications. When it comes to .NET applications however, I am often not sure about how to properly deal with multithreading. There are threads, tasks, the TPL and all sorts of other things I can use for multithreading but I never know when to use which of those options.
I am currently working on a C# based Windows service which needs to periodically validate different groups of data from different data sources. Implementing the validation itself is not really an issue for me but I am unsure about how to handle all of the validations running simultaneously.
I need a solution for this which allows me to do all of the following things:
Run the validations at different (predefined) intervals.
Control all of the different validations from one place so I can pause and/or stop them if necessary, for example when a user stops or restarts the service.
Use the system ressources as efficiently as possible to avoid performance issues.
So far I've only had one similar project before where I simply used Thread objects combined with a ManualResetEvent and a Thread.Join call with a timeout to notify the threads about when the service is stopped. The logic inside those threads to do something periodically then looked like this:
while (!shutdownEvent.WaitOne(0))
{
if (DateTime.Now > nextExecutionTime)
{
// Do something
nextExecutionTime = nextExecutionTime.AddMinutes(interval);
}
Thread.Sleep(1000);
}
While this did work as expected, I've often heard that using threads directly like this is considered "oldschool" or even a bad practice. I also think that this solution does not use threads very efficiently as they are just sleeping most of the time. How can I achive something like this in a more modern and efficient way?
If this question is too vague or opinion-based then please let me know and I will try my best to make it as specific as possible.

Question feels a bit broad but we can use the provided code and try to improve it.
Indeed the problem with the existing code is that for the majority of the time it holds thread blocked while doing nothing useful (sleeping). Also thread wakes up every second only to check the interval and in most cases go to sleep again since it's not validation time yet. Why it does that? Because if you will sleep for longer period - you might block for a long time when you signal shutdownEvent and then join a thread. Thread.Sleep doesn't provide a way to be interrupted on request.
To solve both problems we can use:
Cooperative cancellation mechanism in form of CancellationTokenSource + CancellationToken.
Task.Delay instead of Thread.Sleep.
For example:
async Task ValidationLoop(CancellationToken ct) {
while (!ct.IsCancellationRequested) {
try {
var now = DateTime.Now;
if (now >= _nextExecutionTime) {
// do something
_nextExecutionTime = _nextExecutionTime.AddMinutes(1);
}
var waitFor = _nextExecutionTime - now;
if (waitFor.Ticks > 0) {
await Task.Delay(waitFor, ct);
}
}
catch (OperationCanceledException) {
// expected, just exit
// otherwise, let it go and handle cancelled task
// at the caller of this method (returned task will be cancelled).
return;
}
catch (Exception) {
// either have global exception handler here
// or expect the task returned by this method to fail
// and handle this condition at the caller
}
}
}
Now we do not hold a thread any more, because await Task.Delay doesn't do this. Instead, after specificed time interval it will execute the subsequent code on a free thread pool thread (it's more complicated that this but we won't go into details here).
We also don't need to wake up every second for no reason, because Task.Delay accepts cancellation token as a parameter. When that token is signalled - Task.Delay will be immediately interrupted with exception, which we expect and break from the validation loop.
To stop the provided loop you need to use CancellationTokenSource:
private readonly CancellationTokenSource _cts = new CancellationTokenSource();
And you pass its _cts.Token token into the provided method. Then when you want to signal the token, just do:
_cts.Cancel();
To futher improve the resource management - IF your validation code uses any IO operations (reads files from disk, network, database access etc) - use Async versions of said operations. Then also while performing IO you will hold no unnecessary threads blocked waiting.
Now you don't need to manage threads yourself anymore and instead you operatate in terms of tasks you need to perform, letting framework \ OS manage threads for you.

You should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
Subject<bool> starter = new Subject<bool>();
IObservable<Unit> query =
starter
.StartWith(true)
.Select(x => x
? Observable.Interval(TimeSpan.FromSeconds(5.0)).SelectMany(y => Observable.Start(() => Validation()))
: Observable.Never<Unit>())
.Switch();
IDisposable subscription = query.Subscribe();
That fires off the Validation() method every 5.0 seconds.
When you need to pause and resume, do this:
starter.OnNext(false);
// Now paused
starter.OnNext(true);
// Now restarted.
When you want to stop it all call subscription.Dispose().

How to Limit request per Second in Async Task C#

I'm writing an application which interact with Azure Cosmos DB. I need commit 30,000 Records to CosmosDB in a Session. Because I used .NET Core so I cannot use BulkInsert dll. So, I use Foreach loop to Insert to CosmosDB. But I see too many request per Second and it overload RU limit by CosmosDB.
foreach(item in listNeedInsert){
await RequestInsertToCosmosDB(item);
}
I want to Pause foreach loop when Number of request reach 100. After done 100 request. foreach will continue.

You can partition the list and await the results:
var tasks = new List<Task>();
foreach(item in listNeedInsert)
{
var task = RequestInsertToCosmosDB(item);
tasks.Add(task);
if(tasks.Count == 100)
{
await Task.WhenAll(tasks);
tasks.Clear();
}
}
// Wait for anything left to finish
await Task.WhenAll(tasks);
Every time you've got 100 tasks running the code will wait for them all to finish before executing the last batch.

You could set a delay on every hundred iteration
int i = 1;
foreach(item in listNeedInsert)
{
await RequestInsertToCosmosDB(item);
if (i % 100 == 0)
{
i = 0;
await Task.Delay(100); // Miliseconds
}
i++;
}

If you really want to maximize efficiency and can't do bulk updates, look into using SemaphorSlim in this post:
Throttling asynchronous tasks
Hammering a medium-sized database with 100 concurrent requests at a time isn't a great idea because it's not equipped to handle that kind of throughput. You could try playing with a different throttling number and seeing what's optimal, but I would guess it's in the single digit range.
If you want to do something quick and dirty, you could probably use Sean's solution. But I'd set the Task count to 5 starting out, not 100.

https://github.com/thomhurst/EnumerableAsyncProcessor
I've written a library to help with this sort of logic.
Usage would be:
await AsyncProcessorBuilder.WithItems(listNeedInsert) // Or Extension Method: listNeedInsert.ToAsyncProcessorBuilder()
.ForEachAsync(item => RequestInsertToCosmosDB(item), CancellationToken.None)
.ProcessInBatches(batchSize: 100);

CancellationTokenSource not working properly with Parallel.ForEach

I have the following code:
CancellationTokenSource ts = new CancellationTokenSource(10000);
ParallelOptions po = new ParallelOptions();
po.CancellationToken = ts.Token;
List<int> lItems = new List<int>();
for (int i = 0; i < 20; i++)
lItems.Add(i);
System.Collections.Concurrent.ConcurrentBag<int> lBgs = new System.Collections.Concurrent.ConcurrentBag<int>();
Stopwatch sp = Stopwatch.StartNew();
try
{
Parallel.ForEach(lItems, po, i =>
{
Task.Delay(i * 1000).Wait();
lBgs.Add(i);
});
}
catch (Exception ex)
{
}
Console.WriteLine("Elapsed time: {0:N2} seg Total items: {1}", sp.ElapsedMilliseconds / 1000.0, lBgs.Count);
My question is why takes more than 20 sec to cancel the operation (parallel for) if the CancelationTokenSource is set to finish in 10 sec
Regards

Without a good Minimal, Complete, and Verifiable code example, it's impossible to fully understand your scenario. But based on the code you posted, it appears that you expect for your CancellationToken to affect the execution of each individual iteration of the Parallel.ForEach().
However, that's not how it works. The Parallel.ForEach() method schedules individual operations concurrently, but once those operations start, they are out of the control of the Parallel.ForEach() method. If you want them to terminate early, you have to do that yourself. E.g.:
Parallel.ForEach(lItems, po, i =>
{
Task.Delay(i * 1000, ts.Token).Wait();
lBgs.Add(i);
});
As your code stands now, all 20 actions are started almost immediately (there's a short delay as the thread pool creates enough threads for all the actions, if necessary), before you cancel the token. That is, by the time you cancel the token, the Parallel.ForEach() method no longer has a way to avoid starting the actions; they are already started!
Since your individual actions don't do anything to interrupt themselves, then all that's left is for them all to complete. The start-up time (including waiting for the thread pool to create enough worker threads), plus the longest total delay (i.e. the delay to start an action plus that action's delay), determines the total time the operation takes, with your cancellation token having no effect. Since your longest action is 20 seconds, the total delay for the Parallel.ForEach() operation will always be at least 20 seconds.
By making the change I show above, the delay task for each individual action will be cancelled by your token when it expires, causing a task-cancelled exception. This will cause the action itself to terminate early as well.
Note that there is still value in assigning the cancellation token to the ParallelOptions.CancellationToken property. Even though the cancellation happens too late to stop Parallel.ForEach() from starting all of the actions, by providing the token in the options, it can recognize that the exception thrown by each action was caused by the same cancellation token used in the options. With that, it then can throw just a single OperationCanceledException, instead of wrapping all of the action exceptions in an AggregateException.

I am assuming you are not actually
In response to
My question is why takes more than 20 sec to cancel the operation (parallel for) if the CancelationTokenSource is set to finish in 10 sec
This happens because you are not cancelling the Parallel.ForEach
In order to actually cancel you need to to use
po.CancellationToken.ThrowIfCancellationRequested();
inside the Parallel.ForEach code
As previous answer pointed out, if you want to actually cancel the task created by Task.Delay() you need to use the overload of Task.Delay which accepts a CancellationToken
Task.Delay(i * 1000, po.CancellationToken).Wait();
public static Task Delay(
TimeSpan delay,
CancellationToken cancellationToken
)
More details here
MSDN How to: Cancel a Parallel.For or ForEach Loop

High performance async monitoring tasks

I have a couple of hundred devices and I need to check their status every 5 seconds.
The API I'm using contains a blocking function that calls a dll and returns a status of a single device
string status = ReadStatus(int deviceID); // waits here until the status is returned
The above function usually returns the status in a couple of ms, but there will be situations where I might not get the status back for a second or more! Or even worse, one device might not respond at all.
I therefore need to introduce a form of asynchronicity to make sure that one device that doesn't respond doesn't impend all the others being monitored.
My current approach is as following
// triggers every 5 sec
public MonitorDevices_ElapsedInterval(object sender, ElapsedEventArgs elapsedEventArgs)
{
foreach (var device in lstDevices) // several hundred devices in the list
{
var task = device.ReadStatusAsync(device.ID, cts.Token);
tasks.Add(task);
}
// await all tasks finished, or timeout after 4900ms
await Task.WhenAny(Task.WhenAll(tasks), Task.Delay(4900, cts.Token));
cts.Cancel();
var devicesThatResponded = tasks.Where(t => t.Status == TaskStatus.RanToCompletion)
.Select(t => t.GetAwaiter().GetResult())
.ToList();
}
And below in the Device class
public async Task ReadStatusAsync(int deviceID, CancellationToken tk)
{
await Task.Delay(50, tk);
// calls the dll to return the status. Blocks until the status is return
Status = ReadStatus(deviceID);
}
I'm having several problems with my code
the foreach loops fires a couple of hundred tasks simultaneously, with the callback from the Task.Delay being served by a thread from the thread pool, each task taking a couple of ms.
I see this as a big potential bottleneck. Are there any better approaches?
This might be similar to what Stephen Cleary commented here, but he didn't provide an alternative What it costs to use Task.Delay()?
In case ReadStatus fails to return, I'm trying to use a cancellation token to cancel the thread that sits there waiting for the response... This doesn't seem to work.
await Task.Delay(50, tk)
Thread.Sleep(100000) // simulate the device not responding
I still have about 20 Worker Threads alive (even though I was expecting cts.Cancel() to kill them.

the foreach loops fires a couple of hundred tasks simultaneously
Since ReadStatus is synchronous (I'm assuming you can't change this), and since each one needs to be independent because they can block the calling thread, then you have to have hundreds of tasks. That's already the most efficient way.
Are there any better approaches?
If each device should be read every 5 seconds, then each device having its own timer would probably be better. After a few cycles, they should "even out".
await Task.Delay(50, tk);
I do not recommend using Task.Delay to "trampoline" non-async code. If you wish to run code on the thread pool, just wrap it in a Task.Run:
foreach (var device in lstDevices) // several hundred devices in the list
{
var task = Task.Run(() => device.ReadStatus(device.ID, cts.Token));
tasks.Add(task);
}
I'm trying to use a cancellation token to cancel the thread that sit there waiting for the response... This doesn't seem to work.
Cancellation tokens do not kill threads. If ReadStatus observes its cancellation token, then it should cancel; if not, then there isn't much you can do about it.
Thread pool threads should not be terminated; this reduces thread churn when the timer next fires.

As you can see in this Microsoft example page of a cancellation token, the doWork method is checking for cancellation on each loop. So, the loop has to start again to cancel out. In your case, when you simulate a long task, it never checks for cancellation at all when it's running.
From How do I cancel non-cancelable async operations?, it's saying at the end : "So, can you cancel non-cancelable operations? No. Can you cancel waits on non-cancelable operations? Sure… just be very careful when you do.". So it answers that we can't cancel it out.
What I would suggest is to use threads with a ThreadPool, you take the starting time of each one and you have an higher priority thread that looks if others bypass their maximum allowed time. If so, Thread.Interrupt().

how do set a timeout for a method

how do set a timeout for a busy method +C#.

Ok, here's the real answer.
...
void LongRunningMethod(object monitorSync)
{
//do stuff
lock (monitorSync) {
Monitor.Pulse(monitorSync);
}
}
void ImpatientMethod() {
Action<object> longMethod = LongRunningMethod;
object monitorSync = new object();
bool timedOut;
lock (monitorSync) {
longMethod.BeginInvoke(monitorSync, null, null);
timedOut = !Monitor.Wait(monitorSync, TimeSpan.FromSeconds(30)); // waiting 30 secs
}
if (timedOut) {
// it timed out.
}
}
...
This combines two of the most fun parts of using C#. First off, to call the method asynchronously, use a delegate which has the fancy-pants BeginInvoke magic.
Then, use a monitor to send a message from the LongRunningMethod back to the ImpatientMethod to let it know when it's done, or if it hasn't heard from it in a certain amount of time, just give up on it.
(p.s.- Just kidding about this being the real answer. I know there are 2^9303 ways to skin a cat. Especially in .Net)

You can not do that, unless you change the method.
There are two ways:
The method is built in such a way that it itself measures how long it has been running, and then returns prematurely if it exceeds some threshold.
The method is built in such a way that it monitors a variable/event that says "when this variable is set, please exit", and then you have another thread measure the time spent in the first method, and then set that variable when the time elapsed has exceeded some threshold.
The most obvious, but unfortunately wrong, answer you can get here is "Just run the method in a thread and use Thread.Abort when it has ran for too long".
The only correct way is for the method to cooperate in such a way that it will do a clean exit when it has been running too long.
There's also a third way, where you execute the method on a separate thread, but after waiting for it to finish, and it takes too long to do that, you simply say "I am not going to wait for it to finish, but just discard it". In this case, the method will still run, and eventually finish, but that other thread that was waiting for it will simply give up.
Think of the third way as calling someone and asking them to search their house for that book you lent them, and after you waiting on your end of the phone for 5 minutes you simply say "aw, chuck it", and hang up. Eventually that other person will find the book and get back to the phone, only to notice that you no longer care for the result.

This is an old question but it has a simpler solution now that was not available then: Tasks!
Here is a sample code:
var task = Task.Run(() => LongRunningMethod());//you can pass parameters to the method as well
if (task.Wait(TimeSpan.FromSeconds(30)))
return task.Result; //the method returns elegantly
else
throw new TimeoutException();//the method timed-out

While MojoFilter's answer is nice it can lead to leaks if the "LongMethod" freezes. You should ABORT the operation if you're not interested in the result anymore.
public void LongMethod()
{
//do stuff
}
public void ImpatientMethod()
{
Action longMethod = LongMethod; //use Func if you need a return value
ManualResetEvent mre = new ManualResetEvent(false);
Thread actionThread = new Thread(new ThreadStart(() =>
{
var iar = longMethod.BeginInvoke(null, null);
longMethod.EndInvoke(iar); //always call endinvoke
mre.Set();
}));
actionThread.Start();
mre.WaitOne(30000); // waiting 30 secs (or less)
if (actionThread.IsAlive) actionThread.Abort();
}

You can run the method in a separate thread, and monitor it and force it to exit if it works too long. A good way, if you can call it as such, would be to develop an attribute for the method in Post Sharp so the watching code isn't littering your application.
I've written the following as sample code(note the sample code part, it works, but could suffer issues from multithreading, or if the method in question captures the ThreadAbortException would break it):
static void ActualMethodWrapper(Action method, Action callBackMethod)
{
try
{
method.Invoke();
} catch (ThreadAbortException)
{
Console.WriteLine("Method aborted early");
} finally
{
callBackMethod.Invoke();
}
}
static void CallTimedOutMethod(Action method, Action callBackMethod, int milliseconds)
{
new Thread(new ThreadStart(() =>
{
Thread actionThread = new Thread(new ThreadStart(() =>
{
ActualMethodWrapper(method, callBackMethod);
}));
actionThread.Start();
Thread.Sleep(milliseconds);
if (actionThread.IsAlive) actionThread.Abort();
})).Start();
}
With the following invocation:
CallTimedOutMethod(() =>
{
Console.WriteLine("In method");
Thread.Sleep(2000);
Console.WriteLine("Method done");
}, () =>
{
Console.WriteLine("In CallBackMethod");
}, 1000);
I need to work on my code readability.

Methods don't have timeouts in C#, unless your in the debugger or the OS believes your app has 'hung'. Even then processing still continues and as long as you don't kill the application a response is returned and the app continues to work.
Calls to databases can have timeouts.

Could you create an Asynchronous Method so that you can continue doing other stuff whilst the "busy" method completes?

I regularly write apps where I have to synchronize time critical tasks across platforms. If you can avoid thread.abort you should. See http://blogs.msdn.com/b/ericlippert/archive/2010/02/22/should-i-specify-a-timeout.aspx and http://www.interact-sw.co.uk/iangblog/2004/11/12/cancellation for guidelines on when thread.abort is appropriate. Here are the concept I implement:
Selective execution: Only run if a reasonable chance of success exists (based on ability to meet timeout or likelihood of success result relative to other queued items). If you break code into segments and know roughly the expected time between task chunks, you can predict if you should skip any further processing. Total time can be measured by wrapping an object bin tasks with a recursive function for time calculation or by having a controller class that watches workers to know expected wait times.
Selective orphaning: Only wait for return if reasonable chance of success exists. Indexed tasks are run in a managed queue. Tasks that exceed their timeout or risk causing other timeouts are orphaned and a null record is returned in their stead. Longer running tasks can be wrapped in async calls. See example async call wrapper: http://www.vbusers.com/codecsharp/codeget.asp?ThreadID=67&PostID=1
Conditional selection: Similar to selective execution but based on group instead of individual task. If many of your tasks are interconnected such that one success or fail renders additional processing irrelevant, create a flag that is checked before execution begins and again before long running sub-tasks begin. This is especially useful when you are using parallel.for or other such queued concurrency tasks.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.